bell notificationshomepageloginNewPostedit profile

Topic : Re: How do I remove Nikkud (vowel marks) from a Word 2016 document? I am working on a commentary on Ethics of the Fathers and I want readers to be able to read sources I'm quoting in their - selfpublishingguru.com

10% popularity

A quick Google search on hebrew remove nikkud gave an answer.

On Github there's a JavaScript with a live preview code. If it's little text you could use the JavaScript either online or download and use it on your pc (save as .js).

The Hebrew charcodes are all between 1425 and 1479 and the nikkud are between 0591 and 05C7.

Python implementation (tested):

import unicodedata
# nikkud-test.txt is the file you save your text in.
f= open('nikkud-test.txt','r', encoding='utf-8')
content = f.read()
normalized=unicodedata.normalize('NFKD', content)
no_nikkud=''.join([c for c in normalized if not unicodedata.combining(c)])
no_nikkud
f.close()
f = open('no-nikkud-test.txt','w',encoding='utf-8')
fw = f.write(no_nikkud)
f.close()

This works very fast.

UPDATED:
How to use this script?

Download Python 3.x.x from the python.org
Save your nikkud text to nikkud-test.txt in whatever directory
From the start menu start your cmd shell/command prompt/terminal.
Move to directory where you saved your file by typing cd followed by the directory
type python or open an iPython console.
copy + paste script
no-nikkud-test.txt will show up in the same directory

UPDATE without Terminal (Tested with Python 3.5 IDLE and iPython)

Download Python 3.5 or higher from python.org
Save your niqqud text to niqqud.txt in your Documents folder. (Windows / Mac)
Open IDLE from the Start Menu. (Alternatively, use iPython)

Copy and paste the function below:

def hasar_niqqud(source="niqqud.txt"):
"""This function removes niqqud vowel diacretics from Hebrew.
@param source: The source filename with .txt extension."""
import os, unicodedata
path = os.path.expanduser('~/Documents/'+str(source))
f= open(path,'r', encoding='utf-8')
content = f.read()
normalized=unicodedata.normalize('NFKD', content)
no_niqqud=''.join([c for c in normalized if not unicodedata.combining(c)])
f.close()
path = os.path.expanduser('~/Documents/'+str(source)[:-4]+"-removed.txt")
f = open(path,'w',encoding='utf-8')
f.write(no_niqqud)
f.close()

Then run the function with this code:

hasar_niqqud()

That's it! You can find the output in the Documents folder niqqud-removed.txt


Load Full (0)

Login to follow topic

More posts by @Welton431

0 Comments

Sorted by latest first Latest Oldest Best

Back to top