How to Programatically Strip Hebrew Nikud from a Hebrew Text
See my previous post on nikud, and on Hebrew transliteration. Dicta now has a beta transliteration, here (thanks to Avi Shmidman of Dicta for telling me about this new tool). The transliteration style looks closest to what "A little hebrew" tool calls "SBL General", but with nice additional features: /‘a/ for ע, /u-/ for prefix vav.
Wikipedia, “Niqqud“:
In Hebrew orthography, niqqud or nikud (Hebrew: נִקּוּד, Modern: nīqqūd, Tiberian: nīqqūḏ, "dotting, pointing" or Hebrew: נְקֻדּוֹת, Modern: nəqudōt, Tiberian: nequdōṯ, "dots") is a system of diacritical signs used to represent vowels or distinguish between alternative pronunciations of letters of the Hebrew alphabet. Several such diacritical systems were developed in the Early Middle Ages […]
Niqqud marks are small compared to the letters, so they can be added without retranscribing texts whose writers did not anticipate them.
In modern Israeli orthography, niqqud is mainly used in specialised texts such as dictionaries, poetry, or texts for children or new immigrants to Israel […]
Dicta - ‘Remove Nikud’
Dicta has a special tool called Remove Nikud. You can input text with nikud into a text box, or upload a file, and it will output that same text without nikud:
Google Sheets Regex function - Only Output Hebrew text, as well as specific symbols
=REGEXREPLACE(A1, "[^א-ת\s.?:;,]", "")
Heads up: this also removes double quotes ("), parentheses (()), and the Maqaf symbol (־), which are often used in punctuation. To retain those in the text, add them to the regex formula to be excluded from removal.
Google Sheets Custom function - to remove nikud by Unicode
In Google Apps Script, create a custom function called REMOVE_NIKUD:
/**
* Removes Hebrew Nikud from the text.
*
* @param {string} input The text from which to remove Nikud.
* @return The text without Nikud.
* @customfunction
*/
function REMOVE_NIKUD(input) {
if (!input) return input;
// Regex pattern for Hebrew Nikud
var nikudPattern = /[\u0591-\u05C7]/g;
return input.replace(nikudPattern, '');
}
Issue: also removes the Maqaf symbol (־), which is sometimes used in Hebrew punctuation. See next.
Best - Google Sheets Regex function - remove nikud by Unicode
Also mentioned in “Remove Hebrew vowels (nikkud) from selected Unicode Hebrew text“ (edited Jul 3, 2020)
=REGEXREPLACE(A1, "[\x{0591}-\x{05C7}]", "")
Issue: also removes the Maqaf symbol (־), which is sometimes used in Hebrew punctuation. Revised formula, to not remove the Maqaf symbol (־):
=REGEXREPLACE(A1, "[\x{0591}-\x{05BD}\x{05BF}-\x{05C7}]", "")
Example, from export from Sefaria of Ein Yaakov: