Yeshivish Dialect in the Landscape of Generative Artificial Intelligence
My Curious Journey in Leveraging State-of-the-Art Generative Artificial Intelligence Tools on Yeshivish Content
The video: Video in Yeshivish using Generative artificial intelligence - YouTube
Tweeted here on 23-Apr-2023: https://twitter.com/BrandEzra/status/1650159628985028610
Retweeted here:
https://twitter.com/DBashIdeas/status/1650497814319734795 (“AI generated yeshiva guy sounds like a Harry.“)
https://twitter.com/TechRav/status/1650828169937551361 (“This AI generated video is simultaneously crazy good and bizarrely bad.“)
https://twitter.com/Mottel/status/1650523411876134916 (“Results are... Interesting“)
Based on two previous pieces of mine, available at my Academia.edu page (requires registration):
Recently, a conversation on Facebook was initiated by a Facebook post with screenshots displaying GPT-4's surprising proficiency in writing in the style of the Yeshivish dialect of English.
Among all the comments expressing surprise, a commenter asked about the training dataset used, prompting a response from myself, speculating on a few possible sources. Another commenter pushed back, leading to a discussion.
Here I’d like to point to some possible datasets of Yeshivish used for training ChatGPT-4, with a short discussion of each.
Wikipedia. The entry on Yeshivish is pretty good (definitely lots of minor things to quibble with, but overall pretty good), and if one goes down the "wiki rabbit hole" via categories and lists (e.g., here , here , here) and such, there are lots of Yiddish words with entries, with a focus on ones used in an American English context.
Popular books about conversational Yiddish. There are a bunch of books from the 60s and on that discuss Yiddish in a popular secular context, that would be helpful in providing training. For example, Leo Rosten's Joys of Yiddish.
Dictionaries of Yeshivish. Frumspeak: The First Dictionary of Yeshivish (1995), by Chaim M. Weiser. Regrettably, it’s not a great book, for what it's trying to do. It's quite amateur, in the worst way.
Yeshivish magazines. Mishpacha and other magazines and newspapers have articles written in Yeshivish in toned-down yeshivish. Some humor articles there are written in an exaggerated Yeshivish. Some articles are online.
Yeshivish websites. Yeshiva World News (=YWN), Imamother, etc are often good examples of Yeshivish dialect. Especially YWN’s forum, “Coffee Room”.
Blogs. For example, Rabbi Pinky Schmeckelstein. A blog around 2010 written by someone with the pseudonym that would give a ridiculous parody divrei torah in exaggerated Yeshivish. With at least a few dozen blog posts. See more examples of blogs in my forthcoming “Notes on Online Critics of mainstream Orthodox Judaism in English” (email me for a draft). Many of the pieces in those websites are written in a form of Yeshivish.
Yeshivish lectures/shiurim. There are tens of thousands of hours of recordings of shiurim online. no one knows exactly what corpora went into chat-gpt. being that Open AI is at forefront of AI transcription (Whisper), it's likely that transcriptions are a significant part of training data.
YouTube videos. Assuming transcriptions could be used, examples of Youtube channels parodying yeshivish:
How I did it
Intro
Generative artificial intelligence - Wikipedia
YouTube video for a good tutorial:
Create Voiced and Animated AI Avatar | Step by Step
Midjourney (Pic) - $8/month
https://discord.com/channels/662267976984297473/1008571074981658694
/imagine prompt:portrait shot of 30-year old yeshivish new york orthodox jew with white bright shirt, full body, very small gray beard, cropped hair, no hat, indoors, no background, facing forward, neutral expression. hyper detailed. ultra realistic. --v 5
ElevenAI (voice) - $5/month
To download youtube video audio: Offliberty
"old New York yeshivish voice, giving a boring lecture"
Chatgpt, for lecture:
“rephrase in conversational style, as an informal lecture. Rephrase all hebrew terms into english, it is important that no hebrew loanwords appear:”” ”
Optimizations:
~90% of hebrew words pronounced incorrectly. Especially: can’t pronounce “kh” sound (ח/כ).
On 3rd party app, change speed to 0.8x or 0.85x (I used Clipchamp, requires registration
Respellings, for better pronunciations:
Berakhot = Bra-khot
Mishna = Mish-na
Shema = Shma
Masechet = Masekhet
D-ID (video) - $5/month
https://studio.d-id.com/editor
ChatGPT-4 (content)- $20/month
write a four sentence paragraph in the style of yeshivish incorporating these yiddish words and phrases:
"
Shalom shalom
Schul
Litvish
Shtiebel
Shtick
Heimish
Hashem
Shmooze
Schnitzel
Don’t use the following words:
Daven
Shidduch
Guten
Aleichem
Torah
Nosh
Gefilte fish
Sheitel
tzedakah
“
Final (after light edits):
Shalom shalom! It's always good to see a fellow Yid in the Schul, especially one who is Litvish like myself. I prefer the cozy atmosphere of a Shtiebel over the grandeur of a big shul any day. My shtick is to make sure everyone feels heimish and welcome in Hashem's house. Let's sit down for a shmooze over a delicious schnitzel. May we always strive to bring more Yiddishkeit into our lives.
great post, and fun video. but AI will never get all the nuances right. heimish, not cozy. gezinta, not grandeur. gevaldik, not delicious. and welcoming others into the shul as a goal by a litvak? um, no.