General Update, and Discussion of Widely Available Cutting-Edge Software Tools that Read Text in Images (=OCR), including Hebrew

Updates to Academia.edu; "Accessible Talmud" project; Hebrew OCR in Windows; ChatGPT4 Vision; Google Lens; Google Photos; Google Drive (Docs)

Oct 24, 2023

I wanted to provide some updates. Due to the outbreak of war in Israel, I took a break from regular daily posts. I was anyway planning on shifting to focusing on larger projects, and updating pieces on my Academia.edu page.

Here are a few of the things I’ve done, and projects I’m working on. At the end, I give an overview, review, and comparison of some cutting-edge OCR tools.

Updates to Academia.edu

As a reminder, access to reading pieces on Academia.edu requires registration.

Updated the organization of all my pieces there. Split all the pieces into the following sections:
1. Guides - 5 pieces
2. Talmudic Names - 6
3. Talmudic Greek Loanwords - 5
4. Medieval Kabbalah - 6
5. Modern Period - 9
6. Other Papers - 12
7. Drafts - 3
Updated my “Index, By Century Discussed, of Interviews of Historians on Seforim Chatter Podcast”. I highly recommend this podcast.

Talmudic Greek Loanwords

I’ve been continuing to work on the topic of Talmudic Greek Loanwords (“Talmudic” in the broad sense of literature of Chazal - Mishnah, Talmud, and classical Midrash). As part of this, I translated (from German) “Krauss's Comprehensive Mapping Table of Letters for Transcription of Greek Loanwords in Classical Rabbinic Literature”. I will be making further updates to this piece (adding further introduction and bibliography): “Krauss's Rules of Transcription of Greek Loanwords in Classical Rabbinic Literature - Consonants - b; g; c; d; g; z; th (=θ, theta); k; l; m; n; x (=ξ, xi)”.

Accessible Talmud

I’m continuing to work on trying out ways of using new AI tools, specifically ChatpGPT4, to make the standard Talmudic sugya more accessible. Each of these will be created per sugya. For a broad discussion, and review of current projects, see the end of my Seforim Blog post: “From Print to Pixel: Digital Editions of the Talmud Bavli” (June 5, 2023). For my initial attempt, see my previous “Some Exciting Techniques for Making a Typical Talmudic Sugya More Accessible” (September 6, 2023). I’m working on improving all of these, and bringing them all together. (For an interesting alternative model to that of Sefaria and Al-Hatorah, compare Perseus’s presentation of classical texts):

Outline/summary of the sugya
Narrative Overview of the sugya
Flowchart or Diagram of the sugya
Historical and Cultural Context
Halachic terms mentioned the sugya, that are specific to the topic(s) being discussed (and linking to Hebrew Wikipedia).
Glossary/highlighting of Formulaic discourse terms in the sugya. For example: תנו רבנן ; מכדי ; רב X אמר. See: “Signposts in Sacred Text: Formulaic Terms Used in Talmud Bavli” (September 15, 2023).
Glossary/highlighting of Formulaic hermeneutic terms, per sugya. For example: היקש; קל וחומר; קולא; חומרא. See: “Using A Linguistic Lens on Midrash: Does a Deep Technical Understanding of Modern Linguistics Provide Significant Help in Understanding Drashot in the Talmud and Midrash?”
Glossary/highlighting of Sages named in the sugya (generation, short bio, and linking to Hebrew Wikipedia). See: “From Abba to Zebedee: A Comprehensive Survey of Naming Conventions in Hebrew and Jewish Aramaic in Late Antiquity” (May 12, 2023)
Section, line, and word counts. See “Quantifying the Talmud: A Technical Dive into Chapter Word Counts” (September 19, 2023)
Splitting into short lines, and matching Hebrew with translation line-by-line. See: “Pt.4 of Scripting the Talmud: Using ChatGPT4 to Automatically Optimize the Formatting of side-by-side Hebrew-English Talmud” (September 13, 2023)
Transliteration of Hebrew to English. See: “Bridging Scripts: A Comparison of Tools and Methods for Automated Transliteration of Hebrew characters to Latin characters” (September 28, 2023).
Part-of-Speech Tagging. See: Tag, You're It: A Review of Dicta's New Part-of-Speech Tagging Tool, and Comparing it with ChatGPT4” (September 14, 2023)

I plan on returning to the project of Talmudic Names. Specifically:
1. Medieval sources on Talmudic biography. See, for now, my: “Hebrew History: A List of Some Pre-Modern Jewish Chronicles” (September 29, 2023).
2. Continue to add to my major piece: “From Abba to Zebedee: A Comprehensive Survey of Naming Conventions in the Mishnah, Talmud, and Late Antique Midrash”. I have much additional material accumulated.

Hebrew OCR in Windows; ChatGPT4; Google Lens; Google Drive

For a short overview of the major OCR tools built specifically for Hebrew manuscripts, see my “Guide to Online Resources for Scholarly Jewish Study and Research - 2023”, section “Tools for Transcription of manuscripts” (pp. 23-4; Academia.edu, registration required)

Wikipedia, “Optical character recognition”:

“Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast).”

As I said in my previous discussion of Google Lens OCR (“Android Unleashed: Optimizing My Smartphone” (October 5, 2023), section “Google Lens, for OCR”), Google Lens’ OCR works shockingly well, including for Hebrew text. As of now, Google Lens is still better than ChatGPT4.

Windows now (since ~1.5 months ago) has native OCR in "Snipping Tool", and it works for Hebrew. I tested it on a screenshot of Bar-Ilan Responsa CD word counts, and it worked flawlessly, including correctly retaining the separation into lines.

ChatGPT4 now has the ability to upload images, and to interpret them. It’s called “Vision”, and it can read text in images (OCR). I gained access to this around two weeks ago, when it became available for all users. It is great for multiple languages. It also allows for follow-up conversations, and interpretation. But Google Lens is still better.

Google Photos, as well, has "Copy text from image" (as of ~April 2021). I tested it, and it works quite well. However, it had an issue: it didn't realize that the text is formatted as two columns. It read each two rows as one continuous row.

Windows "Snipping Tool" was better in this regard, it overall recognized that it was two formatted as two columns, and copied the text accordingly, in order. But it did mess up on a few rows, and read them as a single continuous row.

Google Drive also has OCR capabilities that are quite good. I used this when translating from German the above-mentioned “Krauss's Rules of Transcription”.

Test

I tested it on a screenshot of Yeriah Gedolah.

On this medieval kabbalistic work, my article in Academia.edu: “The Yeriah Gedolah: An Allusive Rendering of the Sefirot“, and my Seforim Blog article: “Towards Decoding Ha-Yeriah Ha-Gedolah (The Great Parchment), A Cryptic 14th Century Italian Kabbalistic Text“ (July 7, 2019).

I used manuscript JTS, available at NLI website. The Hebrew handwriting style is medieval Italian cursive.

Here's a screenshot of the first page of that work, in the manuscript:

Here's my transcription of the manuscript:

דע לך כי כוונת המחבר האיגרת הזאת יש בו שני {מספרים/נסתרים?} אחד

על צד החקירה והצירוף . ואחד על צד הקבלה האמיתית . ועוד כי במקומות רבים

האריך הסיפור ופרט אותו בתכלית ההפרטה ואחר כך כלל אותו בסוף ספורו גם אני

הכותב בקצת מקומות אבאר לך הסיפור כולו על דרך כללות כדי להקל המעיין

ואחר כך אבאר כל ספור וספור בפרט בפני עצמו עד סוף הספור: מבוא

לביאורו: כוונת המחבר שכתב שני פסוקים בראש הספור והם מגדל

עוז וכו' יהי ? המים וכו' ראשי תיבות מן שני פסוקים הם מ"י לרמוז כי הם

ההויה הראשונה היא מקור חמשים שערי בינה עם הוראת הפסוקים מורים

אל ההויה הראשונה ואמר שמגדל עוז היא שם ייי וכבר אמר ספר הבהיר כל מקום

Google Lens, Drive, and Photos recognized that it was Hebrew, but it only got a few letters/words right, out of hundreds.

ChatGPT4 image reader couldn't figure it out any words at all, even after providing context, and transcription of the first line

In contrast, Windows Snipping Tools' OCR was surprisingly good. Not only did it recognize that it was Hebrew, but it had quite a high accuracy rate. Even where the letters were incorrect, many of them were close to being correct, and good "guesses".

Here's my analysis of Windows OCR. Screenshot follows; I highlighted in yellow the words that are correct, or close to correct:

I tested using ChatGPT4 to attempt to correct the transcription, by copy-pasting the text provided by Windows Snipping Tool. ChatGPT4 made some correct changes, but also incorrect changes, so overall, wasn't helpful.

Discussion about this post

Ready for more?