Pushing the Boundaries of Talmudic Analysis: A Year in Review
With appendix: text file of Talmud Steinsaltz Translation with Commentary, Verses, and Names removed
In celebration of Rosh Hashanah, the Jewish New Year, I wanted to dedicate this piece to a recap and summary.
Over the last year, I believe that I’ve made substantial progress towards making a few dozen excerpts of aggadic sugyot in the Talmud more accessible; potentially providing a template in showing how it might be done on a larger scale.
In an article on Seforim Blog (June 5, 2023), “From Print to Pixel: Digital Editions of the Talmud Bavli”, I mentioned the potential for AI to accomplish this:
Given the rapid advancements in artificial intelligence and large language models, it appears highly likely that AI will continue to improve its ability to interpret the Talmud, along with all other sources, including digitized manuscripts. It may not be long before we turn to AI to obtain the ultimate p’shat in the Talmud.
Furthermore, there is a possibility that AI could eventually bring comparative sources to bear [...]
Another interesting idea that is ripe to be explored is visualization of the Talmudic text.
Although we haven’t reached that stage yet, the digital tools currently available can greatly improve the accessibility of the Talmud. This has been a central theme on my blog since its inception.
In this blogpost, I’d like to focus on a few aspects of the work I’ve done on names. For a more general summary, see my update a year ago here (24-Oct-2023), section “Accessible Talmud”.1
At around the same time as the Seforim Blog article, I published on my Academia page “Prospectus for a Large Language Model (LLM) to Facilitate Talmud Research” (version 1 published 16-May-2024; recently revised in version 2).
I write in the abstract:
Objective:
To build a large language model (LLM) to assist with researching the Talmud.
The initial goal is a comprehensive survey of onomastics in Hebrew and Jewish Aramaic in Late Antiquity.
The large language model (LLM) will include the entire corpus of Mishnah, Talmud, and Midrash.
Over the past year, I’ve been steadily working on the task of making the Talmud more accessible and better contextualized. My base text has been the Steinsaltz translation, which is overall outstanding and conveniently available on Sefaria. Its accessibility represents a significant milestone in bringing the Talmud to a wider audience.
When it comes to programmatically analyzing the Talmudic text, I’ve found the Steinsaltz translation to be far more accessible than the original Hebrew and Aramaic. For starters, it includes punctuation—a rarity in many Hebrew texts on Sefaria—and capitalization, which makes identifying proper names easier. As Josh Waxman notes,2 the use of capitalization is critical in distinguishing names from common nouns. From a purely technical perspective, the English translation allows for smoother scripting.
This ease isn’t just about convenience. Working with Hebrew, especially in digital formats, brings a slew of complications. Right-to-left writing is far harder to handle in most platforms—just try formatting a blog post with it on Substack, and you’ll see what I mean.
Add to that the fact that natural language processing libraries are much more developed for English, and the decision to work with the Steinsaltz translation becomes an obvious one.
But even with these advantages, the task is far from simple. I’ve been matching and extracting names from the Talmud (see the the most recent results here, and the original script here), often sifting through duplicates caused by variations in spelling. Some of these inconsistencies exist in the original text, while others arise from differences in the translation. Thankfully, the Steinsaltz edition remains fairly consistent in this regard, and the duplicates are a minority that can be handled programmatically.
The progress so far? I’ve mapped thousands of names to several major works on the subject, all of which can be found on my Academia page, section “Talmudic Names”.
As an aside, the Talmud is an enormous work for its time. I recently learned that the Indian epic poem Mahabharata, another foundational ancient religious text written around the same period, is similar in size. Of course, the content couldn’t be more different: the Mahabharata is an epic poem, much more in line with Homeric tradition than rabbinic debate.
There’s still much to be done, but the progress over the past year has been steady. And I’m confident that the tools I’ve built will continue to bear fruit as I push this project forward.
Appendix - text file of Talmud Steinsaltz Translation with Commentary, Verses, and Names removed
I’ve created a text file that contains only:
The bold text from Steinsaltz’s translation—essentially, the actual Talmudic translation minus the commentary.
I’ve stripped out the biblical verses as well, which were all encased in quotation marks. There’s a chance I removed a few lines of Talmud in the process, but it’s a small price to pay for clarity.
I’ve replaced all the names with placeholder {NAME} to allow for more efficient analysis.
This file has proven to be a helpful tool in my names research, as well as for finding interesting passages for analysis. At around three million words—roughly 50 percent larger than the original Hebrew text—others may find it helpful as well for Talmud analysis.
I was unable to upload it to Substack as a text file, so instead, I uploaded to Drive, and I’m sharing it as a link.
In a .txt file, it’s 13 MB large. It’s ~3,300 pages in a Word file. Exact stats:
The Talmud text is in alphabetical order of tractate, for technical reasons related to how it was downloaded from Sefaria (see my piece on this here).
Here’s the link:
https://drive.google.com/file/d/19-nh8tBWsYLzVR60nm7mJW9gg06uT7Qx/view?usp=sharing
I plan to turn my last half a year of blogposts on Talmudic aggada into an ebook sometime soon.
See also:
Joshua Waxman, “A graph database of scholastic relationships in the Babylonian Talmud”, Digital Scholarship in the Humanities, Volume 36, Issue Supplement_2, October 2021, Pages ii277–ii289.
I cite him also here.