Proposal for ChavrutAI: An AI Chavruta for Studying Talmud
Appendix - Custom Formatter of Sefaria Talmud Text Via API
Talmud study has traditionally been an interactive process. The back-and-forth of a chavruta (study partner) sharpens understanding, challenges assumptions, and brings texts to life. But what if an AI could serve as a chavruta—providing structured analysis, background information, and even summarizing debates?1
This is the goal of ChavrutAI, an AI-assisted Talmud study tool I’m planning to develop.2 The idea is straightforward: leverage modern AI language models (LLMs) to provide structured assistance in analyzing the Talmud, while ensuring that the process remains rigorous and faithful to the text.
Outline
How It Will Work
Expanding AI-Assisted Talmud Research
The Future of AI and Talmud
Appendix - Custom Formatter of Sefaria Talmud Text Via API
The Challenge of Digital Jewish Texts
What the program does
Illustrations (Screenshots)
Implications for Digital Humanities
How It Will Work
ChavrutAI will pull text directly from Sefaria’s API, then clean and format it for optimal readability and accuracy. I've already built an initial script for this, check out my GitHub repo;3 it does the following (see my Appendix to this piece for elaboration on this script):
Split into lines based on punctuation (period, colon, question mark, exclamation point [.:?!])
Generate numbered section headers
Remove nikud (Hebrew vowel marks) while preserving punctuation
Standardize terminology according to scholarly preferences
Convert spelled-out numbers to Arabic numerals for numbers
GUI interface via Google Colab notebook
Clean text formatting for Google Docs and other word processors; copy-paste ready output4
Once the text is prepared, a large language model (LLM) will interpret it. Current AI capabilities for processing the Talmud and other classical Jewish texts are surprisingly strong. If you’re skeptical, check out the Substack of Ari Friedman et. al., “LLMOD”, where they analyze and discuss AI and classic Jewish texts and p’sak.
The LLM will provide:
Introductory overviews of a sugya (Talmudic discussion)
Summary tables outlining key halakhic positions
Required background information (historical, linguistic, and halakhic context)5
Expanding AI-Assisted Talmud Research
Beyond structured interpretation, I’m also hoping to develop a Talmudic Named Entity Recognition (NER) model. The goal is to systematically identify and categorize names, places, and other key entities in the Talmud. If you’re interested in this, see my proposal on my Academia page, and see related material under the section on Talmudic names.
The Future of AI and Talmud
ChavrutAI is meant to offer structured insights, organizing material, and acting as a tireless study-partner. If you’re interested in collaborating, testing, or providing feedback, I’d love to hear from you.
Talmud study has always evolved with new tools—from manuscripts to print editions to digital resources.6 AI chavruta might just be the next step.
Appendix - Custom Formatter of Sefaria Talmud Text Via API
For scholars, rabbis, students, and enthusiasts of Jewish texts, Sefaria has revolutionized access to the vast canon of Jewish literature. This digital library provides free access to thousands of texts, but when it comes to incorporating these texts into academic papers, lesson plans, or study sheets, the formatting process can be cumbersome.
In this article, I'll share how I developed a custom Python script that interfaces with Sefaria's API to retrieve texts and format them according to specific scholarly preferences. This tool addresses common needs in academic and religious study environments: consistent terminology, clean Hebrew text, and formatting that transitions seamlessly to word processors.
The Challenge of Digital Jewish Texts
The digital revolution has transformed how we interact with ancient texts. Yet, digital convenience sometimes comes with unexpected friction points. For those working with Hebrew texts, these include:
Nikud handling - Hebrew vowel marks (nikud) might need to be removed while preserving punctuation7
Terminology standardization - Academic or denominational preferences often require different terminology than exists in a translation
Formatting inconsistencies - Copy-pasting from web interfaces often introduces unwanted formatting
Numerical representation - Converting spelled-out numbers to numerical form for ease of reading and calculating
Consider a scholar preparing materials on Tractate Sotah who needs to follow specific style guidelines. Manually adjusting each text excerpt takes time and is error-prone. Our solution automates this process through a programmatic approach.
What the program does
The notebook contains a Python script for retrieving and formatting Talmudic texts from Sefaria's API. Here's a summary of what this tool does:
The notebook creates a custom formatter for Sefaria Talmud texts that:
Retrieves text from the Sefaria API by specifying a reference (like "Megillah.11b")
Formats the text for easy copying into Google Docs with these features:
Option to remove nikud (Hebrew vowel marks)
Standardizes terminology according to preferences (e.g., changing "Gemara" to "Talmud")
Splits text by punctuation marks for better readability
Formats in Calibri 12pt which is suitable for Google Docs
Organizes text into clearly marked sections
Can include adjacent pages of the Talmud
The code is structured with:
Helper functions for text processing
Sefaria API functions for retrieving and formatting the texts
A user interface built with ipywidgets that includes:
Text input for references
Various options like language selection
Checkboxes for formatting preferences
Bottom line, the major use-case for the program is retrieving Talmudic texts in a clean, consistent format that's ready for study purposes, with the output designed specifically for pasting into Google Docs.
This interface provides a simple form where users can specify their reference, choose display options, and retrieve formatted text ready for copying.
Illustrations (Screenshots)
I set Megillah.11b as the default due to the relatively high frequence of spelled-out numbers and relatively complex calculations; for transformation of spelled-out numbers.8
The Interface:
Example #1:
https://www.sefaria.org.il/Megillah.11b.4:
My notebook output:
Example #2:
https://www.sefaria.org.il/Megillah.11b.9:
My notebook output:
Implications for Digital Humanities
This project demonstrates several broader principles applicable to digital humanities work:
API Integration - Programmatic access to digital libraries opens new possibilities for text processing
Linguistic Standardization - Automated approaches can customize to preferred terminology and style
Unicode Manipulation - Working with non-Latin scripts requires careful handling of Unicode character ranges
Cross-Platform Compatibility - Bridging digital libraries and document preparation systems reduces friction
For scholars working with specialized texts, similar approaches could be developed for other digital collections, creating customized pipelines that enforce institutional or disciplinary standards.
Conclusion
Digital humanities tools are most effective when they bridge the gap between digital libraries and scholars' actual workflows. By creating a custom formatter for Sefaria texts, we've demonstrated how API access combined with text processing techniques can streamline research and teaching preparation.
The completed tool allows scholars to retrieve texts with preferred formatting in seconds rather than minutes of manual adjustment, illustrating how small-scale digital tools can have an outsized impact on research efficiency.
For those interested in adapting this approach, the complete code is available as a Google Colab notebook (linked in a footnote), making it accessible to researchers regardless of their programming experience. As digital Jewish studies continues to evolve, tools like this represent an important step toward integrating digital libraries more seamlessly into scholarly workflows.
See my 2023 discussion of this in my article “From Print to Pixel: Digital Editions of the Talmud Bavli”, at the Seforim Blog.
And compare on this Moshe Koppel's 2024 article in Mosaic magazine, “What Artificial Intelligence Has In Store for Judaism”, archived version here; his focus there is the opposite of mine: human chavruta vs AI
Another major advantage of using the API vs. the Sefaria website UI is speed: The Sefaria website UI often loads very slowly, sometimes even not loading at all.
See my related discussions in previous pieces, especially my update here (24-Oct-2023), section “Accessible Talmud”.
See especially Prof. Menachem Katz's monumental project of a digital edition of Talmud Yerushalmi; I plan on reviewing that project soon.
As I’ve argued a number of times, nikud has far less value for the advanced learner than punctuation. See my piece “Symbols and Syntax: Punctuation and Nikud in the Talmud“. On removing nikud, see my piece “How to Programatically Strip Hebrew Nikud from a Hebrew Text“.
See my piece yesterday where I discuss and analyze that passage.
Thank you!
Just sent you an email--would love to collaborate. :)