Pt1 ChavrutAI Web App in Progress: Advancing the Vision of a More Accessible Talmud

Apr 20, 2025

This is the first part of a two-part series. Part 1 will focus on the detailed vision, and Inspiration & Models, while Part 2 will focus on prototypes and mock-ups. The outline for the series is below.1

ChavrutAI is a work-in-progress open-access web app aimed at making the Talmud more accessible, explorable, and navigable for contemporary audiences—scholars, students, and the curious public alike.

Outline

Rationale

The Vision

Features (Planned, Roadmap)

Textual Processing, Analysis & Tools
Side-by-Side Bilingual View
URL Structure & Navigation
Integration with Existing Resources
Contextual Metadata & Wikification
Semantic Highlighting
ChavrutaAI bot - personalized LLM-assistance
Additional possible longer time-horizon ideas, esp. Customization
Academic/ scientific / scholarly

Inspiration & Models

Work done by Prof. Menahem Katz
Perseus Project
Digital Dante (Columbia University)

Current Status (prototypes / pre-alpha versions)

Fetching Talmud text (Hebrew and English)

Custom Instruction, for processing text, and for AI / LLM, for querying a general LLM model re Talmud questions

MVP (Minimal Viable Product Specification) - Upcoming Beta Version

Text Source Integration
Primary View (inspired by Sefaria + Perseus)
Frontend
Backend
Initial Development Focus

Mock-up of the Proposed Interface of the web app - with focus on the UX/UI and Aesthetics

Mock-up of main tab (“Text & Translation”)
Mock-up of tab “Summaries & Key Terms”, top (in tablet/mobile view)
Mock-up of tab “Summaries & Key Terms”, scrolled down (in tablet/mobile view)
Mock-up of tab “Broader Analysis”, top (on tablet/mobile)
Mock-up of tab “Broader Analysis”, scrolled down to the bottom (on tablet/mobile)

Rationale

As I outlined almost two years ago in my post at the Seforim Blog2 and developed extensively on this blog, the traditional and highly conservative “tzurat hadaf” of the Talmud imposes major constraints—highly arbitrary pagination, visually dense, extremely minimal formatting, and layered commentary—that can be easily and massively transcended in digital editions.

This project aims to reflect the structure of Talmudic sugyot more faithfully, aided throughout by contemporary tools.

Ultimately, this is not just about access, but about intelligent access—making the Talmud readable and explorable by sensitivity to its literary, legal, and narrative structures in ways that reflect contemporary research and technological capabilities.

The Vision

Building upon existing monumental resources--especially Sefaria--the goal is to extend (or—in contemporary tech parlance—provide a “wrapper” for) the capabilities of current platforms and to customize it for Talmud reading and study, by drawing inspiration from the best practices in the broader field of digital humanities and user experience.

This web app will present the Talmud in a fully digitized format offering clean, semantic HTML. It will incorporate both the original Hebrew/Aramaic text and the major open-access English translation/interpretation (Steinsaltz edition), with rich typographic formatting, side-by-side display, and responsive design.

Features (Planned, Roadmap)

Textual Processing, Analysis & Tools

Each section broken further into less arbitrary sections, based on punctuation,3 as well as other processing of the English translation/interpretation to make it more accessible, readable, modern, and accurate.4
Bible verses quoted: highlight and make accessible/add punctuation5
User can highlight to pull Jastrow dictionary entries.6 The dictionary definition pane will also display and hyperlink to the Hebrew Wiktionary definition.
Named Entity Recognition (NER) for rabbis, places, and other technical terms, with links to Wikipedia entries.7 Visually indicate recognized names in the text.8

Side-by-Side Bilingual View

Hebrew and English aligned horizontally9
Rich text formatting preserving bold and italics10
Optimized dynamic layout, updates dynamically based on device/window aspect ratio (desktop/tablet/mobile), with customization options11

URL Structure & Navigation

Unique URL for every daf/amud/section (e.g., Sanhedrin.90a.2), as well as for ranges of sections, for deep linking, mirroring Sefaria
Topic-based navigation and section headers12

Integration with Existing Resources

Link back to Sefaria page for additional resources, commentaries, and cross-references
Modular design to eventually support Mishna, then Bible

Contextual Metadata & Wikification

Topic labeling for sections and pages13
"Wikification" of key concepts with links to relevant reference materials: Keyword/concept tagging per page with links to Hebrew and English Wikipedia entries.14 Query the Wikipedia API to fetch the relevant pages.15

Semantic Highlighting

Identify and color-code halakhic vs. aggadic content per sugya.16
Display section summaries, summary tables, and labels (e.g. “Story about Elijah,” “Law of Apostates”), aided by queries to an LLM

ChavrutaAI bot - personalized LLM-assistance

An on-page chatbot will provide a wrapper for questions to an aligned LLM (for alignment/custom instructions for a general LLM, see the upcoming Part 2), for customized assistance for the user/”learner”
Labels, overviews/background info/intros, summary tables, and various other visualizations, aided by LLMs.17

Additional possible longer time-horizon ideas, esp. Customization

Allowing extensive customization in fonts (both Hebrew and English) and other display
More granular customization of preferred level of explanation
Customization of terminology in translation/interpretation (terminology used, amount/type of Hebrew-to-English, type and amount of transliteration/romanization
Audio and video features.18

Academic/ scientific / scholarly

Providing an interface for advanced search.19
Manuscript and other variants (= גירסאות, שינויי נוסח): Moving past the traditional edition (the 19th century ed. Romm-Vilna), and incorporating manuscript and other historical attestations
Source criticism (= רבדים): separating by historical layers (tannaitic; named statements; Stam)
Even more speculative is AI-generated scholarship (חידושים).20

Inspiration & Models

Work done by Prof. Menahem Katz

The work done by Prof. Menahem Katz.21

Screenshot of sample of ibid., Yerushalmi Yevamot

https://www.talmudyerushalmi.com/talmud/yevamot/001/001

(Note, for all screenshots: where the page is a web app/website, I narrowed the view, to be closer to tablet/mobile view, to be able to show it more easily on a standard aspect ratio Substack reader)

Screenshot of sample of ibid., Bavli Yevamot, Table of Contents (p. 2):

Screenshot of Yerushalmi sugya mapping

https://assets.talmudyerushalmi.com/documents/research/sugyot_map_yevamot_01.pdf

Perseus Project

While the Jewish digital text space contains some major, relatively well-developed and modern resources (esp. Sefaria, Hebrew Wikisource, Al-HaTorah), this project looks outside that ecosystem for additional inspiration from the wider digital humanities field:

Scaife Viewer (Perseus Project)
- Best-in-class model for bilingual Greek-English reading
- Hover definitions, word frequency lists, and deep metadata integration

Screenshot of sample page , in the newer interface

( https://scaife.perseus.org/reader/urn:cts:greekLit:tlg0059.tlg002.perseus-grc2:17/?right=perseus-eng2&highlight=%40%E1%BC%90%CE%B8%CE%B1%CF%8D%CE%BC%CE%B1%CF%83%CE%B1%5B1%5D ):

Screenshot from Perseus’s older interface, illustrating their notable navigation system,

visualized using horizontal bars (chunked by chapter (vs. section). Tabs to the right clicked “hide”:

( https://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.01.0126%3Abook%3D1%3Achapter%3D163

Note: I erased from the screenshot a lot of the surrounding navigation links and meta-data that appear on that page, as it’s mostly irrelevant for our purposes here)

Screenshot of ibid., showing “Notes” opened (“focused”):

Screenshot showing opened-up “[original] Greek” and “Places (automatically extracted)”

Digital Dante (Columbia University)

Digital Dante (Columbia University)

Screenshot of sample page, tab “Text & Translations”

( https://digitaldante.columbia.edu/dante/divine-comedy/purgatorio/purgatorio-6/ > tab “Text & Translations”):

Screenshot of sample page,

Ibid., tab “Commento Baroliniano” (summary/overview of the passage under discussion):

Screenshot of Table of contents:

https://digitaldante.columbia.edu/commento-baroliniano/

See my previous piece on the topic here: Proposal for ChavrutAI: An AI Chavruta for Studying Talmud.

“From Print to Pixel: Digital Editions of the Talmud Bavli” (June 5, 2023). Reposted at my Academia page here.

The script for this has already been developed and the web app is live, see Part 2.

Modernization of style, more accurate / precise translation, changing spelled-out words to Arabic numerals, etc

I.e. modern punctuation—not nikud—which, in the Hebrew/Aramaic original formatting of the Sefaria edition of the Talmud, is entirely missing from Bible verses; unlike in the rest of the original Hebrew/Aramaic text, and in the English translation/interpretation.

As currently exists for Sefaria; use JavaScript to detect text selection within the Hebrew/Aramaic text panes; with Jastrow text modernized via special Jastrow processing script - have Python script in Google Colab notebook for this.

See my piece on this at my Academia page.

Much of the initial work on gazetteers--lists of known names--has already been done, see my extensive, maximalist list of personal names extracted from the Talmud (3000+ unique names).

E.g., subtle underline, as already done by Sefaria.

However, Sefaria’s implementation leaves much to be desired.

A major aspect that can be improved upon is the number of entities recognized, as well as the information provided for each recognized entity.

For all of this, as well as definitions, in previous, and “Wikification”, see next, a customized robust semantic identification and disambiguation scripting solution will be required, to identify the correct semantic sense.

As offered by Sefaria, Scaife, and Digital Dante, see below. However, the former two have blocks of text; it's desirable to have further intelligent splitting into “paragraphs” by clause, at least as an option; see earlier.

Bolding used by ed. Steinsaltz to differentiate between translation vs. gloss/interpretation; while italics indicates a transliteration, as is standard.

Sefaria has customization options, but there's much that can be improved upon, with current tools.

As well as breadcrumbs. See my mock-up in Part 2.

Compare Scaife’s navigation UI, using horizontal bars, see below.

Important models for splitting into named sugyot are Prof. Menachem Katz’s “Menachem’s Notebooks” (מחברות מנחמיות - see citation in a later note) and the volumes of the Talmud HaIgud, under the editorship of Prof. Shamma Friedman. (On this project, see my overview in my “Guide” (citation in a later note).

It should be pointed out that the ed. Steinsaltz at Sefaria does have some form of splitting into sugyot, using the section symbol (‘§’).

However, I don’t believe that that splitting is especially useful, especially since it’s given with no title for the sugya, and it overall still seems fairly arbitrary.

In general, the topic of the “stream of consciousness” /associative style of the Bavli is a major one, that I hope to discuss further.

For now, see my recent piece where I explored this somewhat via a case study: Selection of Men for Intercalation of the Hebrew Calendar and Stories of Emotional Self-Sacrifice to Protect Others From Embarrassment (Sanhedrin 10b-11a), see there “Appendix 3 - Making Talmudic Aggadah More Accessible with AI: A Case Study“.

Compare Scaife's “Word List” box, see screenshots below.

Caching the major ones; see my list at my Academia page of top 100 relevant entries.

Currently can do this via proxy of word density per page; I’ll demonstrate this in an upcoming piece.

Of course, the issue of hallucinations is a serious one (currently, and for the foreseeable future), so a prominent disclaimer will be in order.

I.e. shiurim , via text-to-speech and text-to-video.

See my notes on this in my piece “Pixel”, as well as various other pieces at my Academia page.

As an aside, one of the major challenges is getting non-English words to be pronounced correctly.

Compare Dicta and Sefaria search--while somewhat advanced, the options are still relatively limited, compared with what can be done relatively simply with regex.

Compare Scaife’s search, which is fairly advanced, while still having a straightforward UI.

Other Ideas for search: by tradent/speaker/Tanna/Amora, linguistic search aware of Hebrew/Aramaic roots and morphology, by topic or halachic vs. aggadic.

Compare Avi Schmidman’s work on programmatic disambiguation and spell check for rabbinic Hebrew, with machine learning (BERL), and the recent work of Satlow et. al. extending that to comparing word senses between Talmud Bavli and Yerushalmi.

This doesn’t currently seem to be relevant.

Compare discussions on reasoning.

However there are those claiming that Chatgpt4’s recently release o3 model may be capable of true reasoning.

This includes a number of projects of Katz’s, that he’s worked on together with Hillel Gershuni:

The monumental “Hachi Garsinan” website, which gathers all the major versions of the Talmud Bavli, and has already been a fundamental tool in this field for many years. See my overview of this resource in my “Guide to Online Resources for Scholarly Jewish Study and Research” (at my Academia page), p. 18.

Katz’s formatting of Talmud Bavli. See the links listed in my “Guide”, ibid., p. 15. On that, see my short review in my piece “Pixel” (cited earlier), as well as pieces a while ago where I made initial attempts to programmatically emulate it.

Katz’s ongoing monumental project on Talmud Yerushalmi (I plan on reviewing that project soon).

https://www.talmudyerushalmi.com/

See some of this other relevant previous work here, relating to Talmud and digital humanities:

https://www.talmudyerushalmi.com/resources

Especially see a combined splitting into sugyot of Bavli tractate Yevamot:
https://assets.talmudyerushalmi.com/documents/research/bavli_yevamot.pdf

Bavli tractate Ketubot:

https://assets.talmudyerushalmi.com/documents/research/bavli_ketubbot.pdf

See also his work on Tosefta tractate Yevamot:
https://assets.talmudyerushalmi.com/documents/research/tosefta_yevamot.pdf

Discussion about this post

Ready for more?