Some Technical Notes on Sefaria Relating to Talmud: The Missing API Manual, ‘Grounding’ LLMs, and Emulating and Enhancing the Outlines of Talmud Tractates Chapters and Pages
In Part 1 of this piece, I discuss Sefaria’s API, as it relates to fetching text; especially the original Hebrew text and translations of Talmud, Mishnah, and Bible.
Sefaria has made their API remarkably open. Unlike most APIs that require authentication keys or complex setup, Sefaria's can be accessed directly through any web browser.
This accessibility is very helpful for a number of reasons. One major implication is that AI systems can query these texts just as easily as humans can. The implications are profound: instead of AI systems vaguely trying to recall (and often hallucinating) training data about Jewish texts, they can access the authoritative sources directly.1
In Part 2 of this piece, unrelated to previous, I briefly note my recent work on emulating and Enhancing Sefaria’s Outline Talmud Chapters and Folios (as part of my prep work for ChavrutAI).
Outline
Outline 1
Part 1: Sefaria API as it relates to fetching the original Hebrew text and translations of Talmud, Mishnah, and Bible.
How the API Actually Works
Critical Detail: Retrieving micro-level Talmud/Mishna Sections and Bible verses (vs. macro-level Talmud pages/chapters)
Another Critical Detail: Zero-Indexed Sections
Real-World Applications; What is "Grounding" and Why Does It Matter?
Part 2: Emulating and Enhancing Sefaria’s Outline of Talmud Chapters and Folios
Outlining Talmud Chapters and Folios: Enhancing the Interface
Part 1: Sefaria API as it relates to fetching the original Hebrew text and translations of Talmud, Mishnah, and Bible.
Consider this practical example: when an AI system needs to reference a specific Talmudic passage, it can query https://www.sefaria.org/api/texts/Berakhot.2b and receive both the original Hebrew text and English translation in a structured format. This eliminates the guesswork and potential errors that come from relying on pre-trained knowledge.
How the API Actually Works
The basic technical structure of Sefaria's API is straightforward: https://www.sefaria.org/api/texts/{reference}.{Talmud Page/Mishnah or Bible Chapter}
, where the reference follows specific patterns for different types of texts.
For Talmud texts, the format is {Tractate}.{Page}
. Examples include:
Berakhot.2a
for Tractate Berakhot, page 2aShabbat.15b
for Tractate Shabbat, page 15bSanhedrin.99a
for Tractate Sanhedrin, page 99a
For Bible texts, use {Book}.{Chapter}:
Genesis.2
for Genesis chapter 2Exodus.15
for Exodus chapter 15
For Mishnah texts, the pattern is Mishnah_{Tractate}.{Chapter}
:
Mishnah_Peah.2
for Mishnah Peah, chapter 2
The API returns a JSON object with two key arrays: text contains the English translation (for Talmud, this is the Steinsaltz translation and commentary), while he contains the original Hebrew text with vowel points (nikud). These arrays correspond by index—element 0 in each represents the same passage.
Critical Detail: Retrieving micro-level Talmud/Mishna Sections and Bible verses (vs. macro-level Talmud pages/chapters)
The URL itself doesn't change for different sections—https://www.sefaria.org/api/texts/Berakhot.2b returns the entire page. You extract specific sections from the returned arrays, not by modifying the URL.
This is fairly confusing (AI chatbots often make this mistake), since the Sefaria website allows linking to a specific section by adding an additional parameter, e.g. https://www.sefaria.org/Berakhot.2b.2 (notice: this is the main website, not the API; the API is accessed via the path ‘/api/texts/
’); however, this does not work in the API URLs.
Screenshot, in Chromium desktop browser, of first two section of English:
Screenshot of first two sections of Hebrew:
Screenshots of prettified JSON response:2
So, key fields (of the JSON object response):
text
- Array of English (translation) text segmentshe
- Array of (original) Hebrew text segments
Another Critical Detail: Zero-Indexed Sections
The API returns all sections of a page in arrays, but these arrays are zero-indexed like most programming structures.
This means that to access what's referred to on the Sefaria website as (for example) "section/verse 2", you need to slice text[1]
from the response.
Real-World Applications; What is "Grounding" and Why Does It Matter?
The concept of "grounding" has become central to making AI systems more reliable and trustworthy. See this definition of “grounding” , in this 2023 article (“Grounding LLMs | Microsoft Community Hub”):
Grounding is the process of using large language models (LLMs) with information that is use-case specific, relevant, and not available as part of the LLM's trained knowledge. It is crucial for ensuring the quality, accuracy, and relevance of the generated output.
While LLMs come with a vast amount of knowledge already, this knowledge is limited and not tailored to specific use-cases. To obtain accurate and relevant output, we must provide LLMs with the necessary information. In other words, we need to "ground" the models in the context of our specific use-case.
For example, see this convo (in a ‘Claude’ project, where I gave exact documentation for how to properly access the Sefaria API, as I detailed above):
https://claude.ai/share/adaf81b0-fb64-4086-b4a4-9c06f36704c9
In simpler terms: instead of an AI system trying to remember what it learned about the Talmud during training (with all the potential for errors that entails), it can look up the actual text in real-time from an authoritative source. This creates a form of computational citation where every claim can be traced back to its source.
This is particularly important for scholarship, where precision matters enormously. A misquoted passage can completely change the meaning. When AI systems can access Sefaria directly, they can provide not just summaries or paraphrases, but actual verbatim quotations with proper attribution.
The difference is transformative. Instead of asking an AI "What does the Talmud say about X?" and hoping for an accurate response, you can have an AI that retrieves the relevant passages, presents them in context, and allows you to verify the information yourself. The AI becomes a research tool rather than just a generator of potentially unreliable information.
The practical applications of this grounded approach are already emerging. AI systems can now:
Verify citations: When someone claims a text says something specific, an AI can check the actual source
Provide context: Instead of isolated quotes, AI can retrieve surrounding passages that clarify meaning
Cross-reference texts: AI can quickly locate parallel discussions across different tractates or books
Maintain accuracy: By accessing authoritative sources rather than relying on training data, AI reduces the risk of propagating errors
This enables AI research assistants that can retrieve and analyze passages with scholarly precision.
Part 2: Emulating and Enhancing Sefaria’s Outline of Talmud Chapters and Folios
Outlining Talmud Chapters and Folios: Enhancing the Interface
This is a continuation of my research discussed in yesterday’s piece: “Mapping the High-Level Hierarchical Structure of Classical Hebrew Texts: A Case Study of Graphing Talmudic Chapters and Word Counts“.3
Sefaria’s outline of tractate Shabbat: https://www.sefaria.org.il/Shabbat.
Hebrew:
English:
Screenshot of my mock-up: tractate Shabbat (note: the data may not be fully accurate, it hasn’t been fully checked):
Sefaria’s visualizations are similar to mine, but I believe that my visualization has certain advantages (this is a mock-up; simply food for thought):
Sefaria shows only chapter names with no quantitative information. Our tool explicitly displays folio counts and page counts for each chapter ("18 folios • 36 pages")
Interestingly, Sefaria’s Hebrew Outline doesn’t include chapter numbers (which the traditional printed edition has, at the top of every page (Chapter 1,2, etc), only Sefaria’ English Outline does.4
Possible other improvements would be adding a synopsis of each chapter; Sefaria does this for each tractate, under “About This Text”.
Using ‘web search’, which is a relatively new feature for many AI chatbots. For example, see this announcement from Anthropic on Mar 20, 2025: “Claude can now search the web“. As noted there, ‘web search’ only fully became available on all Claude plans on May 27 (i.e. a week ago).
Another good source for static Talmud text is Hebrew Wikisource.
To clarify here: Sefaria itself loads its texts by querying its own API through Javascript, which often results in noticeable delays and sluggish performance. For that reason (it being Javascript), LLM ‘web search’ can't access the content (and often ends up finding Sefaria source sheets, to quote from).
Unfortunately, even after extensive tweaking of prompts and custom instructions in the chatbot interfaces, Claude and Chatgpt, even the most advanced models, won’t consistently query these sources.
However, it’s still early days, and I believe this will become feasible in the next year or two, in the regular chatbots.
I haven’t extensively tested this via Anthropic and OpenAI api.
Copy-pasting the text into here: https://jsonformatter.org/json-viewer.
Compare also how I process Sefaria’s API Talmud text here: https://github.com/EzraBrand/talmud-nlp-indexer/tree/main/data.
Same as there, the foundation of this visualization project is my previous analysis of the 300+ chapters of the Babylonian Talmud: "Bavli By the Numbers: Word Counts of All Chapters in Talmud Bavli".
It’s worth noting that previous to the first printed edition of Talmud (the 16th century Venice-Bomberg edition), all citations were to chapter names/numbers.
As opposed to citations to page numbers (2a, 2b, etc), that are based on the pagination of the first edition, as I’ve noted a number of times in related pieces.