Improved Concordance of Biblical Figures in the Talmud

Feb 08, 2026

I built an improved concordance — essentially an index — that maps over 400 Biblical figures to the specific Talmud passages where they appear. The result is a 446-page reference document, available as a PDF. Uploaded to my Academia page, here. Note that this is a new and improved version of my previous effort at this, see my previous discussion in “Biblical Figures in the Talmud: Counts, Contexts, and Index”.1

The major improvement of this version is:

It also includes ed. Steinsaltz commentary (i.e. non-bolded)
It uses the entire natural section as context, following the segmentation of Steinsaltz, rather than the earlier approach, which relied on a fixed number of words before and after.

Screenshot from there, for example (page 6):

How It Works

I started with two inputs:

A gazetteer of 531 biblical figures that appear in the Talmud (in TXT format), from Aaron to Zophar the Naamathite. This includes major figures like Abraham, Moses, and David, but also less prominent names — Achsah, Barzillai the Gileadite, Zelophehad’s daughters. Many names appear in multiple variant forms (e.g., “Joab,” “Joab ben Zeruiah,” “Joab, son of Zeruiah”), and each variant gets its own entry.
The full text of the Steinsaltz Talmud in English (in CSV format), covering 81,794 individual sections across all tractates.

The concordance matches each biblical name against the bolded text in each Talmud section. In the Steinsaltz edition, bolded text represents the translated Talmudic source text itself — as opposed to the surrounding commentary and explanation. This means we’re finding places where the Talmud itself names a biblical figure, not just where the Steinsaltz commentary happens to mention one.2

For each name, I record every section where it appears in the source text, and display up to five representative passages. The full count of appearances is always shown, so you can see at a glance whether a name comes up twice or two hundred times.

What You’ll Find

The concordance covers 423 names that appear at least once in the Talmud’s own words (or in verses explicitly quoted). Some results are expected — Moses appears constantly, as do Abraham, David, and Solomon. Others are less obvious. You might not have guessed how often the Talmud circles back to Balaam, or how many different contexts bring up Jephthah.

Each entry shows the passage in full, preserving the formatting of the Steinsaltz translation with its mix of source text (in bold) and explanatory commentary. Every citation links directly to the relevant page on ChavrutAI, so you can jump straight to the full context and continue learning from there.

What It’s For

This is a reference tool — something you can browse, flip through, or consult when you want to see how a particular biblical figure shows up in rabbinic discussion. For anyone who enjoys the Talmud’s habit of pulling a biblical figure into an unexpected argument, it’s a pleasant thing to browse.

Limitations

A few things this project does not do:

It only catches names that appear in the bolded source text of the Steinsaltz translation. If a biblical figure is discussed in the commentary but not named in the Talmud’s own words, that passage won’t appear.
Name matching is based on exact string matching against a predefined list. If the Talmud refers to someone by a nickname or epithet that isn’t in our gazetteer, we’ll miss it.
The five-passage limit per entry is a practical choice. For frequently mentioned figures like Moses or David, the concordance shows a small sample of a much larger set of appearances.

Appendix - Technical

The full code and input texts are freely available at my Github.

The project consists of two Python scripts that run sequentially:

build_concordance.py — Reads the Talmud CSV and the name gazetteer, performs matching, and outputs an HTML concordance.

Input: A CSV file (81,794 rows, one per Talmud section) with rich HTML formatting, and a plain-text gazetteer of 531 biblical names.
Encoding: The source CSV uses Windows-1252 encoding. Literal `?` characters in the data serve as placeholders for the Hebrew letter het(`ḥ`) and are replaced with the proper Unicode diacritics (U+1E25 lowercase, U+1E24 uppercase).
Matching strategy: For each row, bold text is extracted from <b> and <strong> tags. This text is tokenized into words and all n-grams (up to n=10, matching the longest multi-word name) are generated as a lowercased set. The full set of lowercased names is intersected with each row’s n-gram set — an O(1) lookup per name, making the entire scan complete in about 25 seconds across 81K rows.
Output: A self-contained HTML file with alphabetical navigation, highlighted name occurrences within bold text, ChavrutAI hyperlinks for each citation, and text substitutions applied in text nodes only (preserving HTML structure).

html_to_pdf.py — Converts the HTML concordance to a paginated PDF with running headers.

Pass 1: Playwright (headless Chromium) renders the HTML to a base PDF. Print-specific CSS is injected to enforce one entry per page, remove colored backgrounds, and compress text density (9pt body, 8.5pt content, tighter line-height).
Pass 2: pdfplumber extracts text from each page of the base PDF and identifies which name entry appears first on each page by matching against the known list of entry names from the HTML.
Pass 3: reportlab generates a transparent overlay PDF with stamped headers (centered title + entry name, right-aligned page number, decorative rule) on each page. pypdf merges the overlay onto the base PDF.

See especially in the intro and footnotes there.

And see also my recent two-part series based on that piece, on Talmudic etymologies of the names of Biblical figures, final part here.

And see also at ChavrutAI my general index of “Biblical Citations in the Talmud”.

As an aside, unrelated to the topic in the current piece, I’ve made a number of recent updates to ChavrutAI, see the Changelog page, section “February 2026”:

Talmud Text Processing Improvements:

Added 30 cardinal number conversions for large and compound numbers (e.g., “three hundred and sixty-five thousand” → “365,000”, “forty and two thousand three hundred and sixty” → “42,360”, “Five thousand eight hundred and eighty-eight” → “5,888”)
Added ordinal mappings for compound ordinals (e.g., “six hundred and first” → “601st”) and fractional ordinals (”two hundred fifty-sixth” → “1/256th”)
Removed ambiguous ordinal mappings (”hundredth”, “one hundredth”, “two-hundredth”) that could be either ordinal or fractional depending on context

Sugya Viewer Export Options:

Added export buttons for Markdown (.md) and HTML (.html) formats
HTML export retains rich formatting (bold, italics, RTL direction)
Markdown export preserves Hebrew text as bold for readability
Useful for uploading text to chatbot assistants that don’t retain formatting from copy/paste

AI Chatbot Improvements:

Chatbot responses are now more direct and specific, avoiding vague filler phrases
Removed meta-commentary from responses (e.g., “further exploration might be needed”)
Expanded chat input to a multi-line text box (4 rows) for longer questions
Press Enter to send, Shift+Enter for new line

Note that the gazetteer is restricted to names that occur at least once in the Talmud outside of direct biblical quotations. By contrast, the concordance script also scans names within direct quotations. As a result, any name that appears exclusively in quoted biblical verses will be absent from the concordance.

Discussion about this post

Ready for more?