Finding Rare Voices in the Talmud: A Quantitative Analysis and Concordance
Over the past several years, I’ve been working on a project that sits at the intersection of Jewish learning and computational text analysis.1 I spend a lot of time with the Talmud, and I’ve always been struck by how many individual figures appear in its pages. Some are well known, like Rabbi Akiva or Rava, but many more flicker onto the stage just once or twice. They might teach a single halakha, ask a question, or appear in a story, and then vanish from the textual record.
I became curious about those brief voices. Who are all the minor figures in the Talmud, the ones who appear only once or twice? Could I gather them together and look at what they say, or in what context they appear? Such a list is of couse tedious to produce manually. The Bavli is long, and names come in many forms: sometimes with titles, sometimes with patronymics, sometimes with nicknames or place-names attached.
So I decided to build a concordance automatically.
Outline
Finding Rare Voices in the Talmud: A Quantitative Analysis and Concordance
Appendix 1 - Excerpt
Appendix 2 - Technical
Input Data
Tokenization
Token-Sequence Matching
Extracting Context
Selecting Rare Names
Output
Finding Rare Voices in the Talmud: A Quantitative Analysis and Concordance
I’ve been assembling a list of names that appear in the Talmud, as transliterated by ed. Steinsaltz (at Sefaria).2 This list is a “gazetteer”: a simple text file with one name per line. It’s drawn from an existing project of mine, and it includes thousands of entries. Some are simple (“Rava”), some are more elaborate (“Rabbi Ḥanina bar Ḥama”), and others reflect the Talmud’s narrative range (“a certain butcher,” “the maidservant of Rabbi Yehuda HaNasi”).
Next, I took a plain-text version of the full Babylonian Talmud and wrote a script that would go through the text word by word, looking for every occurrence of every name in the gazetteer. Whenever the script found a match, it pulled out a small amount of context, five words on each side, so I could later read the appearance in something like its natural setting.
Once all the occurrences were counted, I filtered the list down to names that appear only once or twice in the entire text. This gives a different kind of portrait of the Talmud: not a map of the major voices, but a gallery of the quieter ones. Many of these small figures show up in brief narrative moments, while others appear in halakhic chains or as transmitters of a single teaching. It’s a reminder that the Talmud is not just the product of a handful of canonical sages, but a sprawling ecosystem of personalities, households, communities, and students whose words—sometimes just a few—were preserved.
This project is a way of listening more closely to the texture of the text. When a person speaks only once, the line becomes easier to overlook. But gathered together, these rare voices take on their own significance.
Final concordance is here:3
https://docs.google.com/document/d/1gBwgrLPdnMwVVsRLk-GXMufUghVyNbd8zW3q1QjlXuE/edit?usp=sharing
Appendix 1 - Excerpt
First ~100 rows of the text file:4
A certain Galilean
... as it is written: “” A certain Galilean taught before Rav Ḥisda: a ...
A certain Roman officer
... against Rabban Gamliel for execution. A certain Roman officer came and stood in the ...
A certain Sage
... it, he may move it. A certain Sage his first day raised an ...
A certain elderly woman
... will go and greet him. A certain elderly woman said to him: There is ...
A certain heretic named Sason
... as it is written: “” A certain heretic named Sason said to Rabbi Abbahu: You ...
A certain magus
... death, are they all appeased? A certain magus said to Ameimar: From your ...
A certain matron
... were going on a boat. A certain matron [<i>matronita</i>] said to them: Let ...
... were traveling on a boat. A certain matron said to them: Seat me ...
A certain noblewoman
... between them, without touching them, A certain noblewoman [<i>matronita</i>] said to them: Your ...
A certain philosopher
... to sign against My will, A certain philosopher asked Rabban Gamliel: It is ...
A certain pregnant woman
... of that man is redder, A certain pregnant woman smelled came before Rabbi He ...
Abaye bar Abba
... Rather, apart from that as Abaye bar Abba and Rav Ḥinnana bar Abaye ...
Abaye bar Huna
... “” read, about them “” Abaye bar Huna says Rav Ḥama bar Gurya ...
Abaye bar Ravin
... immersed themselves during the day Abaye bar Ravin and Rav Ḥanina bar Avin ...
Abaye, son of Rabbi Abbahu
... consent? with her consent. As Abaye, son of Rabbi Abbahu, taught: “” teaches that he ...
Abaye’s mother
... in accordance with Rabbi Yoḥanan, Abaye’s mother, prepared for him, and he ...
... tradition suffering silence and prayer. Abaye’s mother raised a lamb to accompany ...
Abaye’s tenant farmer
... to marry after fifteen months. Abaye’s tenant farmer came before Abaye He said ...
Abaye’s wife, Ḥoma
... yes, at the outset, no, Abaye’s wife, Ḥoma, came before Rava She said ...
Abba Binyamin
... of wine. It was taught Abba Binyamin say: All of my life ...
... written there, “” was taught Abba Binyamin says: If two enter in ...
Abba Bira’a, son of Rabbi Eliezer ben Ya’akov
... hear, as it is taught Abba Bira’a, son of Rabbi Eliezer ben Ya’akov, says his hand upon the ...
Abba Elazar ben Gimmel
... Gimmel, as it is taught Abba Elazar ben Gimmel says: “” A tenth part ...
... said: Measures. whose is this Abba Elazar ben Gimmel, as it is taught Abba ...
Abba Gurya
... says in the name of Abba Gurya: A person may not teach ...
Abba Guryan of Tzadyan
... and have lost my livelihood. Abba Guryan of Tzadyan says in the name of ...
Abba Mar bar Rav Ashi
... before What is the sides Abba Mar bar Rav Ashi said: Come hear as we ...
Abba Mar, son of Rav Pappa
... for talking similar to this Abba Mar, son of Rav Pappa, would take pans of wax ...
Abba Mari, the Exilarch
... this involving the wife of Abba Mari, the Exilarch, who quarreled Rav Naḥman bar ...
Appendix 2 - Technical
Below is a brief description of how I generated the list and concordance, for readers interested in the computational side.
Input Data
I used two locally stored plain-text sources:
Full Babylonian Talmud text (≈13.8 million characters), plain UTF-8 text.
talmud_names_gazetteer.txt, a gazetteer of ~3.4k entries, one per line.
This list contains names of individuals, titles, nicknames, professions, and descriptive referents found in rabbinic literature.
The goal was to find all occurrences of these names in the Talmud text.
Tokenization
I tokenized the Talmud text using a simple whitespace-based split combined with basic punctuation stripping:
raw_tokens = text.split()
norm_tokens = [strip_punct(tok) for tok in raw_tokens]
This produced roughly 2.5 million tokens.
The gazetteer names were also tokenized into normalized token sequences, allowing multi-word names like:
[”Rabbi”, “Shimon”, “ben”, “Lakish”]
[”Rav”, “Shmuel”]
[”Ben”, “Bag”, “Bag”]
Token-Sequence Matching
For each token position i in norm_tokens, I checked whether any gazetteer name sequence began at that position.
I indexed gazetteer entries by their first token to keep the lookup efficient:
name_patterns_by_first[first_token].append((norm_seq, orig_name, length))
Matches were found via direct equality on normalized token slices:
if norm_tokens[i:i+L] == norm_seq:
occurrences[name].append((i, L))
This approach avoids complex regex overhead and ensures deterministic matching.
Extracting Context
For each match I extracted a fixed-window concordance:
left = max(0, pos - 5)
right = min(len(raw_tokens), pos + L + 5)
snippet = “ “.join(raw_tokens[left:right])
Each snippet shows roughly five words on each side of the name.
Selecting Rare Names
Finally, I filtered names that appeared a total of 1 or 2 times:
if 1 <= len(occurrences[name]) <= 2:
collect the context snippets
This yielded 2,367 concordance entries for names meeting the “rare occurrence” threshold.
Output
The script writes a structured concordance file (talmud_rare_names_concordance.txt), grouping entries by name, with clear separation lines and all contexts preserved.
For my general research on Talmudic names, see my Academia page, section “Talmud Names”. See especially ibid., “From Abba to Zebedee: A Comprehensive Survey of Naming Conventions in the Mishnah, Talmud, and Late Antique Midrash“
And see my three-part summary of that research on this blog: “What’s in a Talmudic Name? Unpacking the World of Personal Names in Talmudic Literature“, final part here.
Compare this previous piece of mine on the general topic of quantitative analysis of names in the Talmud: Scalable Natural Language Processing (NLP) for Named Entities, Topics, and Tags in the Talmudic Corpus (May 04, 2025)
For the actual Talmud list name, see at my github: talmud_names_gazetteer.txt
For Bible names that were excluded, see this list: bible_names_gazetteer.txt
See also the Talmud place names list at that same repo, that I recently updated: talmud_toponyms_gazetteer.txt
For the base Talmud text, see my discussion at: How To Download Sefaria’s Entire Talmud At Once (Apr 09, 2024). For the Hebrew text for this analysis, I stripped out ed. Steinsaltz interpretation (=non-bolded text), and biblical verses (=text within quotation marks).
Of course, it is likely that many of these names are mistakes (i.e. scribal errors, errors in transmission, etc) and variants of other names. This requires a lot of further work.
I also uploaded it to my Academia page: “Talmud Rare Names Concordance“.
Note: the exact location of each of the passages can be found by searching in Sefaria, and filtering for “Talmud”, set for “Exact match”.
For example, the first passage, “A certain Galilean“:
This search:
Finds the exact source, in Sanhedrin.113a.15:
דרש ההוא גלילאה קמיה דרב חסדא:
משל דאליהו, למה הדבר דומה?
A certain Galilean (ההוא גלילאה) taught before Rav Ḥisda:
There is a parable for the actions of Elijah; to what is this matter comparable?


Regarding the topic of the article, I found myself nodding along with the 'names come in many forms' challenge, and I'm genuinely curios if the automated process ever felt like wrangling a particularly stubborn regex, becaus that's what I picture.