The Rabbis of Talmudic Eretz Yisrael: Automated Personal Name Extraction from the Talmud Yerushalmi
As with the Bavli, I wanted to know: who are the most-cited figures in the Yerushalmi, and how does that compare to the Bavli?1 The same automated approach I used for the Bavli — a regex that matches names in their surrounding context of honorifics and patronymics — gave the answer, with additional extensive modifications.
The Data
I used Heinrich Guggenheimer’s English translation of the Jerusalem Talmud, published by De Gruyter between 1999 and 2015, which is available open-access on Sefaria.2 The full text is downloadable in bulk from Sefaria’s public export repository,3 organized by tractate. I wrote a script that downloads one JSON file per tractate, strips HTML formatting, and runs the regex against every segment of translated text.
The corpus covers all 39 tractates of the Yerushalmi; 12,243 text segments, covering roughly the entirety of the Yerushalmi. The script ran in under a minute.
The Results
Across all 39 tractates, the script found ~3k distinct name strings representing ~50k total name occurrences.
Before discussing who’s at the top, a note on counting: one sage can appear under many surface forms — “Rebbi Joḥanan” and “R. Joḥanan” are the same person, and Guggenheimer spells Zeira’s name in over four different ways across the corpus (”Zeïra”, “Zeˋira”, “Ze’ira”, “Ze`ira”).4
The top 20 most-cited figures:5
The dominant figure is R’ Yoḥanan — the major third-century Amora who heads the Tiberias academy and whose teachings saturate the Yerushalmi (as well as the Bavli).
R’ Ze’ira — counted across his large number of spelling variants — totals roughly 1,650 appearances, which would place him in the top five once consolidated.
Other figures and groups that are far more prominent in the Yerushalmi than in the Bavli are R’ Yonah (רבי יונה), R’ Ḥaggai (רב חגא), and “the rabbis of Caesarea”.6
Outliers
Mononyms — sages known by a single name or title (e.g., Shmuel, Rav [=Abba Aricha], “Rabbi” [=Rabbi Yehuda HaNasi]) — are problematic with this method, as I’ve discussed elsewhere.7 The regex works by detecting a name in context, requiring either an honorific prefix (”Rebbi”, “R.”, “Rabban”) or a patronymic connector (”ben”, “bar”). So a name like “Rav” in running text is undetectable without additional logic.8
The full list of 3,147 name strings, with occurrence counts and example references, is in the appendix. The full data is available as a CSV, in Github.
Appendix 1: Top 100 Most-Mentioned Rabbis in Talmud Yerushalmi, by Count
The names are normalized, broadly using Steinsaltz Talmud translation conventions for transliteration of Hebrew names, and thus rows combined, via the normalized canonical name.9
Appendix 2 - Technical
Relevant files, at the ChavrutAI Github repo:
/scripts/yerushalmi-names-redacted.txt (for manual QA, to allow manual spot-checking that don’t have missing major recurring names)
The Script
The extraction script is at the ChavrutAI Github repo, at scripts/extract-yerushalmi-names.ts, written in TypeScript and run with npx tsx. It:
Reads
shared/data/yerushalmi-shapes.jsonto know which tractates and chapters exist.For each tractate, fetches the Guggenheimer JSON dump from the Sefaria-Export GCS bucket (
gs://sefaria-export/json/Talmud/Yerushalmi/...), caching it locally.Strips HTML tags and entities from each segment, strips double-quoted content (which in Guggenheimer typically marks Bible verse citations), and NFC-normalizes the Unicode.
Collects all regex matches from both patterns across the segment.
Applies greedy longest-match deduplication: all match spans are sorted by start position (ties broken by length descending), and overlapping spans are dropped. This prevents a match like “Simeon ben Laqish” and the sub-match “ben Laqish” from both being counted for the same textual occurrence.
Aggregates counts globally and per-tractate, recording up to three example references per name string.
Writes
yerushalmi-names-results.json,yerushalmi-names-results.md, andyerushalmi-names-results.csv.
The Regex
Two patterns are used, both adapted from the original Bavli patterns in my earlier blog post:
Pattern 1 (honorific-first): matches an opening honorific or relational phrase followed by a capitalized name, optionally followed by a connector and patronymic, and optionally a place name. Examples:
Rebbi JoḥananR. Simeon ben Laqishthe son of Rebbi Abbahu
Pattern 2 (name-first): matches a capitalized name followed by a patronymic connector and another name. Examples:
Simeon ben LaqishJoḥanan bar Nappaḥa
The name token character class is: [A-Z + Latin Extended] [a-z + Latin Extended + apostrophe variants]+. This proved the most consequential part to get right (see Challenges below).
Guggenheimer-specific additions to the honorific list (absent from the original Bavli patterns):
Rebbi— used for Eretz Yisrael sages (vs. Rabbi for Babylonian ones in Steinsaltz)R.— Guggenheimer’s standard abbreviation, appearing ~12,000 timesAll relational variants:
son of Rebbi,bar Rebbi,daughter of Rebbi, etc.
Challenges
1. Customizing honorifics. The original Bavli regex was tuned to the Steinsaltz translation, which uses “Rabbi” and “Rav”. Guggenheimer uses “Rebbi” for Eretz Yisrael sages and “R.” as a universal abbreviation. These two forms together account for over 50,000 occurrences in the Yerushalmi corpus.
2. Decomposed Unicode. Guggenheimer’s text on Sefaria stores the transliteration character ḥ (h-with-dot-below, U+1E25) in decomposed form: ASCII “h” followed by combining dot below (U+0323). Without NFC normalization, the regex character class [ḥ] would not match the decomposed sequence, and names like “Joḥanan” would be truncated to “Jo”. Applied .normalize(’NFC’) to each segment before matching.
3. Four different apostrophes. Guggenheimer represents the Aramaic glottal stop (in names like Zeira, Ze’ira) using four distinct characters depending on where in the text it appears:10
U+2018 (left single quotation mark):
Ze’iraU+02CB (modifier letter grave accent):
ZeˋiraU+0060 (grave accent / backtick):
Ze`iraU+00EF (i-diaeresis / ï):
Zeïra
None of these are standard ASCII. The first three were being treated as non-name characters, causing “Rebbi Ze’ira” to match as “Rebbi Ze” — a truncated non-name that was appearing near the top of the frequency list with ~900 false occurrences. Adding all four variants to the character class resolved this.
4. Quote-stripping collateral damage. The original blog post blanks out content in quotation marks (Bible verse citations) before matching. My initial implementation extended this to single-quoted content using U+2018/U+2019. This correctly strips verse citations but also strips apostrophes inside names — “Ze’ira” becomes “Ze” after the stripper removes everything between ‘ and ‘. Fixed by limiting quote-stripping to double-quote delimiters only (U+201C/U+201D and straight “).
This post follows up on my extensive project on personal names in the Babylonian Talmud (Bavli) using the Steinsaltz translation. Here I apply the same approach — with substantial modifications — to the Jerusalem Talmud (Yerushalmi), using the Guggenheimer translation.
It’s the default Yerushalmi translation in Sefaria, and the one I used also in ChavrutAI, via Sefaria.
Their data dump was recently (a few months ago) moved to a Google Cloud Storage bucket; see instructions at their Github for how to access the new data dump.
I’m also actively working on normalizing names as they appear in the Yerushalmi at ChavrutAI, see the Changelog page, section “April 2026”, sub-section “Talmud English: New Term Replacements”, for current. And see also the note on the technical appendix at the end of this piece.
Consolidated by normalized names. See the longer table in the appendix - top 100 figures, and my note there.
On “the rabbis of Caesarea” (רבנן דקיסרין), see its entry in Michlol and Toldot Tanna’im VeAmora’im.
See especially Michlol ibid., section “רבנן דקיסרי“, my translation (with added formatting via numbered list):
Among the sages who were in Caesarea were:
R’ Abbahu
R’ Ami
Rabban Gamliel V
Rav Safra
R’ Ya’akov bar Idi
R’ Tahlifa of Caesarea
R’ Hanina bar Pappi
R’ Yitzhak Nappaha
R’ Huna b. Ika
Rav Kahana of Pum Nahara
R’ Zerika
R’ Zeira
R’ Yonah
R’ Yose bar Zevida
R’ Hizkiah
R’ Mana
R’ Hanina b. Abbahu
R’ Yitzhak b. Elazar
Some say that R’ Hezekiah and R’ Yitzhak b. Elazar were heads of the academy after Rabbi Abbahu’s death.
This academy is mentioned dozens of times in the Jerusalem Talmud, and it had a decisive influence on spiritual life in Eretz Yisrael. In one source, the Jerusalem Talmud contrasts “the Rabbis of Caesarea” with “the Rabbis here,” meaning the sages of the academy of Tiberias.
According to Saul Lieberman, the Rabbis of Caesarea edited the tractates Bava Kamma, Bava Metzia, and Bava Batra of the Jerusalem Talmud, tractates whose editing many scholars have observed to be different from the rest of the tractates in the Jerusalem Talmud.
This conclusion is based on comparison with statements in other orders that are cited in the name of Rabbi Abbahu and his colleagues, whereas in these tractates they are cited anonymously.
In general, see my preliminary/work-in-progress effort to compare counts and prominence of figures in Bavli and Yerushalmi:
script: /scripts/merge-names-tables.ts
output CSV: /scripts/yerushalmi-bavli-merged.csv
See especially my recent “How Often Are the rabbis ‘Rabbi’, ‘Rav’, and Shmuel Mentioned in the Talmud?“ (Mar 18, 2026).
I’ve added this logic, see the note on the technical appendix at the end of this piece.
Note that the major mononymic figure “Ulla”, common in the Bavli, in the Yerushalmi is referred to with a standard style name, with patronymic: “Ulla b. Yishmael”, see Hebrew Wikipedia, “עולא”. In any case, he seems to be a less prominent figure in the Yerushalmi.
For example:
Guggenheimer’s “Rebbi Joḥanan; R. Joḥanan; Rebbi Yoḥanan; R. Yoḥanan” are all normalized to “R’ Yoḥanan”;
“Rebbi Zeïra; Rebbi Ze‘ira; Rebbi Zeˋira; Rebbi Zeira; Rebbi Zeїra; R. Zeïra; R. Ze‘ira; R. Zeˋira; R. Zeira; Rebbi Ze’ira; R. Zeїra” are all normalized to “R’ Ze’ira”;
“Rebbi Jehudah; R. Jehudah; Rebbi Yehudah; R. Yehudah” to “R’ Yehudah”;
“Rebbi Simeon ben Laqish; R. Simeon ben Laqish; Rebbi Simon ben Laqish; R. Simon ben Laqish” to “R’ Shimon ben Lakish”
“The rabbis of Caesarea; the rabbis of Caesarea; The rabbis of Caesaria; the rabbis of Cesarea” normalized to “the rabbis of Caesarea”
“The House of Shammai; the school of Shammai; The Hause of Shammai; The school of Shammai” to “Beit Shammai”; and “The House of Hillel; the school of Hillel” to “Beit Hillel”
Special: “in the name of Rav; Rav said; in the name of Rab; said in the name of Rab” normalized to “Rav” (=Abba Aricha. Note: there are likely some false positives.)
“Rebbi said; in the name of Rebbi” normalized to “R’ Yehuda HaNasi” (Note: there are likely some false positives.)
“Samuel” normalized to “Shmuel” (note: there are false positives with the biblical figure)
The full raw and corresponding normalized, can be seen at the main table, at the Github CSV table.
In general, Guggenheimer edition at Sefaria is relatively full of a wide variety of inconsistencies, typos, spelling mistakes, and OCR artifacts.


