Introducing a New Index of Biblical, Talmudic, and Medieval Jewish Figures
The Talmud NLP indexer glossary I published earlier this year covers personal names, place names, and key terms found in the Bavli — about 4,900 entries, grounded in occurrences in the Steinsaltz English text. (Now incorporated into ChavrutAI, with a much more user-friendly display: https://chavrutai.com/term-index ) It stops at the boundary of the Talmudic period itself. I wanted to use the same approach to extend this for personal names: backward to the Hebrew Bible and forward through the Geonim and Rishonim, so that a single table spans the full arc of Jewish (or Israelite) figures, thru the end of the Medieval period.
The result is a new index with 908 deduped entries, published as a sortable table. (Based on this Github repo: jewish-figures-index.) I built the entry list by scraping Wikipedia category trees in both English and Hebrew, then resolved each entry to a single Wikidata Q-ID so that a figure appearing in both wikis collapses into one row. Wikidata supplies the biographical fields: father, teachers, students, birth and death date and place.
Distribution
The current distribution across periods is:
Some entries belong to more than one bucket — Hanina bar Hama, for instance, sits on the Tannaim/Amoraim border — which is why the column sums slightly exceed 908.
Coverage of biographical fields drops off as you go further back in time, which is what one would expect. About 215 entries have a Wikidata birth date and 216 have a death date, concentrated heavily among the Geonim and Rishonim where dates are documented. The Tannaim and Amoraim mostly have only relational information (father, teachers, students). The Biblical figures are sparser still; for many of them Wikidata records a symbolic or rabbinic-tradition date such as “1813 BCE” for Abraham.
A central design choice was that this is an index of Jewish/Israelite figures, not biblical figures in general. The Hebrew Bible names dozens of foreign rulers — Sennacherib, Nebuchadnezzar, Cyrus, Pharaoh Necho — and even more foreign antagonists from Moab, Aram, Philistia, Amalek. Wikipedia’s “Hebrew Bible people” category tree includes all of them. I filter them out at the row-building stage, along with the Hamite founders of non-Israelite nations (Canaan, Mizraim, Cush, Nimrod), Esau and his line (Eliphaz, Reuel, Amalek), Ishmael and his sons (Dumah, Raamah), and figures from non-Jewish religious traditions who leak in via category overlap (Zoroaster, Mazdak, Tiresias). The exact list is in the NON_NAME_TITLES set in the scraper, and every dropped entry is logged to cache/skipped_non_names.json so the filter can be audited.
The other major filtering pass removes entries that are not personal names at all. Wikipedia categories about people tend to include topic articles (”Binding of Isaac”, “Reconciliation of Jacob with Esau”, “Responsa of the Geonim”), texts (Mishnah, Tosefta, Baraita on the Erection of the Tabernacle), books of the Bible, lists (”List of minor biblical figures, A–K”), tribes and nations (Ammonites, Hittites, Egyptians), places (Babylon, Edom, Hebron), and the occasional modern intrusion (the moshav Beit Gamliel; the youth movement Bnei Akiva). These all come out via a combination of an explicit title list and regex patterns.1
Screenshots (by category, in chronological order of period)
Biblical:
Tannaim:
Amoraim:
Geonim:
Medieval:
Appendix 1 - Technical
The full source is at https://github.com/EzraBrand/jewish-figures-index under MIT.
The scraper is scrape.py, a single Python file using the requests library. It runs in six phases, each cached as JSON in a cache/ directory so re-runs only redo what changed:
Category members. For each seed category in SEEDS, fetch all
mainspacepages via the MediaWiki Action API(list=categorymembers&cmnamespace=0). For umbrella categories flagged with expand=True, also fetch one level of subcategories and treat each as an additional source — this is what picks up the 30-odd Book-by-book subcats underCategory:Hebrew Bible peopleand the 11th–15th-century rabbi subcats underCategory:Rishonim.Title → Q-ID resolution. Batch 50 titles per call to
prop=pageprops&ppprop=wikibase_item&redirects=1. The redirects parameter is important; many biblical figures are accessed via redirects.Dedupe by Q-ID. Build
by_qid: {qid → {labels: set, src: dict}} so that a figure in both English and Hebrew categories yields one row with both source buckets joined.Wikidata entity fetch. Batched wbgetentities calls (50 IDs per call), requesting
labels|sitelinks|claimsfor languagesen|heandsitefilter enwiki|hewiki.Referenced-entity labels. The father (P22), teacher (P1066), student (P802), birthplace (P19), and deathplace (P20) claims are themselves Q-IDs, which need to be resolved to display labels — a second pass of batched
wbgetentitiesfor the QIDs referenced by all entries combined.Filter and write. Apply the non-name blocklist, write names_extended.csv, and bake the CSV into a
<script type=”text/plain” id=”csv-data”>block inside index.html so the file works when opened directly from disk without a server.
Methodological caveats
A few methodological caveats are worth noting. The scope is the union of a curated set of Wikipedia category seeds, expanded one level into subcategories where the umbrella category has no direct members. That makes the coverage uneven: the Tannaim are well-represented because Hebrew Wikipedia has a deep category tree for them, while the Amoraim are sparser on the English side. The Wikidata enrichment depends entirely on what has been recorded in Wikidata; for less prominent figures, especially Geonim and earlier Tannaim, the father/teacher/student fields are often empty.
Concretely, the current Geonim count of 52 is dominated by the English Wikipedia category; only four Geonim are tagged in the Hebrew Wikipedia category for גאונים, which suggests that the Hebrew side relies more on per-individual articles without consistent category membership. This is the kind of gap the next iteration would address by querying Wikidata’s instance of property rather than relying on Wikipedia categories.
Appendix 2 - Field Coverage (out of 908)
As mentioned in the main piece, coverage is heaviest among the Medieval cohort (where rabbis have well-documented dates and lineages on Wikidata) and lightest among Biblical figures (where dates are largely symbolic and relational fields are often absent).
Appendix 3 - Source Wikipedia Category Seeds
English Wikipedia: Hebrew Bible people (expanded), Torah people, Books of Samuel people, Books of Kings people, Tannaim (expanded), Mishnah rabbis, Amoraim (expanded), Geonim, Rishonim (expanded), Medieval Jewish theologians (expanded), Medieval rabbis.
Hebrew Wikipedia: אישים בתנ”ך (expanded), אישים בתורה, נביאים, מלכי ישראל, מלכי יהודה, תנאים (expanded), אמוראים (expanded), גאונים, ראשונים (expanded).
For example regexes:
^Baraita\b
^List of
^Book of
in rabbinic literature$
Empire$
^\d for year-prefixed events








