A Prospectus for a Large Language Model (LLM) to Facilitate Researching Rabbinic Onomastics
Nomenclature Networks: An LLM Approach to researching the etymology, history, and use of proper names in Chazal. Revolutionizing Talmudic Research with AI and Machine Learning
This prospectus is a work in progress. It is partially based on the research in a previous study of mine: “From Abba to Zebedee: A Comprehensive Survey of Naming Conventions in Hebrew and Jewish Aramaic in Late Antiquity”. The current post is crossposted at my Academia.edu page (requires registration).
Objective
To build a large language model (LLM) to assist with researching a comprehensive survey of onomastics in Hebrew and Jewish Aramaic in Late Antiquity. The large language model (LLM) will include the entire corpus of Mishnah, Talmud, and Midrash.
Suggested approach
**1. Data Collection and Preparation:**
Prepare corpus of text of Talmud, Mishnah, and Midrash. Organizations who have transcribed primary text corpuses:
Academy of the Hebrew Language
Sefaria (open-access)
Wikisource (open-access)
**2. Named Entity Recognition (NER):**
A Named Entity Recognition (NER) system that can identify specific categories of names and related terms in the text. This would involve training a model to recognize and categorize phrases according to rules.
This could be done by fine-tuning a transformer-based model (like GPT-4) on a labeled dataset where instances of these categories in the text have been manually tagged. This can be a time-consuming process, but there are NLP libraries (like SpaCy or Hugging Face's Transformers) that can help.
The Named Entity Recognition (NER) system will encompass a broad scope of tags: generations, naming conventions, and surname origins.
On generations of Tannaim and Amoraim, see the Wikipedia templates on Tanaim, Amoraim of Eretz Yisrael, and Amoraim of Babylonia.
It will identify variations and diminutives of given names, honorifics, mononyms, pseudonyms, and epithets. The NER will also explore how different people may share the same name and the concept of aptronyms.
Further, the NER will enable analysis of naming conventions for anonymous individuals and multiple individuals, along with descriptors for these groups. It addresses the use of placeholder names and demonyms.
The NER will enable a detailed study of honorifics (such as "Abba" "Imma", "Pappa", "Mar", and others) and its various combinations with given names, other honorifics, toponyms, occupations, and physical traits.
Another focal point will be patronymics, matronymics, papponymics, teknonymy, and other familial relationships. The exploration of surnames also encompasses toponymic surnames, occupational surnames, and non-Semitic loanwords, among others.
Several patterns of surnames will be analyzed by mishkal (word formation pattern in Hebrew and other Semitic languages), such as katal, katlan, kotel, and katol. The NER will also includeg titles and surnames related to personal status, personality, and physical traits, as well as surnames with unclear etymology.
**3. Contextual Understanding:**
The model should be trained to understand context to differentiate between different usages of the same word. For instance, "Abba" can be a given name, an honorific, or a part of a toponym or occupation. The model should understand the context to categorize it correctly.
**4. Building a Searchable Database:**
Once the model has been trained and can accurately recognize and categorize the entities, you could use it to scan through the entire corpus and build a database of instances of each category. This database could then be made searchable, allowing users to quickly and easily find all instances of a particular category.
This will provide a leap up from the current major open-access search tools for Talmudic literature:
Sefaria
Dicta
**5. User Interface and Deployment:**
An interface for users to interact with this system. This could be a web-based interface where users can select a category and see a list of all instances, or it could be more complex, allowing users to enter queries or use other search parameters.
**6. Evaluation and Iteration:**
The system should be continuously evaluated and improved based on feedback from users and experts. This might involve adding new categories, improving the NER model, or making the interface more user-friendly.
It's worth noting that, due to the nature of the texts and the languages involved, there may be significant challenges in terms of language variability, ambiguity, and the availability of digitized and annotated resources.
Bibliography
Traditional
סדר תנאים ואמוראים (חובר בתקופת גאונים או בתקופת ימי הביניים המוקדמים)
ר' שרירא גאון, איגרת רב שרירא גאון (סביב שנת 987)
ר' יהודה בן קלונימוס משפיירא, יחוסי תנאים ואמוראים (מחצית השנייה של המאה ה-12)[1]
ר' אברהם זכות, ספר היוחסין (1566, השלם: 1857)
ר' יחיאל היילפרין, סדר הדורות (1769)
ר' יעקב צבי יאליש, כנוס סופרים : הנקרא בית ועד לחכמים : תולדות תנאים ואמוראים, סדר יחוסיהם, משפחותיהם, תלמידיהם (1884/1975)
ר' יצחק אייזיק הלוי, דורות הראשונים (1897 - 1918)
ר' אהרן היימן, תולדות תנאים ואמוראים (1910)[2]
ר' ראובן מרגליות, לחקר שמות וכינויים בתלמוד (1960)
ר' אברהם אורנשטיין, אנציקלופדיה לתוארי כבוד בישראל: אוצר תארים וכנויים לסוגיהם, מימי המקרא עד סוף ימי הגאונים, 1960[3]
מרדכי מרגליות (עורך), אנציקלופדיה לחכמי התלמוד והגאונים, (מהדורה מחודשת בידי יהודה איזנברג, 2006)
רפאל הלפרין, אטלס עץ חיים (תש"מ), תנאים ואמוראים א-ב[4]
ר' שלמה בניזרי, סדר הדורות הקצר (2015), עמ' 49 ואילך, עמ' 338 ואילך.
Aspaklaria > "ר"
Academic books - biographies
צבי גרץ, דברי ימי ישראל, כרכים א-ג (1855/1916)
אייזיק (יצחק) הירש וייס, דור דור ודורשיו (1871-1891)
בנימין זאב בכר:
אגדת אמוראי בבל (1878)
אגדת התנאים, תורגם מגרמנית על ידי א"ז רבינוביץ (תרפ"ב)
ערכי מדרש, תורגם מגרמנית על ידי א"ז רבינוביץ (תרפ"ג)
אגדת אמוראי ארץ ישראל (7 חלקים). תורגם מגרמנית על ידי א"ז רבינוביץ (תרצ"ח)
חנוך אלבק:
מבוא למשנה (תש"ך), פרק תשיעי[5]
מבוא לתלמודים (1987), פרק שישי
Academic papers - digital tools
Menachem Katz, "Introduction to Personalities in Rabbinic Literature” (2005)
Michael Satlow et al.:
“Naming Rabbis: A Digital List” (2017)[6]
“The Rabbinic Citation Network”, AJS Review (2020).
“AllNameReferences” (Excel file, 2020)[7]
Maayan Zhitomirsky-Geffet et. al, “SageBook: toward a cross-generational social network for the Jewish sages’ prosopography”, Digital Scholarship in the Humanities, 34(3) (2019), pp. 676–695.
Josh Waxman, “A graph database of scholastic relationships in the Babylonian Talmud”, Digital Scholarship in the Humanities, Volume 36, Issue Supplement_2, (October 2021), Pages ii277–ii289.
Open-access biographies
אוצר הדמויות (של ארגון "בונייך", תקליטור)[11]
דמויות מהתלמוד | טקסטים ודפי מקורות מן התורה, התלמוד וספריית המקורות של ספריא.
[1] "הספר היה ספון בכתב יד ולא התפרסם במשך מאות שנים, עד המאה ה-19. המשך כתב היד נדפס מאוחר יותר בשם ערכי תנאים ואמוראים."
[2] See Katz, “Introduction”, p. 15 for critique of Heiman’s work.
קטגוריה:תולדות תנאים ואמוראים – ויקיטקסט :
1218 entries indexed, as of 10-May-2023.
[3] ארבע כרכים. (נמצאים בהיברוקבוקס כרכים אלה: שני ; שלישי ; רביעי. כרך ראשון ב"אוצר החכמה" כאן)
Contrary to the title, this book is on all titles, nicknames, rabbinic occupations, and the like, not only honorific ones. This is pointed out by Ornstein himself in the intro vol. 4. In addition, it is primarily focused on titles used in Hazalic literature. However, based on the entries, Ornstein defines “titles” extremely broadly, and thus many of the terms are quite outside the scope of this study. The book discusses a very wide variety of words that are used to describe rabbis, and descriptive terms of rabbinic occupations in general. As well as a wide variety of Talmudic superlatives used for rabbis and students. Also a variety of names of social groupings. Presumably, the kavod in the title is to exclude negative epithets, such as rasha. However, some insults are indeed discussed, see in footnote later, in section “Epithets”.
[4] See Katz, “Introduction”, p. 16 for critique of Halperin’s work.
[5] See Katz, “Introduction”, pp. 15-16 for critique of Albeck’s work.
[6] “[T]here are somewhere in the range of 5,000 individual rabbis named in the literature surveyed by [Heiman].”
[7] A pivot table of unique names shows that there are 2299 unique names in Satlow et. al’s file.
[8] 183 entries indexed, as of 22-Apr-2023.
[9] 355 entries indexed, as of 22-Apr-2023.
[10] קטגוריה:אמוראי בבל – המכלול :
303 entries indexed, as of 10-May-2023.
[11] Zhitomirsky-Geffet, “SageBook”: “A review of some of these studies can be found in the introduction to an important bibliographical project called: "Biographical Data Base for Rabbinic Literature", which was published by "Bonayich Educational Services” by Pinchas Heiman (2001) - the database including almost 3,000 names found in Rabbinic literature from the period of the Mishna and Talmud. ”