Aramaic Identifier: A Python Script for Identifying Aramaic Passages in Talmudic Texts

Jul 07, 2024

One of the Talmud’s unique features is the interweaving of Hebrew and Aramaic languages throughout its text. In this piece I’d like to introduce a Python script that can automatically identify Aramaic passages within Talmudic texts, opening up new possibilities for linguistic analysis and study.

The Challenge of Talmudic Language

Traditionally, identifying Aramaic passages has relied on the reader's familiarity with both languages. However, with the advent of digital texts and computational linguistics, we can now leverage technology to assist in this process.

Introducing the Talmudic Aramaic Identifier

Compare Dicta’s recent tool for POS tagging, which I review here.

The Python script, which I’m calling the "Talmudic Aramaic Identifier," is designed to automatically detect and highlight Aramaic passages within a given Talmudic text. Here's how it works:

Text Processing: The script takes a Talmudic text file as input.
Aramaic Pattern Recognition: It uses a predefined list of common Aramaic words and phrases to identify potential Aramaic content.
Sliding Window Analysis: The script analyzes the text in chunks, or "windows," to determine the concentration of Aramaic words in each section.
Ratio Calculation: For each window, it calculates the ratio of Aramaic words to total words.
High-Aramaic Passage Identification: Sections with a high concentration of Aramaic words (above a certain threshold) are flagged as Aramaic passages.
Unique Passage Filtering: The script ensures that only unique passages are reported, avoiding repetition.
Output Generation: Finally, it outputs the identified Aramaic passages, along with their location in the text and the Aramaic words found.

Applications and Benefits

This tool has several potential applications in the field of Talmudic studies:

Linguistic Analysis: Researchers can use this tool to study the distribution and frequency of Aramaic usage across different Talmudic tractates or historical periods.
Study Aid: Students of the Talmud can quickly identify Aramaic sections, which often require different interpretive strategies than Hebrew sections. (It’s often understood to be a different layer, referred to as the “Stam”.)
Translation Assistance: For projects involving Talmudic translation, this tool can help prioritize sections that may require specialized Aramaic expertise.
Textual Criticism: By analyzing the patterns of Aramaic usage, scholars might gain new insights into the composition and editing processes of the Talmud.

Technical

The code is in Python.

I executed it in a new Jupyter notebook, specifically in Google Colab.

I used Claude 3.5 to generate the code.

The Talmud text that I used is Sefaria’s ‘Ein Yaakov - he - Daat’, with nikud removed. (On removing nikud, see my piece here.)

Sample output

End index: 66320
Aramaic ratio: 0.96
Aramaic words found: , מאי
Passage:
כוליה לבישו. (מאי טעמא? דאיברו ביה אור וחשך) האי מאן דבתרי בשבא, יהא גבר רגזן. מאי טעמא? משום דאיפליגו ביה מיא. האי מאן דבתלתא בשבא, יהא גבר עתיר וזנאי יהא. מאי טעמא? משום דאיברו ביה עשבים. האי מאן דבארבעה בשבא, יהא גבר חכים ונהיר. מאי טעמא? משום דאתלו ביה מאורות.
Start index: 66650
End index: 66700
Aramaic ratio: 1.08
Aramaic words found: , אמר, מאי, אי, הנך
Passage:
והוו קאזלי הנך אנשי לאגמא. אמר ליה אבלט לשמואל: האי גברא אזיל ולא אתי, דטריק ליה חויא ומיית. אמר ליה שמואל: אי בר ישראל הוא, אזיל ואתי. אדיתבי, אזיל ואתא. קם אבלט, שדי לטוניה, אשכח ביה חויא, דפסיק ושדי בתרתי גובי. אמר ליה שמואל: מאי עבדת? אמר ליה: כל יומי

The code

import re
import itertools
def is_aramaic_text(text):
    aramaic_patterns = [
        r'\b(הוה|הוו)\b', # was, were
        r'\b(קא|קאי)\b', # present tense marker, standing
        r'\b(אמר|אמרו) ליה\b', # he said to him, they said to him
        r'\bאית\b', # there is
        r'\bלית\b', # there isn't
        r'\b(מאי|מנא)\b', # what, from where
        r'\bדקא\b', # that is
        r'\b(אי|אית)\b', # if, there is
        r'\b(הנך|הני)\b', # those
        r'\bגברא\b', # man
        r'\bאיכא\b', # there is
        r'\bביה\b', # in him/it
        r'\bהכי\b', # thus, so
        r'\bניהו\b', # what is
        r'\bלקמיה\b', # before him
        r'\bקמיה\b', # before him
        r'\bשלח\b', # sent
        r'\bחזא\b', # saw
        r'\bאזל\b', # went
        r'\bבעי\b', # asked, requested
    ]
    combined_pattern = '|'.join(aramaic_patterns)
    matches = re.findall(combined_pattern, text)
    matches = list(itertools.chain(*matches))
    total_words = len(text.split())
    aramaic_ratio = len(matches) / total_words if total_words > 0 else 0
    return aramaic_ratio, matches
def analyze_text(text, window_size=50, threshold=0.9):
    words = text.split()
    results = []
    seen_passages = set()
    for i in range(0, len(words) - window_size + 1, 10): # Step by 10 to reduce overlap
        window = ' '.join(words[i:i+window_size])
        aramaic_ratio, matches = is_aramaic_text(window)
        if aramaic_ratio > threshold and window not in seen_passages:
            results.append({
                'start_index': i,
                'end_index': i + window_size,
                'aramaic_ratio': aramaic_ratio,
                'aramaic_words': matches
            })
            seen_passages.add(window)
    return results
def main():
    file_path = '/content/Ein Yaakov - he - Daat - all - no nikud.tsv'
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            text = file.read()
    except FileNotFoundError:
        print(f"Error: The file '{file_path}' was not found.")
        return
    except IOError:
        print(f"Error: There was an issue reading the file '{file_path}'.")
        return
    results = analyze_text(text)
    if results:
        print(f"Unique passages with high Aramaic content (first {min(10, len(results))} results):")
        for result in results[:150]:
            print(f"Start index: {result['start_index']}")
            print(f"End index: {result['end_index']}")
            print(f"Aramaic ratio: {result['aramaic_ratio']:.2f}")
            print("Aramaic words found:", ', '.join(set(result['aramaic_words'])))
            print("Passage:")
            print(" ".join(text.split()[result['start_index']:result['end_index']]))
            print()
    else:
        print("No passages with high Aramaic content found.")
if __name__ == "__main__":
    main()