Counting Words in the Mishnah and Tosefta: What the Numbers Reveal
I recently completed a project that I think many of you might find interesting, especially if you’re into Jewish texts and numbers. The project is called "Mishnah and Tosefta By the Numbers: Word Counts of Tractates and Chapters", and it’s exactly what it sounds like: an analysis of how many words each chapter of the Mishnah and Tosefta contains.1 I’ve crunched the numbers and the results, available on my Academia page, are pretty revealing about the structure and length of these foundational texts.
First, a bit of context. The Mishnah is a central text of Jewish law, written in terse Hebrew. The Tosefta, a companion work, often expands on the Mishnah’s content with additional material.
See Hebrew Wikipedia, “Tosefta” > “Structural Comparison“, my translation, with adjustments:
Both the Mishnah and the Tosefta contain the same six orders (sedarim, singular seder סדר) and the same sixty tractates, with the exceptions of Avot, Middot,2 Kinnim, and Tamid, which appear only in the Mishnah and not in the Tosefta.
Both are divided into chapters.
However, there are several differences between them:
The division of chapters does not always align.
The order of laws is not always the same.
In some cases, the parallel to a Mishnah is found in a different tractate in the Tosefta.
Sometimes the Tosefta provides parallels to a Mishnah that is found in another tractate of the Mishnah.
Some Mishnayot have double parallels (i.e., appear in two places) in the Tosefta.
Some Mishnayot have no parallel in the Tosefta.
Some sections of the Tosefta stand alone without the statement they are commenting on.
But just how much more is “expanding”? To answer that, I wrote a Python script to calculate word counts for each chapter of the Mishnah and Tosefta, pulling the data directly from the Sefaria API (see the full technical details at the end of this piece). This allowed me to compare the two works quantitatively.
Here’s what I found: the Mishnah has 187,875 words, while the Tosefta clocks in at a hefty 294,241. On average, a Mishnah chapter contains 362 words, compared to 699 words for the Tosefta. In other words, Tosefta chapters are almost double the length of Mishnah chapters.
Some chapters really stood out. The longest chapter in the Mishnah is Sotah 9, with 873 words. But in the Tosefta, the longest chapter is Yoma 2, which hits a whopping 1,571 words. On the other end of the spectrum, Shabbat 4 in the Mishnah and Meilah 3 in the Tosefta were the shortest chapters, with 111 and 104 words, respectively.
To help visualize this, I created a histogram comparing the word counts across the two works. For the Mishnah, most chapters fall in the 300-400 word range, while Tosefta chapters show a broader spread, peaking at around 500-600 words but stretching all the way up to 1,600. This highlights a key difference: while the Mishnah tends to be more concise and consistent, the Tosefta varies much more in chapter length.
One of the most exciting aspects of this project is that it opens the door for further research. For example, I’d love to see if there’s a correlation between the length of Mishnah chapters and the corresponding Talmud discussions. Does a longer Mishnah chapter lead to a longer Talmudic debate? Is there a pattern here? Future projects could dig into that.
For those of you who are technically inclined, the analysis was done using a Python script that processes each chapter by pulling the text from Sefaria, removing Hebrew diacritics, and counting the words. It’s parallelized to speed things up, since there are a lot of chapters to get through. The final output is a data table that lists the word count for each chapter.
If you’re interested in diving deeper into these numbers or playing around with the data yourself, the full script is available at the end of this piece. Let me know if you have any thoughts, or if this sparks any ideas for future projects!
The piece and stats, summarized
The piece is available on my Academia page (registration is required to view, and it's also attached below): “Mishnah and Tosefta By the Numbers: Word Counts of Tractates and Chapters”.
Here are some stats from the analysis of word counts for chapters in the Mishnah and Tosefta:
Total Word Counts:
Mishnah: 187,875 words
Tosefta: 294,241 words
Average Word Count per Chapter:
Mishnah: 362 words
Tosefta: 699 words
Chapters with the Highest Word Count:
Mishnah: Sotah 9 with 873 words
Tosefta: Yoma 2 with 1,571 words
Chapters with the Lowest Word Count:
Mishnah: Shabbat 4 with 111 words
Tosefta: Meilah 3 with 104 words
This shows that, on average, Tosefta chapters are almost double the length of Mishnah chapters.
Chart: Histogram of Word Counts for Mishnah and Tosefta (20 Buckets)
In the histogram, the word counts for the Mishnah and Tosefta are displayed in 20 buckets. Here are some key observations:
Mishnah Word Counts (yellow):
The distribution peaks around 300-400 words per chapter.
Most of the Mishnah chapters tend to be shorter, with a clear concentration in the lower word count range.
The Mishnah's distribution is more skewed toward the left (lower word counts), with a sharp drop-off after 600 words.
Tosefta Word Counts (orange):
The distribution for Tosefta is broader, with more variation in chapter length.
The peak occurs around 500-600 words, but it also extends far to the right, with several chapters having higher word counts (up to 1,600).
Tosefta chapters, on average, are longer than those of the Mishnah, as indicated by the shift in the peak toward higher word counts.
Comparison:
The Tosefta generally has more chapters with higher word counts compared to the Mishnah. While the Mishnah chapters cluster in the lower range (200-400 words), the Tosefta chapters show a broader spread, with many chapters falling between 400 and 1,000 words.
There is significant overlap in the mid-range (300-600 words), but the Tosefta includes many more chapters that exceed 600 words.
(The Y-axis, labeled “Frequency”, means the number of chapters that have this word count.)
Future work
Future work woud be to match this with lengths of Talmudic chapters, and seeing whether there’s a relationship with Mishnah length and Talmud length, and if so, how strong is the correlation.
Technical
This Python script is designed to calculate the word count for each chapter of the Mishnah using the Sefaria API. It handles multiple tractates in parallel, counts Hebrew words after removing diacritics (nikud),3 and compiles the results into a DataFrame. Here's a breakdown of what it does:
Key Functions:
remove_hebrew_diacritics(text): This removes Hebrew diacritics from the text to ensure a clean word count.
count_hebrew_words(text): This splits the input string into words and returns the word count.
process_chapter(tractate, chapter, base_url): This retrieves the text for a specific chapter from the Sefaria API, processes the Hebrew text, counts the words, and returns the results (tractate, chapter, word count, and edition used).
process_tractate(tractate): Iterates through each chapter of a tractate and processes it using process_chapter. Stops processing if it encounters a 404 error.
How it Works
Parallel Execution: It processes all 63 Mishnah tractates in parallel using the ThreadPoolExecutor. Each tractate is processed chapter by chapter (up to 31 chapters).4
Data Collection: Results are stored in a list of dictionaries and later compiled into a DataFrame.
Output:
The total word count for all processed tractates and chapters is calculated.
A TSV (tab-separated values) format output is generated and saved to a file, allowing easy copy-pasting into a spreadsheet.
The Full Python Script
# Import necessary libraries
import requests
import json
import re
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed
# Function to remove Hebrew diacritics (Nikud)
def remove_hebrew_diacritics(text):
# Remove Hebrew diacritics (Nikud) using regular expression
return re.sub(r'[\u0591-\u05C7]', '', text)
# Function to count Hebrew words in a string
def count_hebrew_words(text):
# Split the text by spaces to get words and return the count
words = text.split()
return len(words)
# Function to process a single chapter
def process_chapter(tractate, chapter, base_url):
# Define the specific section reference for each chapter
tref = f"Mishnah_{tractate}.{chapter}"
# Define the API endpoint URL
url = f"{base_url}{tref}"
# Send the GET request to the Sefaria API
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
# Parse the JSON response
data = response.json()
chapter_word_count = 0
edition_used = "Unknown Edition"
# Process the Hebrew text from the first available version (if it exists)
if "versions" in data and len(data["versions"]) > 0:
for version in data["versions"]:
if "text" in version:
# Get the edition used
edition_used = version.get("versionTitle", "Unknown Edition")
# Iterate through the lines and count the words
for line in version["text"]:
# Remove diacritics from the line
clean_line = remove_hebrew_diacritics(line)
# Count words in the clean line
word_count = count_hebrew_words(clean_line)
chapter_word_count += word_count
return {"Tractate": tractate, "Chapter": chapter, "Word Count": chapter_word_count, "Edition Used": edition_used}
elif response.status_code == 404:
print(f"404 Error for {tractate} Chapter {chapter}. Moving to next tractate.")
return None # Stop processing further chapters for this tractate
else:
print(f"Failed to retrieve data for {tractate} Chapter {chapter}. Status code: {response.status_code}")
return None
# Base URL for Sefaria API v3 Texts
base_url = "https://www.sefaria.org/api/v3/texts/"
# List of 63 tractates, names exactly as transliterated by Sefaria
tractates = [
"Berakhot", "Peah", "Demai", "Kilayim", "Sheviit", "Terumot", "Maasrot", "Maaser Sheni", "Challah",
"Orlah", "Bikkurim", "Shabbat", "Eruvin", "Pesachim", "Shekalim", "Yoma", "Sukkah", "Beitzah", "Rosh Hashanah",
"Ta'anit", "Megillah", "Moed Katan", "Chagigah", "Yevamot", "Ketubot", "Nedarim", "Nazir", "Sotah", "Gittin",
"Kiddushin", "Bava Kamma", "Bava Metzia", "Bava Batra", "Sanhedrin", "Makkot", "Shevuot", "Eduyot", "Avodah Zarah",
"Pirkei Avot", "Horayot", "Zevachim", "Menachot", "Chullin", "Bekhorot", "Arakhin", "Temurah", "Keritot", "Meilah",
"Tamid", "Middot", "Kinnim", "Kelim", "Oholot", "Negaim", "Parah", "Tahorot", "Mikvaot", "Niddah", "Makhshirin",
"Zavim", "Tevul Yom", "Yadayim", "Oktzin"
]
# Set max chapters as 31
max_chapters = 31
# Initialize total word count and create a list to store tractate-wise data
total_word_count = 0
tractate_data = []
# Function to process a single tractate (for parallelization)
def process_tractate(tractate):
tractate_results = []
for chapter in range(1, max_chapters + 1):
result = process_chapter(tractate, chapter, base_url)
if result:
tractate_results.append(result)
else:
# Stop further processing if we encounter a 404 error
break
return tractate_results
# Parallelize the processing of tractates using ThreadPoolExecutor
with ThreadPoolExecutor() as executor:
# Submit tasks for each tractate
futures = {executor.submit(process_tractate, tractate): tractate for tractate in tractates}
# As tasks complete, add their results to the data
for future in as_completed(futures):
tractate_results = future.result()
if tractate_results:
tractate_data.extend(tractate_results)
# Create a DataFrame from the tractate data
df = pd.DataFrame(tractate_data)
# Calculate the total word count
total_word_count = df['Word Count'].sum()
# Add a total row at the end of the table
df.loc["Total"] = ["Total", "Total", total_word_count, ""]
# Prepare the DataFrame for easy copy-pasting into a spreadsheet (TSV format)
df_tsv = df.to_csv(sep='\t', index=False)
# Save to a TSV file (optional if you want to download the file in Colab)
with open('/content/mishnah_wordcount.tsv', 'w') as f:
f.write(df_tsv)
# Print TSV formatted data for easy copy-pasting
print(df_tsv)
The piece is available on my Academia page (registration is required to view, and it's also attached below): “Mishnah and Tosefta By the Numbers: Word Counts of Tractates and Chapters”.
See also my earlier works on word counts, at my Academia page:
Special thanks to Avraham Yoskovich, via email, for suggesting the idea of expanding my word count project to include word counts of all the chapters in Mishnah and Tosefta.
See my recent reformatting of tractate Middot, at my Academia page (registration required): “Mishnah Tractate Middot: Featuring Reader-Friendly Formatting, Summaries, Tables, Hyperlinks, and Loanword Etymologies“.
The Mishnah tractate with the highest number of chapters is Kelim, at 30 chapters.