Word Counts of Classical Jewish Works: A Deep Dive
Comments on Individual Works; Word Count tables - Tanakh; Chazal (Mishnah; Talmud); Mishneh Torah; Tur; Shulchan Aruch
Update: See my subsequent research at my Academia.edu page: “Words of Wisdom: Word Counts of Classical Jewish Works“; and “Bavli By the Numbers: Word Counts of All Chapters in Talmud Bavli”
In an era dominated by digital advances, it’s surprising that, as far as I’m aware, a straightforward word count of classic Hebrew books isn't readily accessible to the public.
Ari Z. Zivotofsky, “The Longest Masechta is …”, The Seforim Blog (March 31, 2023) mentions that Bar-Ilan Responsa Project (=BIRP) CD shows word counts for works contained in that corpus. YD was kind enough to send me screenshot of BIRP’s word counts (via the ‘Ask the Beit Midrash’ Facebook group, a private group).
I hope to independently check the numbers by checking Sefaria texts (see my discussion at the end of this post), as well as Academy of the Hebrew Language’s Ma’agarim (מאגרים),1 and to document in a future post.
Comments on Individual Works
Tanakh: See this Hebrew Wikipedia entry, which discusses the 1500 year history of attempts to get the exact word count of the Pentateuch, starting from the Talmud: סטטיסטיקות של התנ"ך – ויקיפדיה . It’s noteworthy that each book of the Pentateuch approximates the size of an ancient 'book' or scroll (=~40,000 words). The ancient 'book' or scroll is roughly equivalent to 30-40 pages in a contemporary book with standard formatting.
Talmud: There have been a few discussions of the Babylonian Talmud's word count— with estimates around 1.865 million words. See here: Shalom, “What is the longest masechta?” (~2010), Mi Yodeya (Aug 26, 2010) ; Jeremy Brown, “Berachot 2 ~ How Many Words Are In the Babylonian Talmud?”, Talmudology (January 5, 2020); and Zivotofsky, ibid.
I haven’t found online the Mishnah's word counts of Mishnah, Classic Midrash (midrashei halacha and aggada from period of chazal), Mishneh Torah, Zohar, Tur, and Shulchan Aruch.
Interestingly, Zivotofsky there mentions one-volume editions of Bavli and Yerushalmi, edited by Zvi Preisler. (In my Seforim Blog post, I mistakenly cited this from a commenter on Zivitofsky’s post, instead of from Zivitofsky himself.)
Zohar: A question on the Judaism Stack Exchange from four years ago (see here) inquiring about the Zohar's word count remains unanswered.
Word Count tables
Tanakh
Image:
OCR text:
Work Word Count Letter Count
------------------- ------------ --------------
תנ"ך 1,202,969 306,755
בראשית 78,133 20,628
שמות 63,592 16,726
ויקרא 44,807 11,955
במדבר 63,573 16,417
דברים 54,948 14,306
יהושע 39,878 10,063
שופטים 39,044 9,906
שמואל א 51,714 13,336
שמואל ב 42,588 11,129
מלכים א 50,838 13,186
מלכים ב 48,162 12,352
ישעיהו 67,119 16,986
ירמיהו 85,571 21,975
יחזקאל 75,208 18,871
הושע 9,412 2,387
יואל 3,876 958
עמוס 8,049 2,045
עובדיה 1,124 292
יונה 2,700 688
מיכה 5,586 1,400
נחום 2,272 562
חבקוק 2,603 672
צפניה 3,005 769
חגי 2,342 601
זכריה 12,464 3,134
מלאכי 3,450 876
תהלים 79,155 19,656
משלי 26,818 6,984
איוב 32,118 8,395
שיר השירים 5,167 1,254
רות 5,002 1,306
איכה 6,084 1,564
קהלת 11,014 2,998
אסתר 12,196 3,058
דניאל 24,870 6,049
עזרא 15,950 3,791
נחמיה 22,632 5,336
דברי הימים א 44,769 10,789
דברי הימים ב 55,136 13,355
Chazal
Image:
OCR text:
Work Word Count Letter Count
ספרות חז"ל 39,210,393 9,993,109
משנה 781,068 191,563
תוספתא 1,213,882 302,847
תלמוד בבלי nסכתות קטנות 7,948,735 2,004,215
תלמוד ירושלמי (וילנא) 3,133,340 794,835
תלמוד ירושלמי (ונציה) 3,206,396 814,642
מדרשי הלכה 2,201,420 576,633
מדרשי אגדה 20,725,552 5,308,374
Zohar
Image:
OCR text:
Work Word Count Letter Count
זוהר 5,183,545 1,239,420
זוהר - הקדמה 58,819 14,264
זוהר 2,869,377 691,909
זוהר - תוספתא 15,908 3,699
זוהר - סתרי תורה 48,632 11,440
זוהר - מדרש הנעלם 58,764 14,449
זוהר - רעיא מהימנא 330,025 78,214
זוהר - רזא דרזין 12,171 2,772
זוהר - ס"א 1,571 396
זוהר - ספרא דצניעותא 11,168 2,642
זוהר - האדרא רבא 77,087 17,773
זוהר - האדרא זוטא 42,041 9,721
זוהר - השמטות 87,865 21,387
Mishneh Torah; Tur; Shulchan Aruch
Image:
OCR text:
Work Word Count Letter Count
משנה תורה לרמב"ם (עם ראב"ד) 3,530,053 871,871
טור 2,878,992 707,401
שולחן ערוך 3,050,105 756,198
Preliminary discussion of Word counts using Wikisource and Sefaria
Wikisource, one of the primary open-access sources of the Talmud, primarily showcases text per daf or chapter, rather than per tractate. For a comprehensive approach, I turned to Sefaria's BSON dump — (BSON is a binary format used by MongoDB).
To extract the desired data, I employed Python within Google Colab. This enabled me to navigate the BSON files, particularly interacting with the ‘pymongo’ library to filter and process the Hebrew texts. This Python script also eliminates nikkud, and then counts the words.
I hope to make the final full Python to fetch these word counts, and discuss in a future post.
Thanks to JH in the above-mentionoed Facebook group for pointing out that Ma’agarim shows word counts