Scripting the Talmud Part#2: Automated Rashi Text Extraction and digital layout of tzurat daf
Intro to the traditional tzurat hadaf; Emulating tzurat hadaf and typography in digital layout ; Automated Rashi Text Extraction ; Appendix: Purim Daf (דף פורים), Yeshiva Shaar Hatorah 2011
Part of a series. Part 1: “Scripting the Talmud: Automated Talmudic Text Extraction and Formatting Programmatically - Extracting and formatting Talmudic text from Sefaria using Google Apps Script” (May 24, 2023). See also my parody daf at my Academia.edu page, that I created 12 years ago when I was in yeshiva (requires registration; the PDF is also available in appendix at the end of this blogpost): (PDF) Purim Daf (דף פורים), Yeshiva Shaar Hatorah 2011 | Ezra Brand - Academia.edu.
Intro to the traditional tzurat hadaf layout of the Talmud Bavli
Vilna Edition Shas - Wikipedia :
The Vilna Edition of the Talmud, printed in Vilna [...], Lithuania, is by far the most common printed edition of the Talmud still in use today as the basic text for Torah study in yeshivas and by all scholars of Judaism [...] This edition comprises 37 volumes and contains the entire Babylonian Talmud. In its entirety there are 2,711 double sided folio pages. It follows the typical pagination due to Bomberg of printing with the Gemara and/or Mishnah centered with Rashi's commentary on the inner margin and Tosafot on the outer margin [...] This edition was first printed in the 1870s and 1880s [...]
This layout is called the tzurat hadaf. On the history of tzurat hadaf, see Yoel Finkelstein’s important recent article (open-access), ”From Bomberg to the Beit Midrash: A Cultural and Material History of Talmudic Page Layout”, Tradition (Winter 2023), Issue 55.1.[1]
What is called a daf in Talmud/Gemara, is called a folio in English.
The term "folio" (from Latin folium 'leaf'), has three interconnected but distinct meanings in the world of books and printing: [...] second, it is a general term for a sheet, leaf or page in (especially) manuscripts and old books [...]
Second, folio is used in terms of page numbering for some books and most manuscripts that are bound but without page numbers as an equivalent of "page" (both sides), "sheet" or "leaf", using "recto" and "verso" to designate the first and second sides, and (unlike the usage in printing) disregarding whether the leaf concerned is actually physically still joined with another leaf. This usually appears abbreviated: "f26r." means the first side of the 26th leaf in a book. This will be on the right hand side of the opening of any book composed in a script that is read from left-to-right, such as Latin (as used in English), Cyrillic, or Greek, and will be opposite for books composed in a script that is read from right-to-left, such as Hebrew and Arabic.
For the Talmud, instead of the terms recto and verso, the letters A and B are traditionally used. Equivalently, their Hebrew equivalents are used: the letters א and ב, or period and semicolon. When Hebrew letters are used, an acronym is often used instead, so amud A is often shortened to ע”א.
To summarize:
Folio 2 recto = 2r = 2a = דף ב עמוד א =
ב ע"א = ב.
Folio 2 verso = 2v = 2b = דף ב עמוד ב =
ב ע"ב = ב:
Sefaria has added yet another level of granularity, for Talmudic citation: the section number (or “line", as they call it). Every page (=amud) of Talmud is split by Sefaria into around 10-15 sections. The section number is noted after a period symbol. So the second section of page 2b would be written as follows:
2b.2
Emulating tzurat hadaf and typography in digital layout
Set up Google doc as landscape, better fit for computer monitor aspect ratio–as opposed to the physical page of the typical tzurat hadaf.
Set format as two columns, for Rashi and Tosafot
Added text box, for Talmud
Used the fonts that were closest to the traditional fonts that I could find that are available natively in Google Docs:
Talmud: Frank Ruhl Libre, Semi Bold
Rashi: Noto Rashi Hebrew, Bold
Screenshot of final (content is lorem ipsum-style content):
Automated Rashi Text Extraction
Rashi - Download rashi:
Highlight part of talmud, go to Mefarshim > Rashi > אודות ספר זה.
Download as Hebrew text.
Don't download CSV, comes out as gibberish.
Don’t download english translation, hardly any translation
Tractate Gittin in traditional tzurat hadaf is ~180 pages. Meaning, 180 amudim. This calculation is based on 90 daf in Gittin, which, as explained earlier, is called folios in English.
It should be pointed out that Sefaria uses the term line to refer to the sections they split Talmudic page (=amud) into—typically 10-15 sections per page.
Rashi on gittin, downloaded from Sefaria is ~ 380 pages in a Google Doc, and takes a while to import to a Google doc.
Instead, we can use Google Apps Script to fetch only the text we need. See my previous post, for how I did this for the text of a page of Talmud: “Scripting the Talmud: Automated Talmudic Text Extraction and Formatting” (May 24, 2023).
I wanted to loop through the section numbers. However, because the Sefaria API doesn't indicate how many sections are in a given page of Talmud, I needed to manually set a maximum section number. As I mentioned earlier, Sefaria typically splits a page of Talmud into around 10-15 sections. To be on the safe side, I set the max number of sections to 50. When the function exceeds the number of actual sections, the Sefaria API returns a 404 error, which stops the loop.
Final script (Replace 'DOCIDHERE' with the ID of your Google Docs document):
function fetchAndInsertRashiCommentary() {
var tractate = "Gittin";
var pageNumber = "17a";
var maxSectionNumber = 50; // set a maximum number for the section count
var doc = DocumentApp.openById('
DOCIDHERE');
var body = doc.getBody();
for (var sectionNumber = 1; sectionNumber <= maxSectionNumber; sectionNumber++) {
var pageSection = pageNumber + "." + sectionNumber;
var url = "https://www.sefaria.org/api/texts/Rashi_on_" + tractate + "." + pageSection;
var response;
try {
response = UrlFetchApp.fetch(url);
} catch (error) {
// Sefaria returns a 404 error when a section doesn't exist
// So we break the loop when we've gone past the last section
break;
}
var dataAll = JSON.parse(response.getContentText());
var dataHebrew = dataAll['he'];
for (var i = 0; i < dataHebrew.length; i++) {
if (dataHebrew[i]) {
body.appendParagraph(pageSection + ": " + dataHebrew[i]);
}
}
}
}
Screenshot of output:
Appendix: Purim Daf (דף פורים), Yeshiva Shaar Hatorah 2011
(PDF) Purim Daf (דף פורים), Yeshiva Shaar Hatorah 2011 | Ezra Brand - Academia.edu:
[1] Thanks to Eliezer Brodt for pointing out this article to me. Seealso my study on digital layouts on the Talmud, forthcoming in Seforim Blog.