Further Experiments in Automating Accessible Talmud Layout
Using Google Apps Script to fetch content from a Wikisource page and format it in Google Docs
Follow-up to my previous post, where I wrote “The one place I found punctuation is Hebrew Wikisource’s Commentary (ביאור), from R’ Yishaya Hacohen Hollander’s ‘Easy Talmud’ (גמרא נוחה). I pulled this punctuated text automatically, and copy-pasted it here. In addition to the very important punctuation work, R’ Hollander’s text also opens up all acronyms. A future blogpost will be about how to pull this text automatically.” See my previous pieces discussing potential ways of making the Talmud Bavli more accessible: “Cataloguing My Blogposts: An Organized Breakdown by Theme” > section “Automation and digital layouts of rabbinic works”
Google Apps Script is a powerful, relatively user-friendly tool for automating tasks. It can help automate tasks that are time-consuming to do manually. In this blog post, I will walk through using Google Apps Script to fetch content from a Wikisource page and format it in Google Docs.
Source Text
Hollander’s Easy Talmud (גמרא נוחה) on Hebrew Wikisource:
ביאור:בבלי בבא מציעא דף נט – ויקיטקסט
Screenshot:
Final Output
Final Output: Automating Accessible Talmud Layout with Google Apps Script:
Final Code: See below, at the end
Coding steps
Fetching Content from Wikisource
The first step in my process was to fetch content from a specific page on Wikisource. I did this using the Wikisource API, which allowed us to retrieve the content of the page in a format that could be easily parsed and manipulated.
This script uses Google Apps Script's UrlFetchApp service to make an HTTP request to the Wikisource API, fetching the content of a specified page. The content is then parsed as JSON and logged to the Apps Script console.
Cleaning Up the Content
Once I had the content from the Wikisource page, I needed to clean it up before inserting it into my Google Doc. Specifically, I wanted to remove footnotes and empty lines. To do this, I used JavaScript's built-in string manipulation functions. This code splits the content into lines and checks each line to see if it is empty. If it is not empty, the line is added to my Google Doc.
Formatting the Document
With my cleaned-up content in hand, the next step was to format it in my Google Doc. I set the font family and size, added space after each paragraph, and inserted the content into the document.
Limitations and Manual Formatting
Google Apps Script doesn't support setting the font weight, adding page numbers to the header, or setting alignment to right-to-left. These tasks must be done manually in Google Docs.
I also deleted irrelevant headers.[1] For this example, I also deleted parts of the output that aren’t part of the story of ‘the Oven of Akhnai’.
It would be great to be able to programmatically combine this layout with the Steinsaltz commentary available at Sefaria.
Final Code
function fetchPageContent() {
var pageTitle = "ביאור:בבלי_בבא_מציעא_דף_נט"; // Replace 'XXX' with your page title
var apiUrl = "https://he.wikisource.org/w/api.php";
var options = {
method: "GET",
headers: {"Api-User-Agent": "Example/1.0"},
muteHttpExceptions: true
};
var apiQuery = apiUrl + "?action=query&prop=revisions&rvprop=content&format=json&titles=" + encodeURIComponent(pageTitle);
var response = UrlFetchApp.fetch(apiQuery, options);
var json = JSON.parse(response.getContentText());
if (json && json.query && json.query.pages) {
for (var pageId in json.query.pages) {
if (json.query.pages[pageId].revisions && json.query.pages[pageId].revisions.length > 0) {
var pageContent = json.query.pages[pageId].revisions[0]['*'];
// Remove footnotes
pageContent = pageContent.replace(/<ref[^>]*>[\s\S]*?<\/ref>/g, "");
// Split the content into lines
var lines = pageContent.split('\n');
// Append the text content to the current Google Doc
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
for (var i = 0; i < lines.length; i++) {
// Skip empty lines
if (lines[i].trim() !== '') {
var para = body.appendParagraph(lines[i]);
para.setSpacingAfter(10); // Adds space after paragraph
para.setFontFamily("Frank Ruhl Libre"); // Sets the font to "Frank Ruhl Libre"
para.setFontSize(12); // Sets the font size to 12
}
}
} else {
Logger.log("No revisions found for the page.");
}
}
} else {
Logger.log("No data retrieved. Check the page title.");
}
}
[1] It’s likely that there’s a way to do this in Google Apps Scripts, but I didn’t get around to it. Also, need to find a way to remove footnotes in brackets, such as in this line:
מאי ([[דברים ל יב]]) לא בשמים היא [לאמר: מי יעלה לנו השמימה ויקחה לנו, וישמענו אתה, ונעשנה]?