5 Comments
User's avatar
Shmuel's avatar

This looks very cool but I have a few questions/comments:

1. Can you explain why your previous analysis identified Sanhedrin 108a?

2. I don't think the Sefaria text has any contractions (even simple ones like א"ר for אמר רב) but it does contain in-line citations of biblical verses, which will result in overcounting the words on folios that have more of these citations.

More generally: I guess you'll get into this in the next post, but I don't really understand why you needed a good indicator for identifying aggadic texts, when the Ein Yaakov did that already? I mean, it's not perfect (and there's some grey area between aggadah and halakha anyways) but I would imagine that it is more robust of an indicator than any other? Have you done a comparison between these correlative indeces vs a simple check of whether the talmudic passage appears in Ein Yaakov?

Expand full comment
Ezra Brand's avatar

Hi Shmuel!

Thanks for the thoughtful questions.

Responses:

>”1. Can you explain why your previous analysis identified Sanhedrin 108a?”

That’s a good question, I’m not sure myself. My previous calculation was far less accurate, in my opinion (aside from being far less comprehensive, in terms of pages checked), for a number of reasons:

-I used Wikisource, and had to extract the relevant text.

-See my next response (#2, re Sefaria vs. Wikisource)

>”2. I don't think the Sefaria text has any contractions (even simple ones like א"ר for אמר רב) but it does contain in-line citations of biblical verses, which will result in overcounting the words on folios that have more of these citations.”

Actually, Sefaria text via the API is ideal for our purposes (word count): it indeed contains no contractions, and in fact contains no in-line citations of biblical verses (unlike Wikisource, see previous response)

>3. ”More generally: I guess you'll get into this in the next post, but I don't really understand why you needed a good indicator for identifying aggadic texts, when Ein Yaakov did that already? I mean, it's not perfect (and there's some grey area between aggadah and halakha anyways) but I would imagine that it is more robust of an indicator than any other? Have you done a comparison between these correlative indices vs a simple check of whether the talmudic passage appears in Ein Yaakov?”

So Ein Yaakov was indeed what I first thought of, when I first started thinking about this question. In fact, a year ago, I spent some time analyzing Ein Yaakov. See the following posts of mine:

Identifying the Most Quoted Sages in the Talmud's Aggada: A Programmatic and Quantitative Study (Jan 17, 2024) -

https://www.ezrabrand.com/p/discovering-the-talmuds-most-cited

Automated Aggada formatting - splitting into sections and lines, bolding verses, and underlining sages (sampling the beginning of Sanhedrin Perek Chelek) (Jan 24, 2024) -

https://www.ezrabrand.com/p/automated-aggada-formatting-splitting

But what I came to realize was that:

a) Ein Yaakov significantly undercounts/under-labels aggadah (in my opinion). What I mean by that is that it has a much more “minimalist” perspective on what aggadah is. I haven’t done a systematic investigation to back that up (I’d love to find out if anyone has studied this), but that’s my impression. Of course, as you mentioned, “there's some grey area between aggadah and halakha”. I personally have a more “maximalist” definition of aggadah. So I’d include the long medical sugya in Gittin, whereas I’d assume that Ein Yaakov doesn’t include that.

b) You write: “I would imagine that it is more robust of an indicator than any other? Have you done a comparison between these correlative indices vs a simple check of whether the talmudic passage appears in Ein Yaakov?” - From a purely technical perspective, using Ein Yaakov is difficult to use for programmatically labelling aggadah, for the following reason: Unless I’m missing something, it’s non-trivial to match up Ein Yaakov’s text, with Sefaria’s main Talmud text. If you look at Sefaria’s edition of Ein Yaakov, it’s “paginated” by tractate + number. See here: https://www.sefaria.org.il/Ein_Yaakov; and here: https://www.sefaria.org.il/Ein_Yaakov_(Glick_Edition)

Expand full comment
Shmuel's avatar

Cool, thanks for the reply! My guess is that if you were interested in publishing this in a journal (if you care about that kinda thing) you'd probably want to do the Ein Yaakov comparison and pull out a few discrepancies to demonstrate the strengths/weakness of either approach. But yeah unfortunately Sefaria does not have anything that would allow for this. I thought it has "connections" in Ein Yaakov that links to the Talmudic sourcetext, but I just checked now and actually most of those appear missing (and even for well annotated texts those "connections" are not 100% clean/accurate anyways, as I discovered when trying to determine which tractate is Shas Kattan). Sefaria does have Maharsha's chiddushei aggdaos with connections, and he comments on >80% of aggadah, so that's potentially useful, but will be undercounting. Either way, I'm looking forward to when you can tell us (by any metric) what percentage of Talmud Bavli is halakha vs. aggadah!

Expand full comment
Ezra Brand's avatar

Responses:

> "My guess is that if you were interested in publishing this in a journal (if you care about that kinda thing) you'd probably want to do the Ein Yaakov comparison and pull out a few discrepancies to demonstrate the strengths/weakness of either approach."

Forget publishing in a journal, it's an interesting question in general! But again, it's really a separate study/analysis.

>"Sefaria does have Maharsha's chiddushei aggados with connections, and he comments on >80% of aggadah, so that's potentially useful, but will be undercounting."

Right. I don't think that would be a great proxy/indicator. As I mentioned in a footnote in this piece, I'm working on an additional separate proxy/indicator: ratio of bolding in Steinsaltz. That ended up also being more technically complex than I expected (due to issues relating to section numbers, long story).

Anyway, tomorrow's piece will be expanding on page word count as indicator aggadah, so stay tuned.

> "I'm looking forward to when you can tell us (by any metric) what percentage of Talmud Bavli is halakha vs. aggadah!"

Interesting point, I hadn't really thought about revisiting that question (I did mention it in my Ein Yaakov piece). But thinking about it now, not sure it would work. Page word count is a reasonable proxy (in my opinion) for labeling whether the page contains aggadah. But scaling up from that to calculating what percentage of the Talmud is halacha vs. aggadah I think would be too loose to lead to accurate percentage, in any sense. In that regard, I think that the Ein Yaakov proxy is indeed a good one: Since Ein Yaakov word count is ~1/3 to 1/4 that of the total Talmud, that's a reasonable ratio (as is known). Though again, I think that's an undercount.

Long term, I'd love to to do a full (programmatic) labeling (via regex, NLP, or LLM) of all sections of Talmud Bavli, for a much more robust segmenting of halacha vs. aggadah (as part of a broader indexing project). And then I think we'd have a much better ratio of halacha vs. aggadah. But that's a big project, that I personally probably won't be working on in the foreseeable future.

Expand full comment
Ezra Brand's avatar

To clarify a bit more here:

I use the term 'aggadah' in a broad, functional sense: essentially, anything in the Talmud that isn't halakhic argument or legal analysis falls under that label. That includes stories (of either biblical or rabbinic figures), folklore, ethical reflections, cosmological or theological claims, medical material, metaphysics, demonology, and various scientific observations (botanical, linguistic, etc). I’m not treating 'aggadah' as a literary genre with fixed features (e.g. homiletics), but rather as a catch-all for the Talmud's non-legal discourse.

Expand full comment