Quirky Queries: Querying Wikidata with SPARQL for Hebrew Wikipedia Entries on Talmudic Biography
How I queryed Wikidata using the SPARQL query language, for finding Wikidata IDs and related Hebrew Wikipedia entry names
Continuation of my previous post: “From Mishnah to Josephus: A New Digital Index of Jewish Personalities in Classical Antiquity” (October 29, 2023)
In this post, I’ll explain how I queryed Wikidata using the SPARQL query language, for finding Wikidata IDs and related Hebrew Wikipedia entry names, for this piece of mine: “Index of Online Entries on Personalities in the Mishnah, Talmud, Late Antique Midrash, and Josephus” (Academia.edu, registration required).
Wikidata IDs are helpful for providing a single source of truth for a piece of data. Which is especially important for Talmudic names, where there are lots of variations of spelling of names, multiple people with the same name, and other challenges. (For more details, see my: “From Abba to Zebedee: A Comprehensive Survey of Naming Conventions in the Mishnah, Talmud, and Late Antique Midrash” [Academia.edu, registration required], especially sections “Given Names and Their Variations” (pp. 9-12), “Different People With the Same Name” (p. 16), and “Male Jewish names, by etymology” (pp. 63-71).)
Wikidata’s homepage explains:
“Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.”
The primary way to query Wikidata is with ‘SPARQL’. Wikipedia defines “SPARQL”:
“SPARQL is [...] a semantic query language for databases”
From initial exploration to refining and optimizing queries, this article serves as a useful guide for anyone looking to harness the power of Wikidata. The following post epitomizes the iterative process often required in data retrieval and manipulation. With a clear objective, the right tools, and a bit of patience, one can tailor data to meet specific needs. Whether you're a researcher, student, or data enthusiast, mastering tools like SPARQL and Google Sheets' REGEX formulas can unlock impressive possibilities.
The full query that I used, is in the end of this piece (appendix).
Details
The final query focuses on individuals whose occupation was specifically listed as "Tannaim" on Wikidata. This was achieved by altering the predicate to ‘wdt:P106’ (occupation) with the value ‘wd:Q975574’ (Tannaim).
The query not only identifies persons with the occupation "Tannaim", but also includes their respective Hebrew Wikipedia page titles and URLs.
An important consideration was preserving Hebrew characters when exporting SPARQL results. To do this, I:
Used the TSV download option from the Wikidata Query Service.
Imported the TSV into Google Sheets. This successfully retained Hebrew characters.
To further assist the user in manipulating the data, I used a REGEX formula for Google Sheets to extract specific parts of URLs. The formula:
=REGEXEXTRACT(A1, ".*/(.*)$")
obtains the unique Wikidata ID from a complete Wikidata URL, transforming "https://www.wikidata.org/wiki/Q2533433” (the Wikidata ID URL for Onkelos (אונקלוס) into just "Q2533433".
Appendix - final code
By ‘title’ = ‘Tannaim’
SELECT DISTINCT ?item ?itemLabel ?page_titleHE ?itemDescription ?article
WHERE {
?item
wdt:P31 wd:Q975574. # Instance of Tannaim
?article schema:about ?item;
schema:isPartOf <https://he.wikipedia.org/>;
schema:name ?page_titleHE.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],he" }
}
ORDER BY DESC(?sitelinks)
‘Part of’
To modify the code to use the "part of Tannaim (Q975574)" predicate instead of the "instance of Tannaim (Q975574)", you'll want to adjust the ‘wdt:P31 wd:Q975574’ portion of the query. The predicate for "part of" is ‘wdt:P361’.
By ‘Occupation’
occupation - ‘P106’
By ‘Date’
SELECT DISTINCT ?person ?page_titleHE ?article ?personLabel ?dateOfBirth ?personDescription WHERE {
?person wdt:P31 wd:Q5; # Instance of a human
wdt:P140 wd:Q9268; # Religion: Judaism
wdt:P106 wd:Q133485; # Occupation: Rabbi
rdfs:label ?page_titleHE; # Wikipedia entry name in Hebrew
wdt:P569 ?dateOfBirth. # Date of birth
# Filter for Hebrew labels
FILTER(LANG(?page_titleHE) = "he")
# Filter for date of birth between 300 BCE and 501 CE
FILTER(?dateOfBirth >= "-0300-00-00T00:00:00Z"^^xsd:dateTime && ?dateOfBirth <= "0501-00-00T00:00:00Z"^^xsd:dateTime)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],he". }
}
ORDER BY ?dateOfBirth