Language technology in the service of the humanities Eetu Mäkelä, D.Sc. Assistant Professor in Digital Humanities / University of Helsinki Docent (Adjunct Professor) in Computer Science / Aalto University

What’s it about: Contextual reader CORE with Thea Lindquist, University of Colorado Boulder

A Contextual Reader for WWI Primary Sources

A Contextual Reader for Finnish Law

Needs from language technology ● (Language identification) ● NER against configurable vocabularies

○ Recall more important than precision ○ Need for strong identification of found entities ○ Limited domain alleviates need for disambiguation

● ● ● ●

(Inflected form generation) (Synonym generation) Configurability! Evaluation: is interesting context highlighted, links relevant?

A Contextual Reader for Finnish WWII Magazines

Place disambiguation ● Is one of the candidates in the focus area? ● Which is the largest place?

Person disambiguation: Karl or Klaus Oesch? ● K. Oesch –> Karl ● K. Oesch after 1.10.1944 –> Klaus ● Privates Talvela, Oesch and Walden & after 8.7.1942 –> Klaus ● K. Oesch when 2nd platoon mentioned –> Klaus ● Mannerheim cross knight private Oesch –> Klaus

Needs from language technology ● (Language identification) ● NER against configurable vocabularies

○ Recall more important than precision ○ Need for strong identification of found entities ○ Need for disambiguation depends on scope and domain

● ● ● ●

(Inflected form generation) (Synonym generation) Configurability! HACK HACK HACK TWEAK Evaluation: is interesting context highlighted, links relevant?

Current projects 1: Sociohistorical Language Change with Tanja Säily, University of Helsinki

Case study: derivational productivity of -er and -or ● Verb + suffixes -er and -or: driver, governor, filler ● Corpora of Early English Correspondence: spelling variation, false positives

○ er(e), ar(e), or(e), our(e), owr(e), ur(e), r + plural, possessive… ○ \S*(([rR]|[eEoO]~)(=?|=?[eE]=?|[='~]*[eEiIyY]?[='~]*[sSzZ][=']*))( ?![a-zA-Z'~=+]) ○ 6800 candidate words, 400 000 appearances

FiCa

Needs from language technology ● Support for manual perusal of results, context ● Support for grouping spelling variants ● 5080 words out of 6800 irrelevant after manual study ● 153 words out of 6800 needed further study ○ 11768 individual uses

Case study: newly coined words • Compare Corpus of Early English Correspondence words to:

• The millions of words in Eighteenth Century Collections Online, Early English Books Online, British Library Newspapers, Burney Collection, Nichols Collection • Structured information in the Oxford English Dictionary

• Needs from language technology: • Handling spelling variation • Handling OCR errors

Current projects 2: Analysing public communication in 18th century Britain and 19th century Finland with the COMHIS collective, University of Helsinki

Linguistic fingerprint, either of works or of words neighbourhood

Temporal/geographical perspective

Close reading

Robust tracking of particular discourses through term vectors

Charting conceptual distance: human nature vs benevolence

Subcorpora formulation “Human nature” → “Human nature” or “frailty” or “misanthropy” or “pravity” in 1) books between 100 and 200 pages long, that do not have the word “sermon” or “christ” in their PREFACE or TITLE and that also contain the word “philosophy”, against 2) books that DO have the word christ or sermon in their PREFACE or TITLE

Needs from language technology ● Support for fluid term vector operations in terms of subcorpora, parameters ● OCR error handling

Future project?: Analysing newspaper fiction literature genres in 19th century Finland Academy proposal with people from the Digital Humanities Hackathon 2017

Objectives • Literary scholarship: discover and analyze the world of fiction published on the pages of newspapers in Finland in the 19th century • Needs from language technology: • Robust genre dissection, summarization and comparisons ‒

fact/fiction/poetry/drama → ads, patriotic poetry, religious texts, socialist realism, experimental fiction, …

[email protected] http://j.mp/s-makela This presentation: http://j.mp/lt-hums

Language technology in the service of the humanities

Is one of the candidates in the focus area? ○ Which is the largest place? Place disambiguation. Page 8. Person disambiguation: Karl or. Klaus Oesch? ○ K. Oesch –> Karl ... Robust tracking of particular discourses through term vectors. Page 19. Charting conceptual distance: human nature vs benevolence. Page 20 ...

3MB Sizes 0 Downloads 138 Views

Recommend Documents

(the) Digital Humanities? - Sign in Accounts
Software Development. Typically following an. "agile" development model ... Gephi. ○ Java. ○ Drupal References. ○ D3. ○ ArcGIS Network Analyst ...

The Digital-Humanities Bust - The Chronicle of Higher Education.pdf ...
Oct 16, 2017 - interview series that #BlkTwitterstorians and other uses of social media have "helped people create maroon — free, black, liberatory,.

Peterson s Graduate Programs in the Humanities
Read PDF Peterson s Graduate Programs in the Humanities, Arts Social Sciences 2000 (Peterson s Graduate Programs in the Humanities, Arts and Social ...

Christ and Krishna - Mormon Scholars in the Humanities
The Gītā is not, after all, a stand-alone text, but an excerpt from the lengthy .... alone can I be known, and be truly seen in this form, and be entered into, Arjuna.

Named Entities in the Digital Humanities
Automatic conversions from “Lastname, Firstname” to. “Firstname Lastname” does not always work due to bad data. Problems for NER. Charles-Victor Prévost d'Arlincourt. Charles Victor Prévôt ˜d'œ. Arlincourt. Charles Victor Prevot d'.

Christ and Krishna - Mormon Scholars in the Humanities
For the protection of the good and the destruction of evil doers, for the ... the brother of Jared and stood in a cloud and talked with him” (Ether 2:14). Although the ...

Rethink Technology In The Age Of The Cloud ... Services
browser has become a central access point for communication and collaboration in the cloud. ... comprise 26% of today's information workers and are likely to be a high growth segment as .... Senior Market Impact Consultant. Contributing ...