Ancient Greek Dependency Treebanks Federico Boschetti∗
[email protected] ∗ University
of Pavia
Erfurt, 12 October 2011
Federico Boschetti
Ancient Greek Dependency Treebanks
1/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Overview
1
Available Treebanks
2
Data Entry
3
Retrieval
4
Analysis
5
Converters
Federico Boschetti
Ancient Greek Dependency Treebanks
2/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Available Treebanks for Ancient Greek
Perseus Project http://www.perseus.tufts.edu/hopper
Pragmatic Resources in Old IndoEuropean Languages (PROIEL) http://www.hf.uio.no/ifikk/english/research/projects/proiel
Herodotus Dependency Treebank (HdtDep) http://www.crs.rm.it/hdtdep/demo.asp
Federico Boschetti
Ancient Greek Dependency Treebanks
2/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus
http://nlp.perseus.tufts.edu/syntax/treebank/greek.html Epic Poetry Homer Iliad (128,102 words) Odyssey (104,467)
Hesiod Shield of Heracles (3,834) Theogony (8,106) Works and Days (6,941)
Federico Boschetti
Ancient Greek Dependency Treebanks
3/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus Tragedy Aeschylus Agamemnon (9,806) Eumenides (6,380) Libation Bearers (6,563) Persians (6,223) Prometheus Bound (7,045) Seven Against Thebes (6,206) Suppliants (5,949)
Sophocles Ajax (9,474)
Total: 309,096 words Federico Boschetti
Ancient Greek Dependency Treebanks
4/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus – Format sample
Federico Boschetti
Ancient Greek Dependency Treebanks
5/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus – Guidelines http://nlp.perseus.tufts.edu/syntax/treebank/agdt/1.2/docs/guidelines.pdf
“These guidelines are based on those developed for the annotation of Latin syntax in collaboration with the Index Thomisticus” Specific constructions Ellipsis
Accusative + Infinitive
Relative Clauses
Tmesis
Particles The Genitive and Accusative Absolute
Federico Boschetti
Direct Speech Direct Address
Ancient Greek Dependency Treebanks
6/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL
http://www.hf.uio.no/ifikk/english/research/projects/proiel
Download and/or browse http://foni.uio.no:3000 (Rails server) Fully annotated New Testament Partially annotated Herodotus’ Histories (first book+) Not yet annotated Historia Lausiaca
Federico Boschetti
Ancient Greek Dependency Treebanks
7/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL
Declared objectives word order discourse particles pronominal reference and the use of null pronouns expressions of definiteness the use of participles to refer to background events
Federico Boschetti
Ancient Greek Dependency Treebanks
8/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL – Format sample ἀμὴν λέγω ὑμῖν ὅτι πολλοὶ προφῆται καὶ δίκαιοι ἐπεθύμησαν ἰδεῖν ἃ βλέπετε καὶ οὐκ ἴδαν καὶ ἀκοῦσαι ἃ ἀκούετε καὶ οὐκ ἤκουσαν Mt. 13.17
Federico Boschetti
Ancient Greek Dependency Treebanks
9/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL – Guidelines http://folk.uio.no/daghaug/syntactic guidelines.pdf
Some differences with the IT-TB guidelines all the parenthetical sentences depend on the sentence root the predicates of the subordinate clauses are annotateted with PRED when headed by a subjunction OBL is used for oblique OBJ conjunctions and prepositions are not annotated with AuxC/AuxP complement clauses are not distinguished in SBJ or OBJ (conversion is not trivial) PROIEL uses slashes and does not use Co (issues in attachment of a SBJ shared by 2 PREDs) ...
(Courtesy of M. Passarotti) Federico Boschetti
Ancient Greek Dependency Treebanks
10/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
HdtDep http://www.crs.rm.it/hdtdep/browse.asp
Currently available only the first Book of Herodotus HdtDep only encodes the information that seemed strictly necessary for word order studies (neither morphological data nor syntactic relationship types are encoded), but provides a powerful and userfriendly search engine, which allows searching for precise dependency patterns involving specific grammatical categories or lexemes in exact sequences through a graphic interface. (http://www.digitalclassicist.org/wip/wip2011-04av.html)
Federico Boschetti
Ancient Greek Dependency Treebanks
11/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Overview
1
Available Treebanks
2
Data Entry
3
Retrieval
4
Analysis
5
Converters
Federico Boschetti
Ancient Greek Dependency Treebanks
12/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus / Alpheios
Od. 15.28-30 Federico Boschetti
Ancient Greek Dependency Treebanks
12/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus / Alpheios
Propertius 1.8
Federico Boschetti
Ancient Greek Dependency Treebanks
13/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus / Alpheios
Federico Boschetti
Ancient Greek Dependency Treebanks
14/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus / Alpheios
Federico Boschetti
Ancient Greek Dependency Treebanks
15/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus / Alpheios
Federico Boschetti
Ancient Greek Dependency Treebanks
16/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL
Web interface
http://foni.uio.no:3000
Federico Boschetti
Ancient Greek Dependency Treebanks
17/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
TrEd – wrong dependency
Federico Boschetti
Ancient Greek Dependency Treebanks
18/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
TrEd – correct dependency
Federico Boschetti
Ancient Greek Dependency Treebanks
19/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
TrEd – edit features
Federico Boschetti
Ancient Greek Dependency Treebanks
20/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
TrEd – edit features
Federico Boschetti
Ancient Greek Dependency Treebanks
21/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Overview
1
Available Treebanks
2
Data Entry
3
Retrieval
4
Analysis
5
Converters
Federico Boschetti
Ancient Greek Dependency Treebanks
22/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Annis http://www.sfb632.uni-potsdam.de/d1/annis
(from the Homepage)
Federico Boschetti
Ancient Greek Dependency Treebanks
22/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Annis Access from Perseus http://annis.perseus.tufts.edu
Od. 15.28-30
Query: form-beta=“*sa/moio/”
Federico Boschetti
Ancient Greek Dependency Treebanks
23/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Annis
Federico Boschetti
Ancient Greek Dependency Treebanks
24/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL
http://foni.uio.no:3000 Morphological analysis
Federico Boschetti
Ancient Greek Dependency Treebanks
25/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Federico Boschetti
Ancient Greek Dependency Treebanks
26/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Federico Boschetti
Ancient Greek Dependency Treebanks
27/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
HdtDep Demo
Federico Boschetti
Ancient Greek Dependency Treebanks
28/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
HdtDep Demo
Federico Boschetti
Ancient Greek Dependency Treebanks
29/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
TrEd – Query
PML Tree Query Guide: http://ufal.mff.cuni.cz/ pajas/pmltq/doc/pmltq doc.html Federico Boschetti
Ancient Greek Dependency Treebanks
30/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
TrEd – Analytical level
Federico Boschetti
Ancient Greek Dependency Treebanks
31/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
TrEd – Tectogrammatical level
Federico Boschetti
Ancient Greek Dependency Treebanks
32/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Overview
1
Available Treebanks
2
Data Entry
3
Retrieval
4
Analysis
5
Converters
Federico Boschetti
Ancient Greek Dependency Treebanks
33/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PDT Valency Lexicon
Prague Dependency Treebank Valency Lexicon http://ufal.mff.cuni.cz/pdt2.0/visual-data/pdt-vallex/vallex.html
Federico Boschetti
Ancient Greek Dependency Treebanks
33/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
IT-VaLex Index Thomisticus Valency Lexicon http://itreebank.marginalia.it/itvalex
Federico Boschetti
Ancient Greek Dependency Treebanks
34/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Subcategorization frames and classes
“The SCFs record the linear order of the arguments in the sentence whereas SCCs do not. The linguistic reason for this choice i that these structures respond to different needs and can be used for different analyses” B. Mc Gillivray, 2010. A computational approach to Latin verbs: new resources and methods, Ph.D. Thesis, p. 68
Federico Boschetti
Ancient Greek Dependency Treebanks
35/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Analysis Tools
Many approaches are possible btred: the non interactive scriptable version of tred parsing of the PML files with XML parsers built-in in general purpose programming languages (e.g. Java) and scripting languages (e.g. Python) scripts written in R, the scripting language oriented to the statistical analysis R embedded in other languages (e.g. Java), in order to provide statistics and dynamic graphs online
Federico Boschetti
Ancient Greek Dependency Treebanks
36/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Overview
1
Available Treebanks
2
Data Entry
3
Retrieval
4
Analysis
5
Converters
Federico Boschetti
Ancient Greek Dependency Treebanks
37/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus to PROIEL
J. Lee, D. Haug, 2010. Porting an Ancient Greek and Latin Treebank, LREC http://www.lrec-conf.org/proceedings/lrec2010/pdf/631 Paper.pdf
It is a non trivial task, highly automated, even if adjustments and integrations are required (e.g. to represent PRO-DROP, etc.)
Federico Boschetti
Ancient Greek Dependency Treebanks
37/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL to Perseus
This converter is not ready. Conversion apparently is simpler, but some cases need manual intervention (as pointed out by Passarotti in his list of observations).
Federico Boschetti
Ancient Greek Dependency Treebanks
38/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
PROIEL to PML (PDT 2.0 conformant)
According to the developers, a PROIEL ⇒ Perseus ⇒ PML is a viable solution. In addition, the extraction of information belonging to the tectogrammatical level, should be performed and separately converted in PML (.t level).
Federico Boschetti
Ancient Greek Dependency Treebanks
39/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Perseus to PML (PDT 2.0 conformant)
AGDT (Perseus) format has been converted in PML by Alpheios’ team
Courtesy of F. Mambrini
Federico Boschetti
Ancient Greek Dependency Treebanks
40/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
Thank you for your attention
Federico Boschetti
Ancient Greek Dependency Treebanks
40/ 41
Available Treebanks Data Entry Retrieval Analysis Converters
References Bamman, David and Mambrini, Francesco and Crane, Gregory 2009. “An Ownership Model of Annotation: The Ancient Greek Dependency Treebank”. Proceedings of the Eight International Workshop on Treebanks and Linguistic Theories (TLT 8). Milan; 5-15. Boschetti, Federico 2010. A Corpus-based Approach to Philological Issues, Ph.D. Thesis, University of Trento. Herbst, Thomas and G otz-Votteler, Katrin 2007. Valency: Theoretical, Descriptive and Cognitive Issues. Berlin: Mouton de Gruyter. Lehmann, Christian 2002. “Latin valency in typological perspective”. Bolkestein, A. Machtelt and Kroon, Caroline H.M. and Pinkster, Harm and Remmelink, H. Wim and Risselada, Rodie (eds.), Theory and description in Latin linguistics. Selected papers from the XIth International Colloquium on Latin Linguistics, Amsterdam June 24-29, 2001. Amsterdam: J.C. Gieben; 183-203. Lehmann, Christian 2005. “Typologie d’une langue sans cas: le maya yucat` eque”. Travaux du SELF 10:101-114. Levin, Beth and Rappaport Hovav, Malka 2005. Argument Realization, Cambridge: Cambridge University Press. Luraghi, Silvia 2004. “Null Objects in Latin and Greek and the Relevance of Linguistic Typology for Language Reconstruction”, Proceedings of the 15th Annual UCLA Indo-European Conference, JIES Monograph 49, 234-256. Luraghi, Silvia 2010. “The extension of the passive construction in Ancient Greek”. Acta Linguistica Hafniensia 42/1, 60-74. Federico Boschetti
Ancient Greek Dependency Treebanks
41/ 41