Data-driven Approaches to Ancient Languages
Premodern or historically attested languages are invaluable resources of both the study of diachronic linguistics and their contemporary culture. Although these languages might be from various language families or have a different script, researchers face common challenges, among which illegible or lost text (parts), inexistent gold standards and, very important these days, scarcity of data. Luckily, more and more texts become available, but the language of those texts might be so different from their modern pendant — should that modern pendant exist — that it considerably impacts the performance of existing tools. This workshop aims to provide a platform to a broad field of researchers engaged in digital approaches to pre-modern languages.
Praktische info:
Inschrijven?
- Voorwaarden: basic knowledge within the field of Digital Humanities
- Prijs: €15-€45
Theme
This workshop invites contributions centered around digital approaches for ancient languages. The focal point of the workshop is the presence or absence of data of historically attested languages. While the current landscape of computational linguistics is dominated by large-language models, such as the generative GPT model underlying ChatGPT, which is trained on an inconceivable amount of data, these systems, despite their apparent infallibility, do face challenges. Particularly, their performance is contingent on the availability of substantial training data, a condition not typically met by historically attested languages. Notably, our recent findings highlight that more traditional machine learning approaches, particularly into processing languages like Medieval Greek, yield superior outcomes compared to contemporary transformer-based techniques. This underscores the importance of linguistic expertise in computational approaches to lower-resourced languages. Profound knowledge of the corpus at hand is imperative, emphasizing the necessity for researchers of such lower-resourced languages to engage in knowledge exchange.
With the DAAL workshop, we aim to bring together a broad field of researchers engaged in digital approaches to pre-modern languages. This way, we hope to provide a forum to further advance the thriving research domain of NLP for ancient languages, where researchers and practitioners can meet and discuss their latest work. We hope to foster discussions and exchange ideas on shared challenges in language processing across various ancient languages.
Programme
Morning
09.00 Welcome coffee
09.30 Welcome
09.45 "Nescio Carneades iste qui fuerit": Evaluation of Knowledge Bases for Named Entity Linking for Latin Texts
Evelien De Graaf & Margherita Fantoli (KU Leuven)
10.05 Evaluating Generative LLMs for Named Entity Recognition in Literacy-Historical Texts
Tess Dejaeghere, Julie Birkholz, Els Lefever & Pranaydeep Singh (Universiteit Gent)
10.25 Automatic Generation of Greek Word Forms: a Corpus-Based Approach
Alek Keersmaekers (KU Leuven)
10.45 Coffee break
11.05 Decoding Byzantine Book Epigrams: an Exploration of Machine-Assisted Extraction of Formulaic Material
Kyriaki Giannikou, Colin Swaelens, Els Lefever & Klaas Bentein (Universiteit Gent) & Ilse De Vos (VAIA)
11.25 Hybrid Approach to Orthographic Similarity in Graph Databases
Colin Swaelens, Maxime Deforche, Guy De Tré & Els Lefever (Universiteit Gent) & Ilse De Vos (VAIA)
11.45 Panel discussion: Technology in Philology
Afternoon
12.30 Lunch break
13.30 Keynote: Modelling Latin Semantics with Computational Methods
Barbara McGilivray (King's College London)
14.30 Viability of Automatic Lexical Semantic Change Detection on a Diachronic Corpus of Literary Ancient Greek
Silvia Stopponi, Saskia Peels-Matthey & Malvina Nissim (Rijksuniversiteit Groningen)
14.30 Unsupervised Authorshop Attribution for Medieval Latin using Transformer-based Embeddings
Loic De Langhe, Orphée De Clercq & Veronique Hoste (Universiteit Gent)
15.10 Coffee break
15.30 NLP Pipelines for Classical Armenian
Lillit Kharatyan & Petr Kocharov (Universität Würzburg)
15.50 Early Modern Dutch Comedies and Farces in the Spotlight: Introducing EmDComF and its Emotion Framework
Florian Debaene, Kornee van der Haven & Veronique Hoste (Universiteit Gent)
16.10 Syncing Syntax: Building a Word Alignment Corpus through Morphological, Lemmatic and Syntactic Annotations
Wouter Mercelis & Toon Van Hal (KU Leuven)
16.30 Concluding remarks
16.45 Closing reception
Gerelateerde opleidingen

Interdisciplinary School on Machine Learning and AI for Science
Interdisciplinary school - Heilbronn - ETH Zürich


Machine Learning Specialization
opleiding - online - DeepLearning.AI, Stanford Online