Ga verder naar de inhoud
Post-conference workshop

Data-driven Approaches to Ancient Languages

27 jun 2024 09:00 - 17:00

Premodern or historically attested languages are invaluable resources of both the study of diachronic linguistics and their contemporary culture. Although these languages might be from various language families or have a different script, researchers face common challenges, among which illegible or lost text (parts), inexistent gold standards and, very important these days, scarcity of data. Luckily, more and more texts become available, but the language of those texts might be so different from their modern pendant — should that modern pendant exist — that it considerably impacts the performance of existing tools. This workshop aims to provide a platform to a broad field of researchers engaged in digital approaches to pre-modern languages.

Lees meer & inschrijven

Praktische info:

27 jun 2024 09:00 - 17:00
Mercator A104 - Abdisstraat 1, 9000 Gent
Engels
Doelgroep: researchers engaged in digital approaches to pre-modern languages

Inschrijven?

  • Voorwaarden: basic knowledge within the field of Digital Humanities
  • Prijs: €15-€45
Lees meer & inschrijven

Georganiseerd door:

Theme

This workshop invites contributions centered around digital approaches for ancient languages. The focal point of the workshop is the presence or absence of data of historically attested languages. While the current landscape of computational linguistics is dominated by large-language models, such as the generative GPT model underlying ChatGPT, which is trained on an inconceivable amount of data, these systems, despite their apparent infallibility, do face challenges. Particularly, their performance is contingent on the availability of substantial training data, a condition not typically met by historically attested languages. Notably, our recent findings highlight that more traditional machine learning approaches, particularly into processing languages like Medieval Greek, yield superior outcomes compared to contemporary transformer-based techniques. This underscores the importance of linguistic expertise in computational approaches to lower-resourced languages. Profound knowledge of the corpus at hand is imperative, emphasizing the necessity for researchers of such lower-resourced languages to engage in knowledge exchange.

With the DAAL workshop, we aim to bring together a broad field of researchers engaged in digital approaches to pre-modern languages. This way, we hope to provide a forum to further advance the thriving research domain of NLP for ancient languages, where researchers and practitioners can meet and discuss their latest work. We hope to foster discussions and exchange ideas on shared challenges in language processing across various ancient languages.

Programme

Morning

09.00 Welcome coffee

09.30 Welcome

09.45 "Nescio Carneades iste qui fuerit": Evaluation of Knowledge Bases for Named Entity Linking for Latin Texts

Evelien De Graaf & Margherita Fantoli (KU Leuven)

10.05 Evaluating Generative LLMs for Named Entity Recognition in Literacy-Historical Texts

Tess Dejaeghere, Julie Birkholz, Els Lefever & Pranaydeep Singh (Universiteit Gent)

10.25 Automatic Generation of Greek Word Forms: a Corpus-Based Approach

Alek Keersmaekers (KU Leuven)

10.45 Coffee break

11.05 Decoding Byzantine Book Epigrams: an Exploration of Machine-Assisted Extraction of Formulaic Material

Kyriaki Giannikou, Colin Swaelens, Els Lefever & Klaas Bentein (Universiteit Gent) & Ilse De Vos (VAIA)

11.25 Hybrid Approach to Orthographic Similarity in Graph Databases

Colin Swaelens, Maxime Deforche, Guy De Tré & Els Lefever (Universiteit Gent) & Ilse De Vos (VAIA)

11.45 Panel discussion: Technology in Philology

Afternoon

12.30 Lunch break

13.30 Keynote: Modelling Latin Semantics with Computational Methods

Barbara McGilivray (King's College London)

14.30 Viability of Automatic Lexical Semantic Change Detection on a Diachronic Corpus of Literary Ancient Greek

Silvia Stopponi, Saskia Peels-Matthey & Malvina Nissim (Rijksuniversiteit Groningen)

14.30 Unsupervised Authorshop Attribution for Medieval Latin using Transformer-based Embeddings

Loic De Langhe, Orphée De Clercq & Veronique Hoste (Universiteit Gent)

15.10 Coffee break

15.30 NLP Pipelines for Classical Armenian

Lillit Kharatyan & Petr Kocharov (Universität Würzburg)

15.50 Early Modern Dutch Comedies and Farces in the Spotlight: Introducing EmDComF and its Emotion Framework

Florian Debaene, Kornee van der Haven & Veronique Hoste (Universiteit Gent)

16.10 Syncing Syntax: Building a Word Alignment Corpus through Morphological, Lemmatic and Syntactic Annotations

Wouter Mercelis & Toon Van Hal (KU Leuven)

16.30 Concluding remarks

16.45 Closing reception

Gerelateerde opleidingen

Programmer Coding At Desk

Interdisciplinary School on Machine Learning and AI for Science

19 juni 2025

Interdisciplinary school - Heilbronn - ETH Zürich

Programmer Coding At Desk

AI @ AP Meetup

19 juni 2025

evenement - Antwerpen - AP Hogeschool

Programmer Coding At Desk

Machine Learning Specialization

opleiding - online - DeepLearning.AI, Stanford Online