A new biennial conference series on Language, Data and Knowledge takes place at the Insight Centre for Data Analytics at NUI Galway today, organised in collaboration with Goethe University, Frankfurt and Leipzig University.

Have you ever thought about the fact that language is data? Everything you write on social media, everything curated in digital archives, company files and records, your emails, all contain a wealth of data. But language is complicated and, for computers, the quirks and inconsistencies in how we use language can prove challenging. But understanding and interpreting language as humans use it is becoming a pivotal part of both machine learning and artificial intelligence.

LDK brings together researchers from across disciplines concerned with the acquisition, curation and use of language data in the context of data science and knowledge-based applications. With the advent of the web and digital technologies, an ever-increasing amount of language data is now available across application areas and industry sectors, including social media, digital archives, company records and so-on. The efficient and meaningful exploitation of this data in scientific and commercial innovation is at the core of data science research, employing Natural Language Processing (NLP) and machine learning methods as well as semantic technologies based on knowledge graphs.

Conference organiser Dr Paul Buitelaar said, “Language data is of increasing importance to machine learning-based approaches in NLP, Linked Data and Semantic Web research. The acquisition, provenance, representation, maintenance, usability, quality as well as legal, organisational and infrastructure aspects of language data are rapidly becoming major areas of interest and these will be a big focus of the conference.”

Another conference focus will be knowledge graphs – an active field of research concerned with the extraction, integration, maintenance and use of semantic representations of language data in combination with semantically or otherwise structured data, numerical data and multimodal data. Furthermore, the conference is concerned with the combined use and exploitation of language data and knowledge graphs in data science-based approaches to use cases in industry, including biomedical applications, as well as use cases in humanities and social sciences. 

Speakers include:

Zoltán Szlávik, who leads the IBM Benelux Center for Advanced Studies
Kathleen McKeown, director of the Data Science Institute.
Gertrude Rothschild, Professor of Computer Science at Columbia University.
Antal van den Bosch, director of the Meertens Institute in Amsterdam, and Professor of Language and Speech Technology at the Centre for Language Studies at Radboud University, the Netherlands.
Graham R Isaac, lecturer in Welsh at NUI Galway