The Unit for Linguistic Data (ULD) is concerned with the creation, improvement and maintenance of linguistic data (also known as language resources) through a variety of methods. The term linguistic data refers to a range of data types that are of use to researchers in linguistics and natural language processing (NLP). Principally, linguistic data can be split into four major categories: firstly, lexical data contains descriptions of words and their meanings, syntax and relations; secondly, corpora consist of collections of texts made for a particular purpose; thirdly, language descriptions document typological properties of language to enable comparative studies; and finally, metadata about language resources and their availability.
As a primary research method, this group is focussed on exploring the use of linked data technologies, that is Linguistic Linked Open Data (LLOD), as a method of processing linguistic data. This has led to the development of several key tools and resources that use linked data as a key part of its mechanism. One such tool, the Naisc tool is a novel tool developed by the group for linking together resources of different kinds and has been applied to the task of linking lexicographical resources in the context of the ELEXIS project. Another tool, Teanga, enables the construction of pipelines of NLP tools that can be composed and integrated through the use of linked data and standards for linguistic data, such as the OntoLex-Lemon standard developed in this project. Finally, ULD maintains and develops several catalogues for the discovery of resources of linguistic data, including the Linghub website as well as the Linked Open Data Cloud and its Linguistic Linked Open Data Subcloud. In the context of the Prêt-à-LLOD project, ULD is further exploring how the quality and availability of resources can be improved.
One of the major applications of linguistic data is the use of already developed NLP technologies to new languages and domains. As such, a major part of this group's work is on under-resourced languages, and there is much ongoing work on the development of technologies for minority languages as well as an active collaboration with the Irish Department and the Moore Institute on the development of NLP techniques for historical languages, in particular Old Irish. Furthermore, the unit is working on expanding WordNet to many under-resourced languages by means of machine translation.
Areas of work:
Linked data, Under-resourced languages, Digital humanities, Language resources, Lexicography, Metadata, Linguistic linked open data, Linked-data-based services,
Unit Leader:
Unit Members:




Publications:

ICACCS 2020: International Conference on Advanced Computing \& Communication Systems (ICACCS)

Ruba Priyadharshini, Bharathi Raja Chakravarthi, Mani Vegupatti and John P.Mccrae
ICACCS 2020: International Conference on Advanced Computing \& Communication Systems (ICACCS)

Proceedings of 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop at LREC 2020

Proceedings of 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop at LREC 2020

Christian Chiarcos and Bettina Klimek and Christian Fäth and Thierry Declerck and John P. McCrae
Proceedings of the 1st International Workshop on Language Technology Platforms at LREC 2020

Proceedings of the 1st International Workshop on Language Technology Platforms at LREC 2020

Suryawanshi, Chakravarthi, BR, Verma, Arcan, McCrae, JP, Buitelaar
Proceedings of the 5th Workshop on Indian Language Data: Resources and Evaluation (WILDRE-5) at LREC-2020

Ana Salgado and Sina Ahmadi and Alberto Sim\~oes and John P. McCrae and Rute Costa
Proceedings of the 7th Workshop on Linked Data in Linguistics: Building tools and infrastructure at LREC 2020

John P. McCrae and Alexandre Rademaker and Ewa Rudnicka and Francis Bond
Proceedings of the Multimodal Wordnets Workshop at LREC 2020

Omnia Zayed and John P. McCrae and Paul Buitelaar
Proceedings of the 12th Language Resource and Evaluation Conference (LREC 2020)

Francis Bond and Luis Morgado da Costa and Michael Wayne Goodman and John P. McCrae and Ahti Lohk
Proceedings of the 12th Language Resource and Evaluation Conference (LREC 2020)

Proceedings of the 12th Language Resource and Evaluation Conference (LREC 2020)

Proceedings of the 12th Language Resource and Evaluation Conference (LREC 2020)

Morphosyntactic Variation in Medieval Celtic Languages: Corpus-based approaches

Mihael Arcan and Daniel Torregrosa and Sina Ahmadi and John P. McCrae
Proceedings of the 2nd Translation Inference Across Dictionaries (TIAD) Shared Task

Proceedings of the 2nd Translation Inference Across Dictionaries (TIAD) Shared Task

Mustafa Jarrar and Hamzeh Amayreh and John McCrae
Proceedings of the Poster Track of LDK 2019

Adrian Doyle and John P. McCrae and Clodagh Downey
Proceedings of the Celtic Language Technology Workshop 2019

John P. McCrae and Adrian Doyle
Proceedings of the Celtic Language Technology Workshop 2019

Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019) at ACL 2019

Sina Ahmadi and Mihael Arcan and John McCrae
Proceedings of the Poster Track of LDK 2019

Proceedings of Sixth Biennial Conference on Electronic Lexicography, eLex 2019

Proceedings of Sixth Biennial Conference on Electronic Lexicography, eLex 2019

Sina Ahmadi and Hossein Hassani and John P. McCrae
Proceedings of Sixth Biennial Conference on Electronic Lexicography, eLex 2019

Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages (LoResMT 2019)

John P. McCrae and Alexandre Rademaker and Francis Bond and Ewa Rudnicka and Christiane Fellbaum
Proceedings of the 10th Global WordNet Conference \textendash GWC 2019

Sai Krishna Lakshminarayanan and John McCrae
Proceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science

Omnia Zayed and John P. McCrae and Paul Buitelaar
2nd Conference on Language, Data and Knowledge (LDK 2019)

John P. McCrae and Thierry Declerck
Proceedings of the Language Technology 4 All Conference

Simon Krek and Thierry Declerck and John Philip McCrae and Tanja Wissik
Proceedings of the Language Technology 4 All Conference

Adrian Doyle and John P. McCrae and Clodagh Downey
Proceedings of the Celtic Language Technology Workshop 2019

John P. McCrae and Adrian Doyle
Proceedings of the Celtic Language Technology Workshop 2019

McCrae, John Philip, Fransen, Theodorus
International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide

Bharathi Raja Chakravarthi and Mihael Arcan and John P. McCrae
Proceedings of the 9th Global WordNet Conference

Bolette Pedersen and John McCrae and Carole Tiberius and Simon Krek
Proceedings of the 9th Global WordNet Conference

Omnia Zayed and John P. McCrae and Paul Buitelaar
Proceedings of the Workshop on Figurative Language Processing

Rajdeep Sarkar and John P. McCrae and Paul Buitelaar
Proceedings of the 11th Language Resource and Evaluation Conference (LREC)

Ian D. Wood and John P. McCrae and Vladimir Andryushechkin and Paul Buitelaar
Proceedings of the 11th Language Resource and Evaluation Conference (LREC)

Housam Ziad and John P. McCrae and Paul Buitelaar
Proceedings of the 11th Language Resource and Evaluation Conference (LREC)

Proceedings of the XVIII EURALEX International Congress on Lexicography in Global Contexts

Proceedings of Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Adrian Doyle and John P. McCrae and Clodagh Downey
Processing of the 3rd Workshop for Collaboration and Computing for Under-Resourced Languages

Proceedings of the 9th Global WordNet Conference

John P. McCrae and Ian D. Wood and Amanda Hicks
Proceedings of the 9th Global WordNet Conference

John P. McCrae and Ian D. Wood and Amanda Hicks
Proceedings of the 9th Global WordNet Conference

Bharathi Raja Chakravarthi and Mihael Arcan and John P. McCrae
Proceedings of the 9th Global WordNet Conference

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence

Andrejs Abele and John P. McCrae and Paul Buitelaar
Proceedings of the First Conference on Language, Data and Knowledge (LDK2017)

Bettina Klimek and John P. McCrae and Christian Chiarcos and Sebastian Hellmann
Proceedings of the First Conference on Language, Data and Knowledge (LDK2017)

John P. McCrae and Ian Wood and Amanda Hicks
Proceedings of the First Conference on Language, Data and Knowledge (LDK2017)


John P. McCrae and Mihael Arcan and Paul Buitleaar
Proceedings of the First Workshop on Multi-Language Processing in a Globalising World (MLP2017)

John P. McCrae and Kartik Asooja and Nitish Aggarwal and Paul Buitelaar
NUIG-UNLP at SemEval-2016 Task 1: Soft Alignment and Deep Learning for Semantic Textual Similarity

John P. McCrae and Georgeta Bordea and Paul Buitelaar
1st Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability

John P. McCrae and Philipp Cimiano and Paul Buitelaar and Georgeta Bordea
PARSEME/ENeL workshop on MWE e-lexicons

John P. McCrae and Narumol Prangnawarat
Proceedings of the First Workshop on Knowledge Extraction and Knowledge Integration (KEKI-2016)

John P. McCrae and Philipp Cimiano
Proceedings of the ISWC 2016 Posters and Demo Track

Proceedings of the ISWC 2016 Posters and Demo Track

10th Language Resource and Evaluation Conference (LREC)

Piek Vossen and Francis Bond and John P. McCrae
Proceedings of the Global WordNet Conference 2016

Francis Bond and Piek Vossen and John P. McCrae and Christiane Fellbaum
Proceedings of the Global WordNet Conference 2016

Proceedings of the 4th Workshop on Linked Data in Linguistics

John P. McCrae and Philipp Cimiano
Proceedings of the 11th International Conference on Semantic Systems

Benjamin Siemoneit and John P. McCrae and Philipp Cimiano
Proceedings of the 4th Workshop on Linked Data in Linguistics

Proceedings of the 4th Workshop on the Multilingual Semantic Web

Proceedings of 12th Extended Semantic Web Conference

Lars Borin and Dana Dannells and Markus Forsberg and John P. McCrae
Proceedings of the ISWC 2014 Posters \& Demonstrations Track - a track within the 13th International Semantic Web Conference

Proceedings of the 9th Language Resource and Evaluation Conference

Francesca Quattri and Adam Pease and John P. McCrae
Proceedings of 4th Workshop on Cognitive Aspects of the Lexicon

John P. McCrae and Christina Unger and Francesca Quattri and Philipp Cimiano
Proceedings of 4th Workshop on Cognitive Aspects of the Lexicon

John P. McCrae and Philipp Cimiano
Proceedings of the 8th International Workshop on Semantic Evaluation

John P. McCrae and Christiane Fellbaum and Philipp Cimiano
Proceedings of the 3rd Workshop on Linked Data in Linguistics

Proceedings of the 3rd Workshop on Linked Data in Linguistics

John P. McCrae and Cord Wiljes and Philipp Cimiano
Proceedings of the 1st Workshop on Linked Data Quality

Christina Unger and John McCrae and Sebastian Walter and Sara Winter and Philipp Cimiano
Proceedings of 1st International Workshop on NLP and DBpedia

John P. McCrae and Philipp Cimiano
Proceedings of the Joint Workshop on NLP\&LOD and SWAIE: Semantic Web, Linked Open Data and Infromation Extraction

Peter Menke and John P. McCrae and Philipp Cimiano
Proceedings of the 2nd Workshop on Linked Data in Linguistics

Elena Montiel-Ponsoda and John McCrae and Guadalupe Aguado-de-Cea and Jorge Gracia
Proceedings of the 10th International Conference on Terminology and Artificial Intelligence

John McCrae and Philipp Cimiano and Roman Klinger
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Philipp Cimiano and John McCrae and Paul Buitelaar and Elena Montiel-Ponsoda
New Trends of Research in Ontologies and Lexical Resources

Dennis Spohr and Philipp Cimiano and John McCrae and Sean O'Riain
First International Workshop on Finance and Economics on the Semantic Web in conjunction with 9th Extended Semantic Web Conference

John McCrae and Elena Montiel-Ponsoda and Philipp Cimiano
Proc. of the 2012 International Conference on Language Resource and Evaluation

John McCrae and Philipp Cimiano
Proc. of the Workshop on Collaborative Resource Development and Delivery at the 2012 International Conference on Language Resource and Evaluation

Paul Buitelaar and Philipp Cimiano and John McCrae and Elena Montiel-Ponsoda and Thierry Declerck
Proc. of 9th International Conference on Terminology and Articial Intelligence

Elena Montiel-Ponsoda and Guadalupe Aguado-de-Cea and John McCrae
Proc. of 9th International Conference on Terminology and Articial Intelligence

John McCrae and Dennis Spohr and Philipp Cimiano
Proc. of the 8th Extended Semantic Web Conference

Fifth Workshop on Syntax, Structure and Semantics in Statistical Translation in conjunction with 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Internal Financial Control Assessment Applying Multilingual Ontology Framework

John McCrae and Jesus R. Campaña and Philipp Cimiano
Proceedings of the 1st Workshop on the Multilingual Semantic Web

In Proc. of the 23rd International Conference on Computational Linguistics

Proc. of Workshop on Semantic Authoring, Annotation and Knowledge Markup in conjunction with the 5th International Conference on Knowledge Capture