Research Seminars

Research seminars

Multilingual corpus analysis software: Problems and limitations from an end-user point of view

Hugo Sanjurjo González (University of Huddersfield) 

Wednesday 20th March 2019

This talk reports on the problems and limitations of the current software for corpus linguistic analysis for multilingual corpora. It is well known that corpus linguistics software has evolved considerably over the last years, through any web browser users can create, analyse and get complex and visual appealing statistics from their own corpora without installing any additional software. However, from the point of view of a non-technical user that wants to analyse a bi/multilingual parallel corpus with linguistic annotation, most of the available software is still failing to take into account the usability of the building process.  Most of the usability problems are related to some activities such as aligning, tagging or the required corpus formatting. These activities are often carried out using programs that lack of user interface or demand complex system configurations. In addition, using non English language may affect software requirements, making resource availability less reliable, both in quantity and quality.

In this talk some possible solutions will be described. Special focus will be given to ACTRES Corpus Manager (ACM),  a software for corpus analysis belonging to the ACTRES Research Group. ACM allows users to create their own corpora (monolingual, bi/multilingual parallel and comparable) with linguistic annotation, make linguistic queries and obtain the most common statistics without technical assistance during the process and regardless of the technical skills of the user.  ACM tries to overcome usability problems by means of the automation of critical activities, employing existing tools and resources together with additional custom-built software. The usability of the linguistic user point of view is used as a key factor based.