Making big data smaller: Developing low-resource accent recognition technology
Georgina Brown (Lancaster University)
Wednesday 6th February 2019
Big data is alive and well across disciplines. Its arrival has meant that we can extend our toolkits for research and it is vital to developing many technologies. This talk raises some issues that come with big data, with specific reference to forensic speech technology. In forensic speech casework, we face very particular problems, and these kinds of problems do not necessarily respond well to big data solutions. The main task undertaken by forensic speech scientists is the speaker comparison task, where multiple recordings are analysed and assessed to establish how likely the same speaker features in each. Automatic speaker recognition technology has emerged as a possible option to assist in these kinds of cases. These systems require masses of data for training. However, often the recording conditions are so specific to a given case, we do not yet know how well speaker recognition technology will cope with those conditions. It can be simply impractical to find the quantities of case-relevant data that the data-hungry speaker recognition technologies demand. Research into some of these unknowns is currently underway in the forensic speech science community. While pursuing technological innovations for forensic speech casework is often encouraged, we still need to ensure that our methods are explicable, transparent and transferrable across different cases.
While drawing on the parallels in work being done on automatic speaker recognition, this talk discusses how recent work has aimed to achieve more favourable properties in automatic accent recognition technology (technology that assigns an accent label to a given speech recording). It has been proposed that automatic accent recognition technology could offer some assistance to forensic speech scientists in the context of speaker profiling tasks. These are cases where we have recordings of speakers with no suspects. The task is to extract as much information about the speaker as possible. Automatic accent recognition is lagging far behind automatic speaker recognition in terms of success rates, and it seems that the task of collecting enough data to improve performance is even more daunting for accent recognition than it is for speaker recognition. This talk will present the York ACCDIST-based automatic accent recognition system (Y-ACCDIST), which is a linguistically-informed approach to the accent recognition problem. As a consequence, it demands only a fraction of the data that other approaches do, while also remaining an explicable method which is crucial in the context of forensic applications.