Linguistics@Huddersfield
banner.png

Research Seminars

Research seminars

Multilingual corpus analysis software: Problems and limitations from an end-user point of view

Hugo Sanjurjo González (University of Leeds) 

Wednesday 20th March 2019

This talk reports on the problems and limitations of the current software for corpus linguistic analysis for multilingual corpora. It is well known that corpus linguistics software has evolved considerably over the last years, through any web browser users can create, analyse and get complex and visual appealing statistics from their own corpora without installing any additional software. However, from the point of view of a non-technical user that wants to analyse a bi/multilingual parallel corpus with linguistic annotation, most of the available software is still failing to take into account the usability of the building process.  Most of the usability problems are related to some activities such as aligning, tagging or the required corpus formatting. These activities are often carried out using programs that lack of user interface or demand complex system configurations. In addition, using non English language may affect software requirements, making resource availability less reliable, both in quantity and quality.

In this talk some possible solutions will be described. Special focus will be given to ACTRES Corpus Manager (ACM),  a software for corpus analysis belonging to the ACTRES Research Group. ACM allows users to create their own corpora (monolingual, bi/multilingual parallel and comparable) with linguistic annotation, make linguistic queries and obtain the most common statistics without technical assistance during the process and regardless of the technical skills of the user.  ACM tries to overcome usability problems by means of the automation of critical activities, employing existing tools and resources together with additional custom-built software. The usability of the linguistic user point of view is used as a key factor based.

Linguistics@Huddersfield
What kinds of adjectives do preschoolers encounter in the input, and how do they process what they hear?

Catherine Davies (University of Leeds) 

Wednesday 13th March 2019

Adjectives are a challenging and relatively late-developing word class. In this novel corpus analysis of British English, we measure three and four year-olds’ quantitative and qualitative exposure to adjectives across a range of interactive and socioeconomic contexts in order to: i) measure the syntactic, semantic, and pragmatic variability of adjectives in child-directed speech (CDS); and ii) investigate how features of the input might scaffold adjective acquisition. Adjectives occurred more frequently in prenominal than in postnominal syntactic frames, though less familiar adjectives were more likely to appear postnominally. They also occurred much more frequently with a descriptive than a contrastive function, especially for less familiar adjectives. This pattern held across free play CDS, shared book reading CDS, and in children’s book texts. Our findings present a partial mismatch between the forms of adjectives found in real-world CDS and those forms that should be most developmentally useful, i.e. in postnominal frames and with a contrastive function. Results are discussed in light of their implications for sentence processing, clinical practice, and for models of adjective acquisition.

Linguistics@Huddersfield
"If a Lion Could Speak": Wittgenstein and Alien Forms of Life

Alexander Carter (University of Cambridge) 

Wednesday 6th March 2019

According to Wittgenstein, that which makes it possible for us to understand (or misunderstand) other human beings makes it impossible for us to understand (or misunderstand) other non-human beings. Crucially, this barrier to understanding is not language per se—hence Wittgenstein’s pronouncement that ‘if a lion could speak, we could not understand him.’ Rather, our inability to understand the lion lies in its living an alien ‘form of life’. 

‘Wittgenstein’s lion’ therefore raises a number important questions about i) the co-dependence of language and thought, ii) the extent to which all languages are essentially human and iii) the possibility of creating an artificial intelligence. These questions form the basis of my wider research into Wittgenstein’s later philosophy and his relevance to contemporary, philosophical debates. 

However, my aim in this brief talk will be to consider the significance of ‘Wittgenstein’s lion’ in the context of the blockbuster film Arrival. Just how much of our reality is shaped by the language we use? To what extent are the aliens (mis)understood? And does the film do more than act as an allegory for human misunderstanding and disagreement?

Linguistics@Huddersfield
"Happy shopping, me ducks!": The Dukki Facebook Corpus

Luke Collins (Lancaster University) 

Wednesday 27th February 2019

Businesses engaging with social media face the challenge of distinguishing themselves through a clear and coherent brand identity. I present a case study of how the Nottingham-based independent business ‘Dukki’ uses Facebook to establish a brand identity that is firmly rooted in the local community and manifest in the appropriation of the Nottinghamshire dialect. This case study demonstrates how researchers can use corpus approaches to investigate highly stylised forms of online communication as well as how businesses construct their own identity and that of their customer base.

In addition to representing the local dialect as a feature of the products that they sell, Dukki also constructs the business’s Facebook posts through an approximation of the dialect. This study examines a corpus of 862 posts collected throughout 2017 and considers the following questions:

  • How does Dukki represent itself as a business in its Facebook posts?

  • How does Dukki characterise its customer base in its Facebook posts?

  • How does Dukki contribute to the enregisterment (Agha, 2003) of the Nottinghamshire dialect through its Facebook posts?

Keyword analysis identified the labels through which Dukki refers to itself and to its customers and by analysing the collocates of those terms, we can see what types of qualities and processes Dukki projects onto itself and presupposes of its customers.

References

Agha, A. (2003). The social life of cultural value. Language and Communication 23, 231–273.

Linguistics@Huddersfield
Researching Interdisciplinary Discourse

Susan Hunston (University of Birmingham)

Wednesday 13th February 2019

This talk reports on a project designed to identify what is special about the language used in research articles in the field of environmental science, particularly in journals identified as ‘interdisciplinary’. The methods used in the project range from genre analysis of individual texts, through studies of individual words and phrases using concordancing software, to more quantitative, technical approaches to corpus data such as multi-dimensional analysis and topic modelling.

The findings point to two main conclusions. Firstly, researchers who write for an interdisciplinary journal bring with them the practices of their own discipline. As a consequence, interdisciplinary journals incorporate a variety of such practices in comparison with monodisciplinary journals. Secondly, researchers in this field, as in all fields, construct their identity in their discourse. This identity can be more or less interdisciplinary and can be ‘conciliatory’ or ‘antagonistic’ in its stance towards other disciplines.

Carrying out this project has raised many questions for us, including: What data should we collect? What methods should we use to analyse it? What does it all mean and does it matter? In the talk I shall give a sense of how these questions were approached as well as what we think the answers are. The talk is therefore about how a project of this kind comes about as well as what it finds.

Linguistics@Huddersfield
Making big data smaller: Developing low-resource accent recognition technology

Georgina Brown (Lancaster University)

Wednesday 6th February 2019

Big data is alive and well across disciplines. Its arrival has meant that we can extend our toolkits for research and it is vital to developing many technologies. This talk raises some issues that come with big data, with specific reference to forensic speech technology. In forensic speech casework, we face very particular problems, and these kinds of problems do not necessarily respond well to big data solutions. The main task undertaken by forensic speech scientists is the speaker comparison task, where multiple recordings are analysed and assessed to establish how likely the same speaker features in each. Automatic speaker recognition technology has emerged as a possible option to assist in these kinds of cases. These systems require masses of data for training. However, often the recording conditions are so specific to a given case, we do not yet know how well speaker recognition technology will cope with those conditions. It can be simply impractical to find the quantities of case-relevant data that the data-hungry speaker recognition technologies demand. Research into some of these unknowns is currently underway in the forensic speech science community. While pursuing technological innovations for forensic speech casework is often encouraged, we still need to ensure that our methods are explicable, transparent and transferrable across different cases.

While drawing on the parallels in work being done on automatic speaker recognition, this talk discusses how recent work has aimed to achieve more favourable properties in automatic accent recognition technology (technology that assigns an accent label to a given speech recording). It has been proposed that automatic accent recognition technology could offer some assistance to forensic speech scientists in the context of speaker profiling tasks. These are cases where we have recordings of speakers with no suspects. The task is to extract as much information about the speaker as possible. Automatic accent recognition is lagging far behind automatic speaker recognition in terms of success rates, and it seems that the task of collecting enough data to improve performance is even more daunting for accent recognition than it is for speaker recognition.  This talk will present the York ACCDIST-based automatic accent recognition system (Y-ACCDIST), which is a linguistically-informed approach to the accent recognition problem. As a consequence, it demands only a fraction of the data that other approaches do, while also remaining an explicable method which is crucial in the context of forensic applications.

 

Linguistics@Huddersfield
Doctor-patient interaction at a Jordanian university hospital: A conversation analysis study

Rula Abu-Elrob (PhD student University of Huddersfield)

Wednesday 30th January 2019

This study is concerned with analysing medical talk from a conversation analysis view point through identifying fundamental patterns that underpin these medical consultations in terms of the overall structure of the interactions and the turns that make up each segment. Attention is paid to those parts where the participants orient to the medical agenda and where they depart from it (referred to as ‘side talk’).

Medical talk has been studied in the context of different countries but not in Jordan. Investigating the patterns in Jordanian medical talk is important to discover the culturally specific features of Jordanian consultations and similarities with consultations in other countries. Thus, analysis focused on how consultations are opened, how doctors elicit the necessary information, how diagnosis and treatment are managed and how the interaction is closed. A lack of studies analysing the medical talk in Arab countries in general and in the Jordanian culture in particular is another reason to provide information about the medical interaction from a CA point of view.

The findings show that the medical phases, of 20 audio recorded consultations from the internal clinic at King Abdullah University Hospital (KAUH), occur in most of the consultations. Each phase had elements that characterise medical talk; some of them are specific to Jordanian medical talk, such as the use of the religious greeting ‘peace upon you’ in the opening phase and the use of ‘invocations’ in the closing phase. Side talk occurred in all the phases of the medical interaction with a higher frequency in the middle of the consultations (presenting the complaint, history- taking, diagnosis and treatment phases) than at the margins (opening and closing). Side talk was found to affect the way sequences are opened and closed, the sequences themselves and the turns that constitute them. These findings provide a compelling resource for (KAUH) and other hospitals to help improve doctors’ communication skills. The use of CA provides hospitals with naturalistic and empirical data in addition to a detailed description of how the effective communication occurs in the medical consultations.

Linguistics@Huddersfield
The Magic Words: Please and Thank You in American and British English

M. Lynne Murphy (University of Sussex)

Wednesday 16th January 2019

Various corpus studies have found that please and thank* (i.e. thanks or thank you) occur in inverse proportions in American and British English, with British using please at around twice the rate of Americans and Americans thanking up to twice as much as Britons. Given such a severe difference, we have to wonder: do these words perform the same functions in the two countries?

This talk presents the results of three corpus studies. Two (with Rachele De Felice of University College London) examine please and thank* in US and UK corporate emails from the 1990s–early 2000s. Because the corpora are speech-act tagged, we were able to look at both the presence and absence of please in requests and to analyse which types of impositions attract please in the two corporate cultures. For thank* we analysed (among other things) its use as a marker of gratitude versus its use as a request marker, especially in US English. The third study looks at usage of please in the GloWBE corpus of web-based English, and considers its full range of usage: as a sincere request marker, but also as an expression of exasperation/disbelief, as a tool of mock politeness, etc.

Different rates of usage reflect the different functions and meanings the words have in the two cultures, but also perhaps, more generally, different values for formulaicness in politeness marking, recalling Alexis de Tocqueville’s 1840 observation that American manners are  “neither so tutored nor so uniform” as the British but “they are frequently more sincere”.

About the speaker:
M. Lynne Murphy is Professor of Linguistics at the University of Sussex. She is the author of Semantic Relations and the Lexicon (Cambridge UP, 2003), Lexical Meaning (Cambridge UP, 2010) and The Prodigal Tongue: The Love–Hate Relationship between American and British English (Penguin, 2018).

Linguistics@Huddersfield
Linguistics Research Seminars 2018-2019 - Term 2 Schedule

The new schedule is here!

The Linguistics Research Seminar Series offers you the chance to hear about the latest research developments in Linguistics and Modern Languages. Seminars last around an hour and are open to anyone interested.

Fransina de Jager has taken over the coordination of these seminars from Hazel Price. She has done an amazing job over the past few years. Thank you, Hazel!

For more information, please contact Fransina.

Term 2.jpg
Linguistics@Huddersfield
How people use iconicity to create words from scratch

Marcus Perlman (University of Birmingham)

wednesday 5th december 2018

Iconicity clearly played an important role in the formation of many of the signs of signed languages. But how were the first spoken words created? In this talk, I present a series of experiments demonstrating how people can use vocal iconicity to create words from scratch. These studies show: 1) people can innovate iconic vocalizations to express a wide variety of meanings; 2) these vocalizations are understandable to naïve listeners, 3) including listeners from disparate cultural and linguistic backgrounds; and 4) through repeated interactions – and even just rote imitations – the vocalizations become more word-like in form and function. Taken together, these studies show how iconicity can play a vital role in the creation of spoken symbols, comparable to its function in the creation of many signs. Thus, I speculate that the use of iconic vocalizations was fundamental in the formation of the first spoken words.

Linguistics@Huddersfield
The many meanings of English: An ontological framework for Applied English Linguistics

Christopher J Hall (York St John University)

28th november 2018

Searle (2008, pp. 43-4) states that, in the social sciences, “[u]nless you have a clear conception of the nature of the phenomena you are investigating, you are unlikely to develop the right methodology and the right theoretical apparatus for conducting the investigation”. Addressing teachers, Harris (2009, p. 25) asserts: “Whether you realize it or not, you are teaching not just English […], but a certain view of what that language is, and also a certain view of what a language is [...].” So for both research and practice, considering the ontological status of (the) English (language) is fundamental. Yet currently there is no explicit framework for specifying the many ways in which English can be said to exist. In this presentation I will propose such a framework, claiming that English, when used in relation to language, names types of entities associated with two ontological categories. One set of types sits within the ontological category of the language capacity, the species property. Within this category, English refers to individual instantiations of the broader capacity. The second set of ontological types is socially constructed on the basis of the contemplation of the first set; these types are all directly or indirectly derived from the process of collective identification (Jenkins, 2004) holding at the level of nation. Polemically, I will suggest that understandings of English provided within linguistics and purveyed in teachingare derived from, conditioned by, or defined with reference to, this second ontological category, rather than directly from the first. Some critical and pedagogical implications of this for English applied linguistics will be discussed.

 References 

Harris, R. (2009). Implicit and explicit language teaching. In Toolan, M. (ed.),Language teaching. Intergrational linguistic approaches (pp. 24-46). London: Routledge.

Jenkins, R. (2004). Social identity (3rd edn). London: Routledge.

Searle, J. R. (2008). Language and social ontology. Theory and society, 37, 5, 443-459.

Linguistics@Huddersfield
Liverpool lexicography: compiling the Liverpool English Dictionary

Tony Crowley (University of Leeds)

Wednesday 21st November 2018

The Liverpool English Dictionary: A Record of the Language of Liverpool 1850-2015 on Historical Principles (Liverpool University Press, 2017), was based on more than thirty years research and was compiled using traditional and contemporary lexicographical methods. Together with Scouse: A Social and Cultural History(LUP, 2012), the LED refutes the traditional account of the history of language in Liverpool.

In this paper I will address a number of questions that arose during the historical documentation of this local urban vernacular. These include issues such as: the difficulty of explaining the creation of Liverpool English and its dating; cultural and linguistic boundaries; spatio-linguistic variation within the city; major influences on the form; historical attitudes towards it; its role in the formation of cultural identity. 

Linguistics@Huddersfield
Lovely Nurses and Rude Receptionists: A corpus analysis of patient comments about the NHS.

PAUL BAKER (LANCASTER UNIVERSITY)

WEDNESDAY 6TH NOVEMBER 2018

This talk reports on the analysis of a 29 million word corpus of over 200,000 patient comments posted on the NHS Choices website between 2013 to 2015. The study was funded by an ESRC Knowledge Exchange Grant and involved answering questions that were set by Patients and Information Directorate, NHS England. In this talk I address one area of the research project which aimed to examine key differences in patient’s experiences across different types of healthcare providers (e.g. dentists vs GPs). Taking a corpus-based approach we identified frequent forms of positive and negative evaluation for different types of NHS staff, as well as considering the most frequently associated collocates and keywords in different sub-sections of the corpus. Concordance analyses helped to interpret and explain the patterns we found. The findings from the analysis reveal insights into both the underlying nature of patient feedback and the current state of the NHS.

Linguistics@Huddersfield
The importance of gesture for documenting language: A study of gesture in two Modern South Arabian Languages.

JACK WILSON (UNIVERSITY OF SALFORD)

wednesday 31st october 2018

Most observers are aware that when people speak, or interact with each other more generally, they also move their bodies in a variety of ways. Such movements may be referred to as gestures. In recent years, linguists, sociologists and psychologists have explored gestures from a variety of perspectives. Research into language and gesture has demonstrated that the two are linked phonetically, syntactically, and semantically. While this observation has had a dramatic impact on theories within linguistics, it has not had a similar impact on standard practice for language documentation. This has resulted in a continued bias towards the collection of audio rather than audio-video data. 

In this talk, I will demonstrate the importance of collecting audio-video data for linguistic analysis, by showing how critical gesture is during face-to-face interaction. Throughout the presentation I will use examples from two endangered Modern South Arabian Languages, Merhi and Śḥerεt, to demonstrate the importance of gesture annotation and analysis. I argue that the collection of audio-video data is especially important for the documentation and analysis of Merhi and Śḥerεt (andmany other endangered languages) because face-to-face communicative practices (which always include gesture) constitute the language. 

Linguistics@Huddersfield
“The bloodiness and horror of it”: Exploring metaphorical accounts of endometriosis pain

Stella Bullo (Manchester Metropolitan University)

WEDNESDAY 24TH OCTOBER 2018

This work explores the challenges of endometriosis pain communication and the conceptualisation of pain by women who suffer from the debilitating gynecological disease of endometriosis. Endometriosis affects 1 in 10 women yet its worldwide average diagnosis length is 7.5 years. Among other manifestations, it causes severe pain in women. However, it is not infrequent to find health-care practitioners that dismiss and normalize pain as part of the female condition (Bullo, 2018). Studies also suggest that dismissal or normalisation leading to diagnosis delay may also happen as a result of miscommunication of symptoms, in particular, the way in which pain is communicated and explained during early consultations. Medical research advises that the endometriosis pain experience should assess not only the severity but also pain mechanisms (Morotti, et al., 2016). 

In this paper, I work with elicited and non-elicited data collected through interviews and online forum contributions to investigate metaphorical accounts of endometriosis pain by women. I compare the affordances of metaphors and similes to offer insights into how pain is conceptualised and communicated. The findings of the study have implications for endometriosis communication practices and will provide the basis for broader enquiries on the pain making sense experience and its communication.

Linguistics@Huddersfield
Gendered social structure in 19th century children’s literature: A corpus linguistic approach

anna cermakova (University of Birmingham)

wednesday 17th october 2018

In this talk, I am going to investigate the social structure in 19th century children’s literature with a particular focus on gender representation. By using ChiLit, a 19th century corpus of children’s literature and other corpus resources I am going to illustrate how a range of textual evidence can help us gain access to different layers of society. With the example of how ‘mothers’ are portrayed, constructed, and otherwise present or absent in children’s books published in the 19th century, I will show how a corpus linguistic approach makes it possible to identify some of characteristics of social structures that are shared across children’s books. The shared elements of fictional worlds provide key links to the real world of the time that in turn has influenced contemporary social structures in a profound way. Gender is one of the fundamental structuring principles of our society and as such it is reflected and reproduced in everyday language practice. Everyone grows up into a gendered discourse. However, gender is constructed in different ways in different discourses and the discourses also have a diachronic dimension. In order to understand today’s discourse on gender, we need to understand its origins. 

Linguistics@Huddersfield
Assertion & Presupposition: A Cross-Linguistic Experimental Investigation into the Syntax-Meaning Mapping

Kajsa Djarv (University of Pennsylvania)

Wednesday 10th october 2018

In this talk I present new data from a large-scale cross-linguistic experimental study, addressing two empirical questions regarding attitude predicates that take declarative clausal complements:

  1. (To what extent) are ASSERTION and PRESUPPOSITION syntactically encoded in the embedded clause?

  2. How does the lexical meaning of attitude verbs constrain: (i) the availability of different kinds of declarative complements; and (ii) the interpretation of the embedded clause, and the utterance as a whole?

These questions are at the heart of several theoretical debates, going back to Kiparksy & Kiparsky 1970, Emonds 1970, Hooper & Thompson 1973, and Stalnaker 1974, 1978. Much of the work in this area has centered around a family of constructions, so-called Main Clause Phenomena (MCP), which although typically confined to unembedded clauses, are also observed to be available in some embedded declaratives. The consensus view in the literature is that the availability of MCP correlates positively with ASSERTION, and negatively with PRESUPPOSITION.

However, despite its long history, providing a precise characterization---and hence a predictive theory---of this syntax-meaning relationship, has proven challenging. I attribute this to two gaps in our knowledge:

  1. First, it is not clear to what extent these constructions actually represent a (syntactically or semantically) homogeneous class.

  2. Secondly, assertion and presupposition are both multifaceted concepts; what specific aspects are relevant to the syntax?

    This talk presents new data from an on-going large-scale experimental study investigating the semantic-pragmatic conditions governing the availability of 4 different MCP, across 3 different languages, thus addressing these knowledge gaps, and paving the way for a predictive theoretical account.

Linguistics@Huddersfield
Towards a corpus linguistics of sign languages: The case of indicating verbs in British Sign Language

Adam Schembri (University of Birmingham)

wednesday 3rd october 2018

In this talk, I will begin by addressing some of the widespread myths and misconceptions around sign languages (e.g., their origins, the universality of sign languages, their relationship to spoken languages) and what basic facts every (hearing) linguist really ought to know so that they can respond when they encounter these widespread misunderstandings. I will then focus on some of my work in collaboration with the British Sign Language Corpus (www.bslcorpusproject.org) team, discussing how the BSL Corpus was created, how it is being annotated, and what discoveries we are making in analysing the data. In particular, I will focus on recent work on a subset of verbs in BSL known as ‘indicating verbs’. These signs, found in the vast majority of sign languages documented to date,  can move between locations in space associated with the referents of the arguments of the verb, and thus can use space to distinguish subject and/or object. They have been the subject of intense debate, because this directionality has been compared to person agreement marking in spoken languages. A number of claims about indicating verbs have been made in the BSL and the wider sign language linguistics literature, and we will explore how corpus-based studies have begun to challenge some of these assumptions. 

Linguistics@Huddersfield
The psychological validity of collocation and related association measures

JENNIFER HUGHES (LANCASTER UNIVERSITY )

26TH SEPTEMBER 2018

In this presentation, I discuss the results of four experiments which combine methods from corpus linguistics and cognitive neuroscience in order to investigate the psychological validity of collocation and different measures of collocation strength. For each experiment, I extracted collocational adjective-noun bigrams from the BNC1994. I then constructed matched non-collocational bigrams which are absent from the BNC1994, and examined concordance lines to find suitable sentence contexts for each bigram pair. Participants then read these sentences on a computer screen one-word-at-a-time while their brain activity was recorded using scalp electrodes. This method of detecting the electrical activity of the brain by placing electrodes across the scalp is known as electroencephalography (EEG). More specifically, I used the Event-Related Potential (ERP) technique of analysing brainwave data, where the brain activity is measured in response to particular stimuli.

The aim of Experiment 1 was to pilot this procedure for determining whether or not there is a neurophysiological difference in the way that the native speaker brain processes collocational adjective-noun bigrams (e.g. clinical trials) compared to matched non-collocational adjective-noun bigrams (e.g. clinical devices). The aim of Experiment 2 was to replicate the results of Experiment 1 in another group of native English speakers, and the aim of Experiment 3 was to investigate the same phenomena in non-native speakers of English (specifically, native speakers of Mandarin Chinese). Finally, in Experiment 4, I treated collocationality as a continuous rather than a dichotomous variable in order to investigate the gradience of the ERP response, and I also aimed to investigate the psychological validity of the following association measures: transition probability, mutual information, log-likelihood, z-score, t-score, Dice-coefficient, MI3, and raw frequency. The results of this research have important implications for the field of corpus linguistics.

Linguistics@Huddersfield