Attributing Short Texts
Jack Grieve (University of Birmingham)
Short texts are a common problem in authorship analysis, in forensic, historical, and literary contexts. The basic issue for stylometric methods in particular, where texts are attributed based on a quantitative analysis, is that short texts do not provide a large enough sample to allow for reliable estimates of the relative frequencies of most linguistic features to be made. For example, a text containing 100 words will generally lack many common function words, but we certainly cannot assume that this is true of the author’s writings more generally. Consequently, stylometric methods are usually intended to be used on relatively long texts of at least 500 or 1,000 words. Many anonymous texts, however, are far shorter. To address this issue, we have developed a new quantitative approach for attributing short texts known as n-gram tracing. In this presentation, I will introduce the method, evaluate its general applicability, and apply it to a famous historical case of disputed authorship -- the Bixby Letter, a 139-word letter thought to have been written by either Abraham Lincoln, to whom it is usually attributed, or his personal secretary John Hay.