Monday, June 27, 2022

From aardvark to woke

From aardvark to woke: inside the Oxford English Dictionary

The OED's task – to define every English word – is as ambitious as it was 150 years ago.

By Pippa Bailey

The team at the Oxford English Dictionary felt some nervousness about writing the definition for "Terf", an acronym for trans-exclusionary radical feminist, which this month has been added to its pages. "To a certain extent, it is like any other word," says Fiona McPherson, a 50-year-old lexicographer from Grangemouth, Stirlingshire, who has worked at the dictionary since 1997. "But it would be disingenuous to say that it is exactly the same. There seems more at stake. You want to be accurate, you want to be neutral. But it's a lot easier to be neutral about a word that isn't controversial."

The Oxford English Dictionary (OED) has served as a lexical record of the world's most widely spoken language – and its culture – since it was founded in the mid-19th century. "Post-truth", for example, was the dictionary's word of 2016, the year of Brexit and Trump, while in 2020 it elected not to choose one – because no single word could sum up the pandemic experience. Last year, "police brutality", "deadname", "cancel culture" and "anti-vaxxer" entered the dictionary for the first time; previous years gave us "fake news" (2019), "Silent Generation" (2018) and "woke" (2017).

The June 2022 update includes several terms that reflect our changing understanding of sexuality and gender: "multisexual", "pangender", "gender expression", "gender presentation" and "enby" (derived from "NB", meaning "non-binary"), as well as Terf. But this wasn't, McPherson says, a conscious decision; rather, these additions organically came together as their usage grew. The team decided against labelling Terf "offensive", instead explaining in a usage note that it might be considered so; it was felt that this "was a bit more nuanced than just slapping on 'derogatory' or 'chiefly derogatory'".

[See also: Our words for describing the climate are changing – can they spur us to action?]

McPherson, who has an easy laugh and a melodic Scottish lilt, is part of a team that has been revising the OED since 1993, their progress published quarterly. Outdated entries are revised, new words are added and those that pass from use will be marked "rare" or "obsolete"; changing sensibilities mean that others will be labelled "offensive" or "derogatory". It is an enormous task, and one in which I have a professional as well as a personal interest: part of my role at the New Statesman involves maintaining our style guide, enforcing the rules of grammar and excising cliché. The decisions McPherson and her colleagues make filter into these pages; on questions of spelling and meaning, the team of sub-editors I lead defers to Oxford dictionaries.

The English language evolves at such a pace that, for the OED lexicographers, the goalposts aren't so much shifting as sprinting away from them. Once a word has gained its place, it may be moved – for example, to be listed as a variant spelling – but it is never taken out, meaning that the dictionary only ever expands. (This is true even of mistakes. The word "astirbroad" was added in 1885, but when an editor came to revise it in 2019, they discovered that it was an early-modern typo: the typesetter for the 17th-century book in which the word was originally found had dropped the word "stir" into "abroad". Still, astirbroad remains.) Nor is the OED limited to British English: the dictionary includes varieties spoken outside the UK – what its editors refer to as "World Englishes" – from Singapore to Jamaica.

The resulting dictionary details some 600,000 words. The most recent print edition – the second, published in 1989 – fills 20 volumes and would set you back £862.50. The initial plan was to complete the third edition by 2005, but 17 years later its editors are just halfway through. To an outside observer, the scale of the project – its grand ambition and granular detail – seems almost impossible, but when I put this to McPherson, she is unfazed. "We're in trouble if the language ends, and not just professionally," she says. "The English language is huge and so therefore the job we're doing is a big job – but I don't find it overwhelming. It's energising. It's exhilarating."

Content from our partners
How data can help revive our high streets in the age of online shopping
Why digital inclusion is a vital piece of levelling up

[See also: In Ukraine, Russian is now "the language of the enemy"]

Beyond the sandstone facade, immaculate lawns and wisteria-fringed courtyard of the Oxford University Press building, in the Jericho area of the city, around 70 OED staff work in an open-plan office that is humble compared to the august Dictionary Room its editors once occupied at the Old Ashmolean. It is here in late May that I meet McPherson and six of her colleagues – most of them in person, though a few join by video link (remote working at the OED pre-dated the pandemic; McPherson is based in Munich). Among them are Jane Johnson, another Scot and a new-words editor who is partial to a dataset; and Bernadette Paton, an Australian former art teacher who has been at the OED since 1987. A decade ago, Paton interviewed Danica Salazar, now 38, for her role as an editor; Salazar told Paton that the OED was "doing this World English thing wrong, and I will fix it" – something she has been doing ever since.

When they began working on the third edition, the team progressed alphabetically – though they started from M, because it was felt that, by that point, the editors of the first edition would have been better established in their approach. They got to the end of R before rethinking. Many entries hadn't been edited since the late-19th and early-20th centuries; if they continued in this manner, it would be a long time before they got to outdated definitions towards the beginning of the alphabet – such as "digital", the first sense of which in the previous edition was: "Of or pertaining to a finger, or to the fingers or digits." Such entries are known as priority words and tagged as needing urgent attention whenever an editor comes across one.

The lexicographers' work is collaborative, and it's clear from the way they bounce off each other in our conversation that this suits them. Editors draft in pairs, swapping entries in a kind of informal peer review, after which their words go to etymologists, bibliographers, the pronunciations team, external consultants and finalisation editors. The first step of this process can take anything from a few hours to a few weeks, depending on the word: there are more than 200 senses, for example, of the verb "run".

Paton recalls spending four weeks revising "business", the definition for which was first published in 1888 ("the quality or state of being busy", which we would now differentiate as "busyness"). "There was almost nothing about commercial enterprise," she says. "You were covering 120 years of development in probably the most enormous area of activity of the 20th century. It was hell at the time, but really interesting to see it at the end."

In 1857 a group of gentleman scholars from the Philological Society – Herbert Coleridge, grandson of the poet Samuel Taylor Coleridge, Frederick Furnivall (immortalised by his friend Kenneth Grahame as Ratty in The Wind in the Willows), and Richard Chenevix Trench – established the Unregistered Words Committee, with the aim of capturing those parts of the English language that had not yet been recorded. Previous attempts at a dictionary had been made, but none was comprehensive. Robert Cawdrey's 1604 Table Alphabeticall was the first monolingual English dictionary, but was more of a synonymicon; Samuel Johnson's Dictionary of the English Language, published in 1755, drew only on sources published after 1586, omitting the lexicon of great works from Chaucer to Bede. "Every word," wrote Coleridge at the time, "should be made to tell its own story – the story of its birth and life, and in many cases of its death, and even occasionally of its resuscitation."

But where to begin? Imagining that "an entire army would join hand in hand till it covered the breadth of the island", the Unregistered Words Committee called upon the public to help. In 1857 a reading project was begun: the committee issued a circular with instructions for writing quotation slips – postcard-size pieces of paper on which a reader, having found a quotation in a source that illustrated a particular usage, would write the details and send them to the dictionary team. Trench described this work as "drawing as with a sweep-net over the whole surface of English literature".


A quotation slip for "astirbroad" – the typo that took its place in the OED. Image reproduced by permission of Oxford University Press Archives

It was, as the OED's archivist Beverley McCulloch describes it to me, a "sort of early crowd-sourcing project". The volunteer readers were paid nothing for their efforts, though the most prolific sent in thousands of quotations. The story of one of them, a Broadmoor patient called William Minor, was told in a 2019 film starring Mel Gibson and Sean Penn, The Professor and the Madman – adapted from a book by Simon Winchester. (One editor laments that the film is "full of myths".) For women who were literate but prevented from seeking paid employment, the volunteer reading programme offered an approved-of occupation. "They must have devoted a significant fraction of their lives to it," says Peter Gilliver, a lexicographer who joined the OED on the same day as Paton in the late 1980s, and the author of The Making of the Oxford English Dictionary (2016).

[See also: Why the language we use to talk about the refugee crisis matters]

Anyone expecting the OED archive to have a grand, library-like interior will be disappointed (McCulloch says people are often surprised to discover her office has windows), but the musty smell of slowly decomposing paper is pervasive. Shelving units hold boxes upon boxes of the handwritten slips generated by the reading programme, some more than 150 years old and tied together in bundles. There are around 250 boxes of slips that made it into the dictionary, and a further 200 to 300 of what McCulloch calls "superfluous slips" – those that weren't used. Once sorted alphabetically, the slips were numbered so that order could be restored should a bundle be dropped. Some have old dictionary galley proofs or concert programmes on the back; paper was reused, particularly during the First World War. Others bear quotations cut and pasted from their source: "If you cut it out there are no copying errors. You have ruined the book, but…" says Gilliver with a grimace.

The slips don't bear the names of those who worked on them, but the small, precise script of JRR Tolkien, who was an editorial assistant at the dictionary from 1919 to 1920, is easily identified. The handwriting of Henry Hucks Gibbs, a benefactor who sent in many quotations, sloped to the right – until he lost his right hand in a shooting accident; after that it sloped to the left. One assistant, Arthur Maling, wrote on dust jackets, drafts of his will, even chocolate wrappers.

It wasn't until 1879 that Oxford University Press (OUP) signed up to publish the project; it could have been the Cambridge English Dictionary, but the rival institution's press turned it down. As part of the same deal, the indefatigable lexicographer and philologist James Murray joined as editor. (Work had foundered after the death of Coleridge, aged 30, from tuberculosis; he had lived long enough to reach the word "abrupt".)

A sort of corrugated iron shed was built in Murray's garden from which his team could work. In this "scriptorium", around 1,000 slips arrived daily ; it took two women more than two years to sort those that had accumulated even before Murray took over. The Murray children (there were 12) were paid between one penny and sixpence an hour, depending on their age, to help organise slips into alphabetical order. When an editor came to work on a word they would retrieve all the relevant slips, identify a word's different meanings and which quotations best illustrated them, put them in chronological order, and then write the top slip, with the definition and etymology. Later, the printers at the OUP building would have to decipher this paper patchwork, with all its crossings-out and revisions.

The deal struck between the Philological Society and OUP specified that the finished dictionary would run to 7,000 pages, and take ten years and cost £9,000 to produce. But five years after Murray took over, he and his team had got only as far as "ant". It was decided that the dictionary should be serialised to begin bringing in funds, and a first fascicle, from "A" to "Ant", was published in January 1884 (the dictionary eventually totalled ten such instalments). Further editors – Henry Bradley, William Craigie and Charles Onions – were hired to speed progress, but Murray did not live to see the completion of his magnum opus. He died on 26 July 1915, aged 78. The last entry bearing his hand-writing is "twilight".


Paper chain: James Murray (right) with his staff compiling the New English Dictionary, later the Oxford English Dictionary, date unknown. Photo by Granger – Historical Picture Archvie / Alamy

By the time the complete first edition was published in 1928, at a length of 16,000 pages, more than 70 years had passed since Coleridge, Furnivall and Trench first formed their committee. It was already out of date, and so several supplementary volumes were published in the mid-20th century. In 1984 OUP began a project to integrate the dictionary into a single work – the second edition – and to digitise its entries, an incredibly ambitious project in its time, costing $13.5m (then around £9m) over five years. This is the text whose revision occupies the team at the OED today.

"Our histories, our novels, our poems, our plays – they are all in this one book," said Stanley Baldwin, the then prime minister, at a dinner to mark the completion of the first edition. Today, we might add: advertisements, newspapers, recipe books, journals, song lyrics, film and TV scripts – and Twitter. Fiona McPherson reflects that in the time she's worked at the OED, "the whole idea of what's published has changed".

Unlike the concise dictionaries once found in almost every household, the OED is a historical dictionary: it shows how a word's meaning has changed over time, as illustrated by quotations – around 3.5 million in total. (The New Statesman is cited 1,832 times, with quotes from writers including DH Lawrence and Doris Lessing.) It is these that make the OED so lengthy: the two other best-known dictionaries of British English, Chambers and Collins, are both single-volume. Each OED lexicographer is a sort of word detective, poring over sources, from medieval books to song lyrics, to find the earliest example they can of a particular sense. "It's like trying to sort out the pieces of a jigsaw puzzle," says Bernadette Paton. "You put the edges in first, the obvious bits. And then, 'Where does this fit in?' and, 'Oh, this is blue so it must be part of the sky… Oh, no, it's part of the sea.'"

Despite technology, source-sifting remains time-consuming. Paton recently drafted "what the what" as a euphemism, and sat through "about 30 episodes" of 30 Rock to find its first usage. (Five years earlier, the phrase had been placed on the dictionary's "words to watch" database after someone heard it on the children's TV show The Amazing World of Gumball.) This is one reason the collaborative approach is so key: one lexicographer might have the popular culture reference that unlocks a definition for another. McPherson recalls drafting the entry for "burner phone" and dating it to an episode of The Wire, only for Paton to produce an earlier reference, in Kingpin Skinny Pimp's track "One Life 2 Live". (One of Paton's earliest memories of working at the dictionary is transcribing the lyrics to the Who song "I'm a Boy" for the "headcase" entry.)

The OED relies on the collective mind of the public, too. When James Murray was editor, the Post Office installed a postbox outside his house to deal with the volume of mail generated by the scriptorium. He wrote 30 to 40 letters a day, corresponding with figures from William Gladstone to Alfred Tennyson, seeking their expertise or asking what they intended by their use of a specific word. (The team still sometimes receives letters addressed to Murray, more than 100 years after his death; on the second day of my visit, one arrives all the way from Sacramento.)

In 1891 Murray asked the public to tell him which syllable they emphasised in the word "content" in different contexts; in two months he received almost 400 responses. While the process is now speeded by technology, the team continues to gather evidence from the public through social media and the OED website. In 2012 they appealed for information on the origins of "to come in from the cold", presuming that it was used by the secret service before it made its way into John le Carré's The Spy Who Came in from the Cold. Le Carré wrote in to correct the record: he had coined the term, and then the intelligence agencies had started using it.

The need for a dateable, written source can be frustrating: by the time a word wends its way into the OED, it must have moved from speech into writing. Jane Johnson gives drafting the entry for "bucket list" as an example of the delay this can create. She was only able to date the phrase back to 2006, to a UPI newswire ahead of the release of the film The Bucket List – far later than expected: "If you ask people, when did 'bucket list' come in, everybody is thinking, 'Oh, yeah, that was a thing when I was a teenager.'"

"It's especially true of slang and colloquial language," adds Paton. "At the moment we're doing an Australian batch, and things are coming up all the time. I'll say, 'I know I used this in the Seventies,' but there's no way of pinning it down."

"Because we can't cite: 'Bernie in the Seventies,'" quips Danica Salazar.

"There was a joke that the first edition used to say: 'Heard at a north Oxford tea party in 1898,'" says Paton.

Twitter has proved a valuable resource. "[Social media] is the closest thing we've had to speech since we started," says Paton, "because a lot of written language is very self-conscious." Tweets are dated and time-stamped, and cannot (currently) be edited after posting, making them perfect sources. Salazar, who is from the Philippines, uses the example of the adjectival use of traffic – as in, "so traffic" or "very traffic" – which she knew to be common in Philippine English: "We say things like, 'Oh, it's very traffic here today,' meaning there's a lot of traffic." When she started at the OED, she was keen to include it but couldn't find a written usage. "A couple of years ago, we started quoting Twitter, and I searched for 'very traffic', 'so traffic', and we got hundreds of hits in one day." A Twitter user with the handle @davidg0411 is quoted in the OED as a result.

"You can make a case for including any lexical item in a dictionary, it's just a case of priorities," says Johnson. "You have to work out which ones seem to have the most value." Some are brought to the dictionary's attention by current events: among the June 2022 update, "stealthing" (the act of removing a condom during sex without a partner's knowledge) was considered after California became the first US state to make it illegal, in October 2021. "Sportswashing" ("the use of a sport or sporting event to promote a positive public image for a sponsor or host") came to the OED's attention in the discourse around this year's World Cup in Qatar and the Beijing Winter Olympics.

The OED also maintains a "monitor corpus" of web pages, totalling around 16 billion words, which is updated monthly – a very modern outworking of Trench's 1857 "sweep-net". By comparing the words that have been frequently used in recent months against the corpus as a whole, the team can identify which are growing in usage. Some build in popularity over time; others – such as new senses of "bubble" and "shield" during the pandemic – suddenly spike. The team also tracks failed searches on OED.com, which reveals which words the public expects to find. Every quarter, a prioritisation list is created and reviewed. In this way, the digital is balanced by the human, the data sets moderated by editorial judgement.

"There's no magic number" after which a new word is added, says McPherson. "It's not, 'Well, we've got ten examples so we're going to look at it.' It's breadth and depth." The expected weight of evidence is lower for World Englishes. Salazar references a batch of recently added Bermudian words, including "chingas" ("used to express surprise, awe, etc") and "greeze" ("a large, satisfying meal"). "There are 65,000 people in Bermuda – I think there were more people in the town where I was born in the Philippines. You cannot expect the same amount of evidence [for Bermudian words]. But that doesn't mean Bermudian English does not deserve a space in the dictionary – in fact, it's the oldest variety of English after British and American English."

[See also: Geoff Dyer: How to grow old in America]

Do any of the lexicographers ever feel disquiet about language change? Paton admits the use of "of" instead of "have" – "I would of" – "gives me a bit of a jolt. [But] putting it in the dictionary wouldn't upset me in any way, because I recognise it is used." We pause to look it up, and find Charlotte Brontë is quoted as a source. Salazar laughs: "Charlotte Brontë! She doesn't know how to write proper English."

Johnson tells me about a wedding she attended, where she "was sitting with people I didn't know at all and talking a bit about what I did. The guy beside me said, 'Oh, it sounds very complicated.' And I said, 'Well, it is complicated, but it's doable.' And he went off on a rant: 'You of all people, using this word, doable!'" Johnson regrets that she didn't have a comeback; a quick check finds that "doable" dates back to 1443. McPherson points out that, similarly, the first quotation for the hyperbolic use of "literally" – "for some people, the wrong use" – is from 1769.

Those who profess a desire to "protect" the language out of love for it may be surprised to find that dictionary-makers do not consider themselves arbiters of what is "right" and "wrong". Whether the question is about culture-wars issues of race and gender or a grammatical quibble, the answer is the same: the OED describes how language is already being used; it does not prescribe how it should be used, nor endorse a word's use. McPherson observes that the dictionary's reliance on written sources means it is often "at the rearguard" of language change rather than leading the charge.

The OED's lexicographers believe that slang, such as "bae" (June 2019) and "lol" (surprisingly late: June 2020), is worthy of inclusion – or, at least, they won't say otherwise. (One recent study found that Multicultural London English, a blend of migrant influences, could become Britain's main dialect within 100 years.) "There are some things that you like more than others," says McPherson. "But because everything's earning its place, I don't think there's any point at which I've thought, 'Why are we putting this in?'"

The goal, she explains, is objectivity. "You always want to get everything over in as neutral a way as possible, so nobody would be able to tell what you think. If you're working on a political word, it should not be apparent from the definition how you vote." Collaboration provides a kind of shield against individual bias. Salazar says she approaches words with a "World Englishes" mindset, while Paton, who is in her mid-sixties, appreciates the perspective of the younger members of the team. Together, the hope is that these differences create as great a degree of impartiality as is humanly possible.

Having an open mind is essential for the role. "When you're a lexicographer, if you hear a word that you've never heard before, you don't go, 'Ew, yuck, what's that?' You go, 'Hmm, that's interesting, why did they say it like that?' You want to learn more.

"Words, to me, are like people," she continues. "There are people I don't like – but that doesn't mean I don't recognise their right to exist." This takes on particular significance for Salazar: "I work with varieties that are minoritised, because they're not what we think of as 'correct' English." She tells me about a woman who cried with happiness on learning that the African-American "finna" – meaning "intending to", as in, "I finna make dinner" – had been included in the OED. "These are people who, for their whole lives, have been told, 'You are lesser, you are stupid, because of the way you speak.' To say that, 'No, actually, the words [you use] are in the dictionary' – it has an impact on how people see themselves."

Ask any of my interviewees what makes for a good lexicographer and they will say curiosity. Bernadette Paton describes the role as "sitting in the middle of a great big web and sending out feelers". I recall this comment when the OED's chief editor, Michael Proffitt, who has worked at the dictionary for 33 years, tells me he has a "magpie kind of mindset. You have to be interestable in anything, and you have to be comfortable with working at the limits of your knowledge."

Proffitt started out in comedy writing. Is there any evidence of this former life? "No, the scene of the crime has been completely cleared." One of the words he uses to describe lexicographers is "unassuming", and he seems uncomfortable when I ask if he sees himself as the latest in a line of esteemed OED editors. "I have no sense of being able to compete with those figures," he says. "I think they are extraordinary. Murray, particularly – a polymath, a linguist, in a way that I'm not." Ironically, his humility recalls the words of Murray himself: "I am a nobody. Treat me as a solar myth, or an echo, or an irrational quantity, or ignore me altogether."

Proffitt values the synergic nature of dictionary writing as much as the rest of his team do, because it allows his interventions to pass unmarked. "One thing I like about the OED is the lack of a byline – I like the anonymity. You can do something that has cultural value without inserting yourself into that culture." If he has a legacy, it will be a more natural prose style: freed from the constraints of print, under Proffitt's editorship the OED has moved away from the "sometimes pretty idiosyncratic" level of abbreviation once mandated by cost and practicality. "The idea that the OED should yield its content to the reader, and fairly readily, is important to me. A dictionary should decode language, not encode it."

The OED has never turned a profit: by 1911 the university press had spent £150,000 on it and made less than £60,000 in sales. Today's revision programme is backed by £34m in funding, and Proffitt says income from OED.com subscriptions covers "quite a lot of the editorial project". The dictionary, he adds, brings "a more intangible reputational benefit" to the university press.

What began as a conventional third edition has evolved into a project divorced from the print deadlines that plagued Murray and his co-editors. But without those constraints, the process of revision is potentially limitless. Will there ever be a point when Proffitt's team consider their work done?

"There'll be a point when we say we have revisited every single entry that was in the second edition," says Proffitt. "That will be an important threshold for us." He sees the future of the OED as a hybrid project, balancing a focus on the more rapidly changing words with the goal of updating every single entry. The aim is to prevent a repeat of "the position we were in at the start, which is that you have a completely out-of-date text that you have to revise fully again. We want to avoid that cycle of renovation and dilapidation."

Will the third edition ever be published in print form? "We've not said yes or no," says Proffitt. "We're a long way from completing. At that point, if there's an appetite for a print version, I'm sure we'd consider it." It is hard to imagine, given the expansive possibilities of the digital medium, that there will be.

The Oxford English Dictionary remains, in many ways, a Victorian phenomenon, born in an era of remarkable innovation: of railways and steelworks, anthropology and anaesthesia, Charleses Dickens and Darwin. It is difficult, now, when the thought of consulting a paper dictionary seems so analogue, to grasp how audacious it once was to try to capture, for the very first time, every word and make it tell its story.

The English language continues to twist and flex. Without the pressures of print, the answer to the question of how and where the process of revision ends seems to be: never. But none of the lexicographers I meet appears overwhelmed by this fact, nor by the scale of the task still ahead. Several express impatience about the pace – but more as a form of greed, of the kind a reader might feel in a bookshop, at once hungry to consume every tome and aware that they don't have enough years to do so.

It is unlikely that the third edition will be in some way complete within many of the lexicographers' working lives. Michael Proffitt does not seem to mind this idea; he can imagine a life beyond dictionary-making, likely involving other forms of writing, and a lot of listening to music. I wonder what his last word will be – "unassuming", perhaps, or "twilight", like James Murray, all those years before.

[See also: New words list: OED adds 'Terf', 'pangender' and 'vaxxer']

No comments:

Post a Comment

Why the novel matters

  Why the novel matters We read and write fiction because it asks impossible questions, and leads us boldly into the unknown. By  Deborah Le...