
Now, have any of you all ever looked up this word? You know, in a dictionary? (Laughter) Yeah, that's what I thought. How about this word? You know, I'll show it to you: Lexicography: the practice of compiling dictionaries. Notice -- we're very specific. That word "compile." The dictionary is not carved out of a piece of granite, out of a lump of rock. It's made up of lots of little bits. It's little discrete -- that's spelled D-I-S-C-R-E-T-E -- bits. And those bits are words.
Now one of the perks of being a lexicographer -- besides getting to come to TED -- is that you get to say really fun words, like lexicographical. Lexicographical has this great pattern -- it's called a double dactal. And just by saying double dactal, I've sent the geek needle all the way into the red. But "lexicographical" is the same pattern as "higgledy-piggledy." Right? It's a fun word to say, and I get to say it a lot. Now, one of the non-perks of being a lexicographer is that people don't usually have a kind of warm, fuzzy, snuggly image of the dictionary. Right? Nobody hugs their dictionaries. But what people really often think about the dictionary is, they think more like this. Just to let you know, I do not have a lexicographical whistle. But people think that my job is to let the good words make that difficult left hand turn into the dictionary, and keep the bad words out.
But the thing is, I don't want to be a traffic cop. For one thing, I just do not do uniforms. And for another -- deciding what words are good and what words are bad is actually not very easy. And it's not very fun, and when parts of your job are not easy or fun, you kind of look for an excuse not to do them. So if I had to think of some kind of occupation as a metaphor for my work, I would much rather be a fisherman. I wanna throw my big net into the deep blue ocean of English and see what marvelous creatures I can drag up from the bottom. But why do people want me to direct traffic, when I would much rather go fishing? Well, I blame the Queen. Why do I blame the Queen? Well, first of all, I blame the Queen cause it's funny. But secondly, I blame the Queen because dictionaries have really not changed.
Our idea of what a dictionary is has not changed since her reign. The only thing that Queen Victoria would not be amused by in modern dictionaries is our inclusion of the F-word, which has happened in American dictionaries since 1965. So, there's this guy, right? Victorian era, James Murray, first editor of the Oxford English Dictionary. I do not have that hat. I wish I had that hat. So he's really responsible for a lot of what we consider modern in dictionaries today. When a guy who looks like that -- in that hat -- is the face of modernity, you have a problem. And so, James Murray could get a job on any dictionary today. There'd be virtually no learning curve.
And of course, a few of us are saying: Computers! Computers! What about computers? The thing about computers is -- I love computers. I mean, I'm a huge geek, I love computers. I would go on a hunger strike before I let them take away Google Book Search from me. But computers don't do much else other than speed up the process of compiling dictionaries. They don't change the end result. Because what a dictionary is, its Victorian design merged with a little bit of modern propulsion. It's steampunk. What we have is an electric velocipede. You know, we have Victorian design with an engine on it. That's all! The design has not changed.
And OK, what about online dictionaries, right? Online dictionaries must be different. This is the Oxford English Dictionary Online, one of the best online dictionaries. This is my favorite word, by the way: Erinaceous: Pertaining to the hedgehog family; of the nature of a hedgehog, very useful word. So look at that. Online dictionaries right now are paper thrown up on a screen. This is flat. Look how many links there are in the actual entry: two! Right? Those little buttons -- I had them all expanded except for the date chart. So there's not very much going on here. There's not a lot of clickiness. And in fact, online dictionaries replicate almost all the problems of print, except for searchability. And when you improve searchability, you actually take away the one advantage of print, which is serendipity. Serendipity is when you find things you weren't looking for because finding what you are looking for is so damned difficult.
So -- (Laughter) -- now, when you think about this, what we have here is a ham butt problem. Does everyone know the ham butt problem? Woman's making a ham for a big family dinner. She goes to cut the butt off the ham and throw it away, and she looks at this piece of ham and she's like, "This is a perfectly good piece of ham. Why am I throwing this away?" She thought, "Well my mom always did this." So she calls up Mom, and she says, "Mom, why'd you cut the butt off the ham when you're making a ham?" She says, "I don't know, my mom always did it!" So they call Grandma, and Grandma says, "My pan was too small!" (Laughter)
So it's not that we have good words and bad words -- we have a pan that's too small! You know, that ham butt is delicious! There's no reason to throw it away. The bad words -- see, when people think about a place and they don't find a place on the map, they think, "This map sucks!" When they find a nightspot or a bar and it's not in the guidebook, they're like, "Ooh, this place must be cool! It's not in the guidebook." When they find a word that's not in the dictionary, they think, "This must be a bad word." Why? It's more likely to be a bad dictionary. Why are you blaming the ham for being too big for the pan? So you can't get a smaller ham. The English language is as big as it is.
So if you have a ham butt problem, and you're thinking about the ham butt problem, the conclusion that leads you to is inexorable and counter-intuitive: paper is the enemy of words. How can this be? I mean, I love books. I really love books. Some of my best friends are books. But the book is not the best shape for the dictionary. Now they're gonna think "Oh, boy. People are gonna take away my beautiful, paper dictionaries?" No. There will still be paper dictionaries. When we had cars -- when cars became the dominant mode of transportation, we didn't round up all the horses and shoot them. You know, there're still gonna be paper dictionaries, but it's not gonna be the dominant dictionary. The book-shaped dictionary is not gonna be the only shape dictionaries come in. And it's not gonna be the prototype for the shapes dictionaries come in.
So think about it this way: if you've got an artificial constraint, artificial constraints lead to arbitrary distinctions and a skewed worldview. What if biologists could only study animals that made people go, "Aww." Right? What if we made aesthetic judgments about animals, and only the ones we thought were cute were the ones that we could study? We'd know a whole lot about charismatic megafauna, and not very much about much else. And I think this is a problem. I think we should study all the words, because when you think about words, you can make beautiful expressions from very humble parts. Lexicography is really more about material science. We are studying the tolerances of the materials that you use to build the structure of your expression: your speeches and your writing. And then often people say to me, "Well, OK -- how do I know that this word is real?" They think, "OK, if we think words are the tools that we use to build the expressions of our thoughts, how can you say that screwdrivers are better than hammers? How can you say that a sledgehammer is better than a ball-peen hammer? They're just the right tool for the job."
And so people say to me, "How do I know if a word is real?" You know, anyone that's read a children's book knows that love makes things real. If you love a word, use it. That makes it real. Being in the dictionary is an artificial distinction. It doesn't make a word any more real than any other way. If you love a word, it becomes real. So if we're not worrying about directing traffic, if we've transcended paper, if we are worrying less about control and more about description, then we can think of the English language as being this beautiful mobile. And any time one of those little parts of the mobile changes, is touched -- any time you touch a word, you use it in a new context, you give it a new connotation, you verb it -- you make the mobile move. You didn't break it; it's just in a new position, and that new position can be just as beautiful.
Now, if you're no longer a traffic cop -- the problem with being a traffic cop is there can only be so many traffic cops in any one intersection, or the cars get confused. Right? But if your goal is no longer to direct the traffic, but maybe to count the cars that go by, then more eyeballs are better. You can ask for help! If you ask for help, you get more done. And we really need help. Library of Congress: 17 million books. Of which half are in English. If only one out of every 10 of those books had a word that's not in the dictionary in it, that would be equivalent to more than two unabridged dictionaries.
And I find an un-dictionaried word -- a word like "un-dictionaried," for example -- in almost every book I read. What about newspapers? Newspaper archive goes back to 1759. 58.1 million newspaper pages. If only one in 100 of those pages had an un-dictionaried word on it, it would be an entire other OED. That's 500,000 more words. So that's -- that's a lot. And I'm not even talking about magazines, I'm not talking about blogs -- and I find more new words on BoingBoing in a given week than I do Newsweek or Time. There's a lot going on there.
And I'm not even talking about polysemy, which is the greedy habit some words have of taking more than one meaning for themselves. So if you think of the word "set" -- a set can be a badger's burrow, a set can be one of the pleats in an Elizabethan ruff -- and there's one numbered definition in the OED. The OED has 33 different numbered definitions for set. Tiny little word, 33 numbered definitions. One of them is just labeled "miscellaneous technical senses." Do you know what that says to me? That says to me it was Friday afternoon and somebody wanted to go down the pub. That's a lexicographical cop out, to say, "Miscellaneous technical senses."
So we have all these words, and we really need help! And the thing is, we could ask for help -- asking for help's not that hard. I mean, lexicography is not rocket science. See, I just gave you a lot of words and a lot of numbers, and this is more of a visual explanation. If we think of the dictionary as being the map of the English language, these bright spots are what we know about and the dark spots are where we are in the dark. If that was the map of all the words in American English, we don't know very much. And we don't even know the shape of the language. If this was the dictionary -- if this was the map of American English -- look, we have a kind of lumpy idea of Florida, but there's no California! We're missing California from American English. We just don't know enough, and we don't even know that we're missing California. We don't even see that there's a gap on the map.
So again, lexicography is not rocket science. But even if it were, rocket science is being done by dedicated amateurs these days. You know? It can't be that hard to find some words! So, enough scientists in other disciplines are really asking people to help, and they're doing a good job of it. For instance: there's eBird, where amateur birdwatchers can upload information about their bird sightings. And then ornithologists can go and help track populations, migrations, et cetera.
And there's this guy Mike Oates. Mike Oates lives in the U.K. He's a director of an electroplating company. He's found more than 140 comets. He's found so many comets, they named a comet after him. It's kind of out past Mars -- it's a hike. I don't think he's getting his picture taken there anytime soon. But he found 140 comets without a telescope. He downloaded data from the NASA SOHO satellite, and that's how he found them. If we can find comets without a telescope, shouldn't we be able to find words?
Now, you all know where I'm going with this, because I'm going to the Internet, which is where everybody goes. And the Internet is great for collecting words, because the Internet's full of collectors. And this is a little-known technological fact about the Internet, but the Internet is actually made up of words and enthusiasm. And words and enthusiasm actually happen to be the recipe for lexicography. Isn't that great? So there are a lot of really good word-collecting sites out there right now, but the problem with some of them is that they're not scientific enough. They show the word, but they don't show any context: Where did it come from? Who said it? What newspaper was it in? What book?
Because a word is like an archaeological artifact. If you don't know the provenance or the source of the artifact, it's not science -- it's a pretty thing to look at. So a word without its source is like a cut flower. You know -- it's pretty to look at for a while, but then it dies. It dies too fast. So this whole time I've been saying, "The dictionary, the dictionary, the dictionary, the dictionary." Not "a dictionary" or "dictionaries." And that's because -- well, people use the dictionary to stand for the whole language. They use it synecdochically -- and one of the problems of knowing a word like "synecdochically" is that you really want an excuse to say synecdochically. And so this whole talk has just been an excuse to get me to the point where I could say synecdochically to all of you. So I'm really sorry. But when you use a part of something -- like the dictionary is a part of the language, or a flag stands for the United States, a symbol of the country -- then you're using it synecdochically. But the thing is, we could make the dictionary the whole language. If we get a bigger pan, then we can put all the words in. We can put in all the meanings. Doesn't everyone want more meaning in their lives? And we can make the dictionary not just be a symbol of the language -- we can make it be the whole language.
You see, what I'm really hoping for is that my son -- who turns seven this month -- I want him to barely remember that this is the form factor that dictionaries used to come in. This is what dictionaries used to look like. I want him to think of this kind of dictionary as an eight-track tape. It's a format that died because it wasn't useful enough. It wasn't really what people needed. And the thing is, if we can put in all the words, no longer have that artificial distinction between good and bad, we can really describe the language like scientists. We can leave the aesthetic judgments to the writers and the speakers. If we can do that, then I can spend all my time fishing and I don't have to be a traffic cop anymore. Thank you very much for your kind attention.
好,你们中有谁查过这个单词?用字典?(大笑)是的,我想也就这个样子。这个单词呢?看看解释:Lexicography:字典学,编纂字典的活动。注意,这是专用的说法,定义里用“编纂”这个词。字典并不是从一大块岩石里凿出来的花岗岩,而是由很多的小块合起来的。几乎没有分立,英语里边拼法是 D-I-S-C-R-E-T-E——分立的,我们这里说的分立的就是指单词了。
作为字典编纂者的好处——除了有机会来TED演讲以外,就是可以说很有趣的单词,例如,lexicographical:字典编纂学。这个词有一种很棒的押韵,“扬抑抑格”。只要说到“扬抑抑格”,古怪指数就可以飙升到红色警戒。 其实,lexicographical与higgledy-piggledy (“杂乱无章”的意思)有一样的押韵。对吧?这个词单发音就很好玩,我常常说它。同时,作为字典编纂者,一个让人郁闷的地方是字典从来没有给人留下一个温暖,舒适的印象。对吧?没有人会拥抱他们的字典。但是,其实人们通常对字典的看法是这样的。告诉你一件事情,我没有什么纂字哨子,尽管大家认为我的工作是让所谓的好词做一个有难度的左转拐入字典,而把所谓的坏词拒之门外。
问题是,我不想当交通警察。首先,我不喜欢制服。而且,决定谁是好词谁是坏词其实一点也不容易,还不好玩。如果你的工作中有这么一部分既不容易又不好玩,你就会千方百计去回避它。由此,如果要找一种工作来比喻我在干的活,我宁可当一个渔夫。我要在英语这个深蓝的海洋里撒上我的大网,看可以从海里捕到什么珍异的海产。可是现实中,为什么人们期望我去当交警而不是渔夫呢?这,是英国女王的错。为什么我要责怪她?首先,责怪女王比较有趣,另外的原因是,自维多利亚女王以来,字典并没有什么变化。
我们对字典的理解自维多利亚女王统治时期以来没有改变过。在现代字典里,唯一不会让女王欢喜的是对脏话的引入,自1965年起出现在美国字典里。看看维多利亚时期的这位先生,詹姆斯?穆雷,牛津英语字典的第一位编辑。我没有像他那样的帽子,我多希望有这样的一顶帽子。就是他把我们至今还认为是现代的元素放在字典里。如果把一个如此打扮的人,看那帽子,作为现代的代表,那你们就有麻烦了。詹姆斯?穆雷即使活在今天也还是能为任何一本字典担任编纂工作,而无需学习曲线。
当然,你们中的有些人会说,计算机!计算机!计算机是新东西吧?不要误会,我喜欢计算机,我超级喜欢计算机,如果有人不让我用谷歌的图书搜索,我一定会绝食抗议。但是,在字典编纂工作中,计算机能起到的作用,就是加快了编纂的工作进程,仅此而已。计算机没有改变最终结果,因为字典就是维多利亚时代的设计和现代推动力的整合。没什么,就是一台电动的脚踏车,给维多利亚时代的设计加上一个引擎,仅此而已!设计上没有任何变化。
好,那网上在线字典呢?在线字典不一样了吧。这是牛津在线英语字典,目前最好的在线字典之一。看一下我喜欢的一个词,erinaceous:属于刺猬的;刺猬的本性。非常有用的词。看看这里,在线字典目前就是把纸质版本放到屏幕前,还是很平面。看看这个单词有几个链接?两个!对吧?那些小按键,除了日期纪录,我把他们都展开了。没什么东西,没什么可以点击。事实上,在线字典继承了搜素功能以外印刷品几乎所有的毛病。而当搜索功能改善了,你同时也把印刷品的优势拿走,那就是能不经意间有新发现的能力。你能发现一些并不是你要找的东西,其实是因为找到你想找的东西是如此困难。
所以,(大笑),现在,当你想到这些,我们面对的问题其实是火腿屁股的问题。有人知道火腿屁股的问题吗?有个女人正在弄火腿,给一个大家庭做晚餐。她正要把火腿根部切去扔掉,看着那片火腿,她想,“这其实是块好肉,为什么我要把它扔掉呢?”她继续想,“可是我妈都是这么做的”,于是她打电话给她妈妈,问:“妈妈,为什么你做火腿的时候要把火腿屁股切掉?” 她妈妈说,“我不知道,我妈一直都这么做的!”于是她们又打电话给外婆,外婆说:“我的锅太小了!”(大笑)
所以,这不是因为有好词和坏词的存在,只是我们的锅太小了!要知道,其实火腿屁股味道可好了!没理由把它扔掉。所谓的坏词——当人们想去一个地方,却不能在地图上找到这个地方,他们就认为,“这地图一点用也没有!” 当人们发现一个旅游指南上没有的夜店或酒吧,他们又认为,“旅游指南上没有的,这地方一定很酷!”而当人们发现一个字典上没有的单词,他们就觉得“这个一定是坏词”。为什么呢?这其实更像是一本坏字典的问题。为什么要责怪火腿比锅大呢?你不能找到小一点的火腿,因为英语本身就很大。
所以如果你知道火腿屁股这事儿,而你又正在考虑这个问题,它引向的结论是绝决又有违直觉的:纸张是文字的敌人。怎么可能?我爱书,非常地爱书。我的一些最好的朋友就是书。但是书本并不是字典最好的载体。有人会疑惑“不要吧,人们不是要把优美的纸质字典拿走吧?”不是。纸质的字典还是会存在的。当我们有了车,当车成了主要的交通工具,就不见得要把所有的马匹都毙了。纸质的字典还是会存在的, 只是不会再是主要的载体了。书本形式的字典将不再是唯一形式的字典,而且不会是将来字典的原型。
设想一下,如果你有人为的制约,这种制约就会引致一个武断的区分和一个倾斜的世界观。如果生物学家只研究人们喜欢的动物,对吧,如果我们以审美角度来判断动物,只研究我们觉得可爱的动物,这会怎样?我们就只能了解那些有魅力的大群落,对其它的物种就不太了解了。我认为这是一个问题。我们应该研究所有的词,因为使用词可以创造出美丽的表达,即使是从非常卑下的部分。字典编纂学是研究物质的科学。我们在研究不同物料的偏差,当你去架构你的表达的时候:你的演说和写作。于是人们常常对我说:“那好,我怎么知道这词是真实存在的?”他们认为,“好,如果词语是我们用于表达思想的工具,那你怎么可以说起子比锤子要好?你怎么可以说这种锤子比那种锤子好?它们只是合适的工具而已。”
由此,人们对我说“我怎么知道一个词是真正的词?”任何读过儿童读物的人都知道,爱让事物变真实。如果你爱一个词,用它。这样子,它就成真了。词放在字典里只是人为的区分,这并没有让一个词变得比其它词更真实。如果你爱一个词,它才会变真实。如果我们不需要花心思在指挥交通上,如果我们超越了纸张,如果我们少担心控制而更关注表述,那英语就成为一个美丽的活物了。当其中的一个小组件变化了,被触动了——任何时候你接触到一个单词,你把它用到新的内容中,你赋予它新的涵义,你就让它活起来了——让它移动了。你没有破坏它,只是让它移到一个新的地方,这新的地方也可以是一样美丽。
现在,你不再是交警——交警管理交通的问题是,要么你得在每个十字路口都安排交警, 要么就让车辆犯糊涂。对吧?然而,如果你的目标不再是指挥交通,而是去数来往的车辆,那越多双眼睛越好。你可以找人帮忙!越多人帮忙,你可以完成更多的活。我们真的很需要帮忙。国会图书馆有1700万本藏书,一半是英语,假设其中每十本书有一个词不在字典里,那就相当于超过两本非缩略版字典的词汇量。
我发现没收录到字典里边的词(un-dictionaried)—— 以一个像“un-dictionaried” 那样的未收录词为例——在我读过的几乎每一本书里都有。还有报纸呢?报纸藏品从1759年开始,共有5810万个报纸页面。只要每100页报纸有一个没有收录的单词,那就相当于一整本OED(牛津英语字典)了,超过50万个词,那是很大的词汇量。我还没有说到杂志,博客——一周内,我在BoingBoing发现的新词比《新闻周刊》或《时代》杂志还多,那里正在创造出很多的新词。
这还没说到一词多义,有些词有贪心的习惯,自己有好几个意思。当你想到一个词“set”—“set”可以指獾的穴,也可以指伊丽莎白时代衣领上的褶——在OED里就有好几个定义,在OED里“set”共33种定义。小小的一个单词,33种定义。 其中一个只是说“不同的技术时态。”你知道这对我来说意味着什么?那就是说周五下午某人想去酒吧。用字典编纂学的术语来说,就是“不同的技术时态。”
有那么多词,我们真的很需要帮助!事实上,我们可以寻求帮助——要求帮助并不困难。 字典编纂不是开发火箭。看,我刚给了你很多词和很多数字,这里是个更形象的解释。我们可以把字典当作代表英语的地图,这些亮点是我们已知的,黑点是未知的。如果这个地图指美国英语的所有单词,我们还有很多不知道。我们连整个语言的轮廓也不知道。如果这个是字典,美国英语地图——看,我们有个大致的佛罗里达州了,可是还没有加州!连加州都还没有,我们知道的确实很不够,我们甚至连缺了加州这个事情也不知道。我们连地图里有个空白也不知道。
再强调,字典编纂不是开发火箭。即使是火箭开发科学,当今也有热诚的业余爱好者参与其中了。是吧?发现词语不可能有那么难!足够多的科学家在其它领域正寻求大家帮忙,而且大伙也干得不错。例如,有一种电子鸟,业余观鸟爱好者可以把自己的观察结果上传, 之后鸟类学家可以从中追踪数量,迁徙情况等。
有个叫迈克?奥特斯的人,住在英国,他是一家电镀公司的总监。他发现了超过140个彗星。他发现了如此之多的彗星,有一颗彗星就以他命名。那些彗星比火星还远,是一个长途旅行。我不认为他拍过什么照片。他发现的140颗彗星,并没有用望远镜,而是从NASA SOHO卫星下载数据分析出来的。如果我们可以不用望远镜就能发现彗星,为什么我们就不能发现单词呢?
现在你明白我的立场了。我要去互联网,就像大伙那样。互联网非常合适搜集单词,因为互联网上有很多搜集者。这是一个关于互联网的不为人知的技术事实,互联网实际上是由单词和热情组成的。而单词和热情正好是字典编纂的养料。那不是很好吗?现在已经有很多单词搜集网站了,只是它们中有些还不够科学,有些显示了词,没有显示上下文,词从哪里来?谁说的?源自哪张报纸?哪本书?
因为一个词就像一个考古学产物,如果你不知道起源或源头,这就不够科学——这是应该去考究的。一个没有来源的词就像一朵被剪下来的花。看一会还可以,不久就蔫了。蔫得太快了。我一直在说:“字典,字典,字典,字典。”而不是“一本字典”,或“很多字典”,这是因为,人们用字典去代表整个语言。这是一种借代(以点代面)的说法(synecdochically) —— 知道“synecdochically”会引起的问题是:你真的很想找个理由去说synecdochically,这整个演讲也就是个借口,为了能让我可以有机会跟你们说synecdochically这个词。真对不起。但如果你用一样事物的一部分,例如字典是语言的一部分,或者用国旗代表美国作为一个国家的象征——那样你在用借代。不过,我们可以让字典成为语言的全部,如果我们有个大一点的锅,那我们就可以把所有的单词都放里边了,还可以把所有的单词解释都放里边。 每个人不都想人生更有意义吗(英语里,意义和单词解释是同一个单词)?那样,我们就可以让字典不仅仅是语言的象征,我们就可以让字典涵盖整个语言。
你看,我真希望到我儿子的时候——这个月他就满7岁了——我想他只记得这是过去的字典的形式,这是过去的字典的样子。我想他把这种字典当成8轨录音带类似的东西,是种已经过时的形式,不再适合人们的需求。如果我们可以涵盖所有单词,不再人为地区分好词坏词,我们就可以像科学家那样地描述语言,我们可以把审美判断留给作家和演讲者。如果我们可以做到这点,那我就可以把我所有的时间花在捕鱼上,用不着再当交警。谢谢,感谢您的关注。
