{"id":273,"date":"2016-06-11T15:47:59","date_gmt":"2016-06-11T15:47:59","guid":{"rendered":"http:\/\/www.twonewthings.com\/gabrielrecchia\/?p=273"},"modified":"2016-06-15T10:36:14","modified_gmt":"2016-06-15T10:36:14","slug":"numberless-degrees-of-similitude-word-vectors","status":"publish","type":"post","link":"http:\/\/www.twonewthings.com\/gabrielrecchia\/2016\/06\/11\/numberless-degrees-of-similitude-word-vectors\/","title":{"rendered":"\u201cNumberless degrees of similitude\u201d: A response to Ryan Heuser\u2019s<br>\u2018Word Vectors in the Eighteenth Century, Part 1\u2019"},"content":{"rendered":"<div class=\"section\">\n<p>As someone who has been involved in the world of word vectors since 2007, it\u2019s been fascinating to watch the innovative applications that they\u2019ve been put to over the last few years. After reading Ryan Heuser\u2019s \u2018<a href=\"http:\/\/ryanheuser.org\/word-vectors-1\/\">Word Vectors in the Eighteenth Century, Part 1<\/a>,\u2019 I was left impressed not only by the thoughtful scholarship but also by the sheer mystery of how such an unusual analogy (\u201criches are to virtue as learning is to genius\u201d) might be encoded in the lexical statistics of a corpus of eighteenth-century texts (<a href=\"http:\/\/www.textcreationpartnership.org\/tcp-ecco\/\">ECCO-TCP<\/a>). This post is an attempt to answer that question for myself. My discoveries along the way may be of interest to others curious about the lexical contexts of \u2018virtue,\u2019 \u2018riches,\u2019 \u2018learning,\u2019 and \u2018genius\u2019 in the eighteenth century, or to anyone who finds themselves in the business of interpreting word vectors.<\/p>\n<p>Very good descriptions of word vectors and vector space models have been provided by others <a href=\"http:\/\/bookworm.benschmidt.org\/posts\/2015-10-25-Word-Embeddings.html\">inside<\/a> and <a href=\"https:\/\/www.jair.org\/media\/2934\/live-2934-4846-jair.pdf\">outside<\/a> of the digital humanities, but they&#8217;re still unknown enough within DH that I thought it might be worth giving a brief overview and history. If you\u2019re already familiar with the mechanics and history of vector space models and want to get straight to the investigation of<em> virtue, riches, learning, <\/em>and <em>genius<\/em> in ECCO-TCP, feel free to <a href=\"#skipahead\">skip ahead<\/a><a name=\"lexicalmaths\"><\/a>.<\/p>\n<\/div>\n<div class=\"section\">\n<h2>Lexical maths and fresh politeness<\/h2>\n<p>A vector is simply an ordered list of numbers. The vectors in a vector space model need not necessarily represent words, but for\u00a0the models that interest us here, every word is assigned a unique vector that contains information about the lexical contexts in which it appears in a large body of text. There are many different ways in which this information can be encoded. Here\u2019s one particularly simplistic approach: Start by creating a vector that that is intended to represent the contextual usage of a particular word (the so-called <em>target word<\/em>) in some particular corpus, denoted here by V(<em>target<\/em>). Let each component of that vector correspond to a different <em>context word <\/em>that appears in the same sentence as the target word, and instantiate that component with the number of sentences in which the two words co-occur. Suppose our target word is <em>king<\/em>, and our context words are <em>he, his, him, she, her, hers, crown, throne, royal<\/em>, and <em>majesty<\/em>. The resulting ten-dimensional vector might look something like this:<\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td>he<\/td>\n<td>his<\/td>\n<td>him<\/td>\n<td>she<\/td>\n<td>her<\/td>\n<td>hers<\/td>\n<td>crown<\/td>\n<td>throne<\/td>\n<td>royal<\/td>\n<td>majesty<\/td>\n<\/tr>\n<tr>\n<td>6509<\/td>\n<td>8612<\/td>\n<td>3093<\/td>\n<td>486<\/td>\n<td>1053<\/td>\n<td>1<\/td>\n<td>360<\/td>\n<td>223<\/td>\n<td>335<\/td>\n<td>491<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In other words, <em>king <\/em>appears in the same sentence as <em>he <\/em>6509 times, in the same sentence as <em>his <\/em>8612 times, and so on. A handy thing about vectors is that we can do arithmetic with them. For example, if we assume the following plausible vector for the target word <em>man<\/em>\u2026<\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td>he<\/td>\n<td>his<\/td>\n<td>him<\/td>\n<td>she<\/td>\n<td>her<\/td>\n<td>hers<\/td>\n<td>crown<\/td>\n<td>throne<\/td>\n<td>royal<\/td>\n<td>majesty<\/td>\n<\/tr>\n<tr>\n<td>10438<\/td>\n<td>10466<\/td>\n<td>3409<\/td>\n<td>1582<\/td>\n<td>1961<\/td>\n<td>12<\/td>\n<td>70<\/td>\n<td>31<\/td>\n<td>42<\/td>\n<td>49<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u2026we can calculate what we get if we subtract V(<em>man<\/em>) from <em>V<\/em>(<em>king<\/em>), by taking the difference between corresponding components:<\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td>he<\/td>\n<td>his<\/td>\n<td>him<\/td>\n<td>she<\/td>\n<td>her<\/td>\n<td>hers<\/td>\n<td>crown<\/td>\n<td>throne<\/td>\n<td>royal<\/td>\n<td>majesty<\/td>\n<\/tr>\n<tr>\n<td>-3929<\/td>\n<td>-1854<\/td>\n<td>-316<\/td>\n<td>-1096<\/td>\n<td>-908<\/td>\n<td>-11<\/td>\n<td>290<\/td>\n<td>192<\/td>\n<td>293<\/td>\n<td>442<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><em>\u00a0<\/em><\/p>\n<p>Not surprisingly, we end up with a vector V(<em>king<\/em>)<em> \u2013 <\/em>V(<em>man<\/em>) that has low values for words that appear more frequently with <em>man <\/em>than with <em>king<\/em>, and higher values for words that appear more frequently with <em>king <\/em>than with <em>man. <\/em>There are some obvious problems with vectors that have been constructed in this way. For one thing, we\u2019re not controlling for the frequency of the target words. A relatively rare word like <em>Croesus <\/em>might yield a vector of very small numbers across the board. So although we might hope that V(<em>king<\/em>) \u2013 V(<em>Croesus<\/em>) would yield a vector that could tell us something about the differences between the discourse contexts of <em>king <\/em>and <em>Croesus<\/em>, in actuality we\u2019d likely get a vector that is nearly identical to <em>king<\/em>\u2019s, which would not be particularly useful. And we\u2019re not controlling for the frequency of context words, so common words like <em>he<\/em>, <em>his<\/em>, and so on will have undue influence. Even so, if we continue the thought experiment by adding V(<em>woman<\/em>) to V(<em>king<\/em>)<em> \u2013 <\/em>V(<em>man<\/em>), it\u2019s plausible that we\u2019d get something like the following:<\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td>he<\/td>\n<td>his<\/td>\n<td>him<\/td>\n<td>she<\/td>\n<td>her<\/td>\n<td>hers<\/td>\n<td>crown<\/td>\n<td>throne<\/td>\n<td>royal<\/td>\n<td>majesty<\/td>\n<\/tr>\n<tr>\n<td>-2680<\/td>\n<td>-1003<\/td>\n<td>-4<\/td>\n<td>701<\/td>\n<td>1698<\/td>\n<td>19<\/td>\n<td>306<\/td>\n<td>196<\/td>\n<td>301<\/td>\n<td>448<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><em>\u00a0<\/em><\/p>\n<p>In other words, we&#8217;d get a vector that has <em>low values<\/em> for context words which we expect <em>man <\/em>and <em>king <\/em>to share in common (e.g. <em>he, his, him<\/em>), <em>high values <\/em>for context words with which <em>woman <\/em>occurs much more frequently than <em>man <\/em>(e.g. <em>she, her, hers<\/em>), and <em>high values<\/em> for context words with which <em>king <\/em>occurs much more frequently than <em>man <\/em>(e.g. <em>crown, throne, royal, majesty<\/em>). These are all properties that we\u2019d expect V(<em>queen<\/em>) to have. If we\u2019re lucky, it\u2019s even possible that V(<em>queen<\/em>) would come up as the best match if we were to quantify the overall similarity of V(<em>king<\/em>)<em> \u2013 <\/em>V(<em>man<\/em>)<em> + <\/em>V(<em>woman<\/em>) to the vectors of every possible target word in our corpus. If this turned out to be true, then this vector space model could be said to have successfully completed the analogy \u201c<em>man is to king as woman is to ___.<\/em>\u201d<\/p>\n<p>The above example is not a particularly good one except for illustrative purposes, because of how na\u00efve this particular algorithm for building a vector space model is. Not only does it fail to control for word frequency, it only attends to ten context words that we cherry-picked rather than the thousands of different words that appear in our corpus. This model is not likely to do well on any battery of analogy questions. However, I include the example because I hope it\u2019s intuitive to readers who aren\u2019t intimately familiar with these models, and also to underscore that it\u2019s possible to do analogical vector arithmetic in\u00a0fairly traditional vector spaces, even though this sort of thing has become much more closely associated with popular recent tools such as <em>word2vec<\/em>, which construct a rather different sort of vector that I\u2019ll describe shortly<em>. <\/em>Indeed, Levy, Goldberg, &amp; Dagan (2015) have demonstrated that traditional count-based models can hold their own on analogy tasks against word2vec, with the requisite tweaks.<\/p>\n<p>A historical aside: Vector space models have a long history in computational linguistics, information retrieval, and cognitive science. One interesting predecessor in the world of cognitive psychology was the famed psychologist Charles Osgood\u2019s <em>semantic differential<\/em>, an experimental technique in which study participants had to rate words on dimensions such as <em>active <\/em>versus <em>passive<\/em>, <em>angular <\/em>versus <em>rounded<\/em>, and <em>small <\/em>versus <em>large<\/em>. On a scale of <em>fresh <\/em>to <em>stale<\/em>, how fresh (stale) is the concept <em>POLITE<\/em>? (Empirical answer below.)<\/p>\n<div id=\"attachment_274\" style=\"width: 612px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-274\" class=\"wp-image-274 size-full\" src=\"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2016\/06\/osgood.png\" alt=\"Semantic differentials, averaged across two groups of twenty study participants each. From The Nature and Measurement of Meaning, pp. 229. Psychological Bulletin, 49(3), May 1952.\" width=\"602\" height=\"458\" srcset=\"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2016\/06\/osgood.png 602w, http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2016\/06\/osgood-300x228.png 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><p id=\"caption-attachment-274\" class=\"wp-caption-text\">Semantic differentials, averaged across two groups of twenty study participants each. From <em>The Nature and Measurement of Meaning<\/em>, pp. 229. Psychological Bulletin, 49(3), May 1952.<\/p><\/div>\n<p>Tasks to which vector space models have been applied include retrieving documents related to a particular search query, grouping together documents with related meanings, classifying texts by genre, essay grading, modelling how the human brain might represent some aspects of lexical semantics, and many others. As a result, much attention has been paid to building models that optimize performance on various evaluation criteria, such as maximizing correlations with human judgments of word similarity, scores on multiple-choice tests of synonymy, or accuracy on sentence completion tasks. Less attention has been paid to questions of interpretation that are more relevant to the humanities (how best to interpret the dimensions of various species of word vector, what kinds of information can and cannot be retrieved from the statistics of language, what kinds of analogies are most and least successfully represented in vector spaces, etc.), although there has been important research in all of these areas as well.<\/p>\n<p>Vector space models for which the initial step is to count the number of contexts (sentences, documents, etc.) in which context words co-occur with the target word are generally called <em>count-based <\/em>models; these can be viewed as upgraded versions of the na\u00efve model illustrated earlier in this post. These are generally made more robust by incorporating transformations that control for overall word frequency and\/or make the model more robust to data sparsity. It\u2019s also common for count-based models to use alternative notions of the \u2018context\u2019 of a target word, such as a \u2018window\u2019 that includes only a small number of terms that appear just to its left or right.<\/p>\n<p>In some count-based models, the dimensions correspond directly to transformed counts of particular words (e.g., Bullinaria &amp; Levy 2007), whereas others use statistical methods to transform high-dimensional vectors to lower-dimensional versions (Landauer &amp; Dumais 1997; Pennington, Socher, &amp; Manning 2014), making the interpretation of individual dimensions a bit more fraught. Depending on who you ask, the class of \u2018count-based models\u2019 may or may not include <em>random vector models <\/em>(Jones, Willits, &amp; Dennis, 2015) such as random indexing (Kanerva, Kristofersson, &amp; Holst 2000; Sahlgren 2005) and BEAGLE (Jones, Kintsch, &amp; Mewhort, 2006), which assign a randomly initialized and unchanging <em>index vector <\/em>to each context word, and an initially empty <em>memory vector <\/em>to each target word. As the algorithm chugs through a corpus of texts, the index vectors of words that appear in the context of a target word are added to the target word\u2019s memory vector. Ultimately, memory vectors that repeatedly have the same index vectors added to them (<em>crown, throne, royal, majesty<\/em>) end up more similar to each other, resulting once again in a space of memory vectors in which words that appear in similar contexts end up having similar vectorial representations.<\/p>\n<div id=\"attachment_348\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-348\" class=\"size-full wp-image-348\" src=\"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2016\/06\/table8_800.jpg\" alt=\"An example of how vector similarity between vectors representing a word and the word's context changes with context in a random vector model. From Jones &amp; Mewhort (2007), Psychological Review, 114(1), p. 18.\" width=\"800\" height=\"162\" srcset=\"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2016\/06\/table8_800.jpg 800w, http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2016\/06\/table8_800-300x61.jpg 300w, http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2016\/06\/table8_800-768x156.jpg 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><p id=\"caption-attachment-348\" class=\"wp-caption-text\">An example of how vector similarity between vectors representing a word and the word&#8217;s context changes with context in a random vector model. From Jones &amp; Mewhort (2007), <em>Psychological Review<\/em>, 114(1), p. 18.<\/p><\/div>\n<p>Count-based models are frequently contrasted with prediction-based<a href=\"#_ftn1\" name=\"_ftnref1\">[1]<\/a> models, such as those generated by the popular software package <em>word2vec <\/em>(Mikolov, Chen, Corrado, &amp; Dean, 2013). I won\u2019t explain the inner workings of <em>word2vec<\/em> in detail as this has been done by <a href=\"https:\/\/www.quora.com\/How-does-word2vec-work\">many<\/a> <a href=\"http:\/\/rare-technologies.com\/making-sense-of-word2vec\/\">others<\/a>, other than to say that <em>word2vec<\/em> consists of two distinct algorithms. The first, <em>continuous bag of words (CBOW)<\/em>, attempts to predict the target word given its context, while the <em>skip-gram <\/em>algorithm attempts to predict a word\u2019s context given the word itself. Skip-grams seem to achieve equal or superior performance to CBOW for most purposes, and as such this is the algorithm employed most frequently in the literature. To optimize its ability to predict contexts that actually appear in the corpus, and to minimize its tendency to predict contexts that don\u2019t, the algorithm continuously tweaks a large number of parameters according to a partially stochastic process. Ultimately, the learned model parameters corresponding to a given target word are treated as that word\u2019s vector representation. These relatively low-dimensional <a href=\"#_ftn2\" name=\"_ftnref2\">[2]<\/a> vectors are often described as \u201cword embeddings,\u201d to distinguish them from vectors of models in which every dimension corresponds to one of an extremely large number of context words.<\/p>\n<p>Levy &amp; Goldberg rocked the world of word embeddings in 2014 when they demonstrated that despite the apparent gulf between the inner workings of prediction-based and count-based models, <em>word2vec<\/em>\u2019s skip-gram algorithm is implicitly factorizing a word-context PMI matrix. In other words, the core of what it\u2019s doing, mathematically speaking, is not much different than the sort of thing that some count-based models have been doing for awhile. This is particularly clear in the case of the Glove model (Pennington, Socher, &amp; Manning, 2014), which explicitly factorizes a word-context PMI matrix, and generally achieves comparable results to word2vec. However, word2vec has two other properties that contribute to its continued popularity:<\/p>\n<ul>\n<li>There is a very good, fast implementation available from Google at <a href=\"https:\/\/code.google.com\/archive\/p\/word2vec\/\">https:\/\/code.google.com\/archive\/p\/word2vec\/<\/a> which runs easily on a laptop with limited memory, and an excellent <a href=\"https:\/\/radimrehurek.com\/gensim\/models\/word2vec.html\">Python implementation<\/a> available from Radim \u0158eh\u016f\u0159ek.<\/li>\n<li>Compared against several alternatives, it works quite well \u201coff the shelf\u201d with the default settings, as recently demonstrated by Pereira et al. (2016).<\/li>\n<\/ul>\n<p>Given this combination of ease of use and state-of-the-art performance, it\u2019s no surprise that researchers have been using word2vec for everything from sentiment classification (Xue, Fu, &amp; Shaobin, 2014), to experimental deformation of <em>Pride and Prejudice <\/em>(Cherny, 2014), to predicting relationships between smell-related words using a combination of linguistic and olfactory data (Kiela, Bulat &amp; Clark 2015). In the sections that follow, I wanted to follow up on Ryan Heuser\u2019s use of <em>word2vec<\/em> to investigate the contexts of \u2018virtue,\u2019 \u2018learning,\u2019 \u2018riches,\u2019 and \u2018genius\u2019 within the eighteenth-century corpus ECCO-TCP. In the words of 21<sup>st<\/sup> century Internet media company and clickbait purveyor BuzzFeed.com (2016), \u201cthe answer may surprise you.<a name=\"skipahead\"><\/a>\u201d<\/p>\n<\/div>\n<h2>A sceptical interlude<\/h2>\n<div class=\"section\">\n<p>So far, we\u2019ve strongly suggested that these analogical arithmetic problems work in vector space models for exactly the reason you\u2019d think they should. If A is to B as C is to D, then the conceptual difference between D and C should be approximately equal to the conceptual difference between B and A. Or to say the same thing in terms of vectors,<br \/>\n&nbsp;<br \/>\nV<em>(woman) \u2013 <\/em>V<em>(man) \u2248 <\/em>V<em>(queen) \u2013 <\/em>V<em>(king)<\/em><\/p>\n<p>and, rearranging the terms algebraically,<\/p>\n<p>V<em>(king) \u2013 <\/em>V<em>(man) + <\/em>V<em>(woman) \u2248 <\/em>V<em>(queen)<\/em><\/p>\n<p>which is the very proposition we tested in the illustrative example that we stepped through earlier. But what if the reason this works has nothing to do with the analogical relationship between <em>king\/queen <\/em>and <em>man\/woman<\/em>, but is rather due solely to the fact that the contexts in which <em>queen<\/em> is found share lexical material with contexts in which <em>king <\/em>and <em>woman <\/em>tend to appear? On this view, the presence of <em>man <\/em>is irrelevant, and we should expect V(<em>king<\/em>)<em> + <\/em>V(<em>woman<\/em>) to yield a vector that is even more similar to V(<em>queen<\/em>) than the needlessly complex V<em>(king) \u2013 <\/em>V<em>(man) + <\/em>V<em>(woman).<\/em><\/p>\n<p>How would we test this hypothesis? As previously implied, the canonical test for whether a vector space model can correctly complete the analogy \u201cA is to B as C is to __\u201d is to consider the vectors of all the different target words in our model, and to sort them by their degree of similarity to V(C) \u2013 V(A) + V(B). If the word corresponding to the correct answer is on top (or near the top, if our criterion for success is a little looser), then we declare success and move on.<\/p>\n<p>But, sceptics that we are, we don\u2019t just want to compare V(<em>queen<\/em>) to V<em>(king) \u2013 <\/em>V<em>(man) + <\/em>V<em>(woman); <\/em>we also want to compare it directly to V(<em>king) + <\/em>V<em>(woman). <\/em>If it\u2019s more similar to the latter than to the former, then that\u2019s evidence that our vector space model hasn\u2019t picked up on an analogical relationship, but rather a combinatorial one: that is, that the discourse contexts of <em>queen<\/em> share lexical material with those of <em>king<\/em> and <em>woman<\/em>, and the lexical context of <em>man <\/em>is largely irrelevant.<\/p>\n<p>Let\u2019s test whether this is true in ECCO-TCP. Because the word2vec algorithms involve a stochastic component, I created six models from ECCO-TCP using Google\u2019s code, using the skip-gram algorithm with the default window size (5 words on either side of the target). Just as Mikolov et al. reported in their original paper using the Google News corpus, computing V(<em>king) \u2013 <\/em>V<em>(man) + V(woman) <\/em>in ECCO-TCP yields a vector closer to the vector of the target word V(<em>queen) <\/em>than any other. Here are the ten most similar vectors to V(<em>king) \u2013 <\/em>V<em>(man) + <\/em>V<em>(woman) <\/em>in each of the six models, using the vector cosine as the similarity function:<\/p>\n<p><strong><em>Most similar vectors to V(king) &#8211;\u00a0V(man) + V(woman)<\/em><\/strong><\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 1<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 2<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 3<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.8428452<\/td>\n<td>queen<\/td>\n<td>0.8482416<\/td>\n<td>queen<\/td>\n<td>0.8460983<\/td>\n<td>queen<\/td>\n<\/tr>\n<tr>\n<td>0.8044825<\/td>\n<td>king<\/td>\n<td>0.7945445<\/td>\n<td>king<\/td>\n<td>0.8085347<\/td>\n<td>king<\/td>\n<\/tr>\n<tr>\n<td>0.7614051<\/td>\n<td>princess<\/td>\n<td>0.7806808<\/td>\n<td>princess<\/td>\n<td>0.7889509<\/td>\n<td>princess<\/td>\n<\/tr>\n<tr>\n<td>0.6964183<\/td>\n<td>berengaria<\/td>\n<td>0.7120882<\/td>\n<td>adelais<\/td>\n<td>0.7216414<\/td>\n<td>ethelburga<\/td>\n<\/tr>\n<tr>\n<td>0.6949631<\/td>\n<td>adelais<\/td>\n<td>0.7057881<\/td>\n<td>archduchess<\/td>\n<td>0.7086453<\/td>\n<td>berengaria<\/td>\n<\/tr>\n<tr>\n<td>0.6942522<\/td>\n<td>infanta<\/td>\n<td>0.7038351<\/td>\n<td>ethelburga<\/td>\n<td>0.7041441<\/td>\n<td>elizabeth<\/td>\n<\/tr>\n<tr>\n<td>0.6883563<\/td>\n<td>elizabeth<\/td>\n<td>0.7000595<\/td>\n<td>atheling<\/td>\n<td>0.7023171<\/td>\n<td>anjou<\/td>\n<\/tr>\n<tr>\n<td>0.6814717<\/td>\n<td>ethelburga<\/td>\n<td>0.6953432<\/td>\n<td>infant<\/td>\n<td>0.7022427<\/td>\n<td>infanta<\/td>\n<\/tr>\n<tr>\n<td>0.6765546<\/td>\n<td>maude<\/td>\n<td>0.6914362<\/td>\n<td>berengaria<\/td>\n<td>0.702105<\/td>\n<td>atheling<\/td>\n<\/tr>\n<tr>\n<td>0.6713672<\/td>\n<td>mary<\/td>\n<td>0.6906615<\/td>\n<td>maude<\/td>\n<td>0.7013055<\/td>\n<td>adelais<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 4<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 5<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 6<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.8556742<\/td>\n<td>queen<\/td>\n<td>0.8349165<\/td>\n<td>queen<\/td>\n<td>0.8482416<\/td>\n<td>queen<\/td>\n<\/tr>\n<tr>\n<td>0.8013376<\/td>\n<td>princess<\/td>\n<td>0.7957921<\/td>\n<td>king<\/td>\n<td>0.7945445<\/td>\n<td>king<\/td>\n<\/tr>\n<tr>\n<td>0.8002654<\/td>\n<td>king<\/td>\n<td>0.7841321<\/td>\n<td>princess<\/td>\n<td>0.7806808<\/td>\n<td>princess<\/td>\n<\/tr>\n<tr>\n<td>0.7235335<\/td>\n<td>infanta<\/td>\n<td>0.7190303<\/td>\n<td>ethelburga<\/td>\n<td>0.7120882<\/td>\n<td>adelais<\/td>\n<\/tr>\n<tr>\n<td>0.6992922<\/td>\n<td>archduke<\/td>\n<td>0.7120405<\/td>\n<td>infant<\/td>\n<td>0.7057881<\/td>\n<td>archduchess<\/td>\n<\/tr>\n<tr>\n<td>0.6983816<\/td>\n<td>boleyn<\/td>\n<td>0.7108373<\/td>\n<td>berengaria<\/td>\n<td>0.7038351<\/td>\n<td>ethelburga<\/td>\n<\/tr>\n<tr>\n<td>0.6973003<\/td>\n<td>anjou<\/td>\n<td>0.7077445<\/td>\n<td>atheling<\/td>\n<td>0.7000595<\/td>\n<td>atheling<\/td>\n<\/tr>\n<tr>\n<td>0.6860953<\/td>\n<td>berengaria<\/td>\n<td>0.7030088<\/td>\n<td>adelais<\/td>\n<td>0.6953432<\/td>\n<td>infanta<\/td>\n<\/tr>\n<tr>\n<td>0.6837387<\/td>\n<td>empress<\/td>\n<td>0.697409<\/td>\n<td>ethelred<\/td>\n<td>0.6914362<\/td>\n<td>berengaria<\/td>\n<\/tr>\n<tr>\n<td>0.6816853<\/td>\n<td>adelais<\/td>\n<td>0.6963173<\/td>\n<td>elizabeth<\/td>\n<td>0.6906615<\/td>\n<td>maude<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Sure enough, \u2018queen\u2019 tops the list in every instance. To continue with our original plan, what do we find when we look for the most similar vectors to V(<em>king) + <\/em>V<em>(woman)<\/em>? That turns out to yield the following:<\/p>\n<p><strong><em>Most similar vectors to V(king) + V(woman)<\/em><\/strong><\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 1<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 2<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 3<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.8015435<\/td>\n<td>king<\/td>\n<td>0.8013967<\/td>\n<td>king<\/td>\n<td>0.8045057<\/td>\n<td>king<\/td>\n<\/tr>\n<tr>\n<td>0.7678058<\/td>\n<td>queen<\/td>\n<td>0.7652432<\/td>\n<td>queen<\/td>\n<td>0.7655054<\/td>\n<td>queen<\/td>\n<\/tr>\n<tr>\n<td>0.741822<\/td>\n<td>woman<\/td>\n<td>0.7480951<\/td>\n<td>woman<\/td>\n<td>0.7540007<\/td>\n<td>princess<\/td>\n<\/tr>\n<tr>\n<td>0.741563<\/td>\n<td>princess<\/td>\n<td>0.7397602<\/td>\n<td>princess<\/td>\n<td>0.7470537<\/td>\n<td>woman<\/td>\n<\/tr>\n<tr>\n<td>0.7386138<\/td>\n<td>emperess<\/td>\n<td>0.7268931<\/td>\n<td>monarch<\/td>\n<td>0.7259848<\/td>\n<td>prince<\/td>\n<\/tr>\n<tr>\n<td>0.7283809<\/td>\n<td>prince<\/td>\n<td>0.7170094<\/td>\n<td>prince<\/td>\n<td>0.7146243<\/td>\n<td>man<\/td>\n<\/tr>\n<tr>\n<td>0.7053233<\/td>\n<td>man<\/td>\n<td>0.7058172<\/td>\n<td>man<\/td>\n<td>0.7020829<\/td>\n<td>emperess<\/td>\n<\/tr>\n<tr>\n<td>0.7038987<\/td>\n<td>monarch<\/td>\n<td>0.7048363<\/td>\n<td>nobleman<\/td>\n<td>0.7010611<\/td>\n<td>monarch<\/td>\n<\/tr>\n<tr>\n<td>0.7034665<\/td>\n<td>nobleman<\/td>\n<td>0.6988204<\/td>\n<td>potiphar<\/td>\n<td>0.7009223<\/td>\n<td>nobleman<\/td>\n<\/tr>\n<tr>\n<td>0.6957511<\/td>\n<td>rizio<\/td>\n<td>0.6987603<\/td>\n<td>emperess<\/td>\n<td>0.6943904<\/td>\n<td>hephoestion<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 4<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 5<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 6<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.7972889<\/td>\n<td>woman<\/td>\n<td>0.7975317<\/td>\n<td>king<\/td>\n<td>0.8041995<\/td>\n<td>king<\/td>\n<\/tr>\n<tr>\n<td>0.7931416<\/td>\n<td>king<\/td>\n<td>0.7555078<\/td>\n<td>queen<\/td>\n<td>0.7609213<\/td>\n<td>queen<\/td>\n<\/tr>\n<tr>\n<td>0.7569074<\/td>\n<td>queen<\/td>\n<td>0.7445927<\/td>\n<td>princess<\/td>\n<td>0.7521903<\/td>\n<td>woman<\/td>\n<\/tr>\n<tr>\n<td>0.748388<\/td>\n<td>princess<\/td>\n<td>0.7422349<\/td>\n<td>woman<\/td>\n<td>0.7455949<\/td>\n<td>princess<\/td>\n<\/tr>\n<tr>\n<td>0.736528<\/td>\n<td>man<\/td>\n<td>0.7217866<\/td>\n<td>prince<\/td>\n<td>0.7344416<\/td>\n<td>prince<\/td>\n<\/tr>\n<tr>\n<td>0.7338801<\/td>\n<td>husband<\/td>\n<td>0.7178928<\/td>\n<td>emperess<\/td>\n<td>0.7165893<\/td>\n<td>nobleman<\/td>\n<\/tr>\n<tr>\n<td>0.718644<\/td>\n<td>prince<\/td>\n<td>0.7142186<\/td>\n<td>nobleman<\/td>\n<td>0.7152176<\/td>\n<td>man<\/td>\n<\/tr>\n<tr>\n<td>0.7119973<\/td>\n<td>monarch<\/td>\n<td>0.7104293<\/td>\n<td>man<\/td>\n<td>0.7125439<\/td>\n<td>husband<\/td>\n<\/tr>\n<tr>\n<td>0.7093402<\/td>\n<td>lover<\/td>\n<td>0.7009807<\/td>\n<td>hephoestion<\/td>\n<td>0.7089317<\/td>\n<td>monarch<\/td>\n<\/tr>\n<tr>\n<td>0.7019279<\/td>\n<td>dorastus<\/td>\n<td>0.6977934<\/td>\n<td>lucumon<\/td>\n<td>0.7017831<\/td>\n<td>lucumon<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This is an anticlimactic but reassuring result: V(<em>queen)<\/em>\u2019s similarity to V(<em>king) \u2013 <\/em>V<em>(man) + <\/em>V<em>(woman) <\/em>is reliably slightly greater than its similarity to V<em>(king) + <\/em>V<em>(woman)<\/em>, in absolute numerical terms as well as relative to other words. Removing what is contextually shared by <em>king <\/em>and <em>man <\/em>and adding the result to <em>woman<\/em> yields a better match to <em>queen <\/em>than merely combining what is common to <em>king <\/em>and <em>woman<\/em>. In other words, the lexical statistics of <em>king, queen, man, <\/em>and <em>woman <\/em>contain analogical, not just combinatorial, information.<\/p>\n<\/div>\n<h2>What is genius?<\/h2>\n<div class=\"section\">\n<p>The investigation that follows was prompted by my curiosity surrounding Ryan Heuser\u2019s investigation of the analogy \u201criches is to virtue as learning is to genius\u201d in Edward Young\u2019s <em>Conjectures on Original Composition<\/em>. Young explains his reasoning as follows:\u201cIf I might speak farther of learning, and genius, I would compare genius to virtue, and learning to riches. As riches are most wanted where there is least virtue; so learning where there is least genius. As virtue without much riches can give happiness, so genius without much learning can give renown.\u201d<\/p>\n<p>Should we then expect to find that V(<em>virtue) \u2013 <\/em>V<em>(riches) + <\/em>V<em>(learning)<\/em> <em>\u2248<\/em> V(genius), just as V(<em>king) \u2013 <\/em>V<em>(man) + <\/em>V<em>(woman) \u2248 <\/em>V<em>(queen)<\/em>? Intuitively, this seems like a lot to expect from a vector space model. I\u2019m not sure that most individuals, either today or in the 18C, would\u00a0immediately\u00a0complete the analogy \u201criches is to virtue as learning is to ____\u201d with <em>genius<\/em>. It\u2019s also not clear exactly how or why we should expect this parallel to manifest itself in the statistics of language, whereas with <em>king, man, woman,<\/em> and <em>queen<\/em> it\u2019s a bit more obvious (as I argue <a href=\"#lexicalmaths\">above<\/a>). But the math seems to work, more or less: Ryan finds that out of the 129,098 word vectors in his model, V(<em>genius<\/em>) is the sixth most similar to V(<em>virtue<\/em>)<em> \u2013 <\/em>V(<em>riches<\/em>)<em> + <\/em>V(<em>learning<\/em>)! In my own six word2vec models, V(<em>genius<\/em>) has a median rank of 14<sup>th<\/sup> most similar (likely owing to differences in our corpus preprocessing, algorithm implementation, measures of vector similarity, and\/or random variation)\u2014a little lower than in Ryan\u2019s model, but still holding its own against tens of thousands of alternatives.<\/p>\n<p>What could account for this? As Ryan points out, there is another important symmetry in the relationship between riches\/virtue and learning\/genius: Both virtue and genius express &#8220;ethically immanent and comparatively individualist forms of value,&#8221; whereas learning and riches express &#8220;more socially-embedded and class-based forms of value.&#8221; If that is reflected in the linguistic contexts in which each of these words occur in ECCO-TCP, we might expect that the latter would \u2018cancel out\u2019 in the <em>\u2013 <\/em>V<em>(riches) + <\/em>V<em>(learning)<\/em> part of the formula V(<em>virtue) \u2013 <\/em>V<em>(riches) + <\/em>V<em>(learning)<\/em>, leaving <em>virtue <\/em>to be combined with aspects of <em>learning<\/em> that are free of whatever overtones that <em>learning <\/em>and <em>riches <\/em>might share.<\/p>\n<p>But is that the only possibility? There are other potential explanations. For example, could it be\u00a0that the discourse contexts in which <em>genius <\/em>appears are just really similar to the contexts in which <em>learning <\/em>appears<em>, <\/em>and that V(<em>virtue)<\/em> and V(<em>riches) <\/em>are just introducing noise? Or that the similarity of V(<em>genius) <\/em>to V(<em>learning) + <\/em>V(<em>virtue<\/em>)<em>\u00a0 <\/em>(or to V(<em>virtue) \u2013 <\/em>V<em>(riches)<\/em>, or to V(<em>learning) \u2013 <\/em>V<em>(riches)<\/em>) is the underlying cause of the finding that V(<em>virtue) \u2013 <\/em>V<em>(riches) + <\/em>V<em>(learning) \u2248 <\/em>V<em>(genius)<\/em>?<\/p>\n<p>Given the build-up in the previous section, you can probably see where this is headed. In my own six word2vec models, V(<em>learning) + <\/em>V<em>(virtue) <\/em>is even more similar to V(<em>genius)<\/em> than is V(<em>virtue) \u2013 <\/em>V<em>(riches) + <\/em>V<em>(learning)<\/em>, having\u00a0a median rank of 6 and median cosine similarity of .767:<\/p>\n<p><strong><em>Most similar vectors to V(learning) + V(virtue)<\/em><\/strong><\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 1<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 2<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 3<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.8989632<\/td>\n<td>learning<\/td>\n<td>0.8990331<\/td>\n<td>learning<\/td>\n<td>0.9003327<\/td>\n<td>learning<\/td>\n<\/tr>\n<tr>\n<td>0.862316<\/td>\n<td>virtue<\/td>\n<td>0.8639017<\/td>\n<td>virtue<\/td>\n<td>0.865199<\/td>\n<td>virtue<\/td>\n<\/tr>\n<tr>\n<td>0.7992792<\/td>\n<td>piety<\/td>\n<td>0.8079271<\/td>\n<td>piety<\/td>\n<td>0.804012<\/td>\n<td>piety<\/td>\n<\/tr>\n<tr>\n<td>0.7816411<\/td>\n<td>science<\/td>\n<td>0.7708147<\/td>\n<td>probity<\/td>\n<td>0.7795218<\/td>\n<td>science<\/td>\n<\/tr>\n<tr>\n<td>0.7774129<\/td>\n<td>genius<\/td>\n<td>0.769802<\/td>\n<td>science<\/td>\n<td>0.7786523<\/td>\n<td>probity<\/td>\n<\/tr>\n<tr>\n<td>0.7764739<\/td>\n<td>wisdom<\/td>\n<td>0.7687995<\/td>\n<td>genius<\/td>\n<td>0.7729051<\/td>\n<td>wisdom<\/td>\n<\/tr>\n<tr>\n<td>0.763268<\/td>\n<td>probity<\/td>\n<td>0.7681421<\/td>\n<td>wisdom<\/td>\n<td>0.7707377<\/td>\n<td>genius<\/td>\n<\/tr>\n<tr>\n<td>0.7384178<\/td>\n<td>knowledge<\/td>\n<td>0.7445111<\/td>\n<td>integrity<\/td>\n<td>0.7378831<\/td>\n<td>knowledge<\/td>\n<\/tr>\n<tr>\n<td>0.7366986<\/td>\n<td>philosophy<\/td>\n<td>0.7346854<\/td>\n<td>morals<\/td>\n<td>0.7342591<\/td>\n<td>morality<\/td>\n<\/tr>\n<tr>\n<td>0.7346987<\/td>\n<td>morality<\/td>\n<td>0.7289329<\/td>\n<td>erudition<\/td>\n<td>0.7320021<\/td>\n<td>integrity<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 4<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 5<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 6<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.901885<\/td>\n<td>learning<\/td>\n<td>0.9010782<\/td>\n<td>learning<\/td>\n<td>0.897611<\/td>\n<td>learning<\/td>\n<\/tr>\n<tr>\n<td>0.8747023<\/td>\n<td>virtue<\/td>\n<td>0.8701212<\/td>\n<td>virtue<\/td>\n<td>0.8649616<\/td>\n<td>virtue<\/td>\n<\/tr>\n<tr>\n<td>0.7941153<\/td>\n<td>piety<\/td>\n<td>0.7893927<\/td>\n<td>piety<\/td>\n<td>0.7979715<\/td>\n<td>piety<\/td>\n<\/tr>\n<tr>\n<td>0.7817841<\/td>\n<td>probity<\/td>\n<td>0.7749184<\/td>\n<td>science<\/td>\n<td>0.7828707<\/td>\n<td>probity<\/td>\n<\/tr>\n<tr>\n<td>0.7657776<\/td>\n<td>wisdom<\/td>\n<td>0.7683302<\/td>\n<td>probity<\/td>\n<td>0.7820131<\/td>\n<td>science<\/td>\n<\/tr>\n<tr>\n<td>0.7628907<\/td>\n<td>genius<\/td>\n<td>0.766835<\/td>\n<td>wisdom<\/td>\n<td>0.7652096<\/td>\n<td>genius<\/td>\n<\/tr>\n<tr>\n<td>0.7628165<\/td>\n<td>science<\/td>\n<td>0.7531987<\/td>\n<td>genius<\/td>\n<td>0.7627894<\/td>\n<td>wisdom<\/td>\n<\/tr>\n<tr>\n<td>0.7384895<\/td>\n<td>erudition<\/td>\n<td>0.7428998<\/td>\n<td>morality<\/td>\n<td>0.7471387<\/td>\n<td>knowledge<\/td>\n<\/tr>\n<tr>\n<td>0.7359856<\/td>\n<td>knowledge<\/td>\n<td>0.741063<\/td>\n<td>integrity<\/td>\n<td>0.7411712<\/td>\n<td>morality<\/td>\n<\/tr>\n<tr>\n<td>0.7331604<\/td>\n<td>unblemished<\/td>\n<td>0.74005<\/td>\n<td>knowledge<\/td>\n<td>0.7368826<\/td>\n<td>erudition<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>V<em>(genius) <\/em>was not at all similar to V(<em>virtue) \u2013 <\/em>V<em>(riches) <\/em>nor to V(<em>learning) \u2013 <\/em>V<em>(riches). <\/em>It was moderately similar to V(<em>learning) <\/em>alone, but not as much so as to V(<em>virtue) + <\/em>V<em>(learning).<\/em><\/p>\n<p>To recap: Across six <em>word2vec<\/em> models of ECCO-TCP that varied only in the element of random change inherent in the skip-gram algorithm itself, V(<em>genius)<\/em> was more similar to V(<em>learning) + <\/em>V<em>(virtue) <\/em>(median rank = 6, cos = .767) than it was to V(<em>virtue) \u2013 <\/em>V<em>(riches) + <\/em>V<em>(learning)<\/em> (median rank 14, cos = .591) or to V(<em>learning) <\/em>alone (median rank = 10, cos = .726). These differences appear to be statistically significant<a href=\"#_ftn3\" name=\"_ftnref3\">[3]<\/a>. In Ryan\u2019s <em>word2vec <\/em>model, V(<em>genius)<\/em> has a cosine similarity of .779 to V(<em>learning) + <\/em>V<em>(virtue)<\/em>, .651 to V(<em>virtue) \u2013 <\/em>V<em>(riches) + <\/em>V<em>(learning)<\/em>, and .731 to V(<em>learning)\u2014<\/em>the same ordinal pattern of effects.<\/p>\n<p>None of this should be taken as a claim that the concept <em>genius<\/em> was merely a virtue-infused notion of <em>learning<\/em>, nor as any sort of claim that the complex constellation of ideas that surrounded <em>genius <\/em>can or should be distilled to a simple story. Entire books have been written on the incredible array of uses to which the term has been put, such as Ann Jefferson\u2019s <em>Genius in France <\/em>and Darrin McMahon\u2019s <em>Divine Fury: A History of Genius<\/em>. As Jefferson notes, it would be misleading to assume a strong cultural consensus around the term at all in the 18C, as different authors described it as something to be admired and something to be suspicious of, as something to be emulated and as something incapable of emulation, as something vigorous and vital and as something aberrant or even sickly. Aside from the varying connotations and functions of the term, its very meaning was exceedingly labile, as Carson (2016) relates in this description of its mentions in reference materials of the time:<\/p>\n<blockquote><p>\u201cWith regard to the meaning of \u2018genius,\u2019 there was a range of possibilities, as Samuel Johnson makes clear in his <em>Dictionary of the English Language <\/em>(1755). \u2018Genius\u2019 could signify a spirit (\u2018the protecting or ruling power of men, places, or things,\u2019 as Jonson put it); \u2018a man endowed with superior faculties\u2019; \u2018mental power or faculties\u2019 themselves; a natural disposition \u2018for some peculiar employment\u2019; or nature or disposition broadly understood, such as \u2018the <em>genius <\/em>of the times\u2019 or, very commonly, the <em>genius <\/em>of a people. The <em>Encyclopaedia Britannica <\/em>of 1771 also defined \u2018genius\u2019 first as \u2018good or evil spirit,\u2019 and then as \u2018a natural talent or disposition to do one thing more than another,\u2019 emphasizing with the latter that \u2018art and industry add much to natural endowments, but cannot supply them where they are wanting.\u2019\u201d<br \/>\n<cite><br \/>\nJohn Carson, <em>Genealogies of Genius<\/em>, p. 50<\/cite><\/p><\/blockquote>\n<p>No vector space model will give us anything close to a complete understanding of this multifaceted term. Yet if we were forced to come up with a handful of terms such that the combination of their 18C discourse contexts was similar to the lexical contexts in which \u2018genius\u2019 appeared, we could do worse than to look at the words with the most similar vectors to <em>genius <\/em>in ECCO-TCP:<\/p>\n<p><strong><em>Most similar vectors to V(genius)<\/em><\/strong><\/p>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 1<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 2<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 3<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.7910758<\/td>\n<td>talents<\/td>\n<td>0.7824867<\/td>\n<td>talents<\/td>\n<td>0.7910977<\/td>\n<td>talents<\/td>\n<\/tr>\n<tr>\n<td>0.7422448<\/td>\n<td>erudition<\/td>\n<td>0.7431535<\/td>\n<td>erudition<\/td>\n<td>0.749464<\/td>\n<td>erudition<\/td>\n<\/tr>\n<tr>\n<td>0.7387395<\/td>\n<td>transcendant<\/td>\n<td>0.7377685<\/td>\n<td>abilities<\/td>\n<td>0.7368499<\/td>\n<td>abilities<\/td>\n<\/tr>\n<tr>\n<td>0.7357711<\/td>\n<td>abilities<\/td>\n<td>0.7279122<\/td>\n<td>talent<\/td>\n<td>0.7325332<\/td>\n<td>learning<\/td>\n<\/tr>\n<tr>\n<td>0.7305901<\/td>\n<td>learning<\/td>\n<td>0.7224676<\/td>\n<td>learning<\/td>\n<td>0.7313749<\/td>\n<td>talent<\/td>\n<\/tr>\n<tr>\n<td>0.713866<\/td>\n<td>talent<\/td>\n<td>0.7190522<\/td>\n<td>transcendant<\/td>\n<td>0.7141883<\/td>\n<td>transcendant<\/td>\n<\/tr>\n<tr>\n<td>0.7044489<\/td>\n<td>poetry<\/td>\n<td>0.7076396<\/td>\n<td>improver<\/td>\n<td>0.7132785<\/td>\n<td>versatility<\/td>\n<\/tr>\n<tr>\n<td>0.7032343<\/td>\n<td>beneficently<\/td>\n<td>0.701682<\/td>\n<td>poesy<\/td>\n<td>0.7111905<\/td>\n<td>poesy<\/td>\n<\/tr>\n<tr>\n<td>0.7012243<\/td>\n<td>acquirements<\/td>\n<td>0.7013722<\/td>\n<td>beneficently<\/td>\n<td>0.6998008<\/td>\n<td>wit<\/td>\n<\/tr>\n<tr>\n<td>0.7005317<\/td>\n<td>wit<\/td>\n<td>0.7012314<\/td>\n<td>poetry<\/td>\n<td>0.6997623<\/td>\n<td>excellence<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<tbody>\n<tr class=\"headrow\">\n<td><strong>Model 4<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 5<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<td><strong>Model 6<\/strong><\/td>\n<td><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0.7970124<\/td>\n<td>talents<\/td>\n<td>0.7917068<\/td>\n<td>talents<\/td>\n<td>0.7915666<\/td>\n<td>talents<\/td>\n<\/tr>\n<tr>\n<td>0.7376211<\/td>\n<td>talent<\/td>\n<td>0.7358315<\/td>\n<td>abilities<\/td>\n<td>0.7566795<\/td>\n<td>erudition<\/td>\n<\/tr>\n<tr>\n<td>0.7310964<\/td>\n<td>transcendant<\/td>\n<td>0.7354304<\/td>\n<td>erudition<\/td>\n<td>0.7376744<\/td>\n<td>abilities<\/td>\n<\/tr>\n<tr>\n<td>0.7310451<\/td>\n<td>erudition<\/td>\n<td>0.7310395<\/td>\n<td>talent<\/td>\n<td>0.7295218<\/td>\n<td>learning<\/td>\n<\/tr>\n<tr>\n<td>0.7290431<\/td>\n<td>abilities<\/td>\n<td>0.7198468<\/td>\n<td>learning<\/td>\n<td>0.7285496<\/td>\n<td>talent<\/td>\n<\/tr>\n<tr>\n<td>0.7226858<\/td>\n<td>learning<\/td>\n<td>0.7059976<\/td>\n<td>transcendant<\/td>\n<td>0.7181852<\/td>\n<td>transcendant<\/td>\n<\/tr>\n<tr>\n<td>0.7158517<\/td>\n<td>poesy<\/td>\n<td>0.7057749<\/td>\n<td>wit<\/td>\n<td>0.7139235<\/td>\n<td>poesy<\/td>\n<\/tr>\n<tr>\n<td>0.7023761<\/td>\n<td>achievement<\/td>\n<td>0.7034961<\/td>\n<td>poesy<\/td>\n<td>0.7105052<\/td>\n<td>improver<\/td>\n<\/tr>\n<tr>\n<td>0.7013177<\/td>\n<td>excellence<\/td>\n<td>0.6977917<\/td>\n<td>versatility<\/td>\n<td>0.7092224<\/td>\n<td>poetry<\/td>\n<\/tr>\n<tr>\n<td>0.700995<\/td>\n<td>wit<\/td>\n<td>0.6933404<\/td>\n<td>poetic<\/td>\n<td>0.7051146<\/td>\n<td>wit<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>It turns out that constructing a vector <em>x<\/em> such that V(<em>genius<\/em>) is the most similar vector to <em>x <\/em>in every model requires only three terms: <em>x <\/em>= V(<em>talents) + <\/em>V<em>(abilities) + <\/em>V<em>(erudition)<\/em>. Doing the same for V(<em>learning) <\/em>is even easier, as we need only a single term: V(<em>learning)<\/em> is reliably the most similar vector to V(<em>erudition) <\/em>alone<em>.<\/em><\/p>\n<p>It is tempting to read into this a cultural assumption that <em>genius<\/em> differs from <em>learning <\/em>in that it involves, but is not limited to, \u2018mere\u2019 erudition\u2014that <em>genius<\/em> also \u2018contains\u2019 concepts such as <em>talents<\/em>, <em>abilities<\/em>, and perhaps others (originality, creativity, innateness, etc.). \u00a0There may well be something to this idea. But we should interpret with caution. The fact that <em>talents<\/em> and <em>genius<\/em> appear in similar discourse contexts suggests they are being deployed in similar ways, but may also suggest that they are co-occurring in contexts in which they are being contrasted. For example, far from talent being a component of the concept of genius, Jefferson notes that the two were frequently placed in opposition. \u201cThe emergence of genius as a key concept in the eighteenth century was frequently supported by an opposition between genius and talent: talent is the competent application of rules, whereas genius is the attribute necessary for creating original works of art\u201d (Jefferson, 2015). Indeed, a proximity search turns up some evidence that such contrasts do occur in ECCO-TCP, e.g. \u201cThe apes, however, are more remarkable for talents than genius,\u201d (<a href=\"http:\/\/quod.lib.umich.edu\/e\/ecco\/004880992.0001.003\/1:7.1.1?firstpubl1=1700;firstpubl2=1800;rgn=div3;sort=occur;subview=detail;type=simple;view=fulltext;q1=The+apes%2C+however%2C+are+more+remarkable+for+talents+than+genius\">Buffon, <em>Natural history<\/em>, 1780<\/a>); &#8220;&#8230;a happier talent for that species of writing, which tho&#8217; it does not demand the highest genius, yet is as difficult to attain&#8221; (<a href=\"http:\/\/spenserians.cath.vt.edu\/BiographyRecord.php?action=GET&amp;bioid=33124\">Anonymous, in Cibber-Shiels, <em>Lives of the Poets of Great Britain and Ireland<\/em>, 1753<\/a>). That said, implications that talent and genius regularly accompany each other are arguably more common (e.g., &#8220;&#8230;your being a Man whose Talent and Genius lay particularly in Figures&#8230;&#8221;, <a href=\"http:\/\/quod.lib.umich.edu\/e\/ecco\/004780302.0001.000\/1:2?rgn=div1;view=fulltext\">Defoe, 1731<\/a>; &#8220;to cultivate and bring to perfection whatever talent or genius he may possess&#8230;&#8221;, <a href=\"http:\/\/quod.lib.umich.edu\/e\/ecco\/004861571.0001.001\/1:5.2?firstpubl1=1700;firstpubl2=1800;rgn=div2;sort=occur;subview=detail;type=simple;view=fulltext;q1=talent+or+genius+he+may+possess\">Smith, 1776<\/a>; &#8220;the happy Talent of a superior Genius&#8221;, <a href=\"http:\/\/quod.lib.umich.edu\/e\/ecco\/004865704.0001.001\/1:84?firstpubl1=1700;firstpubl2=1800;rgn=div1;sort=occur;subview=detail;type=simple;view=fulltext;q1=the+happy+Talent+of+a+superior+Genius\">&#8216;Mr. T. L.&#8217;, 1778<\/a>). More close reading of the contexts in which \u2018talent\u2019 and \u2018genius\u2019 appear in our corpus is necessary. But the computational exercise has helped us focus our search. Searching ECCO for conjunctions of the terms \u201cgenius\u201d and the relatively unstudied term \u201cerudition,\u201d for example, may reveal novel and fundamental insights about the multiplicity of ways in which <em>genius <\/em>was understood.<\/p>\n<p>\u201cIn making a comparison, we are placed between the extremes of analogy so close as almost to amount to absolute identity, which can leave no room for a doubtful conclusion; and an analogy so remote as to leave little similitude between the objects but what must exist between any two whatever, as there can be no terrestrial objects that are dissimilar to each in all points,\u201d wrote the English writer William Danby in <a href=\"https:\/\/archive.org\/details\/underhighestlea00adamgoog\">an 1821 discourse largely inspired by Young\u2019s <em>Night-Thoughts<\/em><\/a>. \u201cBetween these two extremes, there are numberless degrees of similitude, each of which affects the observer more or less according to his turn of mind.\u201d \u00a0It is these kinds of analogies that I find the most thought-provoking: not the banal ones frequently used to prove vector space models\u2019 analogical capabilities, nor those that involve the overinterpretation of concepts arbitrarily juxtaposed<a href=\"#_ftn4\" name=\"_ftnref4\">[4]<\/a>, but those like Edward Young\u2019s, \u00a0which may be nonobvious on a first reading but strike the reader as insightful upon explication. Whether vector space models are capable of extracting such analogies from the statistics of language remains to be seen. As for myself, I find it exciting that such \u201cnumberless degrees of similitude\u201d can be quantified at all, and I look forward to the insights that will come as digital humanists continue to explore vector space models of historical corpora.<\/p>\n<\/div>\n<h2>Footnotes<\/h2>\n<p><a href=\"#_ftnref1\" name=\"_ftn1\">[1]<\/a> To borrow terminology from Levy, Goldberg, &amp; Dagan, 2015. Baroni, Dinu, &amp; Kruszewski (2014) prefer <em>count models <\/em>vs. <em>predict(ive) models.<\/em><\/p>\n<p><a href=\"#_ftnref2\" name=\"_ftn2\">[2]<\/a> Some researchers use \u201cword embeddings\u201d as a synonym for vectors generated by prediction-based models, but the more common usage now seems to be that \u201cword embeddings\u201d refer to vectors that are low-dimensional (often 100-500 dimensions) relative to the vocabulary size (generally in the tens or hundreds of thousands).<\/p>\n<p><a href=\"#_ftnref3\" name=\"_ftn3\">[3]<\/a> Treating each cosine as an observation, paired-samples t-tests with Bonferroni correction for multiple comparisons found that differences between cos(<em>genius<\/em>, <em>learning + virtue<\/em>) and each of the other similarity scores discussed here (cos(<em>genius<\/em>, <em>virtue \u2013 riches + learning<\/em>) and cos(<em>genius<\/em>, <em>learning<\/em>)) \u00a0were statistically significant (p &lt; .001). Of course, low p-values are all too easy to find in natural language processing for a number of reasons (S\u00f8gaard et al., 2014)&#8211;significant differences aren\u2019t necessarily meaningful differences. But at the least, significance testing allows us to be pretty confident that the differences we observe aren\u2019t merely due to the variation introduced by the stochastic element of <em>word2vec.<\/em><\/p>\n<p><a href=\"#_ftnref4\" name=\"_ftn4\">[4]<\/a> A vector space model will never be at a loss to complete even the most absurd of analogies, as there will always be <em>some <\/em>target word whose vector is most similar to any vector of the form V(C) \u2013 V(A) + V(B). In ECCO-TCP, as it happens, <em>chariot <\/em>is to <em>information <\/em>as <em>egg <\/em>is to <em>fecundated<\/em>.<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<h2>References<\/h2>\n<p>Baroni, M., Dinu, G., &amp; Kruszewski, G. (2014, June). Don&#8217;t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (1) (pp. 238-247).<\/p>\n<p>Bullinaria, J.A. &amp; Levy, J.P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39, 510-526.<\/p>\n<p>Carson, J. (2016). Equality, inequality, and difference: Genius as problem and possibility in American political\/scientific discourse. In Genealogies of Genius, eds. Joyce Chaplin &amp; Darrin McMahon. Palgrave McMillan: London.<\/p>\n<p>Cherny, L. (2016). <a href=\"http:\/\/blogger.ghostweather.com\/2014\/11\/visualizing-word-embeddings-in-pride.html\">&#8220;Visualizing Word Embeddings in Pride and Prejudice.&#8221;<\/a> Ghostweather R&amp;D Blog. 22 Nov. 2014. Web. 10 Jun 2016.<\/p>\n<p>Danby, W. (1821). Thoughts, chiefly on Serious Subjects. Printed for the author by E. Woolmer, Gazette-Office.<\/p>\n<p>Jefferson, A. (2014). Genius in France: An Idea and Its Uses. Princeton University Press.<\/p>\n<p>Jones, M. N., Kintsch, W., &amp; Mewhort, D. J. (2006). High-dimensional semantic space accounts of priming. Journal of Memory and Manguage, 55(4), 534-552.<\/p>\n<p>Jones, M. N., &amp; Mewhort, D. J. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review,114(1), 1.<\/p>\n<p>Jones, M. N., Willits, J., &amp; Dennis, S. (2015). Models of semantic memory. In Busemeyer, Wang, Townsend, &amp;Eidels (Eds.) Oxford Handbook of Mathematical and Computational Psychology. Oxford University Press. 232-254.<\/p>\n<p>Levy, O., &amp; Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems (pp. 2177-2185).<\/p>\n<p>Levy, O., Goldberg, Y., &amp; Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225.<\/p>\n<p>Kanerva, P., Kristofersson, J., &amp; Holst, A. (2000, August). Random indexing of text samples for latent semantic analysis. In Proceedings of the 22nd annual conference of the cognitive science society (Vol. 1036). USA: Cognitive Science Society.<\/p>\n<p>Kiela, D., Bulat, L., &amp; Clark, S. (2015). Grounding semantics in olfactory perception. In Proceedings of ACL (Vol. 2, pp. 231-6).<\/p>\n<p>Mikolov, T., Chen, K., Corrado, G., &amp; Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.<\/p>\n<p>Pennington, J., Socher, R., &amp; Manning, C. D. (2014, October). Glove: Global Vectors for Word Representation. In EMNLP (Vol. 14, pp. 1532-1543).<\/p>\n<p>Pereira, F., Gershman, S., Ritter, S., &amp; Botvinick, M. (2016). A comparative evaluation of off-the-shelf distributed semantic representations for modelling behavioural data.<\/p>\n<p>Sahlgren, M. (2005, August). An introduction to random indexing. In Methods and applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering, TKE (Vol. 5).<\/p>\n<p>Schmidt, Ben. <a href=\"http:\/\/bookworm.benschmidt.org\/posts\/2015-10-25-Word-Embeddings.html\">&#8220;Word Embeddings for the Digital Humanities.&#8221;<\/a> Ben&#8217;s Bookworm Blog. 25 Oct. 2015. Web. 14 May 2016.<\/p>\n<p>S\u00f8gaard, A., Johannsen, A., Plank, B., Hovy, D., &amp; Alonso, H. M. (2014, June). What&#8217;s in a p-value in NLP? In Proc. CoNLL (pp. 1-10).<\/p>\n<p>Xue, B., Fu, C., &amp; Shaobin, Z. (2014, June). A Study on Sentiment Computing and Classification of Sina Weibo with Word2vec. In Big Data (BigData Congress), 2014 IEEE International Congress on (pp. 358-363). IEEE.<\/p>\n<p><a href=\"#_ftnref1\" name=\"_ftn1\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As someone who has been involved in the world of word vectors since 2007, it\u2019s been fascinating to watch the innovative applications that they\u2019ve been put to over the last few years. After reading Ryan Heuser\u2019s \u2018Word Vectors in the Eighteenth Century, Part 1,\u2019 I was left impressed not only by the thoughtful scholarship but [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,5],"tags":[8,9],"class_list":["post-273","post","type-post","status-publish","format-standard","hentry","category-computational-linguistics","category-digital-humanities","tag-vectors","tag-word-embeddings"],"_links":{"self":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/posts\/273","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/comments?post=273"}],"version-history":[{"count":10,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/posts\/273\/revisions"}],"predecessor-version":[{"id":591,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/posts\/273\/revisions\/591"}],"wp:attachment":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/media?parent=273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/categories?post=273"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/tags?post=273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}