ELMo models have been pre-converted to the .magnitude
format for immmediate download and usage:
Contributor | Data | Light (basic support for out-of-vocabulary keys) |
Medium (recommended) (advanced support for out-of-vocabulary keys) |
Heavy (advanced support for out-of-vocabulary keys and faster most_similar_approx ) |
---|---|---|---|---|
AI2 - AllenNLP ELMo | 1 Billion Word Benchmark | 768D, 1536D, 3072D | 768D, 1536D, 3072D | 768D, 1536D, 3072D |
AI2 - AllenNLP ELMo with Google News word2vec vocabulary |
1 Billion Word Benchmark | 768D, 1536D, 3072D | 768D, 1536D, 3072D | 768D, 1536D, 3072D |
AI2 - AllenNLP ELMo | Wikipedia (1.9B) + WMT 2008-2012 (3.6B) | 3072D | 3072D | 3072D |
AI2 - AllenNLP ELMo with Google News word2vec vocabulary |
Wikipedia (1.9B) + WMT 2008-2012 (3.6B) | 3072D | 3072D | 3072D |
ELMo usage is slightly different than other embedding models (word2vec, GloVe, and fastText), which merits some explanation as to how Magnitude handles these differences.
ELMo vectors are "contextual" meaning they take into account the nature of the word in the sentence the word is being used. For example, where as in word2vec the word "play" only has a single embedding that combines both interpretations of the word (a command to start music or a theatrical act). In ELMo, this is not the case. The embedding for the word "play" in ELMo would, in theory, be different when trained and used with example sentences like "Play some music on the living room speakers." and "Get tickets for the play tonight.".
An ELMo vector for a target word is actually comprised of 3 components (a 2D array 3 x (embedding dimensions)
instead of just a 1D array of (embedding dimensions)
):
- A forward pass bi-directional RNN contextual embedding taking into account words before the target word in the sentence.
- A backward pass bi-directional RNN contextual embedding taking into account words after the target word in the sentence.
- A context-independent embedding of the target word.
For ease of use, each of these 2D embeddings for a target word are concatenated into a single 1D embedding when you use an ELMo .magnitude
model. So, for example, the elmo_2x1024_128_2048cnn_1xhighway_weights
ELMo .magnitude
model will actually contains 1D embeddings of size 768
(3 x 256
concatenated).
You can use Magnitude's concatenated 1D representation of ELMo's 2D representation, just like you would any other embedding (word2vec, fastText, GloVe). However, if you need the 2D representation for your application, you can easily unroll them after querying them in Magnitude like so:
elmo_vecs = Magnitude('elmo_2x1024_128_2048cnn_1xhighway_weights.magnitude')
sentence = elmo_vecs.query(["play", "some", "music", "on", "the", "living", "room", "speakers", "."])
# Returns: an array of size (9 (number of words) x 768 (3 ELMo components concatenated))
unrolled = elmo_vecs.unroll(sentence)
# Returns: an array of size (3 (each ELMo component) x 9 x 256 (the number of dimensions for each ELMo component))
Magnitude makes querying with context simple. Magnitude's query
method already takes in 1D lists and 2D lists of words. If you query a 1D list of words, Magnitude will treat that as a sentence and use ELMo to contextualize each word embedding with the words before and after it. If you query a 2D list of words, Magnitude will treat that as a batch of sentences.
ELMo vectors typically don't ship with a vocabulary of words to vectors since they require context and the vectors must be generated on the fly.
This unfortunately means some of Magnitude's functions like most_similar
or doesnt_match
don't have any vocabularies to work with and return results for.
We solve this problem by also including flavors of each ELMo model with a vocabulary from the Google's word2vec model attached (3,000,000 tokens) so that methods like most_similar
can be used.
If you don't need to use these methods, we recommend not downloading the models with a vocabulary as they add a significant amount to the file size.
The vectors for these 3,000,000 tokens are generated by using ELMo to generate vectors on a sentence with only the target word in it (a single word sentence).
If you want to use a different vocabulary, see the documentation for the converter.
ELMo models are character-based and, thus, handle out-of-vocabulary words through learned representations of subword information. Use ngram_oov=True
on the Magnitude constructor to switch to using Magnitude's out-of-vocabulary method instead.
Magnitude has a remote streaming feature. ELMo models are supported, however, there isn't much benefit to using it with ELMo models as disk space will still need to be consumed for ELMo models in most cases.