Today I Learned This Part I: What are word2vec Embeddings?
Recently Quora put out a Question similarity competition on Kaggle. This is the first time I was attempting an NLP problem so a lot to learn. The one thing that blew my mind away was the word2vec embeddings.
Till now whenever I heard the term word2vec I visualized it as a way to create a bag of words vector for a sentence.
For those who don’t know bag of words: If we have a series of sentences(documents)
This is good - [1,1,1,0,0]
This is bad - [1,1,0,1,0]
This is awesome - [1,1,0,0,1]
Bag of words would encode it using 0:This 1:is 2:good 3:bad 4:awesome
But it is much more powerful than that.
What word2vec does is that it creates vectors for words. What I mean by that is that we have a 300 dimensional vector for every word(common bigrams too) in a dictionary.
How does that help?
We can use this for multiple scenarios but the most common are:
A. Using word2vec embeddings we can find out similarity between words. Assume you have to answer if these two statements signify the same thing:
President g…
Keep reading with a 7-day free trial
Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.