Written by Prashant Basnet
👋 Welcome to my Signature, a space between logic and curiosity.
I’m a Software Development Engineer who loves turning ideas into systems that work beautifully.
This space captures the process: the bugs, breakthroughs, and “aha” moments that keep me building.
Semantics & Sentiment Analysis:
Word2vec:
Word2vec trains words against other words that neighbour them in the input corpus.It does so in 1 of 2 ways:
If you try to implement word to vec on yourself, it takes a very long time on very large corpus.So usually the built in embedded word2vec are used.
If you want to train your on auto-encoder for word to vector, theoritically you can choose between 100 to 1000 dimensions.
Since we have each word mapped to a vector in this 300 dimensional space. We can use cosine similarity to measure how similar word vectors are to each other.
Cosine similarity is just checking the distance between two vectors.
here we see simply diagram in 2 dimensional space.
But this expands out to N dimension.
In our case, we'll be taking several 300 dimensional vectors and then calculating the coisne similarity between them, to see what vectors are most similar to each other, here actual vectors represents words
This also means we can perform vector arithmetic with the new word vectors.
So we can calculate a brand new vector by performing arithmetic that is adding or subtracting different vectors.
So i can take :
Vector(King) - Vector(Men) + Vector (Women)
here i can now, attempt to find the most similar existing vector to this new vector
So it close existing vector could be queen.
Vector(King) - Vector(Men) + Vector (Women) = Vector (Queen)
Essentially
So this is able to establish really interesting relationships between the word vectors. Including relationship between male vs female or even dimension of verb tense. Walking is to walk, swimming is to swim.