less than 1 minute read

Tags: , , ,

Word Embeddings - Word2Vec

Word2Vec - Skip-Gram Model

Imgur

Word2Vec Skip-Gram Illustration

Imgur

Hidden Layer Matrix –> Word Embedding Matrix

Imgur

Weight Matrix Relation

Imgur

Imgur

Word2Vec Skip-Gram Illustration

Imgur

Word Embeddings - Word2Vec Training

Word2Vec Skip-Gram Illustration

Imgur

Loss Function

Imgur

SGD Update for W’

Imgur

Imgur

SGD Update

Imgur

Negative Sampling

Hierarchical Softmax

Imgur

Negative Sampling

Imgur

Imgur

Word2Vec Variants

Word2Vec Skip-Gram Visualization

Word2Vec Variants

Imgur

Word2Vec CBOW

Imgur

Word2Vec LM

Imgur

Word Embeddings - GloVe

Comparision

Imgur

GloVe

Imgur

Imgur

Imgur

Glove - Weighted Least Squares Regression Model

Imgur

text mining 09 word2Vec

Imgur

Imgur

Imgur

Imgur

Imgur

Imgur

gensim

  • 關鍵字像量化的網站: gensim

Imgur

Imgur

gensim code

import pickle
from gensim.models import word2vec
import random
import logging
import os

## turn back to main directory
os.chdir("../")
os.getcwd()
logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
logging.root.setLevel(level=logging.INFO)
## load 'article_cutted'
with open('article_cutted', 'rb') as file:
    data = pickle.load(file)
# build word2vec
# sg=0 CBOW ; sg=1 skip-gram
model = word2vec.Word2Vec(size=256, min_count=5, window=5, sg=0)
# build vocabulary
model.build_vocab(data)
# train word2vec model ; shuffle data every epoch
for i in range(20):
    random.shuffle(data)
    model.train(data, total_examples=len(data), epochs=1)
## print an example
model.wv['人工智慧']
## save model
model.save('word2vec_model/CBOW')