aiacademy: 自然語言處理 NLP 2. Word Embeddings
Tags: aiacademy, nlp, word-embeddings, Word2Vec
Word Embeddings - Word2Vec
Word2Vec - Skip-Gram Model
Word2Vec Skip-Gram Illustration
Hidden Layer Matrix –> Word Embedding Matrix
Weight Matrix Relation
Word2Vec Skip-Gram Illustration
Word Embeddings - Word2Vec Training
Word2Vec Skip-Gram Illustration
Loss Function
SGD Update for W’
SGD Update
Negative Sampling
Hierarchical Softmax
Negative Sampling
Word2Vec Variants
Word2Vec Skip-Gram Visualization
Word2Vec Variants
Word2Vec CBOW
Word2Vec LM
Word Embeddings - GloVe
Comparision
GloVe
Glove - Weighted Least Squares Regression Model
text mining 09 word2Vec
gensim
- 關鍵字像量化的網站: gensim
gensim code
import pickle
from gensim.models import word2vec
import random
import logging
import os
## turn back to main directory
os.chdir("../")
os.getcwd()
logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
logging.root.setLevel(level=logging.INFO)
## load 'article_cutted'
with open('article_cutted', 'rb') as file:
data = pickle.load(file)
# build word2vec
# sg=0 CBOW ; sg=1 skip-gram
model = word2vec.Word2Vec(size=256, min_count=5, window=5, sg=0)
# build vocabulary
model.build_vocab(data)
# train word2vec model ; shuffle data every epoch
for i in range(20):
random.shuffle(data)
model.train(data, total_examples=len(data), epochs=1)
## print an example
model.wv['人工智慧']
## save model
model.save('word2vec_model/CBOW')