Contextualized Word Embeddings ELMo
ELMo: Embeddings from Language Models
- Ideas: contextjalized word representations
- Learn wrod vectors using long contexts instead of a context window
- Learn a deep Bi-NLM and use all its layers in prediction
ElMo - Bidirectionsl LM
ELmo Illustration
ELMo on Name Entitity Recognition
ELMo Resluts
- Machine Comprehension
- Textual Entailment
- 給你兩個句子,問你兩個句子的關係是: 因果關係、平行意思
- Semantic Role Labeling
- 去學在句子裡面,字和字的關係,EX: 主詞、動詞、介係詞
- 不是做 POS
- Coreference Resolution
- 看我 Coreference 指示代名詞,是看原本哪個字
- Name Entity Recognition
- Sentiment Analysis
- 判斷一個句子,是 positive or negative
ELMo Analysis
Contextualized Word Embeddings BERT
- Idea: contextualized word representations
- Learn word vectors using long contexts using Transformer instead of LSTM
BERT#1 - Masked Language Model
BERT#2 - Next Sentence Prediction
- Idea: modeling relationship between sentences
- QA, NLI etc. are based on understanding inter-sentence relationship
- Input embeddings contain
- Word-level token ebeddings
- Sentence-level segment embeddings
- Postion embeddings
BERT Training
- Training data: Wikipedia + BookCorpus
- 2 BERT models
- BERT-Base: 12-layer, 768-hidden, 12-head
- BERT-Large: 24-layer, 1024-hidden, 16-head
BERT Fine-Tuning for Understanding Tasks
- Idea: simply learn a classifier/tagger buit on the top layer for each target task
BERT Overview
BERT Fine-Tuning Results
BERT results on NER
BERT Results with Differernt Model Sizes
- Improving performance by increasing model size
BERT Contextual Embeddings Results on NER
ERNIE: Enhanced Representation through kNowledge IntEgration