Outline

Outline • Name tagging • LSTM-CRF model • Overview • Input features • Sentence encoder (feature extraction) • Label prediction • Ying Lin • yinglin8@illinois.edu • Room 1115, Siebel

Name Tagging • Goal: identify and classify name mentions in unstructured text into pre-defined categories; e.g., person, organization, location, geo-political entity (GPE). • Tag scheme: • BIO: • actress/O Elizabeth/B-PER Arnold/I-PER Hopkins/I-PER Poe/I-PER and/O • in/O Boston/B-GPE on/O • BIOES: • actress/O Elizabeth/B-PER Arnold/I-PER Hopkins/I-PER Poe/E-PER and/O • in/O Boston/S-GPE on/O He was born Edgar Poe in Boston on January 19, 1809, the second child of English-born actress Elizabeth Arnold Hopkins Poeand actor David Poe Jr. PERSON GPE PERSON PERSON

Name Tagging Models • Hidden Markov Models (HMM) • Support Vector Machines (SVM) • Conditional Random Fields (CRF) • Decision trees • Bidirectional LSTM-CRF • LSTM: https://colah.github.io/posts/2015-08-Understanding-LSTMs/ • CRF: Lafferty, John, Andrew McCallum, and Fernando CN Pereira. "Conditional random fields: Probabilistic models for segmenting and labeling sequence data." (2001).

LSTM-CRF Model CRF E-PER Linear Bi-LSTM Input Sentence Features (Chiu and Nichols, 2016)

LSTM-CRF Model E-PER Word Embedding: Word2vec GloVe FastText ELMo Bert … Corpora: Wikipedia WMT Tagger Input Sentence Character-level Representation Each token in the given sentence is represented as the combination of its word embedding and character features. Word embedding Features Character-level Convolutional Network (Chiu and Nichols, 2016)

LSTM-CRF Model E-PER Tagger Input Sentence (Ma and Hovy, 2016) Character-level Representation Each token in the given sentence is represented as the combination of its word embedding and character features. Word embedding Features Character-level Convolutional Network (Chiu and Nichols, 2016)

LSTM-CRF Model E-PER Tagger Input Sentence (Lample et al., 2016) Character-level Representation Each token in the given sentence is represented as the combination of its word embedding and character features. Word embedding Features Character-level Convolutional Network (Chiu and Nichols, 2016)

LSTM-CRF Model (Liu et al., 2017)

LSTM-CRF Model E-PER Tagger The bidirectional LSTM (Long-short Term Memory, an RNN variant) processes the input sentence from both directions, encoding each token and its context into a vector (hidden states). Bidirectional LSTM Input Sentence Features (Chiu and Nichols, 2016)

LSTM-CRF Model E-PER Linear Layer The linear layer projects hidden states to the label space. Tagger 0.1 B-ORG 0.2 I-ORG 0.1 E-ORG Input Sentence 2.5 B-PER 0.1 O … … Features (Chiu and Nichols, 2016)

LSTM-CRF Model Softmax E-PER Linear Layer The linear layer projects hidden states to the label space. Tagger B-PER S-PER Input Sentence Edgar Poe Features (Chiu and Nichols, 2016)

LSTM-CRF Model CRF The CRF (Conditional Random Fields) layer models the dependencies between labels. E-PER Tagger Input Sentence Features ✓ B-PER  I-PER ✗ B-ORG  B-ORG (Chiu and Nichols, 2016)

LSTM-CRF Model Partial CRF (Yang et al., 2018) No linear layer (Ma and Hovy, 2016) Multiple linear layers Self attention E-PER Tagger Transformer Gated recurrent unit (GRU) (Yang et al., 2017) Input Sentence Character-level LSTM Contextualized embedding (e.g., Bert, ELMo) Hand-crafted features Other feature composition methods Features

LSTM-CRF Model • Chiu, Jason PC, and Eric Nichols. "Named entity recognition with bidirectional LSTM-CNNs." Transactions of the Association for Computational Linguistics 4 (2016): 357-370. • Lample, Guillaume, et al. "Neural Architectures for Named Entity Recognition." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. • Ma, Xuezhe, and Eduard Hovy. "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. • Yang, Zhilin, Ruslan Salakhutdinov, and William W. Cohen. "Transfer learning for sequence tagging with hierarchical recurrent networks." arXiv preprint arXiv:1703.06345 (2017). • Liu, Liyuan, et al. "Empower sequence labeling with task-aware neural language model." Thirty-Second AAAI Conference on Artificial Intelligence. 2018. • Yang, Yaosheng, et al. "Distantly supervised ner with partial annotation learning and reinforcement learning." Proceedings of the 27th International Conference on Computational Linguistics. 2018. • Character features: Kim, Yoon, et al. "Character-aware neural language models." Thirtieth AAAI Conference on Artificial Intelligence. 2016.

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: