1 / 13

The Unreasonable Effectiveness of Data

The Unreasonable Effectiveness of Data. Alon Halevy, Peter Norvig and Fernando Pereira Google 2011. 10. 24 Eun -Sol Kim. The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.

cadee
Download Presentation

The Unreasonable Effectiveness of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig and Fernando Pereira Google 2011. 10. 24 Eun-Sol Kim

  2. The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. • Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences • Essentially, all models are wrong but some are useful • George Box

  3. Two approaches to AI • GOFAI ( Good Old-Fashioned Artificial Intelligence ) • Based on Logic • Symbolic AI • SML ( Statistical Machine Learning ) • Based on empirical data ( sensor data or databases ) • Inductive inference based on data, generalize data to rules, predict on future data

  4. Scene completion using millions of photographs - Hays et al., CMU, SIGGRAPH 2007

  5. The power of data

  6. Learning from Text at Web Scale • Brown Corpus • 1 Million English words • Complete sentences, no spelling errors, no grammatical errors • Google a trillion-word corpus • 100 time larger than Brown corpus • Frequency counts for all sequences up to 5 words long.

  7. Some lessons of web-scale learning 1. Use available large-scale data rather than annotated data • We can find useful semantic relationships automatically from the statistics of search queries and the corresponding results or from the accumulated evidence of web-based text patterns without annotated data.

  8. 2. Memorization is a good policy • Memorizing specific phrases is more effective than general patterns. • Machine translation example : Large memorized phrase tables that give candidate mappings between specific source- and target-language phrases. • For many tasks, words and word combinations provide all the representational machinery we need to learn from text.

  9. Conventional two approaches to NLP • Deep approach • Hand-coded grammars and ontologies • Complex networks of relations • Statistical approach • Learning n-gram statistics from large corpora

  10. New approaches to NLP • Combination of two conventional approaches • Statistical relational learning • Represent relations between objects with rule ( first-order-logic) • Model built by statistical learning

  11. Semantic interpretation • Semantic web • A convention for formal representation languages that lets software services interact with each other • Semantic interpretation • Imprecise, ambiguous natural languages. • Embodied in human cognitive and cultural processes whereby linguistic expression elicits expected responses and expected changes in cognitive states

  12. The challenges for achieving accurate semantic interpretation • Interpreting the content • methods to infer relationships between column headers or mentions of entities in the world. • Web-scale data might be an important part of the solution. • Hundreds of millions of independently created tables. • Tables represent structured data • With table, we can resolve semantic heterogeneity.

  13. Choose a representation That can use unsupervised learning On unlabeled data Which is so much more plentiful than labeled data.

More Related