The Unreasonable Effectiveness of Data - PowerPoint PPT Presentation

cadee
the unreasonable effectiveness of data n.
Skip this Video
Loading SlideShow in 5 Seconds..
The Unreasonable Effectiveness of Data PowerPoint Presentation
Download Presentation
The Unreasonable Effectiveness of Data

play fullscreen
1 / 13
Download Presentation
The Unreasonable Effectiveness of Data
165 Views
Download Presentation

The Unreasonable Effectiveness of Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig and Fernando Pereira Google 2011. 10. 24 Eun-Sol Kim

  2. The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. • Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences • Essentially, all models are wrong but some are useful • George Box

  3. Two approaches to AI • GOFAI ( Good Old-Fashioned Artificial Intelligence ) • Based on Logic • Symbolic AI • SML ( Statistical Machine Learning ) • Based on empirical data ( sensor data or databases ) • Inductive inference based on data, generalize data to rules, predict on future data

  4. Scene completion using millions of photographs - Hays et al., CMU, SIGGRAPH 2007

  5. The power of data

  6. Learning from Text at Web Scale • Brown Corpus • 1 Million English words • Complete sentences, no spelling errors, no grammatical errors • Google a trillion-word corpus • 100 time larger than Brown corpus • Frequency counts for all sequences up to 5 words long.

  7. Some lessons of web-scale learning 1. Use available large-scale data rather than annotated data • We can find useful semantic relationships automatically from the statistics of search queries and the corresponding results or from the accumulated evidence of web-based text patterns without annotated data.

  8. 2. Memorization is a good policy • Memorizing specific phrases is more effective than general patterns. • Machine translation example : Large memorized phrase tables that give candidate mappings between specific source- and target-language phrases. • For many tasks, words and word combinations provide all the representational machinery we need to learn from text.

  9. Conventional two approaches to NLP • Deep approach • Hand-coded grammars and ontologies • Complex networks of relations • Statistical approach • Learning n-gram statistics from large corpora

  10. New approaches to NLP • Combination of two conventional approaches • Statistical relational learning • Represent relations between objects with rule ( first-order-logic) • Model built by statistical learning

  11. Semantic interpretation • Semantic web • A convention for formal representation languages that lets software services interact with each other • Semantic interpretation • Imprecise, ambiguous natural languages. • Embodied in human cognitive and cultural processes whereby linguistic expression elicits expected responses and expected changes in cognitive states

  12. The challenges for achieving accurate semantic interpretation • Interpreting the content • methods to infer relationships between column headers or mentions of entities in the world. • Web-scale data might be an important part of the solution. • Hundreds of millions of independently created tables. • Tables represent structured data • With table, we can resolve semantic heterogeneity.

  13. Choose a representation That can use unsupervised learning On unlabeled data Which is so much more plentiful than labeled data.