The Unreasonable Effectiveness of Data

The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig and Fernando Pereira Google 2011. 10. 24 Eun-Sol Kim

The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. • Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences • Essentially, all models are wrong but some are useful • George Box

Two approaches to AI • GOFAI ( Good Old-Fashioned Artificial Intelligence ) • Based on Logic • Symbolic AI • SML ( Statistical Machine Learning ) • Based on empirical data ( sensor data or databases ) • Inductive inference based on data, generalize data to rules, predict on future data

Scene completion using millions of photographs - Hays et al., CMU, SIGGRAPH 2007

The power of data

Learning from Text at Web Scale • Brown Corpus • 1 Million English words • Complete sentences, no spelling errors, no grammatical errors • Google a trillion-word corpus • 100 time larger than Brown corpus • Frequency counts for all sequences up to 5 words long.

Some lessons of web-scale learning 1. Use available large-scale data rather than annotated data • We can find useful semantic relationships automatically from the statistics of search queries and the corresponding results or from the accumulated evidence of web-based text patterns without annotated data.

2. Memorization is a good policy • Memorizing specific phrases is more effective than general patterns. • Machine translation example : Large memorized phrase tables that give candidate mappings between specific source- and target-language phrases. • For many tasks, words and word combinations provide all the representational machinery we need to learn from text.

Conventional two approaches to NLP • Deep approach • Hand-coded grammars and ontologies • Complex networks of relations • Statistical approach • Learning n-gram statistics from large corpora

New approaches to NLP • Combination of two conventional approaches • Statistical relational learning • Represent relations between objects with rule ( first-order-logic) • Model built by statistical learning

Semantic interpretation • Semantic web • A convention for formal representation languages that lets software services interact with each other • Semantic interpretation • Imprecise, ambiguous natural languages. • Embodied in human cognitive and cultural processes whereby linguistic expression elicits expected responses and expected changes in cognitive states

The challenges for achieving accurate semantic interpretation • Interpreting the content • methods to infer relationships between column headers or mentions of entities in the world. • Web-scale data might be an important part of the solution. • Hundreds of millions of independently created tables. • Tables represent structured data • With table, we can resolve semantic heterogeneity.

Choose a representation That can use unsupervised learning On unlabeled data Which is so much more plentiful than labeled data.

The Unreasonable Effectiveness of Data

The Unreasonable Effectiveness of Data

Presentation Transcript

The Effectiveness of Automatic Stabilizers

The Effectiveness of Psychotherapy

Measuring the effectiveness of the workforce

“SUBJECTIVISM: AN UNREASONABLE FACSIMILE”

Evaluating the Effectiveness of the Organization

Assessing the Effectiveness of K.P.M.s

The Effectiveness of Nutritional Supplements

Using Big Data To Improve the Effectiveness of Lifecycle Campaigns

Creating or Improving the Effectiveness of Data Teams

The Unreasonable Effectiveness of Mathematics

THE UNREASONABLE USEFULNESS OF PRIME NUMBERS

The Effectiveness of Achievers Weekend

The Unreasonable Effectiveness of Data

The effectiveness of cost-effectiveness of analysis: Stories of two practitioners

The Effectiveness of Competition Policy

Measuring the Effectiveness of

Empirical Validation of the Effectiveness of Chemical Descriptors in Data Mining

The effectiveness of USMEF's programs

Using Data to Evaluate the Effectiveness of Professional Development

Identify The Errors In Unreasonable Results

The Effectiveness of Divorce Mediation