International Workshop on Big D ata A pplications and P rinciples Madrid By Ajit Jaokar

International Workshop on BigData Applications and Principles Madrid By Ajit Jaokar Sep 2014 @ajitjaokar ajit.jaokar@futuretext.com 0

Ajit Jaokar - www.opengardensblog.futuretext.com World Economic Forum Spoken at MWC(5 times), CEBIT, CTIA, Web 2.0, CNN, BBC, Oxford Uni, Uni St Gallen, European Parliament. @feynlabs – teaching kids Computer Science. Adivsory – Connected Liverpool 1

Ajit Jaokar - Machine Learning for IoT and Telecoms futuretext applies machine learning techniques to complex problems in the IoT (Internet of Things) and Telecoms domains. We aim to provide a distinct competitive advantage to our customers through application of machine learning techniques Philosophy: Think of NEST. NEST has no interface. It’s interface is based on ‘machine learning’ i.e. it learns and becomes better with use. This will be common with ALL products and will determine the competitive advantage of companies. Its a winner takes all game! Every product will have a ‘self learning’ interface/component and the product which learns best will win! 2

Ajit Jaokar - • IoT • Machine Learning • IoT and Machine Learning • Case studies and applications 3

Ajit Jaokar - www.futuretext.com @AjitJaokar ajit.jaokar@futuretext.com 4

Image source: Guardian Image source: Guardian 5

Ajit Jaokar IOT - THE INDUSTRY- STATE OF PLAY 7

Ajit Jaokar • State of play - 2014 • Our industry is exciting – but mature - Now a two horse race for devices with Samsung around 70% of Android • Spectrum allocations and ‘G’ cycles are predictable - 5G around 2020 • 50 billion connected devices by 2020 • ITU world Radio communications Conference, November 2015. • IOT has taken off .. not because of EU and Corp efforts – but because of Mobile, kickstarter, health apps and iBeacon and ofcourse NEST(acquired by Google) 8

Ajit Jaokar Stage One: Early innovation 1999 - 2007 Regulatory innovation – net neutrality - Device innovation (Nokia 7110 and Ericsson t68i) - Operator innovation (pricing, bundling, Enterprise) - Connectivity innovation (SMS, BBM) Content innovation (ringtones, games, EMS, MMS) - Ecosystem innovation (iPhone) Stage two: Ecosystem innovation - iPhone and Android (2007 – 2010) Social innovation - Platform innovation - Community innovation - Long tail innovation - Application innovation 9

Ajit Jaokar Phase three: Market consolidation – 2010 - 2013 And then there were two ... Platform innovation and consolidation Security innovation App innovation Phase four – three dimensions – 2014 .. Horizontal apps (iPhone and Android) Vertical (across the stack) – hardware, security, Data Network – 5G and pricing 10

Ajit Jaokar Many of the consumer IOT cases will happen with iBeacon in the next two years 11

Ajit Jaokar And 5G will provide the WAN connectivity 5G - Source – Ericsson 12

Ajit Jaokar Samsung Gear Fit named “Best Mobile Device” of Mobile World Congress Notification or Quantification? – Displays (LED, e-paper, Mirasol, OLED and LCD) - Touchscreen or hardware controls? - Battery life and charging 13

Ajit Jaokar Hotspot 2.0 14

Ajit Jaokar • Three parallel ecosystems • IOT is connecting things to the Internet – which is not the same as connecting things to the cellular network! • The difference is money .. and customers realise it • IOT local/personal (iBeacon, Kickstarter, Health apps) • M2M – Machine to Machine • IOT – pervasive(5G, Hotspot 2.0) • Perspectives • 2014 – 2015(radio conf) – 2020(5G, 2020) • 2014 – iBeacon (motivate retailers to open WiFi) • Hotspot 2.0 – connect cellular and wifi worlds • Default wifi and local world? • Operator world – (Big)Data, Corporate, pervasive apps – really happen beyond 2020 • So 5G will be timed well. The ecosystems will develop and they will be connected by 5G 15

Ajit Jaokar IOT – INTERNET OF THINGS 16

As the term Internet of Things implies (IOT) – IOT is about Smart objects For an object (say a chair) to be ‘smart’ it must have three things - An Identity (to be uniquely identifiable – via iPv6) - A communication mechanism(i.e. a radio) and - A set of sensors / actuators For example – the chair may have a pressure sensor indicating that it is occupied Now, if it is able to know who is sitting – it could co-relate more data by connecting to the person’s profile If it is in a cafe, whole new data sets can be co-related (about the venue, about who else is there etc) Thus, IOT is all about Data .. IoT != M2M (M2M is a subset of IoT) 17

Sensors lead to a LOT of Data (relative to mobile) .. (source David wood blog) • By 2020, we are expected to have 50 billion connected devices • To put in context: • The first commercial citywide cellular network was launched in Japan by NTT in 1979 • The milestone of 1 billion mobile phone connections was reached in 2002 • The2 billion mobile phone connections milestone was reached in 2005 • The3 billion mobile phone connections milestone was reached in 2007 • The4 billion mobile phone connections milestone was reached in February 2009. • Gartner:IoT will unearth more than $1.9 trillionin revenue before 2020; Cisco thinks there will be upwards of50 billion connected devicesby the same date; IDC estimates technology and services revenue will grow worldwide to$7.3 trillion by 2017(up from$4.8 trillion in 2012). 18

So, 50 billion by 2020 is a large number Smart cities can be seen as an application domain of IOT In 2008, for the first time in history, more than half of the world’s population will be living in towns and cities. By 2030 this number will swell to almost 5 billion, with urban growth concentrated in Africa and Asia with many mega-cities(10 million + inhabitants). By 2050, 70% of humanity will live in cities. That’s a profound change and will lead to a different management approach than what is possible today Also, economic wealth of a nation could be seen as – Energy + Entrepreneurship + Connectivity (sensor level + network level + application level) Hence, if IOT is seen as a part of a network, then it is a core component of GDP. 19

Ajit Jaokar Machine Learning 20

What is Machine Learning? Mitchell's Machine Learning Tom Mitchell in his book Machine Learning “The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.” formally: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Think of it as a design tool where we need to understand: What data to collect for the experience (E) What decisions the software needs to make (T) and How we will evaluate its results (P). A programmers perspective: Machine Learning involves: Training of a model from data Predicts/ Extrapolates a decision Against a performance measure. 21

What Problems Can Machine Learning Address? (source Jason Brownlee) • ● Spam Detection: • ● Credit Card Fraud Detection • Digit Recognition: • ● Speech Understanding: • ● Face Detection: • Product Recommendation: • ● Medical Diagnosis:● Stock Trading: • Customer Segmentation • Shape Detection • . 22

Types of Problems • Classification: Data is labelled meaning it is assigned a class, for example • spam/nonspam or fraud/nonfraud. The decision being modelled is to • assign labels to new unlabelled pieces of data. This can be thought of as a • discrimination problem, modelling the differences or similarities between groups. • Regression: Data is labelled with a real value rather than a label. Examples that are easy to understand are time series data like the price of a stock over time. The decision being modelled is the relationships between • inputs and outputs. • Clustering: Data is not labelled, but can be divided into groups based on • similarity and other measures of natural structure in the data. • An example from the above list would be organising pictures by faces without names, where the human user has to assign names to groups, like iPhoto on the Mac. • ●Rule Extraction: Data is used as the basis for the extraction of • propositional rules (antecedent/consequent or ifthen). • Often necessary to work backwards from a Problem to the algorithm and then work with Data. Hence, you need a depth of domain experience and also algorithm experience 23

What Algorithms Does Machine Learning Provide? • Regression • Instance-based Methods • Decision Tree Learning • Bayesian • Kernel Methods • Clustering methods • Association Rule Learning • Artificial Neural Networks • Deep Learning • Dimensionality Reduction • Ensemble Methods 24

An Algorithmic Perspective Marslandadopts the Mitchell definition of Machine Learning in his book Machine Learning: An Algorithmic Perspective. “One of the most interesting features of machine learning is that it lies on the boundary of several different academic disciplines, principally computer science, statistics, mathematics, and engineering (multidisciplinary). …machine learning is usually studied as part of artificial intelligence, which puts it firmly into computer science …understanding why these algorithms work requires a certain amount of statistical and mathematical sophistication that is often missing from computer science undergraduates.” 25

Definition of Machine Learning A onesentence definition is: “Machine Learning is the training of a model from data that generalizes a decision against a performance measure.” 1) Traininga model suggests training examples. 2) A model suggests state acquired through experience. 3) Generalizes a decision suggests the capability to make a decision based on inputs and anticipating unseen inputs in the future for which a decision will be required. 4)Finally, against a performance measure suggests a targeted need and directed quality to the model being prepared. 26

Key concepts Data Instance: A single row of data is called an instance. It is an observation from the domain. Feature: A single column of data is called a feature. It is an component of an observation and is also called an attribute of a data instance. Some features may be inputs to a model (the predictors) and others may be outputs or the features to be predicted. Data Type: Features have a data type. They may be real or integer valued or may have a categorical or ordinal value. You can have strings, dates, times, and more complex types, but typically they are reduced to real or categorical values when working with traditional machine learning methods. Datasets:A collection of instances is a dataset and when working with machine learning methods we typically need a few datasets for different purposes. Training Dataset: A dataset that we feed into our machine learning algorithm to train our model. Testing Dataset: A dataset that we use to validate the accuracy of our model but is not used to train the model. It may be called the validation dataset. 27

Learning Machine learning is indeed about automated learning with algorithms. In this section we will consider a few highlevel concepts about learning. Induction: Machine learning algorithms learn through a process called induction or inductive learning. Induction is a reasoning process that makes generalizations (a model) from specific information (training data). Generalization: Generalization is required because the model that is prepared by a machine learning algorithm needs to make predictions or decisions based on specific data instances that were not seen during training. OverLearning:When a model learns the training data too closely and does not generalize, this is called overlearning.result is poor performance on data other than the training dataset. This is also called overfitting. UnderLearning:When a model has not learned enough structure from the database because the learning process was terminated early, this is called underlearning.The result is good generalization but poor performance on all data, including the training dataset. This is also called underfitting. 28

Online Learning: Online learning is when a method is updated with dataOnline Learning: Online learning is when a method is updated with data instances from the domain as they become available. Online learning requires methods that are robust to noisy data but can produce models that are more in tune with the current state of the domain Offline Learning: Offline learning is when a method is created on preprepared data and is then used operationally on unobserved data. The training process can be controlled and can tuned carefully because the scope of the training data is known. The model is not updated after it has been prepared and performance may decrease if the domain changes. Supervised Learning:This is a learning process for generalizing on problems where a prediction is required.A"teaching process" compares predictions by the model to known answersand makes corrections in the model. Unsupervised Learning:This is a learning process for generalizing the structure in the data where no prediction is required. Natural structures are identified and exploited for relating instances to each other. 29

Source: Oracle 30

Recap machine learning with IoT Supervised Learning In supervised learning, a labeled training set (i.e., predefined inputs and known outputs) is used to build the system model. This model is used to represent the learned relation between the input, output and system parameters K-nearest neighbor (k-NN): This supervised learning algorithm classifies a data sample (called a query point) based on the labels (i.e., the output values) of the near data samples. For example, missing readings of a sensor node can be predicted using the average measurements of neighboring sensors within specific diameter limits. There are several functions to determine the nearest set of nodes. A simple method is to use the Euclidean distance between different sensors Decision tree (DT): It is a classification method for predicting labels of data by iterating the input data through a learning tree During this process, the feature properties are compared relative to decision conditions to reach a specific category. For example, DT provides a simple, but efficient method to identify link reliability in WSNs by identifying a few critical features such as loss rate, corruption rate, mean time to failure (MTTF) and mean time to restore (MTTR). 34

Neural networks (NNs): This learning algorithm could be constructed by cascading chains of decision units (e.g., perceptrons or radial basis functions) used to recognize non-linear and complex functions . In WSNs, using neural networks in distributed manners is still not so pervasive due to the high computational requirements for learning the network weights, as well as the high management overhead. However, in centralized solutions, neural networks can learn multiple outputs and decision boundaries at once which makes them suitable for solving several network challenges using the same model. Support vector machines (SVMs): It is a machine learning algorithm that learns to classify data points using labeled training samples . For example, one approach for detecting malicious behavior of a node is by using SVM to investigate temporal and spatial correlations of data. To illustrate, given WSN's observations as points in the feature space, SVM divides the space into parts. These parts are separated by as wide as possible margins (i.e., separation gaps), and new reading will be classified based on which side of the gaps they fall on as shown 35

Bayesian statistics: Unlike most machine learning algorithms, Bayesian inference requires a relatively small number of training samples One application of Bayesian inference in WSNs is assessing event consistency (θ) using incomplete data sets (D) by investigating prior knowledge about the environment. Unsupervised Learning Unsupervised learners are not provided with labels (i.e., there is no output vector). Basically, the goal of an unsupervised learning algorithm is to classify the sample set into different groups by investigating the similarity between them. this theme of learning algorithms is widely used in node clustering and data aggregation problems 36

K-means clustering: The k-means algorithm is used to recognize data into different classes (known as clusters). This unsupervised learning algorithm is widely used in sensor node clustering problem due to its linear complexity and simple implementation. The k-means steps to resolve such node clustering problem are (a) randomly choose k nodes to be the initial centroids for different clusters; (b) label each node with the closest centroid using a distance function; (c) re-compute the centroids using the current node memberships and (d) stop if the convergence condition is valid (e.g., a predefined threshold for the sum of distances between nodes and their perspective centroids), otherwise go back to step (b). 37

Principal component analysis (PCA): It is a multivariate method for data compression and dimensionality reduction that aims to extract important information from data and present it as a set of new orthogonal variables called principal components. For example, PCA reduces the amount of transmitted data among sensor nodes by finding a small set of uncorrelated linear combinations of original readings. Furthermore, the PCA method simplifies the problem solving by considering only few conditions in very large variable problems (i.e., tuning big data into tiny data representation) Reinforcement Learning : Reinforcement learning enables an agent (e.g., a sensor node) to learn by interacting with its environment. The agent will learn to take the best actions that maximize its long-term rewards by using its own experience. The most well-known reinforcement learning technique is Q-learning. an agent regularly updates its achieved rewards based on the taken action at a given state. 38

Ajit Jaokar IoT and Machine Learning 39

Basic idea of machine learningis to build a mathematical model based on training data(learning stage) – predict results for new data(prediction stage) and tweak the model based on new conditions • What type of model? Predicitive, Classification, Clustering, Decision Oriented, Associative • IoT and Machine Learning • On one hand - IoT creates a lot of contextual data which complements existing processes • On the other hand – the Sheer scale of IoT calls for unique solutions • Types of problems: • Apply existing Machine Learning algorithms to IoT data • Use IoT data to complement existing processes • Use the scale of IoT data to gain new insights • Consider some unique characteristics of IoT data (ex streaming) 40

IoT : from traditional computing to .. • Gone from making Smart things smarter(traditional computing) to • Making dumb things smarter .. and • living things more robust • 3 Domains: • Consumer, Enterprise, Public infrastructure • 1) Consumer– bio sensors(real time tracking), Quantified self – focussing on benefits • 2) Enterprise – Complex machinery (preventative maintenance), asset efficiency – reducing assets, increasing efficiency of existing assets. More from transactions to relationships(real time context awareness). • 3) Public infrastructure(Dynamically adjust traffic lights). Dis-economies of scale(bad things also scale in cities) – Thanks John Hagel III 41

Three key areas: • Move from exception handling to patterns of exceptions over time.(are some exceptions occurring repeatedly? Do I need to redsign my product, Is that a new product?) – • Move from optimization to disruption – ownership to rental ship (Where are all these dynamic assets?) • Move to self learning: Robotics: From assembly line to self learning robots(Boston Dynamics), autonomous helicopters • Four examples of differences: • Sensor fusion - Deep Learning - Real time - Streaming 42

Sensor fusion • Sensor fusionis the combining of sensory data or data derived from sensory data from disparate sources such that the resulting information is in some sense better than would be possible when these sources were used individually. The data sources for a fusion process are not specified to originate from identical sensors. Sensor fusion is a term that covers a number of methods and algorithms, including: Central Limit Theorem, Kalman filter, Bayesian networks, Dempster-Shafer • Example: http://www.camgian.com/http://www.egburt.com/ 43

Deep learning • Google's acquisition ofDeepMind Technologies • In 2011, Stanford computer science professor Andrew Ng founded Google’s Google Brain project, which created a neural network trained with deep learning algorithms, which famously proved capable ofrecognizing high level concepts, such as cats, after watching just YouTube videos--and without ever having been told what a “cat” is. • A smart-object recognition algorithm that doesn’t need humanshttp://www.kurzweilai.net/a-smart-object-recognition-algorithm-that-doesnt-need-humansA feature construction method for general object recognition (KirtLillywhite, Dah-JyeLee n, BeauTippetts, JamesArchibald) 44

Real time: Beyond ‘Hadoop’ (non hadoopable) the BDAS stack • BDAS Berkeley data analytics stack Spark – an open source, in-memory, cluster computing framework. Integrated with Hadoop(can work with files stored in HDFS) Written in Scala 45

Real time (Stream processing) 46

Spark – an open source, in-memory, cluster computing framework. Integrated with Hadoop(can work with files stored in HDFS) Written in Scala Spark comes with tools: Interactive query analysis (Shark), Graph processing and analysis (Bagel) and Real-time analysis (Spark Streaming). RDDs(Resilient Distributed Data sets): are the fundamental data objects used in Spark..RDDs are distributed objects that can be cached in-memory, across a cluster of compute nodes. Scales to 100s of nodes. Can achieve second scale latencies 48

Source: Tathagata Das (TD) UC Berkeley 49

International Workshop on Big D ata A pplications and P rinciples Madrid By Ajit Jaokar

International Workshop on Big D ata A pplications and P rinciples Madrid By Ajit Jaokar

Presentation Transcript

P RINCIPLES of STABLE ISOTOPE METEOROLOGY

Developing Web A pplications on WSS

D ata sovereignty

USPTO P atent D ata S ource and D ata E xtraction

BIG D ata and OFFICIAL S tatistics

d ata b ase of G enotype a nd P henotype

RAMADDA for Big C limate D ata

I ntroduction to SAS P rogramming and A pplications

P rinciples

D ata A ssimilation

MAVEN PFDPU P article and F ields D ata P rocessing U nit

MAVEN PFDPU P article and F ields D ata P rocessing U nit

D ata D irector

The Seven P rinciples of Government

Inferring Friendship over Mobile P hone D ata

ANU D ata Commons

Elasticsearch in Dashboard D ata Management A pplications

Schoolwide D ata Meetings

MI Measurement And D ata

S oftware D efined N etworking for Big D ata A pplications

Next Generation C limate D ata P roducts

D ata Voluems Comaprisson