1 / 22

Energy I ssues in Data Analytics

Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it. Energy I ssues in Data Analytics. Motivations for Taking Care of Data. Data is everywhere (Big, complex, real-time, unstructured)

lixue
Download Presentation

Energy I ssues in Data Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it Energy Issues in Data Analytics

  2. Motivations for Taking Care of Data • Data is everywhere (Big, complex, real-time, unstructured) • Putting data at the center of research work on energy issues may bring some benefits. (Today the focus is on algorithms). • Cost metrics of data management techniques (communication, storing, access, query, analysis) will help professionals and users to save energy in data-intensive apps. • Energy-scalable data management is important for sustainable data science.

  3. Data Availability or Data Deluge? • Every life process today is data intensive. • The information stored in digital data archives is enormous and its size is still growing very rapidly.

  4. Data Availability or Data Deluge? • Some decades ago the main problem was the shortage of information, now the challenge is • thevery large volume of information to deal with and • theassociated complexityto process it and to extract significant and useful parts or summaries.

  5. ComplexBigProblems… • Bigger and more complex problems must be solved by using large-scale distributed computing systems. • DATA SOURCES are larger and larger and ubiquitous (Web, sensor networks, mobile devices, telescopes, …).

  6. …andBigData • Evenwhere accessible, much data in many fields cannot be read by humans so • The hugeamount of data availabletodayrequiressmart data analysystechniquesto aidpeople to deal with it and • Scalablealgorithms, techniques, andsystems are needed (time and energyscalability).

  7. Data: From Storing to Analysis • Storing data isnot the onlymainproblem. • A keyissueisanalyse, mine, and process data for makingituseful. Source: The Economist

  8. Towards Models for Energy-aware Data Management • The main focus today is on energy-aware algorithms, tasks, applications. • The other side of the coin is dataand costs of operating on it. • Abstract energy-cost models for exchanging, accessing and transform data are primary elements for energy-aware data managementat large scale. • They are useful for sustainabledata science.

  9. An Example:Energy-aware Mining of Data • We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices. • Our interest was mainly on how the same technique consumes energy when dimension of data change. • Tests with different • Data set dimensions, • Attribute number, • Class number.

  10. Data Mining Techniques • Energy characterization of data mining techniques running on mobile devices • k-means (data clustering) • J48 (data classification) • Apriori (association rules) • Common performance parameters • Number of instances (data set size) • Number of attributes • Algorithm-specific performance parameters • k-means: number of clusters • J48: decision tree size • Apriori: Number of rules, minimum support and minimum confidence

  11. k-means (1) • Increasing the number of instances,withdifferentproduced clusters

  12. k-means (2) • Increasing the number of attributes with differentproduced clusters

  13. Apriori (1) • Increasing the number of instances with differentnumber of attributes

  14. Apriori (2) • Increasingthe data set size with differentnumber of rules

  15. Apriori (3) • Increasing the data set size with different minimum confidence

  16. J48 • Increasing the number of instances with differentnumber of attributes

  17. Results on different devices • Resultsobtained with differentsmartphones • Sony XperiaP: 1 GHz Dual CoreARMprocessor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM

  18. Results on different devices • Results obtained with different smart phones • Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM

  19. Results on different devices • Results obtained with different smart phones • Sony Xperia P: 1 GHz Dual Core ARM processor and 1 GB RAM • HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM • Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM

  20. Concluding Remarks • Data-intensive applications demands for energy cost models based on data characteristics. • This should be done for sensors, smart phones, HPC servers, and clouds. In general, for large scale computing systems. • Sustainible data center services and applications may benefit from these models. • Preliminary experiments show useful data.

  21. Data Sets • Census (http://archive.ics.uci.edu/ml/datasets/Census+Income) • Used with K-means • Data set size: 14 MB • Number of instances: 244348 • Number of attributes: 11 • Census_disc (http://archive.ics.uci.edu/ml/datasets/Census+Income) • Used with Apriori • Data set size: 19 MB • Number of instances: 333011 • Number of attributes: 11 • Covertype (http://archive.ics.uci.edu/ml/datasets/Covertype) • Used with J48 • Data set size: 14.5 MB • Number of instances: 114556 • Number of attributes: 55

More Related