1 / 21

“Big Data” The wrong name for a major issue?

“Big Data” The wrong name for a major issue?. Clive Longbottom, Service Director, Quocirca Ltd. “Big Data”. It’s not about databases per se It is about: Volume – but not just databases Velocity – results need to be produced in near real-time Variety – the aspect that is missed by many

kaida
Download Presentation

“Big Data” The wrong name for a major issue?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Big Data”The wrong name for a major issue? Clive Longbottom, Service Director, Quocirca Ltd

  2. “Big Data” • It’s not about databases per se • It is about: • Volume – but not just databases • Velocity – results need to be produced in near real-time • Variety – the aspect that is missed by many • Veracity – how good are the inputs • Value – is the data worth it?

  3. Which of the following statements most closely matches your understanding of the term “big data”?

  4. How well do you believe that you understand what tools are needed for “big data”?

  5. From your point of view, big data can be dealt with through:

  6. How important do you believe big data will be to your organisation over the next 2 years?

  7. A basic “rule of thumb” • 20 years ago: • Only 20% of an organisation’s information was in electronic form • 80% of this was in a formal database • Today: • Well over 80% of an organisation’s information is in electronic form • Less than 20% is in a formal database

  8. The enterprise application dilemma CRM ERP SCM Inf. Silo Inf. Silo Inf. Silo

  9. The growth of unstructured • Not just text – but images, video media assets, VoIP, Videoconferencing • Replicated/archived data a large part of growth • But – is it completely unstructured? Source: Ram Subramanyam Gopalan

  10. File formatting • XML (or quasi-XML) • CSV/tab delimited • Text blocks • Meta data • TCP/IP packet header information • Pattern recognition • Colour, shape, texture (CST) • Inferred data

  11. The open “value chain” “Open” information from e.g. search engines, social networks Information flows Customer’s customer Your Organisation Supplier’s supplier Customer Supplier

  12. Organisation information sources • Organisation data: • Enterprise application data • Office documents • Reports, analytics • GRC information • Information on competitors • Financial performance data • Images, voice, video… • …

  13. Supplier information sources • Supplier data • Logistics data • Inventory data • Transactional data • Competitive information • Credit and background checks • Invoices, catalogues, contracts, images… • Voice, video… • …

  14. Customer information sources • Customer data: • Orders, payment details, returns information • Past purchases • Credit and background checks • Searches, web analytics • Social media comments • …

  15. Information issues • You no longer have control • The open value chain removes direct control • Security of information assets is critical • Identifying and aggregating information assets • Capturing information when and where possible – and legal • Bringing structured and unstructured together • Sifting through the dross to get to the “golden nuggets”

  16. Shrink and filter… • Information under your control: • Deduplicate • Taxonomise • Index • Tag • Information not under your control: • Filter (intelligently) • Tag and index when it crosses your boundaries

  17. Federate and aggregate • Link databases • Use master data management • Bring in unstructured data • Use Hadoop along with NoSQL datastores (e.g. Cassandra, MongoDB) • Use cross-function search and reporting tools • E.g. HP Autonomy, CommVault Simpana • Use analytics to present results in meaningful ways

  18. Basic schematic approach Filter Apply metadata MapReduce App SQL NoSQL Search, analyse and report

  19. A future glimpse? • It’s déjà vu all over again • Remember in-memory databases? • Big data cannot remain as a jigsaw solution • Full-service solutions will come forward • Who will be the winners? • Oracle, IBM, Microsoft? • SAP? • EMC, Symantec? • The Open Source environment (e.g. 10Gen, Apache/Cassandra, CouchDB)?

  20. Conclusions • Big Data has many vectors • Volume, velocity, variety and veracity: each is as important as the others - value will accrue through getting them right • More information is outside the realm of your direct control • Capturing what can be captured in a useful manner is key • The evolution of the market is rapid • NoSQL and Hadoop provide the underpinnings for a new, information centric approach • The formal database is not dead • But it is only on aspect of the problem – and the solution

  21. Thank you Contact details: Clive.Longbottom@Quocirca.com Further reading: http://quocirca.com/reports/150 http://quocirca.com/articles/617 http://quocirca.com/articles/637

More Related