1 / 33

IS6600-10

IS6600-10. Big Data, Intelligence & Surveillance. Hype, Reality or …?. Purpose. The purpose of this class is to introduce the concept of Big Data, examine its potential and value for organisations and governments, as well as the downside effects on privacy

tamas
Download Presentation

IS6600-10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IS6600-10 Big Data, Intelligence & Surveillance

  2. Hype, Reality or …?

  3. Purpose • The purpose of this class is to introduce the concept of Big Data, examine its potential and value for organisations and governments, as well as the downside effects on privacy • I also hope to stimulate your own thinking about Big Data – and how it affects you

  4. Basics • Big Data refers to the vast quantities of data that businesses and governments gather • This data is believed to contain useful, actionable intelligence that could lead to • Process efficiencies • Lower costs, • Higher profits, • Identification of terrorism threats/plans • What is needed is the will and expertise to perform the relevant analysis.

  5. How Big is Big? • It depends on how quickly you can access and process data (with normal database management tools) • For a small company, hundreds of gigabytes could be big. For a larger company, hundreds of terabytes • 1 terabyte = 1000 gigabytes • 1 petabyte = 1000 terabytes • 1 exabyte = 1000 petabytes • Zettabyte, Yottabyte

  6. Size Contexts • Some areas of science generate huge amounts of data: • Meteorology (weather forecasting) & Remote Sensing • Genomics (genome sequencing) • Physics, e.g. CERN • 150 million sensors each deliver data 40 million times per second • Working with only 0.001% of the data collected, still 25 petabytes a year is collected • If all data was used, it would be 500 exabytes a day – 200 times more than all other global data sources combined • Social data, RFID data, • Surveillance – NSA & GCHQ

  7. The History • Big Data is not a new topic • Data has been getting bigger continually ever since the first byte was created • It is related to storage capacity and processing power – which also keep growing continually • Over the last 25 years, many governments have attempted to consolidate data holdings into single databases controlled by single parties • National ID Schemes • National Health Records Management

  8. Corporate Examples • Amazon handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. • Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data • Facebook handles 50 billion photos. • TaoBao & Alibaba – again, billions of transactions • Consumer profile databases, Loyalty Cards, Octopus

  9. Ford • http://www.datanami.com/datanami/2013-03-16/how_ford_is_putting_hadoop_pedal_to_the_metal.html • Ford’s modern hybrid Fusion model generates up to 25 gigabytes of data per hour • Data that is a potential goldmine for Ford, as long as it can find the right analytical tools for the job. • The data can be used to • understand driving behaviors and reduce accidents, • understand wear and tear • identify issues that lower maintenance costs, • avoid collisions • But who should own the data? Ford? The car owner?

  10. Needles & Haystacks • The volume of data is huge, beyond imagination, and the consultants and software firms want us to believe that somewhere, if you can find them, there may be some needles – pieces of actionable intelligence

  11. Who is Pushing Big Data? • IBM! • Because they want to sell you their software that (they claim) will help you to analyse the data and find the needles • Consultants stand to make millions, by panicking their clients into spending on software solutions • Globally, this is a US$100 billion industry, growing 10% a year

  12. Is Everyone Happy? • The consultants suggest not. Accenture: • 22% of companies are very satisfied • 35% are quite satisfied • 34% are dissatisfied • 39% say that they have data that is relevant to their business strategy • Big data can be useful – if you know what to look for and how to get that ‘intelligence’ to the people who can use it

  13. Consultant Perspectives • Companies have lots of data, but “most organisations measure too many things that don’t matter and don’t put sufficient focus onto the things that do” (Accenture). • “Companies are buried in information” and are struggling to use it (McKinsey) • The more data they have, the less they seem to know!

  14. Then What Should the Companies Do? • Spend more money (say the consultants) • “a large investment in new data capabilities” • McKinsey • “embed analytics into business processes” • Accenture • Alternatively • Go and ask people what they think is happening! • Ask your lost customers why they got lost! • A survey or big data analytics won’t tell you why.

  15. Gartner’s Hype Cycle

  16. Big Data and Intelligence • One of the highest impact news stories since June 2013 has concerned the secret surveillance activities of the NSA and GCHQ agencies – as revealed by Edward Snowden • These surveillance activities are fundamentally about big data and analytics, just as they are also about privacy and security, espionage and politics

  17. Key Terms • NSA – National Security Agency (US) • GCHQ – General Communications Headquarters (UK) • Prism, Tempora, Xkeyscore, Bullrun, • Systems that store, retrieve and analyze the data • The Guardian • UK newspaper that first published the stories • Patriot Act • US Act for Homeland Security post 11-9-11 http://en.wikipedia.org/wiki/Patriot_Act

  18. Government’s Perspective • Looking for needles in the metadata • Phone numbers, call duration & frequency • Global patterns that may involve terrorism • If a bombing in India can be matched to a sudden increase of calls in another country, that might be of interest • To be effective, they need as much data as possible – in short, everything.

  19. The Surveillance Picture • Edward Snowden has leaked a LOT of information • The stories are still coming. We have learned a LOT about what governments do – with their own citizens’ data, and with data from other countries • You may recall stories about data being captured in Hong Kong and China from the Chinese University and Tsinghua University Internet hubs • http://www.reuters.com/article/2013/06/24/us-usa-security-tsinghua-idUSBRE95N0M220130624 • This is a series of events of global proportion • We should not be surprised at anything any more • If they want to collect it, anything, then they can and will.

  20. Selected Events • Publication of a top-secret court order against Verizon mandating it to hand over the call records of all its customers • http://www.theguardian.com/world/2013/jul/19/nsa-extended-verizon-trawl-through-court-order • Orders for all other telecoms firms also existed • Large-scale collection of data without individual warrants • Prism • http://en.wikipedia.org/wiki/PRISM_(surveillance_program)

  21. Prism • A system that gives the NSA access to the personal information of non-US people from US Internet companies • Apple, Facebook, Google, Microsoft, Skype, Yahoo,… • These companies always claimed that they protected individual privacy, but … it seems that this was not the case • However, they were legally required to say nothing – the court orders prohibited them saying anything about their data sharing with the NSA • Data obtained by cable tapping • Metadata & content from 4 US telecoms providers’ cables

  22. Facebook • During Jan-June 2013, governments requested info on 38,000 Facebook users • 11,000 + from the US (79% compliance) • 4000+ from India (50% compliance) • 170 from Turkey (47% compliance) • 11 from Egypt (0% compliance) • http://www.theguardian.com/technology/2013/aug/27/facebook-government-user-requests

  23. XKeyscore • This is the data retrieval system used to collect, process and search the data • http://en.wikipedia.org/wiki/XKeyscore • It allows an NSA analyst to query “nearly everything a typical user does on the Internet” in near-real time, including: • Email content • Websites visited and searches • Metadata • In theory these systems were designed to analyse data about foreigners, but many Americans were also included in the databases

  24. GCHQ • This is the UK’s government department that deals with Telecommunications Signals & Intelligence • http://www.gchq.gov.uk • http://en.wikipedia.org/wiki/Government_Communications_Headquarters • Access to Prism since 2010 • Operates Tempora, similar to Prism, for collecting data from the Internet and Telecomms.

  25. GCHQ • In 2009, GCHQ spied on foreign politicians visiting the UK for a G20 summit • Eavesdropping phonecalls, emails • Monitoring computers • Installing keyloggers and then tracking activities post-summit • Turkish Finance Minister (Simsek) • Russian leader (Medvedev) • Purpose – Economic/Political Intelligence

  26. Tempora • Much of the data is harvested from Internet cables that enter the UK (GBs-TBs per second) • 300 GCHQ and 250 NSA analysts are involved • Telephone calls, Email messages, Facebook entries, Personal Internet history, IM chats, pwds, • Cooperation with private telecoms companies • Data held for 3 days, metadata for 30 • http://en.wikipedia.org/wiki/Tempora • http://www.theguardian.com/uk/2013/jun/21/gchq-cables-secret-world-communications-nsa

  27. Bullrun • NSA and GCHQ spend millions developing programmes that can break Internet security (cryptography) protocols like https, ssl, etc. • They also work directly with the telecom providers to ensure that they have backdoors that help them to access data that clients think is private/secret • There are no Secrets! • http://www.theguardian.com/world/2013/sep/05/nsa-gchq-encryption-codes-security

  28. Collusion or Legal Obligation? • One defence offered by the private companies that hold the data (whether in databases or as ISPs) is that they are required to obey the law of the countries in which they operate • They have no choice – they must hand over the data, or cooperate with the security agencies • Also, they cannot reveal that they are cooperating – they are gagged from revealing the existence of the Prism/Tempora/Bullrun systems

  29. Payouts • GCHQ and NSA are working with each other, sharing each other’s data • NSA subsidizes GCHQ’s costs @ GBP millions annually • http://www.theguardian.com/uk-news/2013/aug/01/nsa-paid-gchq-spying-edward-snowden • NSA benefits by GCHQ operating under less strict operating & oversight rules • NSA expects returns… reports, intelligence.

  30. Problems • Big data is HUGE – there is simply too much data to collect and analyse • GCHQ may collect up to 20% of the actual data flow • Big data is getting bigger • Cables that carry hundreds of GB/second make that task harder still • As always, 99.999% of the data is not useful. • Can you find the 0.001% that might be?

  31. Reactions • There have been attempts to stop media organizations from reporting on the surveillance programmes • Computers owned by the Guardian newspaper were physically destroyed in an attempt to remove the data & prevent further publication • Additional copies are held in Brazil and the US • http://www.wired.com/threatlevel/2013/08/guardian-snowden-files-destroyed/

  32. Implications for Individuals • Is your data being harvested? • It seems likely. • Are your private communications, including online purchases, secure? • Not very. • Are you protected by data privacy laws? • Not against governments. • Perhaps against private companies. • http://www.pcpd.org.hk/

  33. Questions • What kind of data is being collected? • Where, By Who, For What Purposes??? • Can we see/find (some of) the data anywhere? • Are you personally at risk? • That depends on who you are, what you do, who you talk to and what about. • Should we be concerned? • Is there anything we can do as individuals, as decision makers, as companies? • http://www.theguardian.com/world/2013/sep/05/nsa-how-to-remain-secure-surveillance • Or is it more sensible just to get on with our lives? • Do some Internet research now and try to answer some of these questions.

More Related