1 / 23

Big Data

Explore the advantages of big data in terms of volume, velocity, variety, and veracity, and how it has transformed analytics. Discover the potential of non-relational databases for managing big data effectively.

Download Presentation

Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Minder Chen, Ph.D. Professor of MIS CSU Channel Islands minder.chen@csuci.edu

  2. Benefits of Big Data http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg

  3. Data Management https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf https://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/

  4. With Big Data, We’ve Moved into a New Era of Analytics 12+ 5+ million terabytes of Tweets create daily. trade eventsper second. Volume Velocity Variety Veracity 100’s Only 1 in 3 of different types of data. decision makers trust their information. Ömer Sever (omers@tr.ibm.com) IBM SWG TR, Enterprise Content Management

  5. Four Characteristics of Big Data Cost efficiently processing the growing Volume Responding to the increasing Velocity Collectively Analyzing the broadening Variety 30 Billion RFID sensors and counting 50x 35 ZB 80% of the worlds data is unstructured 2010 2020 Establishing the Veracityof big data sources 1 in 3 business leaders don’t trust the information they use to make decisions Ömer Sever (omers@tr.ibm.com) IBM SWG TR, Enterprise Content Management

  6. Volume http://www.ibmbigdatahub.com/infographic/four-vs-big-data

  7. Metric prefixes in everyday use https://en.wikipedia.org/wiki/Unit_prefix

  8. Variety http://www.ibmbigdatahub.com/infographic/four-vs-big-data

  9. Velocity

  10. 3Vs of Big Data

  11. The 4th V  Veracity

  12. The 5th V Value http://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data

  13. Relational Database Here are a few reasons you might choose an SQL database: • You need to ensure ACID compliancy (Atomicity, Consistency, Isolation, Durability). ACID compliancy reduces anomalies and protects the integrity of your database by prescribing exactly how transactions interact with the database. Generally, NoSQL databases sacrifice ACID compliancy for flexibility and processing speed, but for many e-commerce and financial applications, an ACID-compliant database remains the preferred option. • Your data is structured and unchanging. If your business is not experiencing massive growth that would require more servers and you’re only working with data that’s consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume.

  14. NoSQL https://www.mongodb.com/nosql-explained https://www.kidscodecs.com/database-design/ https://www.analyticsvidhya.com/blog/2015/06/beginners-guide-mongodb/ Non SQL, Non Relational, Not only SQL NoSQL databases disrupted the database market by offering a more flexible, scalable, and less expensive alternative to relational databases. They also were built to better handle the requirements of Big Data applications. Examples: • MangoDB

  15. NoSQL • Storing large volumes of data that often have little to no structure. A NoSQL database sets no limits on the types of data you can store together, and allows you to add different new types as your needs change. With document-based databases, you can store data in one place without having to define what “types” of data those are in advance. • Making the most of cloud computing and storage. Cloud-based storage is an excellent cost-saving solution, but requires data to be easily spread across multiple servers to scale up. Using commodity (affordable, smaller) hardware on-site or in the cloud saves you the hassle of additional software, and NoSQL databases like Cassandra are designed to be scaled across multiple data centers out of the box without a lot of headaches. • Rapid development. If you’re developing within two-week Agile sprints, cranking out quick iterations, or needing to make frequent updates to the data structure without a lot of downtime between versions, a relational database will slow you down. NoSQL data doesn’t need to be prepped ahead of time.

  16. Relational Database vs. NoSQL NoSQL databases differ from relational DBs in 4 main areas: • Data models: A NoSQL database lets you build an application without having to define the schema first unlike relational databases which make you define your schema before you can add any data to the system. No predefined schema makes NoSQL databases much easier to update as your data and requirements change. • Data structure: Relational databases were built in an era where data was fairly structured and clearly defined by their relationships. NoSQL databases are designed to handle unstructured data (e.g., texts, social media posts, video, email) which makes up much of the data that exists today. • Scaling: It’s much cheaper to scale a NoSQL database than a relational database because you can add capacity by scaling out over cheap, commodity servers. Relational databases, on the other hand, require a single server to host your entire database. To scale, you need to buy a bigger, more expensive server. • Development model: NoSQL databases are open source whereas relational databases typically are closed source with licensing fees baked into the use of their software. With NoSQL, you can get started on a project without any heavy investments in software fees upfront. https://www.mongodb.com/scale/nosql-vs-relational-databases

  17. https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/ https://www.analyticsvidhya.com/blog/2015/06/beginners-guide-mongodb/

  18. Data Model A user has friends who might be a user himself. People who have liked or commented or both can again be users themselves. This type of duplication makes it way harder to de-normalize an activity stream into a single document. https://www.analyticsvidhya.com/blog/2015/06/beginners-guide-mongodb/

  19. Types of NoSQL https://www.upwork.com/hiring/data/sql-vs-nosql-databases-whats-the-difference/ • Key-value model—the least complex NoSQL option, which stores data in a schema-less way that consists of indexed keys and values. Examples: Cassandra, Azure, LevelDB, and Riak. • Column store—or, wide-column store, which stores data tables as columns rather than rows. It’s more than just an inverted table—sectioning out columns allows for excellent scalability and high performance. Examples: HBase, BigTable, HyperTable. • Document database—taking the key-value concept and adding more complexity, each document in this type of database has its own data, and its own unique key, which is used to retrieve it. It’s a great option for storing, retrieving and managing data that’s document-oriented but still somewhat structured. Examples: MongoDB, CouchDB. • Graph database—have data that’s interconnected and best represented as a graph? This method is capable of lots of complexity. Examples: Polyglot, Neo4J.

  20. https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/

  21. Column Store • In a Column Store database, data is stored in columns, as contrast to being stored in rows as is done in most relational database management systems. • A Column Store is comprised of one or more Column Families that logically group specific columns of the database. A key is used to identify and point to a number of columns, with a keyspace attribute that defines the scope of this key. Each column contains tuples of names-values, ordered and comma separated. • Column Stores have fast read/write access to the information. Rows that correspond to a single column are stored as a single disk entry. This means faster access during read/write operations. • The most popular databases that use the column store include Google’s BigTable, HBase, and Cassandra.

  22. RDBMS vs. NoSQL

  23. JSON  {    "name": "John",    "age": 30,    "cars": {        "car1": "Ford",        "car2": "BMW",        "car3": "Fiat"    } } {"name":"John","age":30, "cars":[ "Ford", "BMW", "Fiat" ]} JSON (JavaScript Object Notation) http://json.org/example.html https://www.w3schools.com/js/js_json_objects.asp Retrieving and Updating JSON Objects in SQL Server 2016 (link)

More Related