1 / 50

Expanding Big Data Science: Forward & Backward

Expanding Big Data Science: Forward & Backward. C. Randall (Randy) Howard, Ph.D., PMP Big Data Scientist, Thought Leader, Systems Innovation Analyst, Solutions Architect Sr. Data Scientist, Novetta Solutions Adjunct Professor, Mason ’ s Volgenau School of Engineering choward@gmu.edu

shawn
Download Presentation

Expanding Big Data Science: Forward & Backward

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expanding Big Data Science:Forward & Backward C. Randall (Randy) Howard, Ph.D., PMP Big Data Scientist, Thought Leader, Systems Innovation Analyst, Solutions Architect Sr. Data Scientist, Novetta Solutions Adjunct Professor, Mason’s Volgenau School of Engineering choward@gmu.edu http://www.crhphdconsulting.net/ May 20, 2014 April 4, 2013 Technology Trends, Big Data and Data-Driven Decisions

  2. C. Randall (Randy) Howard, Ph.D., PMP • Senior Data Scientist, Novetta Solutions • Adjunct Professor, Volgenau School of Engineering, GMU • Big Data Overview • Systems Analysis & Design  Determining Needs in Big Data • Big Data, Small Details & Time (Metadata) • 2013 Teaching Excellence Award Nominee • Co-Organizer of Big Data Lecture Series, EIT Award Nominee • Member, Data Science Working Groups & Sub-teams • International Author & Speaker • 30 years IT & systems engineering, architecture, trouble-shooting, change & innovation • Ph.D., Information Technology, GMU • BS, MS: Information Systems, VCU

  3. Agenda Context: What is Big Data All About? Forward: Considering Multiple Perspectives Backward: Refactor/Repurpose Legacy Approaches

  4. Context: What is Big Data Science All About?

  5. Context of Material • How was the big data collected? • Empirical Observations & Applications • Critical Thinking • Where is it stored? • Case Studies • Feverishly Codifying • Move from Rescuing to Preventing • What are the results? • Clarifying and Connecting Disparate, Contentious Pieces • Still Working…

  6. My Positions on Big Data • Big Data Science • Big Data: Problem & Opportunity Space • Data Science: Potential Solution Discipline • Big Data Science: “Applying Data Science to Big Data” • Technology “Reboot” CAN Usher in New Generation of Capabilities • Big Data Today • New “Big Data” Tomorrow • Must Clarify Business Value • Have To Think Horizontally & Corporately • But, I am a professor… • Heresy Now? Genius Tomorrow?

  7. IT Disasters & Dilemmas: Possible w/ Big Data?[IT-Failures] Disasters Dilemmas Economic Winter (Do more w/ Less) What is it? Exactly? NSA Trailblazer* $1.2B: over-budget, ineffective, 7-yr boondoggle FBI’s Trilogy Virtual Case File* $170M:Scrapped UK Inland Revenue* $3.5B:Software Errors Obama Care? Ford’s Purchasing System* $400M:Abandoned

  8. My Big Concern!! Peak of Inflated Expectations: Early publicity produces a number of success stories—often accompanied by scores of failures. Some companies take action; many do not. Plateau of Productivity: Mainstream adoption starts to take off. Criteria for assessing provider viability are more clearly defined. The technology’s broad market applicability and relevance are clearly paying off. Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to crystallize and become more widely understood. Second- and third-generation products appear from technology providers. More enterprises fund pilots; conservative companies remain cautious. Curve of Complacency: Early successes satisfy stakeholders that the problem or opportunity is handled, and it is time to move on to the next issue. Meanwhile the Plateau of Productivity that is achieved is much lower.[crh] Dr. C. Randall Howard, PMP (Not a position of Gartner or Dr. Aiken-yet) Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters. Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven. [Aiken] [Gartner]

  9. Big Data & Data Science “1-Page Summary” • Big Data  “V”s[IBM]: • Volume (How much in total) • Variety (How many sources) • Velocity (How fast does it come in) • Veracity, Variability, Complexity, etc.[various] • “Hard” Data Science[various] • Math, Science, Analytics • Data-Driven Organizations • Creating data products • Looking to the future • “Soft” Data Science? (Hold on) NOTIONAL DEPICITION Creation & Collection Capabilities Capability gaps due to surges in data collections Data V’s [Conway] • Increases in Sensors • Social Media • Mobile Data Processing & Analytical Capabilities Time

  10. Soft Data Science [crh] Changing Term to Tacit Data Science, but that’s another talk Shrink the Capability Gap Creation & Collection Capabilities NOTIONAL DEPICITION • Hardening the “Soft” • Automate “Hard-to-Automate” • Predict Predictable • To-be Performed by Many w/ “Soft” & “Hard” Data Science Data V’s • Backlogs increase exponentially • Signals become noise • “Action” windows lost / missed • We become bottlenecks to partners “Soft Head Start” w/ “Hard” Data Science Alone Processing & Analytical Capabilities • Notoriety to date • Performed by a few • Bottlenecked by a few? Time

  11. Big Data Science Value Parameters • Increased Actionable Intelligence • Trends Noticed / Confirmed • Leverage Unstructured • Faster Knowledge / Awareness / Ability to Search Data • Flexibility / Extensibility of Data Utilization • New, More Adaptable HW/SW Acquisition Models • More TBD

  12. Other Big Data Considerations • Capabilities  Their Own Separate ROI’s • Process Data w/in Acceptable Tolerances: • Time • Errors • Accuracy • Reliability • Etc. • Accountability: Find Critical Intelligence & Make Time Windows • Thus, Big Data Is “Having more data than you can process and manage within acceptable tolerances (e.g. time, quality, cost)”[crh]

  13. Forward: Considering Multiple Perspectives

  14. BDLS: A Broader Look Big Data Science • Each channel is difficult • Each complements the other • Complexities are compounded exponentially in cross-sections

  15. Multiple Perspectives in Publications • Multi-disciplinary[Gartner-ERDS] teams[Patil] a “broad sample of the population” & involves “teams that frequently partner w/ diverse roles in an organization… to gather, organize, & make use of their data”[EMC-DS] • “Wetware[Gleichauf]” (vs. HW & SW): “People, their skillsets, corporate policies, & organizational structures that define our analytic communities” • Soft Skills[Gartner-ERDS]: • Communication • Collaboration • Leadership • Creativity • Discipline • Passion • Data Scientist can be invaluable…unique combination of technical & business skills…makes them difficult to to find or cultivate. [Gartner-ERDS]

  16. Data Science Teams • Data Science Teams[Patil] • Small-team members should sit close to each other • Mix of skill-sets, some experts, some not • Train people to fish • Functional areas must stay in regular contact and communication. • Impediments • Measuring Performance: Rewarding & Disciplining Teams vs. Individuals • Sharing Intellectual Property w/ Integrated Product Teams (esp. cross-vendor) • “Expert Teams”???? • “Expert Teams” • May find Big Data Science trivial • Typically • have more control over their environment • Don’t need to have the masses engaged But … • Most organizations need to have the knowledge & skills spread out to “Non-experts”

  17. Life-Cycle Service Orchestration Acquisition (FAR) ? Legal Review Life Cycle OODA Loop

  18. Classroom Exercise Findings

  19. Wicked Problems

  20. Wicked Problems Tip-off Words[Nixon] Networked Integrated Joint Shared Multi-organizational Interoperable Coalition Cross-organizational Community Combined Virtual Big Data is a Wicked Problem!

  21. Wicked Problems[Nixon] • Requires Multiple Stakeholders’ Perspectives • Key Driver: Social Complexity from Integrated Networks • Traditional linear solution styles are not well suited • Needs focus on: • Social Aspects • Gaining Shared Understanding • Try Things • Let Solution Emerge From Cycle of Adaptation • Thus[crh], • Multiple Perspectives Involves Collaboration • Collaboration Technologies MUST BE INNOVATED

  22. Sample Collaboration Innovation[InnovationGames]

  23. Sample Collaboration Innovation[InnovationGames] [InnovationGames] http://innovationgames.com/

  24. Learning Organizations

  25. Learning Organization [Senge] • Peter Senge (http://www.infed.org/thinkers/senge.htm) • Studied how adaptive capabilities developed • The Fifth Discipline(1990)‘Learning Organization' (LO) • Basic Learning Organization Disciplines: • Systems Thinking • Personal Mastery • Mental Models • Building Shared Vision • Team Learning

  26. Learning Organizations’ Disciplines

  27. Changing Culture

  28. Culture Obstacles[econBD]

  29. Changing Culture • Examples: • Hard-drives • Management Visibility of Data Processing • Target’s former CEO? • Leadership needs to foster a culture of: • Increased curiosity about data • Rewarding experimentation • Counting “Assists” • Need ‘democratization’, or open-access, of data”[Patil] • Or Horizontal Orientation / Governance of Data[crh] • Not trivial - Sharing data exposes risks of: • Misinterpretation • Loss of “credit” associated with results from the data

  30. Education

  31. Education • Establish a new baseline of knowledge to advance • Mason’s Big Data Lecture Series Purpose: • Separate Hype from Reality • Have marquee experts expose what in Big Data: • Is really working and making a difference? • Shows promise? • Has failed? Needs another try?   • Are the impediments? • Convey daunting challenge Is feasible, but still a challenge

  32. Big Data Adoption [IBM-Analytics]

  33. Learning Revolution [Robinson] • Big Data Science is a REVOLUTION that starts (& continues) w/ LEARNING • Requires new skills • New leadership models • http://www.ted.com/talks/sir_ken_robinson_bring_on_the_revolution.html

  34. Backward: Refactor / Repurpose Legacy Approaches

  35. What is Legacy? • What “brought us here” • Business Basics (e.g., Planning, ROI) • Structured Systems Analysis (e.g. Waterfall methodology, CMMI) • Yes, • Very Cumbersome • Have Failed too But… • Developed by Very Smart People • For Very Similar Issues • Been “Tested” So….. • Re-invent the Wheel? • To leverage: • Consider Context: Intent & Issues • Re-calibrate / Re-factor For Today • Come Back to “Common Sense”, What Works • Examples: • Meeting Management • Scaled Agile

  36. Enterprise Architecture • “Process of translating business vision and strategy into effective enterprise change by creating, communicating and improving the key requirements, principles and models that describe the enterprise's future state and enable its evolution.[Gartner-EA] • Short: Simple Structure & Alignment of Technical & Business Capabilities So…. • Take “Business Back to IT”[crh] • Maintain Line-of-Sight to Value[crh] • Focus on the Mission and Mission Capabilities!

  37. Capability Dependencies Hierarchy Example: Tool x requires staff time for training & learning

  38. Strategic Planning Survey[Bain] • 14-year Compilation of: • 11 Surveys • 8,504 respondents 2006: 88% 3.93

  39. Strategy to Tactics Line-of-Sight[crh] • Establish Enterprise-wide Decision Criteria • Convey & Carry Commander’s Intent to Execution Levels

  40. Engineering “Risky Art” Landscape • Most impactful, hardest to tame, most ignored • Least concrete, hardest to sell / prove • Needs the most “innovation attention”

  41. A Big Data Systems Analysis & Engineering “Success” Story Lots of ways to do this. Lots of requirements. Lots of ways to get requirements across lots of different stakeholders Users Big Data Lecture Series Fall 2012 Session 4: Solving the Risk Equation Big Data Systems Analysis & Engineering “So-What” 41

  42. Wrapup

  43. Big Data Science Postulates[crh] • If Big Data Science is not a technology problem, then let’s focus on the PROBLEM: the non-technology side, or the human-side. • We must perfect the blending of disciplines to educate & train on Big Data Science (vs. perfecting specific disciplines) • Doing what you are doing will not get you out of the fix you are in since it got you in the fix in the first place – innovate and improve! • Our Big Data Science, Analytics & Intelligence is an ENVIRONMENT and a SYSTEM, not an APP

  44. Big Data / Data Science Postulates (cont’d) Final Quiz: Where do we start? LEARNING!

  45. One last time… How did we do?

  46. References

  47. References • [1000v] URL: http://www.1000ventures.com/design_elements/selfmade/quaity_cost-4components_6x4.png • [Aiken] Dr. Peter Aiken, Data Blueprint, 2012-2013 • [arcweb] http://www.arcweb.com/events/arc-orlando-forum/pages/analytics-for-industry.aspx • [asq] URL: http://asq.org/learn-about-quality/cost-of-quality/overview/read-more.html • [Bain] http://www.bain.com/management_tools/management_tools_and_trends_2007.pdf • [Barbara’] Dr. Daniel Barbara’, George Mason University, 2012 Big Data Lecture Series • [Batni] Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 3, Article 16 (July 2009), 52 pages. DOI=10.1145/1541880.1541883 http://doi.acm.org/10.1145/1541880.1541883 • [Conway] http://www.drewconway.com/zia/?p=2378 • [coq] URL: http://costofquality.org/wp-content/uploads/2011/02/Cost-of-Quality.jpg • [crh] Dr. C. Randall Howard, PMP, crhPhDConsulting.net • [Crosby] http://www.philipcrosby.com/25years/crosby.html • [ct-bdtech] http://cloudtimes.org/2013/06/13/big-data-techniques-for-analyzing-large-data-sets-infographic/ • [dddm] http://www.clrn.org/elar/dddm.cfm • [DTIC] http://www.dtic.mil/doctrine/new_pubs/ • [econBD] http://www.economistinsights.com/analysis/evolving-role-data-decision-making, August 12th 2013 • [EMC-DS] http://www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf • [Forbes] http://www.forbes.com/sites/christopherfrank/2012/03/25/improving-decision-making-in-the-world-of-big-data/ • [FSAM/BAH] http://www.fsam.gov/about-federal-segment-architecture-methodology.php • [Gartner-EA] http://www.gartner.com/technology/it-glossary/enterprise-architecture.jsp • [Gartner-ERDS] "Emerging Role of the Data Scientist and the Art of Data Science", Gartner, 20 March 2012, ID:G00227058, Douglas Laney, Lisa Kart • [Gartner-HC] http://www.gartner.com/newsroom/id/1763814

  48. References • [gayatri-patele-bay] http://www.slideshare.net/AsterData/gayatri-patele-bay • [Gleichauf] See Bob Gleichauf’s article: http://www.iqt.org/technology-portfolio/on-our-radar/Big_Data_Advanced_Analytics.pdf • [IBM-usingBD] ftp://ftp.software.ibm.com/software/tw/Using_Big_Data_for_Smarter_Decision-Making_v.pdf • [IBM] http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2011_Issue2/BigData/index.html?cmp=dw&cpb=dwinf&ct=dwnew&cr=dwnen&ccy=zz&csr=051211 • [IBM-Analytics] http://www-935.ibm.com/services/multimedia/Analytics_The_real_world_use_of_big_data_in_Financial_services_Mai_2013.pdf • [Infocus] http://infocus.emc.com/robert_abate/the-business-case-for-big-data-part-1/ • Infostory] http://infostory.com/2012/03/28/data-information-knowledge-web/ • [InnovationGames] http://innovationgames.com/ • [IT-Failures] • [http://it-project-failures.blogspot.com • http://it.slashdot.org/submission • http://www.sfgate.com] • [Lwanga] The Job of the Information/Data Quality Professional (2010) Lwanga, Walenta, Talburt (IAIDQ Publication) • [Madnick] Stuart E. Madnick, Richard Y. Wang, Yang W. Lee, and Hongwei Zhu. 2009. Overview and Framework for Data and Information Quality Research. J. Data and Information Quality 1, 1, Article 2 (June 2009), 22 pages. DOI=10.1145/1515693.1516680 http://doi.acm.org/10.1145/1515693.1516680 • [Mason-BDLS] George Mason University Volgenau School of Engineering Big Data Lecture Series, 2011-2012 • [MIT] http://lean.mit.edu/downloads/2010-theses/view-category.html • [Nixon] steven.d.nixon@gmail.com - 08/29/2011, Mason Big Data Lecture Series 2011 • [Nonaka, Hirotaka, Knowledge-Creating Company] Nonaka, Ikujiro, and Hirotaka Takeuchi. The knowledge-creating company: How Japanese companies create the dynamics of innovation. Oxford University Press, USA, 1995. • [O’Reily] https://docs.google.com/present/view?hl=en_US&id=0AXaXKp9bt6OXZGd4YzlnYmRfNThjMmo4dm5yaA from What is data science? O'Reilly Radar • [p36] http://information-retrieval.info/taipale/papers/p36-popp.pdf • [Patil] Patil, D.J., Building Data Science Teams, 2011 • [RG] http://www.riskglossary.com/link/risk_metric_and_risk_measure.htm • [Robinson] http://www.ted.com/talks/sir_ken_robinson_bring_on_the_revolution.html • [Sagan] Dr. Philip Sagan, Infiniti, 2012 Big Data Lecture Series • [Senge] http://www.infed.org/thinkers/senge.htm • [Talburt] Dr. John Talburt, 2012 Big Data Lecture Series • [Tandem] http://www.tandemlabs.com/documents/CPSA2008.pdf

  49. Backup Slides

  50. J. C. R. Lickleider's Man-Computer Symbiosis[Aiken] Best approaches combines manual and automated reconciliation!

More Related