30 likes | 168 Views
Challenges and New Trends in Data Intensive Science. Panel at Data-aware Distributed Computing ( DADC ) Workshop HPDC Boston June 24 2008 Geoffrey Fox Community Grids Laboratory , School of informatics Indiana University gcf@indiana.edu , http://www.infomall.org. 1.
E N D
Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC)Workshop HPDC Boston June 24 2008 Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University gcf@indiana.edu, http://www.infomall.org 1
HPDC DADC Panel Questions I • What do you think is the most challenging research problem in data-intensive computing today? What will it be five years from now? • Same problem as for any distributed computing arena • Should we call it clouds • What is a sustainable architecture and associated software • What are special requirements of scientific data processing • How do they relate across sensors, “classic” astronomy/particle physics and “scholarly studies” • How do these relate to commercial systems that perhaps look most like “scholarly studies” • Is there any chance today’s 100 idiosyncratic data and metadata systems will survive • 5 year projection depends a lot on evolution of compute, data and network systems/hardware • The demand for Data-intensive computing has also attracted the attention of the federal funding agencies. NSF, DOE and other agencies are introducing new programs focusing on data-intensive computing such as DataNet, INTEROP, CLuE, SCiDAC Data Centers etc. Can we say Data Intensive Computing will be the next big/hot thing? How long do you think this trend will continue? • Note 30 years ago I was solving LHC problem on a few hundred 1600 BPI tapes (around 10 gigabyte of data ). I used identical analysis software (as did those from 15 years before) but volume of data a factor of million lower • HPCC Initiative shifted emphasis to simulation in 1990 • Datanet: Note raw data and software thrown away (without asking me; both “saved” on data repository chipstore) 12 years ago but I do have results of analysis • So problem is not so new; scale is new • I want to stress that agencies should keep a good balance between research and deployment; I am not so sure community has done very well on deployment
HPDC DADC Panel Questions II • What How do you think the new Petaflop scale systems and multi-core machines will affect data-intensive science? • Multi-core will be very important as the high performance distributed data analysis system for plethora of distributed data • Not clear what computational complexity of important data mining algorithms will be • Unclear if Petaflops very relevant; many smaller machines better and not allowed to use Petaflops in this fashion • Could you comment on the vision of your institution on the "next generation data-intensive science & discovery"? • Very important in usual sciences – Biology, Chemistry, Physics • Indiana University has no engineering school so social science or “scholarly knowledge” especially interesting • Education and Training; clearly identified as a serious problem • Propose (as chair of informatics department) certificates/minors in X-Informatics (a wrinkle on Computational Science) • X = Geography, Library, Health, Business, Media, Sports • Already have full set of offerings in Bio, Chem, Security, Music, Social, HCI ….