1 / 8

Database Systems

Database Systems. What is “Database systems” research?. Input? large data sets, large files, relational tables How? Fast external algorithms; RAM-efficient data structures at two storage levels Efficiency? Desirable linear time complexity O(n)

blade
Download Presentation

Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Systems

  2. What is “Database systems” research? • Input? large data sets, large files, relational tables • How? Fast external algorithms; RAM-efficient data structures at two storage levels • Efficiency? Desirable linear time complexity O(n) • Hardware? Small computer, single server, parallel DBMS server, parallel cluster; disk, RAID • Infrastructure? DBMS, parallel file system; any large files • Boring? Theory+programming

  3. Database systems research today • Transaction processing? done • Efficient querying? done • Fast external algorithms? Simple tasks. • Parallel computation? Well proven DBMS technology, still many challenges. • Exploiting new hardware? Difficult • Analyzing? Most difficult: data mining, stats • Future? Information integration (db+docs)

  4. DB Systems involves Core CS research:Theory+Programming • Theory we use: • Time complexity and I/O cost • Data structures; especially external • Relational model is here to stay • Multivariate statistics, machine learning, discrete math • Numerical methods • Compilers: parsing/compiling/optimizing code; recursion • Programming (even some hacking): • Systems in a broad sense • Languages: C, C++; efficiency, low-level pointer manipulation, legacy code; Java, C# mainly for portability • Numerical, OS libraries • DBMS • SQL • UDFs • API with C, C++, C#

  5. Research topics • GOAL: Integrating statistical and machine learning algorithms with a DBMS (external algorithms, queries, UDFs) • Difference with machine learning algorithms: Size, external algorithms (small RAM), queries, low level optimization, generally simpler models • Main topics by students: • Zhibo Chen: OLAP cubes, parametric statistical tests, cube ops on flash memory • Mario Navas: Singular Value Decomposition for PCA and ML Factor Analysis, data summarization on multicore CPUs • Carlos Garcia-Alvarado: keyword search across docs and db, ranking, query recommendation • Sasi Pitchaimalai: Bayesian classification, multithreaded summarization • Kai Zhao: predictive association rules, frequent subgraphs • Manish Limaye: ER modeling for data pre-processing • Anu Goyal: accelerating convergence of EM for mixtures of Gaussians

  6. Representative problems Finding predictive association rules OLAP cubes Cluster, PCA and regression Bayesian classification

  7. Why is our database systems research “cool”? • Theory+Programming • Optimization, O(f(n)), systems (external data structures, discrete math, compiler, OS) • Goes from hardware-level stuff (multi-core, cache memory), to high-level query optimization in SQL • Database systems techniques are used in search engines like Google and Yahoo (and vice-versa) • DBMS technology used everywhere

  8. Why join DBMS group? • Balance between theory (math) and programming • We target “cool” conferences: ACM SIGMOD (core database systems); ACM CIKM (IR+DB+DM); IEEE ICDM (DM) • Mature and stable CS research area • Job/internship opportunities in DBMS and search engines; Job security on any IT department • Visit my web page, DBLP. Google “Ordonez SQL”

More Related