1 / 16

Rough Sets in Data Warehousing Infobright Community Edition (ICE)

www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008. Rough Sets in Data Warehousing Infobright Community Edition (ICE). Data Warehousing. Technology Layout. Two-Level Computing. Lar ge D ata ( 10TB ) and M ixed W orkloads. Rough Sets.

Download Presentation

Rough Sets in Data Warehousing Infobright Community Edition (ICE)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008 Rough Sets inData WarehousingInfobright CommunityEdition (ICE)

  2. Data Warehousing

  3. Technology Layout

  4. Two-Level Computing Large Data (10TB) and Mixed Workloads

  5. Rough Sets Classes of records with the same values of the subset of the attributes Sport? = Yes

  6. Information Systems Data-based knowledge models, classifiers... Database indices, data partitioning, data sorting... Difficulty with fast updates of structures...

  7. Rough Sets in Infobright We can imagine the set of all records relevant to the given query, that is satisfying its SQL filter SELECT COUNT(*) FROM Employees WHERE Salary > $ Salary > $ Using Knowledge Grid, we verify, which packs are irrelevant (disjoint with the set), relevant (fully inside the set) and suspect (overlapping) We do not need irrelevant packs. We do not need to decompress relevant ones: we store their local COUNT(*) in the corresponding Data Pack Nodes Packs storing the values of records for column Salary

  8. Information Systems in Infobright

  9. SELECT MAX(A) FROM T WHERE B>15; DATA STEP 1 STEP 2 STEP 3

  10. Advanced Knowledge Nodes Order Detail Table – assume many more rows Supplier/Part Table – assume many more rows

  11. Count Distinct Count(*) on Self-Joins Decision Trees Contingencies New Objectives New Schemas New Volumes New Queries New KNs New Data Types SQL Extensions Feature Extraction Data Compression Community Inspirations

  12. Conclusion • Technology based on interaction between rough and precise operations, open for adding new structures • Full product, simple framework, ad-hoc analytics, good load speed, 10:1 „all inclusive” compression • The core technology based on more data mining, rough sets, computing with rough values, et cetera • Infobright Community Edition (ICE) ready for a free usage and study, as well as open for contributions

  13. References • D. Ślęzak, J. Wróblewski, V. Eastwood, P. Synak: Bright-house: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2): 1337-1345 (2008). • M. Wojnarski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna,J. Wróblewski: Method and System for Data Compression in aRelational Database. US Patent Application, 2008/0071818 A1. • J. Wróblewski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna,M. Wojnarski: Method and System for Storing, Organizing andProcessing Data in a Relational Database. US Patent Application,2008/0071748 A1.

  14. www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008 THANK YOU!!!

More Related