Rough Sets in Data Warehousing Infobright Community Edition (ICE)

www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008 Rough Sets inData WarehousingInfobright CommunityEdition (ICE)

Data Warehousing

Technology Layout

Two-Level Computing Large Data (10TB) and Mixed Workloads

Rough Sets Classes of records with the same values of the subset of the attributes Sport? = Yes

Information Systems Data-based knowledge models, classifiers... Database indices, data partitioning, data sorting... Difficulty with fast updates of structures...

Rough Sets in Infobright We can imagine the set of all records relevant to the given query, that is satisfying its SQL filter SELECT COUNT(*) FROM Employees WHERE Salary > $ Salary > $ Using Knowledge Grid, we verify, which packs are irrelevant (disjoint with the set), relevant (fully inside the set) and suspect (overlapping) We do not need irrelevant packs. We do not need to decompress relevant ones: we store their local COUNT(*) in the corresponding Data Pack Nodes Packs storing the values of records for column Salary

Information Systems in Infobright

SELECT MAX(A) FROM T WHERE B>15; DATA STEP 1 STEP 2 STEP 3

Advanced Knowledge Nodes Order Detail Table – assume many more rows Supplier/Part Table – assume many more rows

Count Distinct Count(*) on Self-Joins Decision Trees Contingencies New Objectives New Schemas New Volumes New Queries New KNs New Data Types SQL Extensions Feature Extraction Data Compression Community Inspirations

Conclusion • Technology based on interaction between rough and precise operations, open for adding new structures • Full product, simple framework, ad-hoc analytics, good load speed, 10:1 „all inclusive” compression • The core technology based on more data mining, rough sets, computing with rough values, et cetera • Infobright Community Edition (ICE) ready for a free usage and study, as well as open for contributions

References • D. Ślęzak, J. Wróblewski, V. Eastwood, P. Synak: Bright-house: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2): 1337-1345 (2008). • M. Wojnarski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna,J. Wróblewski: Method and System for Data Compression in aRelational Database. US Patent Application, 2008/0071818 A1. • J. Wróblewski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna,M. Wojnarski: Method and System for Storing, Organizing andProcessing Data in a Relational Database. US Patent Application,2008/0071748 A1.

www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008 THANK YOU!!!

Rough Sets in Data Warehousing Infobright Community Edition (ICE)

Rough Sets in Data Warehousing Infobright Community Edition (ICE)

Presentation Transcript

Rough Sets Theory

Data Warehousing

Rough Sets

Data Warehousing

Rough Sets

Rough Sets Theory

Data Warehousing

DATA WAREHOUSING

Data Warehousing

DOMINIK ŚLĘZAK infobright infobright

Data Warehousing

Data Warehousing

Rough Sets Tutorial

DATA WAREHOUSING

More Rough Sets Various Reducts and Rough Sets Applications

Data Warehousing

Overview of Rough Sets

Rough Sets

Rough Sets

Data Warehousing

Data Warehousing

More Rough Sets