1 / 19

Komplexpraktikum Datenbank-Anwendungen

Komplexpraktikum Datenbank-Anwendungen. Organisatorisches. Wer? Claudio Hartmann claudio.hartmann@tu-dresden.de Ulrike Fischer ulrike.fischer@tu-dresden.de Für wen? Diplom PO 2004 Informatik, Medieninformatik Komplexpraktikum (Schein) Diplom PO 2010 Informatik Modul PM-FPA Bachelor

davida
Download Presentation

Komplexpraktikum Datenbank-Anwendungen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KomplexpraktikumDatenbank-Anwendungen

  2. Organisatorisches • Wer? • Claudio Hartmann claudio.hartmann@tu-dresden.de • Ulrike Fischer ulrike.fischer@tu-dresden.de • Für wen? • Diplom PO 2004 Informatik, Medieninformatik • Komplexpraktikum (Schein) • Diplom PO 2010 Informatik • Modul PM-FPA • Bachelor • Informatik B-510, B-520 • Medieninformatik B-530, B-540 • Master • Informatik PM-FPA • Medieninformatik E-4 • Leistungsumfang • 4 SWS oder 8 SWS Komplexpraktikum Datenbank-Anwendungen

  3. Organisatorisches • Ziele • Selbständig… • In Sachverhalte einarbeiten • Probleme erkennen und Lösungsansätze entwickeln • Eigene Ansätze und Ideen umsetzen und Evaluieren • Ablauf • Kick-Off in erster Vorlesungswoche (heute) • Sync-Treffen in regelmäßigen Abständen (Termin?) • Alle 1-2 Wochen • Abschlusspräsentation • Vorstellen der entworfenen Ansätze und Ergebnisse • Am Ende des Semesters • Kommunikation/Material • Diskussion - Auditorium (https://auditorium.inf.tu-dresden.de/courses/2154549) • Code & Testdaten - SVN (Zugang folgt) Komplexpraktikum Datenbank-Anwendungen

  4. Scenario & Challenges • Monthly Report • Targets • Sneak Peak Reporting • Missing data: fill gaps • Further targets • Outlier Detection • Fraud Detection • Development Reports Updates Forecast Impute Refine Adjust Historie … ES CC MFD … Aggregate COOL PTV FIRA Estimation Komplexpraktikum Datenbank-Anwendungen

  5. The data look like this… • Time column • period (monthly date stamp) • Measure columns • sales_units / _nc_ne / _nc_e / _c_ne • purchase_units • stock_new_units • Attribute columns • Some product group specific (id, price; e.g. color, energy_label, size, brand, …) • Some outlet specific (id, distributionfactor, extrapolationfactor, turnover_class, nuts1, channel, …) • Size (cooling) • 2.3 mio “item x outlet”-tuple on 5051 items, 1116 outlets and 36 periods outlet item FIRA / SIS

  6. Common scenario • Model usage • Train a statistical model on historical data • Model seasonalandtrendeffects • Requiresequidistantvalues • Calculate forecast values 3rd year 2nd year 1st year Liebherr KT 1434 optimize Model Komplexpraktikum Datenbank-Anwendungen

  7. Problem • Too short time series • Very sparse data on low aggregation levels • No statistical model available for some specific time series 3rd year 2nd year 1st year Liebherr KT 1434 Bosch KSl 20s53 SEG MS210 A … Komplexpraktikum Datenbank-Anwendungen

  8. Solution • Cross-sectional forecasting • Assume similar behavior of some groups of time series • Use transitions over months from a set of time series • Train model on transitions of several time series • Use last known period as input to calculate forecasts 3rd year 2nd year 1st year Liebherr KT 1434 Bosch KSl 20s53 SEG MS210 A … Model Report calculation on previous period Komplexpraktikum Datenbank-Anwendungen

  9. Attribute hierarchies • Many different ways to partition the data • Different forecast error on different forecast targets • Research goal • What is the best partition for which forecast target? outlet outlet outlet ESP - YES TSS - YES outlet item item item ESP - NO TSS - NO item channel X no_frost channel no_frost YES ESP TSS NO All x item FIRA / SIS

  10. Parallel FIRA processes • Distribute each value of each attribute to an dedicated node • Parallel execution of FIRA-process for each value • Split into two phases system configuration Relation cooling Node 1 channel ESP Node 2 channel TS Node 3 nofrost YES Node 4 nofrost NO process configuration Updates Node 5 nofrost N.A. Node 6 brand AEG Node 7 brand SIEMENS Node 8 Channel x nofrost ESP & YES DB-Server FIRA … ES CC MFD Forecast Impute Refine Adjust Variant 2 … Historie Variant 3 COOL Node 9 Channel x nofrost ESP & NO … ES CC PTV MFD … configuration repository Estimation process … Aggregate database COOL PTV FIRA model exploitation / usage Estimation Self-Adjusting Imputation System Komplexpraktikum Datenbank-Anwendungen

  11. Phases of prediction approach • Prediction phase • Fetch all necessary data for model training • Transitions of all time series covered by the attribute value • Train the model • Fetch data from the pre period • Calculate predictions • Evaluation phase • Fetch data of predicted period • Join with predicted data • Calculate forecast error Model Model error Komplexpraktikum Datenbank-Anwendungen

  12. Prediction phase one attribute • 1st Map-Reduce • Map: • Create Model on item x attribute level • Calculate forecasts on item x attribute level • Modell training data: • Query workload • Fetch time series corresponding to nodes task • Once per time slice outlet item DB-Server Zielperiode: 27 SELECT … FROM (SELECT … FROM cooling WHERE nindexIN ( 15, 3 ) AND channel = ‘ELECTRICSP’ AND sales_units>0 group by itemid,nindex ) AS foo, (SELECT … FROM cooling WHERE nindexIN ( 14, 2 ) AND channel = ‘ELECTRICSP’ GROUP BY itemid,nindex ) AS bar WHERE foo.nindex= bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 ESP Node 1 channel ‘ELECTRICSP’ Komplexpraktikum Datenbank-Anwendungen

  13. Prediction phase one attribute • 1st Map-Reduce • Map: • Create Model on item x attribute level • Calculate forecasts on item x attribute level • Modell input data: • Query workload • Fetch time series corresponding to nodes task • Once per time slice outlet item DB-Server Zielperiode: 27 SELECT * FROM ( SELECT … FROM cooling WHERE nindex = 26 AND channel = ‘ELECTRICSP’GROUP BY itemid )foo WHERE sales_units_1>0 AND stock_new_units_1>=0 ESP Node 1 channel ‘ELECTRICSP’ Komplexpraktikum Datenbank-Anwendungen

  14. Prediction phase one attribute • 1st Map-Reduce • Map: • Create Model on item x attribute level • Calculate forecasts on item x attribute level • Modell trainingdata: • Query workload • Fetch time series corresponding to nodes task • Once per time slice outlet item DB-Server Zielperiode: 27 SELECT … FROM (SELECT … FROM cooling WHERE nindexIN ( 15, 3 ) AND channel = ‘TECSUPERST’ AND sales_units>0 group by itemid,nindex ) AS foo, (SELECT … FROM cooling WHERE nindexIN ( 14, 2 ) AND channel = ‘TECSUPERST’ GROUP BY itemid,nindex ) AS bar WHERE foo.nindex= bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 TSS Node 2 channel ‘TECSUPERST’ Komplexpraktikum Datenbank-Anwendungen

  15. Prediction phase two attributes • 1st Map-Reduce • Map: • Create Model on item x attribute level • Calculate forecasts on item x attribute level • Modell trainingdata: • Query workload • Fetch time series corresponding to nodes task • Once per time slice outlet ESP - YES item DB-Server Zielperiode: 27 SELECT … FROM (SELECT … FROM cooling WHERE nindexIN ( 15, 3 ) AND channel = ‘ELECTRICSP’ AND nofrost = ‘YES’ AND … group by itemid,nindex ) AS foo, (SELECT … FROM cooling WHERE nindexIN ( 14, 2 ) AND channel = ‘ELECTRICSP’ AND nofrost= ‘YES’ GROUP BY itemid,nindex ) AS bar WHERE foo.nindex= bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 Node 8 Channel x nofrost ‘ELECTRICSP’ & ‘YES’ Komplexpraktikum Datenbank-Anwendungen

  16. Evaluation phase • 2nd Map-Reduce: • Map • Get real data from database • Joinwithpredictions • Reduce: • Aggregate Data to demanded aggregation level • Calculate error • Query workload • Fetch all data for one task • to join with predictions and calculate errors • Only once DB-Server error SELECT … FROM ( SELECT nindex AS time, itemid AS itemid FROM cooling WHERE nindex>12 AND channel = ‘ELECTRICSP’ GROUP BY time, itemidHAVING sum(sales_units)>0 ) AS t1, ( SELECT … FROM cooling WHERE nindex>13 AND channel = ESP GROUP BY … ) AS t2 WHERE t1.time+1=t2.time AND t1.itemid=t2.itemid Node 1 channel ‘ELECTRICSP’ Komplexpraktikum Datenbank-Anwendungen

  17. Parallel FIRA processes • Distribute each value of each attribute to an dedicated node • Parallel execution of FIRA-process for each value • Split into two phases system configuration Relation cooling Node 1 channel ESP Node 2 channel TS Node 3 nofrost YES Node 4 nofrost NO process configuration Updates Node 5 nofrost N.A. Node 6 brand AEG Node 7 brand SIEMENS Node 8 Channel x nofrost ESP & YES DB-Server FIRA … ES CC MFD Forecast Impute Refine Adjust Variant 2 … Historie Variant 3 COOL Node 9 Channel x nofrost ESP & NO … ES CC PTV MFD … configuration repository Estimation process … Aggregate database COOL PTV FIRA model exploitation / usage Estimation Self-Adjusting Imputation System Komplexpraktikum Datenbank-Anwendungen

  18. Zielstellungen • Einarbeiten in notwendige Technologien • Hadoop, RDBMS, verteilte Datenbanken • Prognoseansatz (Workload) • Verkürzung der Ausführungszeit durch Optimierung des Datentransfers • Erarbeiten verschiedener Ansätze zur Datenhaltung • Hadoop-basierte Lösungen • RDBMS-basierte Lösungen • Andere Lösungsansätze? • Mglw. einschließlich angepasster Prognoseverarbeitung • Evaluation (Erweiterter Aufgabenbereich für 8 SWS) • Vergleich gegen SetUp mit zentraler Datenbank • Besondere Eignung einzelner Ansätze herausstellen und begründen Komplexpraktikum Datenbank-Anwendungen

  19. Einstieg • Einarbeitung in Hadoop • Hadoop v.1.2.1 • http://hadoop.apache.org/ • Einrichten eines Single nodeclusters als erste Testumgebung • http://hadoop.apache.org/docs/stable/single_node_setup.html • http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ • Einrichten von R und RHadoop • https://github.com/RevolutionAnalytics/RHadoop/wiki Komplexpraktikum Datenbank-Anwendungen

More Related