1 / 16

Data Mining&Business Planning of Engineering/Research Projects

Data Mining&Business Planning of Engineering/Research Projects. Presentation 5 Dr. Gá bor Pauler , Associate Professor Faculty of Sciences, University of Pécs Tel:30/9015-488 E-mail: pauler@ t-online.hu. Recording data in Forms and Database tables Database design

manon
Download Presentation

Data Mining&Business Planning of Engineering/Research Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining&Business Planning of Engineering/Research Projects Presentation 5 Dr. Gábor Pauler, Associate Professor Faculty of Sciences, University of Pécs Tel:30/9015-488 E-mail:pauler@t-online.hu

  2. Recording data in Forms and Database tables Database design Nasty practical example: Pregnacy Pregnacy Questionnaire 1 Pregnacy Questionnaire 2 Decomposition of Empirical Data Structures (EDS) Relations, Entity Relationship Diagram (ERD) Graphic User Interface (GUI) Design: Basic problem Business Process Diagram Data Flow Diagram Mainforms and Subforms References Content of the Presentation

  3. Recording data in Forms and Database tables • We have to get familiar usage of electronic forms and databases behind because of: • The target market may be reachable mainly by web survey (eg. young high-educated wealthy people working at global firms), or • Even if the survey will be paper based, usa-ge of electronic forms can speed up recor-ding data 10-12 times: instead of manually matching questionnaire and table, recorder can work with a similar form to questionnai-re, and just „visually copying” checked ans-wers. The form will record it in table + check consistency (we usually add Number of res-pondent and Quality of Filling fields for it) • Database table (Adatbázis tábla) of form is a permanent data storage on hard drive conta-ining of fixed-lenght, fixed-structure, sequen-tially (1..n)numbered Records (Rekord).They are set of unique named Data fields (Adat-mező) described by their: • Name(Név): 8±3 char, no special chars! • Type(Típus):Long(4byte),Single(4),Date(8) • Description(Leírás):can contain any char • Min/Max/Default(Min/Max/Alapért.)values • Compulsory/Optional(Kötelező/Opc.)fill A database table can store max.256 fields× (10M:Access..100M:Oracle)records and re-trive them very fast(1000×faster than Excel), as their starting address can be computed in advance: Start=RecNum×RecLenght (5.1) katt

  4. Database Design: Nasty practical example: Pregnacy 1 • Data of a simple questionnaire can be stored in single data-base table • However there are tricky cases when this leads to fatal mista-kes: • See PregnacyQuestionnaire1 for example, which records data of pregnacies for women with babies, fathers and obs-tetrist (Szülész) data incorpo-rated. Whoever is not familiar with databases probably would design something like this, but it can kill mothers and babies! • What if the woman had more than 2pregnacies and babies? • If there are more babies than pregnacies, how we do know which baby belongs to which? • What if the father was different at each pregnacies? • This thing is deadly inaccurate!

  5. Database Design: Nasty practical example: Pregnacy 2 • However if we want to be „deadly accurate”, we have to design a much more complex questionnaire PregnacyQuestionnaire2.doc with: • Lot of jumps across questions • Wasting space space for invalid questions • Wasting recording workforce (Eg. you have to record the same fathers data n times if he „contributed” in n pregnacies) • Moreover, if you store its data in single table, most of the space is wasted (Eg. average modern women have only 0.5 pregnacies with 0.4 baby from 0.5 father), but in the same time you may loose data because it cannot store! (Eg. Dzsenifer Orsós:has 8kids from 5pregnacies from 4father):

  6. Recording data in Forms and Database tables Database design Nasty practical example: Pregnacy Pregnacy Questionnaire 1 Pregnacy Questionnaire 2 Decomposition of Empirical Data Structures (EDS) Relations, Entity Relationship Diagram (ERD) Graphic User Interface (GUI) Design: Basic problem Business Process Diagram Data Flow Diagram Mainforms and Subforms References Content of the Presentation

  7. Database Design: Decomposition of Empirical Data Structures PregnacyQuestnr WomanName WomanBirthDate FirstMensesDate FatherName FatherBirthDate LastMensesDate LayDate ObstetristName BabyName BabyStatus • The reason behind these difficulties is that the Empirical Data Structure, EDS (Gyakorlati Adatstruktúra) you can see on Pregnacy Questionnaire is full with 1:many( ) or many:many( ) nested relationships (Beágyazott kapcsolatok) between their fields, Eg.: • 1 woman can have many pregnacies • 1 pregnacy can result in many babies (twins) • 1 man can have sex with many women, and also 1 woman can have sex with many men This will result non-fixed lenght structure EDS (Eg. 0..8 pregnacies × 0..5 babies !?), which cannot be processed effectively by single fixed-lenght record database table. Therefore they are transformed in a Database Design (Adatbázis Tervezés) process with 3 goals: • Avoid loss of data • Avoid redundancies in space consumption and data recording • Ensure unambigous (Egyértelmű) relations between data fields • EDS is Decomposed (Szétbont) into Entities (Egyed): • These are objects with large number Occourences/Records (Előfordulás/Rekord) (Eg. there are lot of Woman) described by same Attributes/Fields (Tulajdonság/Mező) (Eg. Name, DateOf Birth, DateOfFirstMenses) stored in separate database tables named their plural(Eg. Women) • Decomposition is made by Cardinality Analysis (Számossági Elemzés) of relations between pairs of entities/attributes forth and back (We denote Entity, Attributes, their relation, its cardinality by corresponding colors): • 1:1 Relation: 1Womanhas1Name, 1BirthDate, 1FirstMensesDate: these are attributes should be left in 1 entity/table marked with purple and can referenced as TableName.FieldName (Eg. Women.Name)

  8. Database Design: Relations • 1:many Relation: 1Womancan havemanyPregnacies, but1Pregnacybelongs to1Woman: these are 2 separate entities/tables. To preserve original data of EDS, separated entities are connected by Relations(Relációk): referencial connections of 2 fields: • Primary Key (Elsődleges Kulcs) field of „1”side-entity/table: It Uniquely Identifies (Egyedileg azonosítja) occourences/records, so it cannot contain repeating or Empty/Null (Üres/Hiányzó) values (but can contain Zero = 0) • If there is no such an attribute in entity/table (Eg.Women.Name can have re-petition) we add an auto-numbered Artificial(Mesterséges) key (Eg. WomanID) • Its recommended name is EntityNameID and it is denoted by orange • Foreign Key (Idegen Kulcs) field of „many”side-entity/table: It is a field referencing primary key of another entity/table (Eg. Pregnacies.WomanID) marked with olive • It has the same name and data type as primary key, but can contain repeating (Eg. more pregnacies belong to the same woman) and Null values • Many:many Relation: 1Womancan have Pregnacy frommanyFathers (in longer time period),and 1Fathercan make Pregnacy formanyWomen: this will result in 3 entities/tables: • A Relation Table (Relációs Tábla) (Pregnacies) denoted with blue will connect 2Master Tables (Törzs Tábla) (Women, Fathers) with 21:many relations

  9. Database Design: Entity Relationship Diagram Woman WomanID Name DateOfBirth FirstMenses Father FatherID Name DateOfBirth Pregnacy PregnacyID LastMenses DateOfLay Obstetrist WomanID FatherID Baby BabyID Name Status PregnacyID • Therefore, the relation table must contain at least 2 foreign key fields referencing to connected master tables. As relation table can have millions of records it can describe any possible combinations of connections between fathers and women (1 father – many women, 1 woman – many fathers, total gruppensex, etc.) • A relation table may contain other data fields (Eg. LastMenses), and it is very advisab-le to put there a primary key (even if it is most of the time artificial) (Eg. PregnacyID) otherwise it cannot be referenced and connected later by other tables (Eg. Babies) • As a result of database design, we get a Normalized Data Structure, NDS (Normalizált Adatstruktúra) from EDS:  It has only 1:many relationships left! •  It is capable of storing practically unlimited number of babies from unlimited number of pregnacies for unlimited number of woman, so there cannot be data loss anymore (Eg. not only at Dzsenifer Orsós, but at even more fast breeding organisms also). Moreover, we will not waste any storing space, and also the need of multiple recording of the very same data is eliminated!!! •  The price we have to pay for it that NDS is more complex, containing surprisingly large number of tables and relations • Therefore we use Entity Relationship Diagram, ERD (Egyedkapcsolati Diagram) to design database structre, which is more compact than sample data tables: • Entites/tables are rounded boxes with EntityName at top: • Blue background denotes Code/Master/Lookup (Kód/ Törzs/Kinézegető)tables:with slow data change in time, • Yellow denotes Relational/Transaction (Relációs/ Tranzakció) tables: rapid, irrevocable changes in time • Fields are listed with their data type icons:( , , , , , , , , ) and names: italic means optional-, normal means required-, bold means auto-filled attribute • Data fields are purple, primary keys are orange prompted by ( ), foreign keys are olive prompted by( ), auto-filled system logging attributes are black • 1:many relations connecting primary- and foreign key fields are denoted by ( )

  10. Recording data in Forms and Database tables Database design Nasty practical example: Pregnacy Pregnacy Questionnaire 1 Pregnacy Questionnaire 2 Decomposition of Empirical Data Structures (EDS) Relations, Entity Relationship Diagram (ERD) Graphic User Interface (GUI) Design: Basic problem Business Process Diagram Data Flow Diagram Mainforms and Subforms References Content of the Presentation

  11. Graphic User Interface (GUI) Design: Basic problem • NDS is highly effective in data storage, but terrible to record data manually into, as there are so many tables. Our Blonde Nusi data recorder won’t be impressed by the beauty of database design (she is impressed by the beauty of Robert Redford…) • She would like to see basically the same as on paper, but with infinite capacity, and without jumps

  12. GUI Design: Business Process Diagram (BPD) Proc:Pregnacy Step:Born Step:Born Step:Get Mature Step:First Menses FOR:Have sex? IF:Getting pregnant? IF:Not aborted? Proc:A EndProc:A FOR:A EndFOR:A IF:Pre Term? Step:PreTerm Care EndIF:Preterm? FOR:Baby born? Step:Name EndFOR:No more baby? ElseIF:B: IF:B EndIF:B EndIF:Not aborted? EndIF:Getting pregnant? EndFOR:No more sex? EndProc:Pregnacy • So, we need Electronic Forms (Elektronikus Űrlapok) as Graphic User Interface, GUI (Grafikus felhasználói Felület) of database to bridge the huge gap between NDS (how can we store effectively) and EDS (how can we use effectively) • The first step in GUI Design is mapping all the possible processes of entities on question-naire to correctly analyze their logical interdependencies (Eg. how a woman can have baby? What are preconditions? What are cosequences?) • For this, we use Business Process Diagram (BPD) (Üzleti Folyamat Diagram): it describes a process with given input, output, responsibilities, time- and resource consumption. • It is a Flowchart (Folyamatábra) laid out in a 2-dimensional coordinate system: • Time: it is not really a physical time, but a nonlinear timescale broken by uniquely named Breakpoints (Töréspont) or Milestones (Mérföldkő) of the process • Entities/Roles (Egyedek/Szerepek): they can be units of an organization (not actual people, because they can be fired!), customers, business partners, etc. • Activities (Tevékenység) of the process are represented by blocks of flowchart. Length of blocks is proportional to their time requirement. They can be: • Pairs of blocks (Blokkpárok) describing process control. They can be nested into each other. Nested pairs are always tabulated for easier overview of large BPDs: • Process header /footer • Cycle condition /footer • Conditon header /Else /footer • Single blocks (Egyedi blokkok): Process step • There are four types of Arrows (Nyíl) joining blocks: Yes branch of condition (), No branch of condition () (they are always drawn before blocks to visualize their nesting hierarchy), Step forward (), Feedback (Visszacsatolás) of cycles (),only this one can step back (logically) in time Step:

  13. Recording data in Forms and Database tables Database design Nasty practical example: Pregnacy Pregnacy Questionnaire 1 Pregnacy Questionnaire 2 Decomposition of Empirical Data Structures (EDS) Relations, Entity Relationship Diagram (ERD) Graphic User Interface (GUI) Design: Basic problem Business Process Diagram Data Flow Diagram Mainforms and Subforms References Content of the Presentation

  14. GUI Design: Data Flow Diagram (DFD) PregnacyQuestnr WomanName WomanBirthDate FirstMensesDate FatherName FatherBirthDate LastMensesDate LayDate ObstetristName BabyName BabyStatus Proc:Pregnacy Step:Born Step:Born JoinedTable Step:Get Mature Step:First Menses FOR:Have sex? JoinedTable IF:Getting pregnant? BaseTable Baby BabyID Name Status PregnacyID Father FatherID Name DateOfBirth Pregnacy PregnacyID LastMenses DateOfLay Obstetrist WomanID FatherID Woman WomanID Name DateOfBirth FirstMenses IF:Not aborted? IF:Pre Term? SubFormTable Step:PreTerm Care EndIF:Preterm? FOR:Baby born? Step:Name EndFOR:No more baby? EndIF:Not aborted? EndIF:Getting pregnant? EndFOR:No more sex? EndProc:Pregnacy • BPD is broken up so detailed level, that data nec-cessary to perform an activity (Eg.IF:Getting preg-nant?) has to fit into 1 form: it is given by user as EDS (Eg. Pregnacy Questionnaire) we showed • A BPD with EDS linked (---) is called Data Flow Diagram, DFD(Adatfolyam Diagramm) • EDS of the given form/activity is already split into 3 different types of tables regarding their relations: • Base Table (Bázis Tábla):1:1 linked with the activity (Eg.1 Form should show 1Pregnacy) • Joined Tables (Csatolt Tábla):m:1 related with base table (Eg. for 1Pregnacy there can be only 1Woman and Father) • SubForm Tables(Segédűrlap-tábla): 1:m linked with base table (Eg. for 1Pregnacy there can bemoreBabies) • This differentiation is important because in most systems (MS Access, .Net,Oracle Forms) 1Page/ Body (Oldal/Test) of form can show data from 1 record of 1Data Source Table (Adatforrás-tábla). But, we should show data of manyBabies for 1Pregnacy form page. How we can solve it?

  15. GUI Design: MainForms and SubForms MainFormSource Select W.WomanID, W.Name, W.DateOfBirth, W.FirstMenses F.FatherID, F.Name, F.DateOfBirth, P.LastMenses, P.DateOfLay, P.Obstetrist From Woman W, Father F, Pregnacy P Where W.WomanID= P.WomanID And F.FatherID= P.FatherID Proc:Pregnacy Step:Born Step:Born Step:Get Mature Step:First Menses FOR:Have sex? IF:Getting pregnant? Pregnacy PregnacyID LastMenses DateOfLay Obstetrist WomanID FatherID Baby BabyID Name Status PregnacyID Father FatherID Name DateOfBirth Woman WomanID Name DateOfBirth FirstMenses IF:Not aborted? IF:Pre Term? Step:PreTerm Care EndIF:Preterm? FOR:Baby born? Step:Name SubFormTable EndFOR:No more baby? EndIF:Not aborted? EndIF:Getting pregnant? EndFOR:No more sex? SubFormSource EndProc:Pregnacy • As only 1 record from joined tables can belong 1base record (Eg. 1Wo-man and Father for 1Pregnacy), they are SQL-queried together into a View Table (Nézet Tábla), which will serve as datasource(↔) of MainForm (Főűrlap) carrying bi-directional data transfer between form and database: shows stored data and checks+stores data typed by user into the form • In most systems there is a SubForm (Segédűrlap) control, which can be Nested (Beágyaz) into main form and show many records for 1 record of mainform (Eg. it can show ManyBabies for 1Pregnacy)The subform table will serve as data source of sub-form also carrying bi-directional data transfer (↔) between form/database • This is how an electronic form creates easy-to-use GUI for a database JoinedTable JoinedTable BaseTable

  16. References • Questionnaire planner software: • Pocket survey: http://www.pocketsurvey.info/ 30 days shareware • Relational database managers: • MySql: http://www.mysql.com/ ingyenesen letölthető • MSSql: http://www.microsoft.com/sql/default.mspx 180 days shareware • Electronic form generators: • PHP: http://www.php.net/ freeware • ASP .Net: http://www.asp.net/ 180 days shareware • Oracle Designer: http://www.oracle.com/technology/products/designer/index.html no demo • Web-server software: • Apache: http://httpd.apache.org/ freeware • Statistical sofware (in order of their quality rank): • 1. Statistica: http://www.statsoft.com/ 30 days limited shareware • 2. SAS: http://www.sas.com/ no demo • 3. SPSS: http://www.spss.com/ 14 days shareware

More Related