1 / 14

Improving the Naming Process for Web Site Reverse Engineering

Improving the Naming Process for Web Site Reverse Engineering. Selima Besbes Essanaa , Nadira Lammari ISID - CEDRIC Laboratory - CNAM - Paris. The context of the research work. Assigns labels to words, to concepts, etc. Reverse engineering of Web sites. . . The naming process :.

kenyon
Download Presentation

Improving the Naming Process for Web Site Reverse Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving the Naming Process for Web Site Reverse Engineering Selima Besbes Essanaa, Nadira Lammari ISID - CEDRIC Laboratory - CNAM - Paris

  2. The context of the research work Assigns labels to words, to concepts, etc. Reverse engineering of Web sites   The naming process :  Concerns various computer science domains It is a challenging task  Our contribution Improving the process by reducing the number of objects to name  Introduction NLDB'04 - Besbes - Lammari

  3. Agenda Introduction RetroWeb overview The naming process in RetroWeb Conclusion and Perspectives NLDB'04 - Besbes - Lammari

  4. HTML Pages RetroWeb is based on the inversion of a life cycle of a web application design  Extraction  RetroWeb is applied forsemi-structured and undocumented sites Physical views RetroWeb gives a description of the informative content of the site at various abstraction levels  Conceptualization EER Schemas Integration RetroWeb uses meta-models and reverse engineering rules  Global EER Schema RetroWebOverview NLDB'04 - Besbes - Lammari

  5. Extraction Pre-treatment Phase HTML Pages Coded sequence Extraction phase Extraction Unnamed physical views Physical views Naming Phase Conceptualization Conceptualization Physical views transformation phase EER Schemas Logical views Integration Logical views transformation phase Global EER Schema Unnamed EER Schemas Naming Phase The Naming Process in RetroWeb (1) NLDB'04 - Besbes - Lammari

  6. A page from an academic journal publication Web site … Volume N° 19 (3) Competitive Strategy, Economics, and the InternetAuthors : ChircuAlina M. and Kauffman Robert J. Volume N° 19 (2) Enterprise Resource PlanningAuthors : RagowskyArik … Example (1) NLDB'04 - Besbes - Lammari

  7. MV1 CV1 MV2 SV2 SV1 CV2 • N° Volume 19(3) • N° Volume 19(2) • … • Compétitive …Internet • Entreprise …planning • … Physical view Naming SV3 SV4 • Alina M. • Robert J. • Arik • … • Chircu • Kauffman • Ragowsky • … Multi-valued type variable Volume-Authors Composed type variable Volume Simple variable Authors Title Volume Number Simple variable domain Author • N° Volume 19(3) • N° Volume 19(2) • … • Compétitive …Internet • Entreprise …planning • … Last Name First Name • Alina M. • Robert J. • Arik • … • Chircu • Kauffman • Ragowsky • … Example (2) NLDB'04 - Besbes - Lammari

  8. Determines automatically classes of concepts that may share the same labels Extraction Pre-treatment Phase HTML Pages Coded sequence Naming phase Extraction phase Extraction Defining concept classes Unnamed physical views Physical views Naming Phase Concept classes Conceptualization Conceptualization Physical views transformation phase EER Schemas Assigning names to concepts Logical views Integration Logical views transformation phase Finds labels and assigns them to concepts Global EER Schema Unnamed EER Schemas Naming Phase The Naming Process in RetroWeb (1) NLDB'04 - Besbes - Lammari

  9. Definition domain of a simple variable = set of its instances  Definition domain of an entity type = the set of properties describing this entity type.   Based on the comparison of the definition domains of concepts  Definition domain of another type of variable = set of variables constructing this variable  We have to build the IS_A hierarchy of concept classes.  A label found for a concept may be assigned to all the concepts of its class and to all the concepts of the sub classes Defining Concept classes (1) IF D(C1)  D(C2) THEN any label assigned to C1 can also be assigned to C2 NLDB'04 - Besbes - Lammari

  10. D(C1) D(C2) D(C2) D(C1) D(C1) D(C2) C1 ↔C2 C1 → C2 Case a Case b Case c D(C1) D(C2) C1 ↔ C2 Defining Concept classes (2)  The relations between definition domains are expressed through existence constraints The use of thresholds to define the bigness or the smallness of the intersection and the differences and then to assimilate the considered case to the case a, b or c. ? NLDB'04 - Besbes - Lammari

  11. C1 – C2 Big Small C2 – C1 C2 – C1 Big Small Big Small C1 C2 C1 C2 C1 C1 C2 C2 Big C1↔C2 C2 →C1 C1 →C2 C1 ↔C2 Intersection C1 C2 C1 C2 C1 C2 C1 C2 Small C1 ↔C2 C1↔C2 C1 ↔C2 C1↔C2 Defining Concept classes (3) NLDB'04 - Besbes - Lammari

  12. Step1: Determine valid classes a b c C1 C2 C1 C2 C1 C2 C3 D(C1) C1 C2 C4 C1 C2 D(C3) C3 C4 C3 C4 Using conditioned existence constraints Using mutual existence constraints Using exclusive existence constraints Step 3 : Derive the Is_A hierarchy D(C4) Step 2 : Organize them into an inclusion graph C1 C2 C1 ↔ C2 C1 C2 C3 C1 C2 C4 C1 C2 C3 ↔ C4 C4 C3 C3 → C1 C4 → C1 Defining Concept classes : The Algorithm D(C2) NLDB'04 - Besbes - Lammari

  13.  Manual (except for simple-type variables) Based on a set of heuristics (for simple-variables) Examples H1 : An invariant string in all instances of a simple variable is a potential label {Volume N° 19(3), Volume N° 19(2), …} … Volume N° Assigning Labels H2 : IF a value domain contains the symbol « @ » THEN the corresponding single variable is an electronic address NLDB'04 - Besbes - Lammari

  14. This research work applies an algorithm that allows to recover concepts from a flat set of data dispatched through all the pages of the web site.  The naming of recovered concepts is just initiated  • The enrichment of the set of heuristics is in progress.  The use of ontologies to find pertinent labels  The study of the applicability of learning approaches for the naming in our context Conclusion and Perspectives NLDB'04 - Besbes - Lammari

More Related