Managing legislative information in Parliaments: new frontiers

Managing legislative information in Parliaments: new frontiers Prof. Fabio VitaliDepartment of Computer Science University of Bologna

Purpose of this talk • To assert that parliamentary processes and citizens’ access to parliamentary records and documents can be improved by: • Adopting the best technologies for document management (mainly, XML and related standards) • Adopting standard formats for naming and electronic representation of documents, possibly a common, multi-lingual, multi-national standard. • Fostering the creation and adoption of many different software tools to be made available to support these standards. Next: Summary 2/34

Summary • My background • Computer support for parliamentary activities • Functionalities • Advantages • Key discussion points • Data/metadata • Different views of the idea of document • Content, structure and presentation • Metadata and ontologies • Naming mechanism Next: Norme In Rete 3/34

Norme In Rete Norms on the Net • Italian-wide initiative sponsored by the Ministry of Justice (1999 - present) to develop • An XML-based data format for national, regional and local norms • A naming schema to identify all relevant documents, both available and unavailable, both existing and potential • A distributed, federated architecture allowing for multiple storage centers with overlapping competencies, official and not official, unified by a single search engine • National standard, adopted by a large number of institutions both at the national and local level. • Large font of inspiration for LexML (Brazil) Next: Akoma Ntoso 4/34

Akoma Ntoso • Sponsored by the UN Department of Economic and Social Affairs (UNDESA), born in 2004 and now adopted by Kenya, Nigeria, South Africa, Cameroon, etc. • Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies. • Describing structures for legislative documents in XML • Referencing documents within and across countries using URIs • Adding systematic metadata to documents using ontologically sound approaches based on OWL, FRBR, etc. for describing and managing legislative documents and Parliamentary workflow documentation needs in Africa • Easy to implement, easy to understand, easy to use, yet complete, precise and reliable Next: CEN Metalex 5/34

CEN Metalex • CEN-sponsored initiative for an XML-based interchange format for European-wide legislative systems. • Born in 2006. Still ongoing • Output for ongoing European projects • Not an actual format, rather a meta-format allowing for individual formats to recognize each other • Basic ideas: to identify similar structures through roles rather than vocabulary: • an article is an article regardless of how it is called. • Naming, workflow, references are also managed • to support functionality without giving up generality Next: Computer support for parliamentary activities 6/34

Computer support for parliamentary activities • Support for documents’ generation • Drafting activities, record keeping, translation into national languages, etc. • Support for workflow • Management of documents across lifecycle, storage, security, timely involvement of relevant individuals and offices • Support for citizens’ access • Multi-channel publication (on paper and on the web), search, classification, identification • Further activities • Consolidation, version comparison, language synchronization, etc. Next: Standard Applications, Architectures or Formats? 7/34

Standard Applications, Architectures or Formats? • Applications rely on concrete technologies (e.g., programming languages, operating systems, programming libraries, etc.) and provide actual support for users' processes and experience. • Architectures describe processes and actors and roles, and describe the characteristics of the tools that support them. • Data formats describe the kind of information that is exchanged by tools and that is kept over time. • Standardizing applications forces common architectures and data formats, but also forces uniformity in users' processes and experience, and is the most fragile to technological advances. • Standardizing architectures is less fragile, but forces uniformity in processes and experience • Standardizing formats first provides solutions that are not dependent on technological advances, and fosters the further generation of architectural and applicative standards as a result, rather than as a prerequisite. Next: HTML, PDF 8/34

HTML, PDF • Just a publishing medium, HTML helped make the Web a big success, but it was constraining by its own simplicity • Excessive reliance on typographic rather than semantic description • Few rules not even strongly imposed • PDF is a commercial, opaque data format aimed at guaranteeing visual aspect of documents • Appropriate when the important characteristic to be maintained is the visual aspect • No support for structure, homogeneity, semantic awareness • A different format is appropriate that provides • Clear differentiation between visual aspect and actual meaning • Strong syntactic rules heavily imposed to guarantee uniformity, homogeneity, sophisticated applications Next: XML 9/34

XML • XML (Extensible Markup Language) is a W3C standard of incredibly widespread diffusion. • XML is pure syntax, without pre-defined semantics. This allows document designers to provide their own semantics. • Thanks to the associated languages (DTD, XSLT, RDF) we can create sophisticated applications with big flexibility in uses. • XML allows to create markup languages that are readable, generic, structured, hierarchical. Next: Parliamentary documents and XML 10/34

Parliamentary documents and XML • XML is ideal for representing parliamentary documents (and especially bills and acts): • They have a well-defined structure, which is systematic and standardized • There are required and optional parts according to rules and tradition • There are containment constraints that determine the global correctness of the document • There are references to other texts (schedules, other acts, etc.) that can fruitfully be used to create a hypertext network. Next: Why is XML good? 11/34

Why is XML good? Energy / Information XML DTP appls. WP + styles Word Processors ASCII Bitmap Paper Conversion is very easy Conversion is very easy Conversion is difficult Next: What to look for 12/34

What to look for • Simple, standard-based data formats • to facilitate usage and understanding. • relying on all the relevant W3C and ISO standards. • Long term feasibility and evolution (backward and forward) • To support documents being drafted now as well as those already drafted and enacted a long time ago. • to support useful lifespan of the system and the documents in the tens and possibly hundreds of years. • Self explaining formats • Documents need to be able to provide all information for their use and meaning through a simple examination, even without the aid of specialized software. • Tools need to be created with ease to provide automatic and semi-automatic aid to data markup and document description. • Manual markup or fine tuning still a possible option for exceptions. Next: Approaches 13/34

Approaches • Extensibility • It must be possible to allow local customizations of the data model • It must be possible to extend the reach of the language towards more countries, more document types, larger vocabularies of fragment qualification • Format-induced homogeneity • Documents produced by different tools and individuals need to be, as much as possible, identical • Documents produced by hand and by tools need to be, as much as possible, identical • Multiple uses • Display on PC Screen, display on cell phone, display on Braille terminal, print on paper, print on paper with a different paper size, cataloguing, searching, workflow management (during drafting and active lifecycle), automatic consolidation, textual analysis, semantic analysis, provision analysis, cross-country comparison, synchronized translation, etc. Next: Understanding the data/metadata dichotomy (1) 14/34

Understanding the data/metadata dichotomy (1) • Data • the actual content (text, structure, images, schemas) as was exactly provided by the author of the document • Metadata • Any consideration or comment or additional information that can be expressed on the content and on the document. • Metadata is generated either by human intervention, or through automated processes. • Ontology (in short) • A formalized representation of the conceptual model that shapes all metadata associated to a document. Next: Understanding the data/metadata dichotomy (2) 15/34

Understanding the data/metadata dichotomy (2) • Authors’ contribution: data • The words and punctuation and breaks, exactly as have been written and accepted by the original author (in the case of legislation, the legislative body) • Editors’ contribution: metadata • Publication data. Lifecycle information. Footnotes. Analysis of provisions. • Metadata is useless unless it is provided following a precise conceptual model, called ontology. • In a way, editors are the authors of the metadata • Put it in another way, metadata is information about a document that was not provided by its authors. Next: Different views on the idea of document (1) 16/34

Different concepts Italian Act 137/2004 The current consolidated version of the Italian Act 137/2004 An XML representation of the current consolidated version of the Italian Act 137/2004 The file stored as “act137-2004.xml” stored in a specific folder of my computer Different properties What is the name of the document? Who is the author of the document? What is the creation date of the document? The IFLA FRBR hierarchy… Work: a distinct intellectual creation. Expression: the specific form in which a work is realized Manifestation: the representation of an expression according to the requirements of a medium Item: a single exemplar (an instance) of a manifestation … provides different answers E.g.: a different name for each level E.g.: the legislator, the editor, the publisher, the data provider E.g.: the enactment date, the consolidation date, the generation date, the copy date Different views on the idea of document (1) Next: Different views on the idea of document (2) 17/34

Different views on the idea of document (2) • Different processes. E.g.: • A repeal is really a process on the work • An amendment is a process on an expression generating a new one • The markup is a process on an expression generating a manifestation • The copy is a process on an item generating another item. • Different peculiarities • A work has no content. The content of an expression is a set of words and drawings. The content of a manifestation is computer data. • Works are eternal and created by Authors. Expressions are stable and created either by Authors or by Editors with domain expertise (consider amendment acts that do not specify the resulting consolidated text). Manifestations are created by computer tools used by secretaries or low level operatives. Next: Content, structure and presentation (1) 18/34

Content, structure and presentation (1) • Content • What exactly was written in the document. • Structure • How the content is organized • Presentation • The typographical choices to present a document on screen or on paper. Next: Content, structure and presentation (2) 19/34

Content, structure and presentation (2) • The structure adds meaning to pieces of content. • The words “Initial definitions” assumes meaning once we know it is the title of section #1 of the Italian Act 137/2004 • The structure connects the presentation to the content • Once we know that the text “Initial definitions” is the heading of a section, we can apply the typographical choices associated to section headings. • The structure can be used to test and validate the correctness of a document • We can deduce that a document is incorrect if there is no heading associated to a section. Next: Descriptive vs. prescriptive approach 20/34

Descriptive vs. prescriptive approach • Descriptive schemas: a very loose set of constraints providing a full vocabulary of elements and little or no check on their presence and order. They are meant to: • Describe a set of documents with allowable many exceptions to the basic rule. • Describe an existing (and thus non-modifiable) set of documents • Describe a set of documents created by a higher authority than the XML coder. • Prescriptive schemas: a more restricted set of constraints providing the same full vocabulary plus tight checks on presence and order. They are meant to: • Impose adherence to drafting guidelines, and reject uncompliant documents • Impose homogeneity on the work of multiple different authors • Allow applications to expect certain characteristic of the documents to be present • Akoma Ntoso, for instance, provides a two-tiered level of documents allowing the full potentiality of both to be expressed Next: Metadata (and ontologies) (1) 21/34

Metadata (and ontologies) (1) • Documents’ content does not include all that is interesting about them. A metadata schema is necessary to associate to documents all data that is not in the content of a document • Some metadata schema are flat, i.e., metadata are simply text values referring to the document; e.g.: Dublin Core, Marc 21, etc. • This prevents tools to • differentiate between the different ideas of document, • identify more precisely classes of concepts associated to documents, such as actors (persons and organizations), events, provisions, places, terms, etc. • An ontology expressed using Semantic Web concepts and languages (e.g., OWL and/or Topic Maps) offers all advantages of metadata schemas, plus allows to: • associate appropriate properties to different ideas of documents (e.g., author, creation date, title, etc.) • Make assertions about abstract concepts rather than plain strings Next: Matadata issues 22/34

Matadata issues • Authorship of metadata • The generation of metadata is itself an authoring process and needs to be controlled, dated, signed, clearly identified. • Versioning of metadata • Metadata may change in time, and actually more often than the document content. How to deal with changing of it? • Relationships between metadata and IFLA FRBR document levels • All metadata refer to one and not the other idea of documents. We need to make sure that these associations are not ambiguous and agreed upon. • Location of metadata: internals or externals? • Internal location guarantees co-maintainance of content and metadata, but makes it difficult to allow for multiple views of the same content • External location allows multiple metadata sets to coexist on the same document, but complicates correct association of data and metadata Next: Metadata terminology 23/34

Metadata terminology • Objective • A piece of information for which no reasonable doubt can exist • E.g. the title of article 15, the publication date • Subjective • A piece of information that requires an active interpretation from a human that may be wrong, or for which different opinions exist • E.g., resolution of implicit citations, classification of provisions • Low competence • the kind of competence one may expect from a non-specialized employee, such as a secretary, armed with just common sense and some topical experience • E.g.: where does article 1 end and article 2 start • High competence • A piece of information whose determination requires the kind of competence one may expect from specialized jurists that come to their results after careful and painful reasoning • e.g.: dates and times in norms. Next: Workflow management 24/34

Workflow management • An important bit of metadata sophistication is the support for workflow • Explicit management of document evolution • Identification of sources of authority (e.g., legislative bodies), sources of changes (e.g., amending acts), time of changes (time of acts is an extremely complex discipline) • Reliable identification of actors and content (through digital signature) Next: Consolidation and side-by-side comparison 25/34

Consolidation and side-by-side comparison • Only possible when structure, content and presentation of documents are explicitly separated • Traditional approaches are labour-intensive, manual, requiring both legislative and typographic competences • Explicit recording of structure and independences from presentation allows: • Consolidation as a semi-automatic process based on explicit structural references in amendments and modification laws • Side-by-side comparison as a fully-automatic process based on a different presentation patterns of the differences between an original and a modified text. Next: Naming documents and fragments 26/34

Naming documents and fragments • Universal Resource Identifiers • These are used throughout the World Wide Web to indicate resources. • The best known are the URL (Universal Resource Locators) that are used to navigate on the web • http://www.akomantoso.org/09-examples.html Next: Naming documents and fragments (2) 27/34

Naming documents and fragments (2) • With legislative documents, the situation is more complex. • Works, expressions and manifestations are not physical resources, but abstract entities. Only items are physical resources. • Yet, references are rarely (or never) to items. So works, expressions and manifestations must have their own URI, • This URI will not be a URL (i.e., it will not correspond to a physical address on a computer) • The act of finding out what is the URL of the item that best represents the manifestation that we are looking for is called URI resolution. Next: Naming documents and fragments (3) 28/34

Naming documents and fragments (3) • Naming schema must guarantee a few properties: • Complete: all relevant documents (in all their levels) must be contemplated • Global: all legislative bodies (ideally even across countries) must be able to use and clearly identify their documents. • Meaningful: names need to mean something. • Make assumption about the kind, freshness and relevance of a citation by looking only at the reference’s name • Memorizable: names need to be easy to jot down, easy to remember, easy to correct if something was written down wrongly. • Guessable: given a reference to act 136/2005, it should be easy to deduce what is the form for act 76/2006, etc. Next: The basic features of a good national standard 29/34

The basic features of a good national standard • Compatibility with CEN Metalex • Systematically use W3C standards (esp. XML, XML Schema, Namespace, semantic web languuages, etc.) • Separate: • Structure • Normative content • Presentation • Metadata • Strong naming policies (a future extension of CEN Metalex will provide guidelines) • Allow for exceptions, extensions and customization Next: Why bother? 30/34

Why bother? • An open standard for data format allows for easier, more cost-effective distribution of legislative content • An open standard for data format allows for long-term preservation of investments and supports ease of maintenance • An open standard for data format allows for a thriving competing market of tools • An open standard for data format allows integration of authoritative content providers and added-value content providers (esp. Private publishers and academics) • An open standard for data format allows comparative studies to be performed with greater ease Next: Inventing, adopting, or… ? 31/34

Inventing, adopting, or… ? • As long as fundamental compatibility is maintained • In terms of basic structures (CEN Metalex) • Naming policies (URI-based) • It is not relevant that you adopt existing standards… • E.g. Akoma Ntoso • … or invent your own national new one • But do behave fairly, and allow for international interoperability. Next: Conclusions (1) 32/34

Conclusions (1) • A successful system is built on three key factors: • Precise and sophisticated content structure • Complete metadata model (with precise time-awareness) • Sophisticated and easy to use naming mechanism • NormeInRete, Akoma Ntoso and (increasingly) CEN Metalex share these properties. • Also it is important to remember that we are discovering new interesting ways to store and use information in this very moment. • So casting in stone design decisions that prevent future evolution of document formats, tools architecture and overall functionalities is wrong and doomed. Next: Conclusions (2) 33/34

Conclusions (2) • Adopting an international standard (e.g. Akoma Ntoso) is a first step in the right direction • Open to local customization, yet international • Allows immediate adoption of existing architectures and tools, yet allows for local developments and extensions • Sharing knowledge and experiences with colleagues from other countries increases the chance of success of local initiatives • Chances for training and capacity building exist • Cfr: Summer school on Legislative Informatics in Florence (September 2007, June 2008)… • … but also local initiatives specific to regional and national needs (e.g. African legislative school, Kenya, January 2008) Fine presentazione 34/34

Managing legislative information in Parliaments: new frontiers