230 likes | 375 Views
Data modeling and metadata. From graphs to graphs. Metadata. Full metadata: relational schemas Self defining data: XML, key/value, key/document No metadata: untagged images, video, audio Parallel metadata: tagged images, video, audio. Full schema metadata. Origins: Semantic networks in AI
E N D
Data modeling and metadata From graphs to graphs
Metadata • Full metadata: relational schemas • Self defining data: XML, key/value, key/document • No metadata: untagged images, video, audio • Parallel metadata: tagged images, video, audio
Full schema metadata • Origins: • Semantic networks in AI • Metadata mixed in with data • Objects (nodes in graph), has-a (arcs in graph), is-a (arcs in graph), types (nodes), subtypes (nodes) • Essentially a network with metadata and all instances of the metadata • Goal was to model knowledge of real world, not to manage volumes of data
Early databases • Slow to adopt data structuring abstractions because speed of access was the focus • Hierarchical and network databases • Links between records of one file to records of another • E.g., each claim record is linked to a subscriber record • Also, sets of records and sets of links
Relational databases • First true abstraction of metadata separated from data • Minimal structure in order to accommodate fast retrieval of tuples • Abstractions • Relation • Attribute • Tuple • PKs, CKs, FKs, null/not null
Concurrent with relational database development: “semantic” databases • Like semantic networks (quite deliberately), only metadata separated from data • Not object-oriented • No object IDs • No classes instantiated from types • A wide variety of competing models, with “the” Semantic Model being one of them
Semantic databases, continued • Other modeling notions • Components or aggregates that are necessary parts of an object and cannot be changed, like the day you were born or the VIN of a car • Versus Properties or attributes that can be changed, like your name or the transmission in a car • Cause and effect relationships • Such as a sales visit leading to a sale • And many other specialized relationships • Interestingly, no query facilities and no commercial systems that were successful
Persistent programming languages • Not necessarily object-oriented • Host language is the only language • Data can be persistent or not, often selectively • Strong notion of metadata as programming data types
Object-oriented databases • Strong notion of object ID and object identity • Types/subtypes and classes • Strong sense of metadata separate from data • Behavioral encapsulation
Object-relational databases • Objects in the small • User defined data types for attribute domains • No behavioral encapsulation
One-of-a-kind semantically rich databases • Engineering/CAD data • Complex objects • Lots of singleton types, but with strict notion of metadata • Complex constraints • Far reaching component and constraint relationships
One-of-a-kind scientific/medical/financial databases • Managing type-based, voluminous data with little internal structure (imaging) • Managing textual data with some structure and lots of domain-based terminology • Often there are real-time demands made on distributed databases – very difficult problem • By putting timing constraints on specific parts of the data processing code
Self-defining data • Inspired by need to stream data live and process it in one pass • Also inspired by the need to vary the structure of individual pieces of data, like documents and other items that don’t really have a shared type construct • XML developed as a shared language model for semi-structured (or self-defining) data • Developed in part to assist the construction of the semantic web • Data is streamed on the Internet or from sensors
Self-defining data, continued • NoSQL databases that store extremely high volumes of loosely structured data • Documents with internal structure • Values with no meaning within the database • Usually no formal query language, as data is interpreted programmatically (either partially or fully); sometimes there is a library of common query templates
No metadata databases • Early blob and continuous data • Images • Video • Audio • Flash • All processing of data taking place in complex programs that do not retrieve metadata or insert metadata in the data • E.g., image processing, facial searching, language searching
Recent blob/continuous data • Development of parallel metadata databases that contain low level and semantically rich tagging • Only the metadata database is actively searched • Searching can be enhanced by downloading small samples • Feedback loops to improve tag interpretation • Tags taken from shared namespaces
Assertion based databases • Usually use triples (assertions) • Triples are chained together to make new inferences • Metadata is treated like data • Joe owns a Ford • Fords are cars • SQL-like, triple-hopping query languages
Graph databases • Networks of objects that blur the boundary between data and metadata • Supports levels of connectivity orders of magnitude bigger than in network and hierarchical databases of old • Has a purpose that is reminiscent of network/hierarchical databases – to represent the fluid and highly interconnected nature of complex data, such as that collected from social media • Use graph-like query and programming interfaces
Graphics/animation/gaming data • Shares a lot of properties with scientific and engineering data • Innately mathematical • Straight and curved line 2D geometry used in 3-space • Bezier and NURBS for curves • Matrix mathematics for 3D manipulation • Transpose, Scale, Rotate • Mapping to pixel based data for presentation
Graphics/animation/gaming, continued • For real-time rendering, low polygon objects and bounding box collision mathematics used • Creates the most aggressive demands on processing and graphics card technology • Often no notion at all of metadata at all • Even non-real-time animation demands low quality interactive rendering
Procedural data • Used heavily in photo/video processing • Focusing, removing objects, adding color effects, changing lighting, etc. • There are standalone apps and plugin products • Used heavily in animation • Procedural textures and materials that don’t need to tiled • Environment procedures (often sun and sky) • Cloning to make crowds • Lighting and camera objects
Metadata for procedural data • Big problem • Difficult to crisply define the “meaning” of procedural data • Often, the reason procedural data exists is that the task is too complex • This sort of data is often inherently non-declarative • The marketplace is filled with competing, varying products, each with its own interface, and they are too powerful to scrap
Procedural data, continued • Mathematical packages used for minding • Almost ironically, these are somewhat easier to package declaratively, since the mathematics can be so complex that its foundation is used in a black box fashion