Course project ideas
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Course Project Ideas PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Course Project Ideas. Yanlei Diao University of Massachusetts Amherst. New Directions for DB Research. Sensor data : new architecture XML : new data model Streams : new execution model Data quality and lineage : new services …. Querying in Sensor Networks.

Download Presentation

Course Project Ideas

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Course project ideas

Course Project Ideas

Yanlei Diao

University of Massachusetts Amherst


New directions for db research

New Directions for DB Research

  • Sensor data: new architecture

  • XML: new data model

  • Streams: new execution model

  • Data quality and lineage: new services

Yanlei Diao, University of Massachusetts Amherst


Querying in sensor networks

Querying in Sensor Networks

  • Store data locally at sensors and push queries into the sensor network

    • Flash memory energy-efficiency.

    • Limited capabilities of sensor platforms.

Internet

Gateway

Push query to sensors

Flash Memory

Acoustic stream

Image stream

Yanlei Diao, University of Massachusetts Amherst


Optimize for flash and limited ram

Memory

~4-10 KB

2. Modify in-memory

1. Load block

Into Memory

3. Save block back

Erase

block

~16-64 KB

Optimize for Flash and Limited RAM

  • Flash Memory Constraints

    • Data cannot be over-written, only erased

    • Pages can often only be erased in blocks (16-64KB)

    • Unlike magnetic disks, cannot modify in-place

  • Challenges:

    • Energy: Organize data on flash to minimize read/write/erase operations

    • Memory: Minimize use of memory for flash database.

Yanlei Diao, University of Massachusetts Amherst


Stonesdb system operation

Proxy Cache of Image Summaries

StonesDB: System Operation

Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A.

  • Identify “best” sensors to forward query.

  • Provide hints to reduce search complexity at sensor.

Yanlei Diao, University of Massachusetts Amherst


Stonesdb system operation1

StonesDB: System Operation

Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A.

Query Engine

Partitioned Access Methods

Yanlei Diao, University of Massachusetts Amherst


Research issues in stonesdb

Research Issues in StonesDB

  • Local Database Layer

    • Reduce updates for indexing and aging.

  • New cost models for self-tuning sensor databases.

    • Energy-optimized query processing.

    • Query processing over aged data.

  • Distributed Database Layer

    • What summaries are relevant to queries?

    • What remainder queries to send to sensors?

    • What resolution of summaries to cache?

Yanlei Diao, University of Massachusetts Amherst


Xml extensible markup language

XML (Extensible Markup Language)

<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

XML: a tagging mechanism to describe content.

Yanlei Diao, University of Massachusetts Amherst


Xml data model graph

XML Data Model (Graph)

Main structure: ordered, labeled tree

References between node: becoming a graph

Yanlei Diao, University of Massachusetts Amherst


Xquery xml query language

XQuery: XML Query Language

  • A declarative language for querying XML data

  • XPath: path expressions

    • Patterns to be matched against an XML graph

    • /bib/paper[author/lastname=‘Croft’]/title

  • FLOWR expressions

    • Combining matching and restructuring of XML data

    • For$pindistinct(document("bib.xml")//publisher)

      Let$b := document("bib.xml")/book[publisher = $p]

      Wherecount($b) > 100

      Order by $p/name

      Return$p

Yanlei Diao, University of Massachusetts Amherst


Metadata management using xml

Metadata Management using XML

  • File systems for large-scale scientific simulations

    • File systems: petabytes or even more

    • Directory tree (metadata): large, can’t fit in memory

    • Links between files: steps in a simulation, data derivation

  • File Searches

    • all the files generated on Oct 1, 2005

    • all the files whose name is like ‘*simu*.txt’

    • all the files that were generated from the file ‘basic-measures.txt’

  • Build an XML store to manage directory trees!

    • XML data model

    • XML Query language

    • XML Indices

Yanlei Diao, University of Massachusetts Amherst


Xml document processing

XML Document Processing

  • Multi-hierarchical XML markup of text documents

    • Multi-hierarchies: part-of-speech, page-line

    • Features in different hierarchies overlap in scope

    • Need a query language & querying mechanism

    • References [Nakov et al., 2005; Iacob & Dekhtyar, 2005]

  • Querying and ranking of XML data

    • XML fragments returned as results

    • Fuzzy matches

    • Ranking of matches

    • References [Amer-Yahia et al., 2005; Luo et al., 2003]

  • Well-defined problems  identify your contributions!

Yanlei Diao, University of Massachusetts Amherst


Data stream management

Attr1 Attr2 Attr3

Data Stream Management

Traditional Database

Data Stream Processor

Results

Results

Data

Query

Queries, Rules

Event Specs,

Subscriptions

  • Data at rest

  • One-shot or periodic queries

  • Query-driven execution

  • Data in motion, unending

  • Continuous, long-running queries

  • Data-driven execution

Yanlei Diao, University of Massachusetts Amherst


In network xml processing

In-Network XML Processing

  • XML is becoming the wire format for data

  • In-network XML processing

    • Authentication

    • Authorization

    • Routing

    • Transformation

    • Pattern matching

  • XPath widely used for in-network XML processing

    • Applied directly to streaming XML data

    • Line-speed performance

Expedite traffic

Enhance security

Real-time monitoring

& diagnosis

Yanlei Diao, University of Massachusetts Amherst


Research issues

Research Issues

  • Gigabit rate XPath processing

    • Take one look, process XPath, buffer data for future use if necessary

    • Processing needs to be gigabit rate

    • Memory usage needs to be minimized

  • Time/space complexity of XPath stream processing

    • Theoretical analysis for common features of XPath

  • Minimizing memory usage of YFilter technolgy

    • YFilter: state-of-the-art for multi-XPath processing

Yanlei Diao, University of Massachusetts Amherst


Rfid technology

RFID Technology

  • RFID technology

reader_id,

tag_id,

timestamp

01.01298.6EF.0A

04.0768E.001.F0

01.01267.60D.01

Yanlei Diao, University of Massachusetts Amherst


Rfid stream processing

RFID Stream Processing

RFID reader

RFID tag

<pml > <tag>01.01298.6EF.0A</tag>

<time>00129038</time>

<location>shelf 2</location></pml>

<pml> <tag>01.01298.6EF.0A</tag>

<time>02183947</time>

<location>exit1</location></pml>

Shoplifting: an item was taken out of store without being checked out.

+

Out of stocks: the number of items of product X on shelf ≤ 3.

Yanlei Diao, University of Massachusetts Amherst


Rfid processing global tracking

RFID Processing: Global Tracking

Counterfeit drugs: a bottle is accepted at the retailer if it came from a legal manufacturer and followed all necessary steps in the distribution network.

<pml> <epc>01.001298.6EF.0A</epc>

<ts type=“begin”> <date>…</date> </ts> <entity type=“maker”> <name type=“legal”>X Ltd. </name> </entity>

<pml> <epc>01.001298.6EF.0A</epc>

<ts type=“end”>

<date>…</date></ts> <entity type=“retailer”> <name type=“legal”>CVS

</name> </entity> …

Expired/spoiled drugs: a bottle is accepted at the retailer if it went through the distribution network in less than 3 months and was never exposed to temperature > 96 F.

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=2>80</msr>

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=5>95</msr>

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=2>85</msr>

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=2>90</msr>

+

Missing pallet, expected case, illegally cloned tags…

Yanlei Diao, University of Massachusetts Amherst


Challenges in rfid management

Challenges in RFID Management

  • Data-Information Mismatch

    • RFID raw data: (tag id, reader id, timestamp)

    • Meaningful information: shoplifting, misplaced inventory, out-of-stocks; expired drugs, spoiled drugs…

  • Incomplete, inaccurate data

    • Readers miss tags

    • Readers can pick up tags from overlapping areas

  • High-volume data

    • Readers read constantly, from all tags in range, without line-of-sight

    • Can create up to millions of terabytes of data in a single day

  • Low-latency processing

    • Up-to-the-second information, time-critical actions

Yanlei Diao, University of Massachusetts Amherst


Research issues1

Research Issues

  • Real-time event stream processing

    • Handling duplicate readings/results

    • Data cleaning

    • Data compression

  • Handling incomplete readings

    • Inferences in event databases

    • Inferences over event streams

  • Distributed processing

    • Real time anomaly detection

    • Distributed inferences

Yanlei Diao, University of Massachusetts Amherst


Adaptive sensing of atmosphere

Sense

Sense

Send

Send

Merge

Detection

Prediction

Adaptive Sensing of Atmosphere

  • Environmental monitoring: real-time processing of huge-volume meteorological data

  • Challenges

    • Large volume but limited bandwidth

    • Real-time processing

    • Uncertain data

    • Data archiving and querying the history

Yanlei Diao, University of Massachusetts Amherst


Managing uncertain data

(1)

(1)

(2)

(2)

(3)

(3)

Merge

(4)

Tornado Detection

Prediction

(confidence?)

Managing Uncertain Data

  • Sources of data uncertainty

    • Sensing noise and partial scanning

    • Data compression

    • Lossy wireless links

    • Incomplete merging

  • Managing uncertain data

    • Model sources of data uncertainty

    • Develop uncertainty calculus to combine the effects of these sources

    • Augment results with confidence values

Yanlei Diao, University of Massachusetts Amherst


Managing uncertain data1

(1)

(1)

(2)

(2)

(3)

(3)

Merge

(4)

Tornado Detection

Prediction

(confidence?)

Managing Uncertain Data

  • Sources of data uncertainty

    • Sensing noise and partial scanning

    • Data compression

    • Lossy wireless links

    • Incomplete merging

  • Self diagnosis and tuning

    • Compare predication at t with observation at t+1 (no ground truth?!)

    • System diagnosis when confidence value is low

    • Automatically tune the system

Yanlei Diao, University of Massachusetts Amherst


Questions

Questions

Yanlei Diao, University of Massachusetts Amherst


Outline

Outline

  • An outside look: DB Application

  • An inside look: Anatomy of DBMS

  • Project ideas: DB Application

  • Project ideas: DBMS Internals

Yanlei Diao, University of Massachusetts Amherst


Application umass cs pub db

Application: UMass CS Pub DB

  • UMass Computer Science Publication Database

    • All papers on professors’ web pages and in their DBLP records

    • All technical reports

  • Search:

    • Catalog search (author, title, year, conference, etc.)

    • Text search (using SQL “LIKE”)

  • Navigation

    • Overview of the structure of document collection

    • Area-based “drill down” and “roll up” with statistics

  • Add document

  • Top hits

  • Example: http://dbpubs.stanford.edu:8090/aux/index-en.html

  • Deliverables: useful software, user-friendly interface

Yanlei Diao, University of Massachusetts Amherst


Application rfid database

Pallet

Truck

Case

Manufacturer

Supplier DC

Retail DC

Retail Store

Application: RFID Database

  • RFID technology

  • RFID supply chain

    • Locations

    • Objects

Yanlei Diao, University of Massachusetts Amherst


Application rfid database1

Application: RFID Database

  • RFID technology

  • RFID Supply chain

  • Database propagation

    • Streams of (reader_id, tag_id, time)

    • Semantics: reader_id  location, tag_id  object

    • Containment

      • Location-based, items in a case, cases on a pallet, pallets in a truck…

      • Duration of containment

    • History of movement: (object, location, time_in, time_out)

    • Data compression for duplicate readings

    • Integration with sensors: temperature, location…

  • Track and trace queries

Yanlei Diao, University of Massachusetts Amherst


Data quality

(1)

(1)

(2)

(2)

(3)

(3)

Merge

(4)

Data Quality

  • Closed world assumption: not any more!

  • Various sources of data loss

    • Sensing noise

    • Data compression

    • Lossy wireless links

    • Incomplete merging

  • Probabilistic query processing

    • Model sources of data loss

    • Quantify the effect on queries max(), avg(), percentile…

    • Output query results with confidence level

Yanlei Diao, University of Massachusetts Amherst


Course project ideas

  • Some idea on INFOD/data dissemination

Yanlei Diao, University of Massachusetts Amherst


  • Login