Course project ideas
Sponsored Links
This presentation is the property of its rightful owner.
1 / 30

Course Project Ideas PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Course Project Ideas. Yanlei Diao University of Massachusetts Amherst. New Directions for DB Research. Sensor data : new architecture XML : new data model Streams : new execution model Data quality and lineage : new services …. Querying in Sensor Networks.

Download Presentation

Course Project Ideas

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Course Project Ideas

Yanlei Diao

University of Massachusetts Amherst


New Directions for DB Research

  • Sensor data: new architecture

  • XML: new data model

  • Streams: new execution model

  • Data quality and lineage: new services

Yanlei Diao, University of Massachusetts Amherst


Querying in Sensor Networks

  • Store data locally at sensors and push queries into the sensor network

    • Flash memory energy-efficiency.

    • Limited capabilities of sensor platforms.

Internet

Gateway

Push query to sensors

Flash Memory

Acoustic stream

Image stream

Yanlei Diao, University of Massachusetts Amherst


Memory

~4-10 KB

2. Modify in-memory

1. Load block

Into Memory

3. Save block back

Erase

block

~16-64 KB

Optimize for Flash and Limited RAM

  • Flash Memory Constraints

    • Data cannot be over-written, only erased

    • Pages can often only be erased in blocks (16-64KB)

    • Unlike magnetic disks, cannot modify in-place

  • Challenges:

    • Energy: Organize data on flash to minimize read/write/erase operations

    • Memory: Minimize use of memory for flash database.

Yanlei Diao, University of Massachusetts Amherst


Proxy Cache of Image Summaries

StonesDB: System Operation

Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A.

  • Identify “best” sensors to forward query.

  • Provide hints to reduce search complexity at sensor.

Yanlei Diao, University of Massachusetts Amherst


StonesDB: System Operation

Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A.

Query Engine

Partitioned Access Methods

Yanlei Diao, University of Massachusetts Amherst


Research Issues in StonesDB

  • Local Database Layer

    • Reduce updates for indexing and aging.

  • New cost models for self-tuning sensor databases.

    • Energy-optimized query processing.

    • Query processing over aged data.

  • Distributed Database Layer

    • What summaries are relevant to queries?

    • What remainder queries to send to sensors?

    • What resolution of summaries to cache?

Yanlei Diao, University of Massachusetts Amherst


XML (Extensible Markup Language)

<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

XML: a tagging mechanism to describe content.

Yanlei Diao, University of Massachusetts Amherst


XML Data Model (Graph)

Main structure: ordered, labeled tree

References between node: becoming a graph

Yanlei Diao, University of Massachusetts Amherst


XQuery: XML Query Language

  • A declarative language for querying XML data

  • XPath: path expressions

    • Patterns to be matched against an XML graph

    • /bib/paper[author/lastname=‘Croft’]/title

  • FLOWR expressions

    • Combining matching and restructuring of XML data

    • For$pindistinct(document("bib.xml")//publisher)

      Let$b := document("bib.xml")/book[publisher = $p]

      Wherecount($b) > 100

      Order by $p/name

      Return$p

Yanlei Diao, University of Massachusetts Amherst


Metadata Management using XML

  • File systems for large-scale scientific simulations

    • File systems: petabytes or even more

    • Directory tree (metadata): large, can’t fit in memory

    • Links between files: steps in a simulation, data derivation

  • File Searches

    • all the files generated on Oct 1, 2005

    • all the files whose name is like ‘*simu*.txt’

    • all the files that were generated from the file ‘basic-measures.txt’

  • Build an XML store to manage directory trees!

    • XML data model

    • XML Query language

    • XML Indices

Yanlei Diao, University of Massachusetts Amherst


XML Document Processing

  • Multi-hierarchical XML markup of text documents

    • Multi-hierarchies: part-of-speech, page-line

    • Features in different hierarchies overlap in scope

    • Need a query language & querying mechanism

    • References [Nakov et al., 2005; Iacob & Dekhtyar, 2005]

  • Querying and ranking of XML data

    • XML fragments returned as results

    • Fuzzy matches

    • Ranking of matches

    • References [Amer-Yahia et al., 2005; Luo et al., 2003]

  • Well-defined problems  identify your contributions!

Yanlei Diao, University of Massachusetts Amherst


Attr1 Attr2 Attr3

Data Stream Management

Traditional Database

Data Stream Processor

Results

Results

Data

Query

Queries, Rules

Event Specs,

Subscriptions

  • Data at rest

  • One-shot or periodic queries

  • Query-driven execution

  • Data in motion, unending

  • Continuous, long-running queries

  • Data-driven execution

Yanlei Diao, University of Massachusetts Amherst


In-Network XML Processing

  • XML is becoming the wire format for data

  • In-network XML processing

    • Authentication

    • Authorization

    • Routing

    • Transformation

    • Pattern matching

  • XPath widely used for in-network XML processing

    • Applied directly to streaming XML data

    • Line-speed performance

Expedite traffic

Enhance security

Real-time monitoring

& diagnosis

Yanlei Diao, University of Massachusetts Amherst


Research Issues

  • Gigabit rate XPath processing

    • Take one look, process XPath, buffer data for future use if necessary

    • Processing needs to be gigabit rate

    • Memory usage needs to be minimized

  • Time/space complexity of XPath stream processing

    • Theoretical analysis for common features of XPath

  • Minimizing memory usage of YFilter technolgy

    • YFilter: state-of-the-art for multi-XPath processing

Yanlei Diao, University of Massachusetts Amherst


RFID Technology

  • RFID technology

reader_id,

tag_id,

timestamp

01.01298.6EF.0A

04.0768E.001.F0

01.01267.60D.01

Yanlei Diao, University of Massachusetts Amherst


RFID Stream Processing

RFID reader

RFID tag

<pml > <tag>01.01298.6EF.0A</tag>

<time>00129038</time>

<location>shelf 2</location></pml>

<pml> <tag>01.01298.6EF.0A</tag>

<time>02183947</time>

<location>exit1</location></pml>

Shoplifting: an item was taken out of store without being checked out.

+

Out of stocks: the number of items of product X on shelf ≤ 3.

Yanlei Diao, University of Massachusetts Amherst


RFID Processing: Global Tracking

Counterfeit drugs: a bottle is accepted at the retailer if it came from a legal manufacturer and followed all necessary steps in the distribution network.

<pml> <epc>01.001298.6EF.0A</epc>

<ts type=“begin”> <date>…</date> </ts> <entity type=“maker”> <name type=“legal”>X Ltd. </name> </entity>

<pml> <epc>01.001298.6EF.0A</epc>

<ts type=“end”>

<date>…</date></ts> <entity type=“retailer”> <name type=“legal”>CVS

</name> </entity> …

Expired/spoiled drugs: a bottle is accepted at the retailer if it went through the distribution network in less than 3 months and was never exposed to temperature > 96 F.

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=2>80</msr>

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=5>95</msr>

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=2>85</msr>

<pml> <epc>01.001298.6EF.0A</epc>

<ts><date>…</date></ts>

<location>…</location> <msr label=“temperature”

max=2>90</msr>

+

Missing pallet, expected case, illegally cloned tags…

Yanlei Diao, University of Massachusetts Amherst


Challenges in RFID Management

  • Data-Information Mismatch

    • RFID raw data: (tag id, reader id, timestamp)

    • Meaningful information: shoplifting, misplaced inventory, out-of-stocks; expired drugs, spoiled drugs…

  • Incomplete, inaccurate data

    • Readers miss tags

    • Readers can pick up tags from overlapping areas

  • High-volume data

    • Readers read constantly, from all tags in range, without line-of-sight

    • Can create up to millions of terabytes of data in a single day

  • Low-latency processing

    • Up-to-the-second information, time-critical actions

Yanlei Diao, University of Massachusetts Amherst


Research Issues

  • Real-time event stream processing

    • Handling duplicate readings/results

    • Data cleaning

    • Data compression

  • Handling incomplete readings

    • Inferences in event databases

    • Inferences over event streams

  • Distributed processing

    • Real time anomaly detection

    • Distributed inferences

Yanlei Diao, University of Massachusetts Amherst


Sense

Sense

Send

Send

Merge

Detection

Prediction

Adaptive Sensing of Atmosphere

  • Environmental monitoring: real-time processing of huge-volume meteorological data

  • Challenges

    • Large volume but limited bandwidth

    • Real-time processing

    • Uncertain data

    • Data archiving and querying the history

Yanlei Diao, University of Massachusetts Amherst


(1)

(1)

(2)

(2)

(3)

(3)

Merge

(4)

Tornado Detection

Prediction

(confidence?)

Managing Uncertain Data

  • Sources of data uncertainty

    • Sensing noise and partial scanning

    • Data compression

    • Lossy wireless links

    • Incomplete merging

  • Managing uncertain data

    • Model sources of data uncertainty

    • Develop uncertainty calculus to combine the effects of these sources

    • Augment results with confidence values

Yanlei Diao, University of Massachusetts Amherst


(1)

(1)

(2)

(2)

(3)

(3)

Merge

(4)

Tornado Detection

Prediction

(confidence?)

Managing Uncertain Data

  • Sources of data uncertainty

    • Sensing noise and partial scanning

    • Data compression

    • Lossy wireless links

    • Incomplete merging

  • Self diagnosis and tuning

    • Compare predication at t with observation at t+1 (no ground truth?!)

    • System diagnosis when confidence value is low

    • Automatically tune the system

Yanlei Diao, University of Massachusetts Amherst


Questions

Yanlei Diao, University of Massachusetts Amherst


Outline

  • An outside look: DB Application

  • An inside look: Anatomy of DBMS

  • Project ideas: DB Application

  • Project ideas: DBMS Internals

Yanlei Diao, University of Massachusetts Amherst


Application: UMass CS Pub DB

  • UMass Computer Science Publication Database

    • All papers on professors’ web pages and in their DBLP records

    • All technical reports

  • Search:

    • Catalog search (author, title, year, conference, etc.)

    • Text search (using SQL “LIKE”)

  • Navigation

    • Overview of the structure of document collection

    • Area-based “drill down” and “roll up” with statistics

  • Add document

  • Top hits

  • Example: http://dbpubs.stanford.edu:8090/aux/index-en.html

  • Deliverables: useful software, user-friendly interface

Yanlei Diao, University of Massachusetts Amherst


Pallet

Truck

Case

Manufacturer

Supplier DC

Retail DC

Retail Store

Application: RFID Database

  • RFID technology

  • RFID supply chain

    • Locations

    • Objects

Yanlei Diao, University of Massachusetts Amherst


Application: RFID Database

  • RFID technology

  • RFID Supply chain

  • Database propagation

    • Streams of (reader_id, tag_id, time)

    • Semantics: reader_id  location, tag_id  object

    • Containment

      • Location-based, items in a case, cases on a pallet, pallets in a truck…

      • Duration of containment

    • History of movement: (object, location, time_in, time_out)

    • Data compression for duplicate readings

    • Integration with sensors: temperature, location…

  • Track and trace queries

Yanlei Diao, University of Massachusetts Amherst


(1)

(1)

(2)

(2)

(3)

(3)

Merge

(4)

Data Quality

  • Closed world assumption: not any more!

  • Various sources of data loss

    • Sensing noise

    • Data compression

    • Lossy wireless links

    • Incomplete merging

  • Probabilistic query processing

    • Model sources of data loss

    • Quantify the effect on queries max(), avg(), percentile…

    • Output query results with confidence level

Yanlei Diao, University of Massachusetts Amherst


  • Some idea on INFOD/data dissemination

Yanlei Diao, University of Massachusetts Amherst


  • Login