working group practical policy rainer stotzka reagan moore n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Working Group: Practical Policy Rainer Stotzka, Reagan Moore PowerPoint Presentation
Download Presentation
Working Group: Practical Policy Rainer Stotzka, Reagan Moore

Loading in 2 Seconds...

play fullscreen
1 / 43

Working Group: Practical Policy Rainer Stotzka, Reagan Moore - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Working Group: Practical Policy Rainer Stotzka, Reagan Moore. Agenda. Thursday March 27, 2014 3:30-5:00 PM Introduction to policy-based data management Discussion of data policy manager for EUDAT (Mark van de Sanden) Presentation on natural language rule processing ( Chitta Baral )

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Working Group: Practical Policy Rainer Stotzka, Reagan Moore' - lumina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
agenda
Agenda
  • Thursday March 27, 2014 3:30-5:00 PM
    • Introduction to policy-based data management
    • Discussion of data policy manager for EUDAT (Mark van de Sanden)
    • Presentation on natural language rule processing (ChittaBaral)
    • Initial presentation of summary of policies across data centers and research projects (Jewel Ward)
  • Friday March 28, 2014 11:00-12:30 PM
    • Discussion of policy summary
      • Identification of best practices
    • Discussion of policy testing – interoperability testbed
    • Integration with deliverables from other working groups
      • Persistent identifiers
      • Linked-data – HIVE
      • Type registry
      • Data Foundation and Terminology
      • Preservation interest group
practical policy working group focuses
Practical Policy Working Group Focuses:
  • Identify the most important policies
  • Practical implementations for managing research data collections
  • Provide recommendations for a “starter kit”
  • Testbeds:
    • Evaluate standard policies
    • Test interoperability across WGs

Policy: Assertion or assurance that is enforced about a collection or a dataset

concept graph by reagan moore
Concept Graph by Reagan Moore

Purpose

Collection

Defines

Defines

Integrity

Isa

Persistent

State Information

Property

Policy

Procedure

Defines

Updates

Controls

Isa

Workflow

Chains

Function

HasFeature

Isa

SysChksumDataObj

Consistency

concept graph by reagan moore1
Concept Graph by Reagan Moore

Purpose

DATA_ID

DATA_REPL_NUM

DATA_CHECKSUM

Collection

Defines

Replication Policy

Has

Isa

Isa

Isa

Has

Isa

Checksum Policy

Defines

Digital Object

Attribute

Has

Isa

Quota Policy

Has

Isa

Integrity

Data Type Policy

Isa

Updates

Isa

Isa

Authenticity

Persistent

State Information

Isa

Property

Policy

Procedure

Defines

Updates

Controls

Access control

Isa

Isa

SubType

Has

HasFeature

GetUserACL

Periodic Assessment Criteria Policy

HasFeature

Workflow

Isa

Policy Enforcement Point

SetDataType

Completeness

HasFeature

Chains

Isa

SetQuota

Correctness

Isa

Function

HasFeature

Invokes

Isa

DataObjRepl

Consensus

Isa

Isa

SysChksumDataObj

Operation

Consistency

Client Action

policy categories
Policy Categories

Integrity

Management

Administrative

Assessment

AccessControl

Replication

Provenance

Preservation

Collection-based Policies

Regulatory

Description

Data Management Plans

Data Staging

Publication

Data Lifecycle Management

Federation

Compliance

management
Management

Testbeds

  • iRODSRenaissance Computing Institute
  • E-iRODSDataNet Federation Consortium – DFC
  • dCacheInstitute of Physics of the Academy of Sciences, CESNET
  • DataVerseOdum Institute
  • List of policies in the RDA Wiki
  • Monthly telephone conferences (RDA)
  • “Policy of the month”Review of policies that have been submitted
  • 54 persons registered
interactions with other wgs
Interactions with other WGs

Data Foundation and Terminology WG

  • Discussion of a vocabulary for operations

Preservation Infrastructure IG

  • Policies for preservation

Persistent Identifiers

  • Properties versus operations on identifiers

Data Citation WG

  • Type registry

Metadata

  • Linked-data vocabularies
eudat data policy manager
EUDAT Data Policy Manager
  • Peisar – Storage Policies at CESNET
natural language rule processing
Natural Language Rule Processing
  • Why?
    • Users or domain experts need not learn the syntax of the rule language.
    • They specify their rules using natural language.
  • How?
    • Natural language specification of rules is translated to rules in the syntax of the rule language – in two steps though
      • Step 1: Natural language to an intermediate language (focus is on correct translation of natural language and dealing with the challenges and quirkiness of natural language)
      • Step 2: Intermediate language to Rule language (Should be more straightforward as both languages are formal languages, and the intermediate language has a very restricted vocabulary)
      • Our focus in this presentation is on Step 1.
underlying technical approach
Underlying Technical Approach
  • Montague’s approach: The meaning of words and phrases are Lambda calculus formulas
  • The meaning (or translation) of sentences are obtained by combining the meaning of its words and phrases.
    • Usually as dictated by a grammar
    • Categorial Grammar (especially CCG) are often used as they give directionality regarding how to combine.
slide12

Print financial report [S]

NL to Policy Example

print(report(finance))

(λy. y@finance) @ (λx. report(x))

( λx. report(x))@finance

financial report [NP]

Print [S/NP]

report(finance)

report(finance)

λz. print(z)

financial [NP/N]

report [N]

λy. y@finance

λx. report(x)

the key issue s
The Key Issue(s)
  • Where do we get the Lambda expressions from?
  • Handcrafting them is not scalable
    • Lambda expressions get complex in a hurry and handcrafting creates a bottleneck
    • Too many words
    • Since target language is not unique we can not painstakingly make new dictionaries for each target language
    • Target languages evolve
  • Other standard issues
    • Ambiguity: Multiple meanings of words; word sense disambiguation; etc.
how to get the lambda expressions how we learned natural languages
How to get the lambda expressions? How we learned natural languages?
  • Often
    • We know the meaning of a sentence
    • We know the meaning of most of the individual words in that sentence
    • But we do not a-priori know the meaning of some particular word(s) in that sentence
    • We are able to correctly guess the meaning of those words
  • Follow a similar approach
    • Given a set of training examples and an initial dictionary, learn the lambda expressions for the words in those examples that are not in the dictionary
    • Inverse Lambda operators
inverse example
Inverse λ Example
  • Every boxer walks.
inverse another example
Inverse λ – another Example

Print financial report [S]

print(report(finance))

financial report [NP]

Print [S/NP]

report(finance)

λz.print(z)

financial [NP/N]

report [N]

λy. y@finance

λx. report(x)

another example
Another Example

Send email to curator of the collection [S]

send(email, curator(collection))

to curator of the collection {NP]

curator(collection)

Send email [S/NP]

curator of the collection [NP]

λz. send(email,z)

to [NP/NP]

curator(collection)

λx.x

Send

[(S/NP)/NP]

email [NP]

of the collection[NP\NP]

curator [NP]

λy. y@collection

email

λy. λz. send(y,z)

λx. curator(x)

the collection [NP]

of [(NP\NP)/NP]

collection

λx.λy. y@x

slide20

NL2KR-L

NL2KR-L System Learning Process

Generate all parse trees of the sentences

Learn lexicon using Inverse-λ and Generalization

Generalize complete lexicon

Parameter Estimation

slide21

NL2KR-T

NL2KR-T System Translation Process

Generate all parse trees of the sentences

Generalize the missing meanings of words and recomputed parse trees

PCCG to rank the translation

current status
Current Status
  • We have a prototype that translates English description of policy rules to a formal representation
    • Working towards making it usable in iRODS
    • Step 1: English to a formal policy specification (in an intermediate language)
    • Step 2: Formal policy specification to Rules (in a lower level language)
natural language rule processing conclusion
Natural Language Rule Processing: Conclusion
  • Described an approach to translate natural language (NL) specification to an intermediate (formal) language - which can then be translated to rules.
  • Theory: Augmented Inverse-Lambda based learning to Montague’s Lambda Calculus based approach.
  • System: Developed the NL2KR system.
  • Used the NL2KR system to build a translation system from NL to Intermediate Policy Description Language.
  • Nl2KR system can be used for developing translation systems from natural language to other formal languages.
    • Has been evaluated in domains such as Geoquery, Robocup language, puzzles, and Biology questions.
invitation
Invitation

We are seeking:

Data experts & Domain scientists !

  • Provide policies already in use: RDA Wiki
    • Description
    • Implementation
  • Express wishes about policies you might need
  • Discuss and analyze policies
  • Enhance the cross-over to other WGs, IGs and initiatives
summary of policies in production use
Summary of policies in production use
  • Policy for data retention. How long, how short? Need preservation, or not? (5) Retention and disposition
  • Notification policies. (Ex. must warn data researcher that their data will be deleted at X time.) (6) notification on event
  • Transferability policies. The data must be transferable from the repository back to the researcher and the repository of origin. Or, in the event of defunding, the data must be de-accessioned and moved to another repository (or not, depending on relevant SOPs, agreements, etc.).
  • Policies re: costs and who pays for all of this data storage (8)
  • Policies around context. Sometimes the original data and additional metadata are needed. Sometimes, the context or derived data is what matters, and not the data itself. (7)
  • Policies re: tagging/annotating data
  • Search/Information Retrieval policies. What parts of the data will you search on, or not search on? (4) Controlling search
  • Standard Sys Admin policies: (1)replication, back up, (2) integrity checks, syncing with back ups.
  • Content policies: do we care what content and file formats users upload? Some do, some don't. (3) Transformative migration
  • Policy to educate researchers about all of the different policies relevant to the data repository. For example, a user agreement/Terms & conditions statement that researchers must check off.
best practices for production policies
Best Practices for production policies
  • Consensus on a policy
    • Use at multiple institutions
    • Generality
  • Best practice policy components
    • Name of operation that policy controls
    • Constraints that policy implements
    • State information that policy uses or modifies
    • Verification policy
    • Example of running code
    • Documentation
operations managed by policies
Operations managed by policies
  • Paper posted that lists 70 operations
    • Policy-verification.docx
  • Candidate operations
    • Access control
    • Backups
    • Data retention
    • Descriptive metadata
    • Format creation
    • Integrity checks
    • Notification
    • Policy constraints
    • Replication
    • Restricted search
    • Storage cost
    • Tags
    • Use agreements
replication policy
Replication Policy
  • Operation that is being controlled
    • Replicate a file
  • Controls
    • When is replication done?
      • When file is ingested
      • When file is changed
    • Which files are replicated? Choose based on:
      • Collection
      • User
      • Size
    • Replication properties
      • Choice of replication location
      • Choice of access controls on replica
      • Requirement for checksum
      • Verification of checksum on replica creation
    • Variants:
      • Versioning of changes vs replication
      • Backups vs replication (time-stamped copy)
  • Verification
    • When should replica existence be verified
interactions with other working groups
Interactions with other Working Groups
  • Interoperability testbed
    • Demonstrate that RDA recommendations can be jointly implemented
  • Control policies
    • Demonstrate that a desired practice can be applied consistently
  • Assessment policies
    • Verify that a recommended practice is followed
  • Integration
    • Demonstrate semantic consistency across systems level integration
    • Example – are data objects considered to be immutable
practical policy wg interfaces with the other wgs
Practical Policy WG Interfaces with the other WGs
  • Interoperability testbed provided by Practical Policy WG
    • Persistent identifiers
      • Handle system
    • Metadata
      • HIVE linked-data vocabularies
    • Type registry
      • Expect implementation for integration
    • Data Foundation and Terminology
      • Exchange of concepts based on use cases
    • Preservation interest group
      • ISO 16363 assessment policies
proposal special interest group on interoperability testbeds
Proposal - Special Interest Group on Interoperability Testbeds
  • New interest group is driven by the need to have testbeds with a longer lifetime than the Practical Policy working group.  
  • Current testbeds
    • Dataverse
    • dCache
    • iRODS
  • Testbed functions
    • Demonstrate interoperability
    • Provide platform to evaluate proposed best practices / software
  • We need working groups to provide software systems or policies for testing.
    • Need a liaison to each working group
special interest group on interoperability testbeds
Special Interest Group on Interoperability Testbeds
  • Interested participants include:
  • David Antos CESNET
  • Jon Crabtree Dataverse
  • MarcioFaerman OSU
  • Patrick FuhrmanndCache testbed, DESY
  • Thomas Jejkal KIT Data Manager repository
  • TiborKalmanPersistent identifier consortium
  • Reagan Moore DataNet Federation Consortium
  • JakubPeisardCache testbed
  • Raphael Ritz MPG