Enhancing UML with Data Flow for High-Performance Applications

Data Flow in UML SAGE (12 prod units) UML (50 prod units) PGM (20 prod CORBA (17 prod units) Dr. Jeffrey E. Smith Mercury Computer Systems, Inc. jesmith@mc.com SCE (40 pr

Agenda • Model based parallel programming alternatives • Focus on framework/UML Conceptualization • Data Parallel CORBA • Data Flow in UML Superstructure

Development Steps Develop Model (GUI + Interpreted Code) Make Parallel (Simulator) Auto-Generate Application Code (Cross Compilation) Motivation: From “Portable HP SW for SIP - What’s Next”, Lincoln Labs • Moore’s law addresses computations, not complexity • In their roadmap for advancing RT Embedded Software Development, they identified model-based development and automated mapping support as thekey long-term technologies • “Blue Jean” datapoint

Methods to Conceptualize/Apply High Performance Data Flow Applications

Observations • UML doesn’t include consistent model of data flow … yet … not really • Translate UML diagrams to any source - might be an avenue of tool support worth exploring

Profile-Guided Optimization Goals: Component Reuse, Software Productivity, Leverage Existing Investments & Wider Programming Base Requirements and Design Model Behavior UML Constructor (Programmer 1) Translate Parallel/DSP Prototypers . . . Graph(ical) CORBA SCE V/P Compilers Executable Prototype Source POSIX-Compliant API Optimizer (Programmer 2) POSIX-Compliant kernel Executable Deliverable

Dynamic Compilation Can Provide a Solution High-Level Algorithms Collect runtime execution behavior Work with OMG UML UML with Data Flow • Memory usage • instruction and data caches • translation look-aside buffers • Control flow • branch probabilities • program “traces” • Call graphs • gprof statistics • Data dependencies • data-dependent control flow • Variable values • value locality • interprocedural dataflow • Hardware counters • pipeline stalls Common CASE & Data-Flow Machine Development (Par.)CORBA IDE 1-7 Transforms Non-Optimized Low-Level Algorithms Profile-Guided Optimizations Feedback Optimized Low-Level Algorithms

Next Steps • Application to IR formation, fusion, template matching • Collect software productivity metrics on above and MITRE benchmarks • Experiment with optimization of UML transformed (through data parallel CORBA or specialized data parallel compiler IDEs) software to efficient embedded platforms • Work with OMG in introducing data flow, in a way that supports streaming high-performance, data-flow distributed computers • Examine possibility of embedding dynamic profile optimization into runtime system • Work with CASE and IDE vendor to integrate model-based development of efficient streaming high-performance, data-flow distributed computer targets

Trick is to: 1) Discover common patterns (SCE, PAS, Par. CORBA, …) 2) Feed this forward into standard OMG specs 3) Simplify our own software architectures/APIs Action Semantics DataFlow PAS Channel

CORBA Sequence

Data-Parallel CORBA Sequence

Meta-Classes

Data Structures

Runtime Associations

Control Flow • Each step is taken when the previous one finishes … • …regardless of whether inputs are available, accurate or complete (“pull”) • Emphasis is on order in which steps are taken Weather Info Start Analyze Weather Info Not UML Notation Chart Course Cancel Trip

Design Product Acquire Capital Procure Materials Build Subassembly 2 Build Subassembly 1 Final Assembly Not UML Notation Object/Data Flow • Each step is taken when all the required input objects/data are available … • … and only when all the inputs are available (“push”) • Emphasis is on objects flowing between steps

UML 2.0 Superstructure RFP Excerpt Further, the way that objects and other data flow between parts of a system is crucial to understanding its architecture. The UML currently supports object/data flow only at the lowest level of granularity {not even}, between the steps in an activity graph {as well as other locations, in a contradictory way}. It is important for architects to be able to model object and data flows between entities at a higher level of granularity, such as classifiers and packages {as well as many other requirements coming up}. {signifies my comments}

Why bring back data flow explicitly into UML? • With parallel computation increasingly used to increase computation speeds, there is interest in linking streaming data flow machines with a matching modeling paradigm • To bring back data flow standard – developers have been building unique custom DFDs out of standard UML structure (patterns) - some CASE vendors added data flow at model & meta-model level • To link/integrate existing DFD toolsets with UML toolsets & existing simulators e.g. Ptolemy [Park] • Functional modeling (only third left out of OMT) fits OO and non-OO modeling paradigm and can be united with other UML models [SD, DSH] • Currently addressed in piecemeal in UML (shown later), none of which conform to pre-existing modelers view (OMT view) of data flow

Why bring back data flow explicitly into UML (cont)? • Object model defines system components, dynamic model (state machines) define system control but functional model (data flow) defines what computations occur in a system & functional dependencies between processes • Need expressed in software process/workflow, defense, medical, wireless and digital video domains • Example: When response to Action Semantics RFP was presented in OMG Plenary, diagrams were not done in UML (were in data flow) - reason given was it would take too much space in UML

d0:1 d0:4 d0:4 d0:1 P1,2 P1 d0:2 d0:2 P1,1 d0:5 d0:5 d0:3 P1,3 d0:3 Why bring back data flow explicitly into UML (continued) ? • Different (yet related) underlying semantics than State Machines • "is-used-to-produce" relation • Can have consistent parent/child (state/substate) diagram from state machine point of view that violates data consistency model [TK] • Unique inheritance (decomposition) requirement • Example definition: Let P be a process and D a composition of P. D is consistent with P iff the I/O relationships that are 1) specified for P must also hold for D and 2) not specified to hold for P must not hold for D [TK] • Relation to state machines: A trigger (t) of a control process "is-used-to-produce" a response (r) of the same process iff there is a transition in the STD that is triggered by t and responds with r [TK]. • It is conceptually simpler for some applications – simply a digraph together with a binary precedence relation. • It is impossible to represent continuous flow, especially with feedback, in a State Machine because of the theoretically infinite amount of states to represent. This is a natural modeling view with data transforms. • STDs are sequential within one machine, DFDs are not

Interaction Diagrams (Sequence, Collaboration) • Different (yet related) underlying semantics than Interaction Diagrams • Interaction diagrams are for interaction among objects • Cannot represent interaction at a lower level (among methods of different classes) • Cannot represent interaction among systems

Why aspects of data flow are not yet supported? • Ambiguous order of input to processes • Considered difficult to unite the semantics of data flow models with other OO models (other research has proved this false) • Some DFDs allowed for control flow and control flow is duplicated in many of the dynamic models • Could be non-deterministic, since not all processes or data flows are necessarily used to produce the high level process outputs • No way to represent sequencing, iteration and conditionals • Considered to be included: but inconsistently and multiply

Where UML experts think data flow either exists or would fit? • UML Profile for Enterprise Distributed Object Computing • Activity Diagrams • Collaboration Diagrams • Action Semantics (Data flow is model element that acts as temporary data store between in and out pins) • Data-parallel CORBA • Using new (data flow) patterns of existing UML structures • A UML ActivityGraph Profile for EDOC Task Model (ad/99-10-07) • Object interaction diagram (I assume this option is merely seconding the Collaboration Diagrams suggestion)

Initial Requirements Collection • Establish criterion for (automatable) checking of (internal/external) completeness (well-formed,well-connected,well-introduced,well-rooted) [TK], consistency [Kung], decomposition (inheritance), boundedness [Park], determinancy [Park, KM] and termination [Park, KM]. • Elementary processes modeled, like Petri nets [Petri], with pre and post conditions describing the behavior of processes. • Provide ability to express data dimensions and other data properties (in Class Diagram) and explicit linkage to these from Data Flow Diagram. • Map to rest of UML - Consistent with State Diagrams (events trigger "is-used-to-produce" relation), Action Semantics, EDOC [EDOC], Collaborations, Activity, Class Diagrams (generalization and process functional dependencies to associations) [Kung], Use Case Diagrams (actors) [Park], RT UML (ports map to I/O specs) and Deployment Diagrams (see next 2 bullets). • Provide ability to express parallelization along data dimensions and mapping to hardware resources in Deployment Diagram. Must express data distribution types (sequential or parallel) and sub types (round robin, random even, random statistical, first available, etc.) • Allow ability to specify that arrows in DFD are associated with (virtual) channels (see Virtual Interface Specification) in Deployment Diagram.

Initial Requirements Collection (cont) • Non-side affecting operations, or previously defined actions, are decomposed using functional models and these are generally used at the aggregate level [SD]. • Aggregate objects are passed as an input parameter and returned as an output parameter, allowing a process to access any object (data stores, object classes, or associations) with the parameter [SD]. • Place all control flow info in a state machine (to solve 4.4 and 4.5) [SD]. • Provide for data store I/O not included in action semantics. • Must be able to model partial objects (multiple partial partitions of data) described in Data Parallel CORBA Spec. • Provide method to express process synchronization as something external to processes (as opposed to state machines where this would be defined in a state) without knowledge of composition context. Constraints to unify behavior, class & functional models

Initial Requirements Collection (cont)for Modeling Streaming Data • Provide a uni-directional data streaming interface with data flow. • Model structured (number of dimensions, extent in each dimension, packing order & element type) and unstructured global data (number of data sets, size of data) [DRI]. • Model object I/O requirements e.g. support for structured/non-structured data, dimensions, element types and data partitioning specification (e.g. indivisible or block type and for each dimension, maximum size, minimum number of required elements, modulo size, block length, left and right overlap specs, etc.) [DRI]. • Model data stream control e.g. push and pull of data, QoS based on data control (e.g. rate & latency constraints), control data stream, control tagged data, etc. [DRI]. Name 2 Properties Name Name 1 Data Distribution (sub)Type Properties Properties Input Specifications Output Specifications Name 3 Properties Global Data Associate I/O specs with port attributes Need semantics to model data

Existing Data Flow Semantic Models • Petri Nets [Petri] • Kung, et al [TK, Kung] • Karp and Miller Computation Graphs [KM] • Kahn Process Networks [Kahn] • Parks Bounded Execution [Parks]

Completely different connection in action semantics, EDOC, Activity Diagrams, Different CASE vendors

References • [BS] D. Bhatt and J. Shackleton, A Design Notation and Toolset for High-Performance Embedded Systems Development, Lectures on Embedded Systems, LNCS 1494, Springer-Verlag, VIII, October 1998. • [DRI] Document for the DARPA Data Reorganization Effort, www.data-re.org, Feb 2000. • [EDOC] Cooperative Research Centre for Enterprise Distributed Systems Technology, UML Profile for Enterprise Distributed Object Computing, ad/99-10-07. • [Kahn] G. Kahn, The Semantics of a Simple Language for Parallel Programming, Info. Proc., pages 471-475, Stockholm, Aug. 1974. • [KM] R. M. Karp and R. E. Miller, Properties of a Model for Parallel Computations: Determinacy, Termination, Queueing, SIAM Journal of Applied Mathematics, Vol. 14, No. 6, November 1966. • [Kung] C. H. Kung, Conceptual Modeling in the Context of Software Development, IEEE Transactions on Software Engineering", 15(10):1176-1187, Oct. 1989. • [Parks] T. M. Parks. Bounded Scheduling of Process Networks Technical Report UCB/ERL-95-105, PhD Dissertation, EECS Department, University of California. Berkeley, CA, December 1995. • [Petri] C. A. Petri, Kommunikation mit Automaten, PhD dissertation, translation by C. F. Greene, Supplement 1 to Technical Report RADC-TR-65-337, Vol. 1, Rome Labs, Griffiss Air Force Base, NY, 1965. • [TK] Y. Tao and C. Kung: Formal Definition and Verification of Data Flow Diagrams, J. Systems Software, 16:29-36, 1991. • [SD] S. DeLoach, Formal Transformations from Graphically-Based Object-Oriented Representations to Theory-Based Specification, PhD thesis, Air Force Institute of Technology, Wright-Patterson AFB,OH, June 1996, AFIT/DS/ENG/96-05, AD-A310 608.

Enhancing UML with Data Flow for High-Performance Applications

Enhancing UML with Data Flow for High-Performance Applications

Presentation Transcript

Introduction to Entity-Relationship Diagrams, Data Flow Diagrams, and UML

Data Flow Diagrams

Data-flow Diagrams

Data Flow Diagrams

Data Flow Diagrams

DATA FLOW DIAGRAMS

Data Flow Modelling

Data Flow Testing

Data Flow Analysis

IQuOD Data Flow

DATA FLOW DIAGRAMS

Data Flow Diagrams

Compositionality in Synchronous Data Flow

DATA FLOW

Data Flow in MEDIN

Data Flow in UML

Flow Data

Data Flow Diagrams

Data Flow Architectures