Tutorial for park data fitting
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Tutorial for PARK data fitting PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

Tutorial for PARK data fitting. Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group. Objective: Distributed Computing Environment. User. User. User. User. User. User/Client ServiceServer Management WorkingServer. Cluster. Service Server Master Node. Working Nodes.

Download Presentation

Tutorial for PARK data fitting

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tutorial for park data fitting

Tutorial for PARK data fitting

Paul KIENZLE, Wenwu CHEN and Ziwen FU

Reflectometry Group


Objective distributed computing environment

Objective: Distributed Computing Environment

User

User

User

User

User

User/Client

ServiceServer

Management

WorkingServer

Cluster

Service Server

Master Node

Working Nodes


Prerequisite

Prerequisite

Python:

version >= 2.40

Windows:

cygwin

Client:

wxPython: version >= 2.6

matplot

Most services may need numpy


Setup of park

Setup of park

  • Download Source code:

    • Source code: svn co svn:[email protected]/park

    • Package for unix/linux: park-0.2.0.tar.gz park-0.2.0.tar.bz2

    • Package for windows: park-0.2.0.zip

  • Edit cluster config file:

    • park/config/hosts

  • Start service server

    • park/servers/mapServer.py

  • Start client

    • park/client/AppJob.py

  • Provide services

    • park/services


Setup of park in unix linux

Setup of park in Unix/Linux

  • Download park-0.2.0.tar.gz or park-0.2.0.tar.bz2 from http://danse.us

  • Unzip the file:

    tar –xvzf park-0.2.0.tar.gz

  • Make the installation:

    cd park-0.2.0

    make install

    or

    setup.py install –install-purelib=home_directory_of_park

    The command make install is equivalent to setup.py install –install-purelib=~. It will install park in directory ~/park.


Setup of park in windows

Setup of park in Windows

  • Download park-0.2.0.zip or park-0.2.0.tar.bz2from http://danse.us

  • Unzip the file:

    unzip park-0.2.0.zip

  • Make the installation in MSDOS window:

    cd park-0.2.0

    setup.py install

    It will install park in directory ~/Lib/site-packages/park.


Edit the config file

Edit the config file

The server makes use of park/config/hosts to configure the working nodes.

Example of park/config/hosts:

#

# hosts configure file for park

# example for compufans.ncnr.nist.gov cluster:

# 4 nodes, each node with 2 cpus

#

# the format is similar to that of /ect/hosts:

# ip_address full_name alias_name[:port:number_of_cpus]

#

127.0.0.1 localhost.localdomain localhost:5300:2

#172.16.255.251 n4.ncnr.nist.gov n4:6500:2

#172.16.255.252 n3.ncnr.nist.gov n3:6300:2

#172.16.255.253 n2.ncnr.nist.gov n2:6200:2

#172.16.255.254 n1.ncnr.nist.gov n1:6100:2


Start the server

Start the server

The server is park/servers/mapServer.py:

cd park/servers

python mapServer.py

Or in cygwin in Windows

cd Lib/site-packages/park/servers

python mapServer.py

The full command is:

python mapServer.py –port port –host host_name –log log_file_name.


Start the server1

Start the server

  • Make sure that python and its environments are set correctly.

  • Make sure that RSH defined in park/servers/environ.py is set to the remote shell command for cluster with multiple working nodes

  • Make sure that this remote shell command can start the remote command without the password.

  • Make sure that the services are executable files.

    Common Error:

  • [Errno 2] No such file or directory: '~/park/config/hosts': no configure file hosts.

  • ERROR (111, 'Connection refused')

    • the working server doesn’t start.

    • make sure that the port is not used

  • ERROR (xxx, ‘port is used')

    • Wait a while before restart the server

    • make sure that the port is not used


Stop the server

Stop the server

Shut down the service server by Ctrl-C or kill command.

Use kill without -9 command, which will also stop the working server program. Otherwise the working server will continue to work even the service server is killed.


Start the client

Start the client

  • Enter ~/park/client

  • Run the client application:

  • $python AppJob.py

  • Connect the server:

    • server > server | port (default port is 5400)

    • click connect button to connect the server.

  • Prepare and submit the service request:

    • shell > load : load xml service request, which will be shown in the upper text field

    • click submit button to submit the service request

    • the message related to service request is shown in the lower text field.

  • View the service results:

    • view : to view the results.

  • There are 3 types of data to be viewed: experimental data (with error bar), simulation data, and chi square. The experimental and simulation data only show the best results, and chi square shows the improvement of chisq for data fitting. Under the panel is a toolbar, which can be used to zoom in/out, save figure, and change the properties of figure (property button).

  • Shutdown the client:

    • server > disconnect then close the window

    • or close the window directly.


Map reduce parallel pattern

Map-reduce parallel pattern

  • Map: master node assigns working unit [i] to working node [j] :

    • map(fn, input[i] ) = output[i] to working node j

  • Reduce: master node collection message from each working node

    and perform reduce function, and send the result to the user:

    • reduce(gn, output[0], …, output[n] ) => send to the user client

Service Server

reducing

Service Server

Mapping

Working Nodes


Service request

Service request

<?xml version='1.0' encoding='UTF-8'?>

<session version='2.0.1' type='7' user='wwchen‘ [email protected]' priority='0' >

<group name='group1'>

<dataSet>

</dataSet>

<reduce classname='Chisq'/>

<task cmd='longwinstr.py' >

<bufsize value='3000'/>

<home value='/home/wwchen/dansesrc/park/services/tester'/>

<cwd value='/home/wwchen/dansesrc/park/servers/tester'/>

</task>

<joblist name='job1' priority='4' cnt='4' >

<input count='24'>

</input>

</joblist>

</group>

</session>

Reduce function

map function

inputs


Software infrastructure of park for data fitting

View Developer

Reduce Service Developer

Service

Service

Service

Service

Scientist

Data simulation

Model Developer

Software Infrastructure of PARKfor data fitting

Data presentation

Data reduction

User Interface

Service Server

Data View

Working

Nodes


Reduce function

Reduce function

The class inherits from park/services/reduce/reduce.Reduce.

class Reduce: """ A base class as the reduce function. """

def __init__(self): """ constructor. """

self.archive = None

self.msgqueue = None

def setArchive(self, archive): """ set the archive to store data """

self.archive = archive

def setMsgQueue(self, msgqueue): """ set the message queue. """

self.msgqueue = msgqueue

def __call__(self, msg):

""" called by the PARK to process the reply from the working node. """

pass


A example of reduce function

A example of Reduce function

park/services/reduce/Chisq.Chisq:

class Chisq (Reduce): """ A class to handle the chisq for data fitting. """

def __init__(self):""" constructor. """

Reduce.__init__(self)

self.chisq = None

def __call__(self, reply):

keys = {}; keys['gid'] = reply.gid; keys['jid'] = reply.id

self.archive.put(keys, str(reply))

if hasattr(reply, 'chisq'):

chisqval = self.chisq

if self.chisq is None:

self.chisq = chisqval

elif chisqval < self.chisq:

self.chisq = chisqval

self.msgqueue.putMsg(reply.gid, '%s<reply gid="%s" update="%s" chisq=%s/>' \

%(XML_HEADER, str(reply.gid), str(reply.id), str(chisqval)))


Map function

map function

  • The pure python function.

    - Running as a thread in PARK.

    • Bad scalability for SMP (due to python multithreading implementation)

    • Only works for pure python function.

      Format:

      output_string function_name(input_string)

  • The executable program.

    - Running as a separated process in PARK.

    • Excellent scalability for SMP

    • Works for any executable program

    • Need more memory and long start-up time

      Read input from the standard in and output the results to standard out.


A example of map function

A example of map function

park/services/tester/longwinstr.py:

if __name__ == '__main__':

try:

longwin()

except:

sys.stderr.write('Exception:%s' %(sys.exc_info()[1]))


A example of map function1

A example of map function

def longwin():

print 'call longwin'

s0 = sys.stdin.read()

node = minidom.parseString(s0).childNodes[0]

t = int(node.getAttribute('count'))

if t > 25:

count = t

else:

count = 2**t

print ' Start work with iteration number: ', t

cnt = 0

while (cnt < count):

a= math.sqrt(2.0)

cnt += 1

print ' finish work: cnt=', cnt


Fully distributed services

Fully Distributed Services ?

User

Client

Services

Service Register

Message Queue

Job Queue

Cluster Management

Task Management

Service Management

Data Fetching

Archive

Logging

Shared Files


Pull or put

Job Server

Message Server

Working Server

Pull or put ?

1. Job server sends job to working server, and working server send results to message server

2. Job server sends job to working server, and message

server working retrieve results from working server

3. Working server retrieves job from job server and send results to message server

4. Working server retrieves job from job server and message

server working retrieve results from working server


Security authentication and authorization

Security: authentication and authorization

Job Server

Security Server

MessageServer

Working Server


Data transfer

Data Transfer

  • Provide the data center server for the cluster, which will retrieve data from remote data server, and store the data for the accessing by the local working nodes. Necessary for diskless nodes in the cluster.

  • Provide the reference to the remote data (similar to url), and each working node will access the data individually.


Ui visualization

UI/Visualization

MVC model

Traits-UI

2D/3D


Multi tier of park

Multi-tier of PARK

Client Server

Explicit direct connection

Implicit direct connection

Possible connection

Service Server

Reduce Server

Working Server

Data Server

All are working as both the server and the client


Multi tier of park1

Multi-tier of PARK

Client Server

Explicit direct connection

Implicit direct connection

Possible connection

Service Server

Reduce Server

Working Server

Data Server

All are working as both the server and the client


  • Login