Associative Data Schemes for Cloud Computing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Associative Data Schemes for Cloud Computing PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Associative Data Schemes for Cloud Computing. Amir Basirat PhD Candidate [email protected] Supervisor: Dr Asad Khan. Clayton School of IT, Monash University STINT Workshop, Lulea, Sweden - May 2012. Contents. 1. Cloud Computing. 2. Hadoop MapReduce. 3.

Download Presentation

Associative Data Schemes for Cloud Computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Associative data schemes for cloud computing

Associative Data Schemes for Cloud Computing

Amir Basirat

PhD Candidate

[email protected]

Supervisor: Dr Asad Khan

Clayton School of IT, Monash University

STINT Workshop, Lulea, Sweden - May 2012


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

What is Cloud Computing?

The vision of Cloud Computing encompasses a general shift of computer processing, storage, and software delivery away from the desktop and local servers, across the network, and into next generation of data centers hosted by large infrastructure companies.


Associative data schemes for cloud computing

Big Data!

  • An IDC estimate put the size of the “digital universe” at 0.18 zetta-bytes back in 2006, and forecasted a tenfold growth by 2011 to 1.8 zetta-bytes.

  • This flood of data is coming from many sources. Consider the following:

    • The New York Stock Exchange generates about one terabyte of new trade data per day.

    • Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.

    • Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.

    • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.

    • The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.


Associative data schemes for cloud computing

Challenge?

Our existing capability to generate data seems to outstrip our capability to analyze it.


Associative data schemes for cloud computing

Data Management in Cloud

  • There are some underlying issues that need to be addressed properly by any data

  • management scheme deployed for clouds (Abadi, 2009), including:

  • capability to parallelise data workload

  • security concerns as a result of storing data at an untrusted host

  • and data replication functionality.

Thus the question, how to effectively process immense data sets is becoming increasingly urgent.


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

Hadoop

  • In a nutshell, what Hadoop provides:

  • “A reliable shared storage and analysis system. The storage is provided by HDFS and analysis by MapReduce”

(Hadoop, 2011)


Associative data schemes for cloud computing

MapReduce

  • MapReduce programming model requires expressing the solutions with two functions: Map and Reduce.

  • A map function takes a key/value pair, computes and emits a set of intermediate key/value pairs as output.

  • A reduce function merges all intermediate values associated with the same intermediate key, executes some computation on them, and emits the final output.

(Hadoop, 2011)


Associative data schemes for cloud computing

Word Count in MapReduce

Pseudo code for word count algorithm in MapReduce

1: class MAPPER

2: method MAP (docida, doc d)

3: for all term t in doc ddo

4: EMIT(term t, count 1)

1: class REDUCER

2: methodREDUCE(term t, counts [c1,c2,…])

3: sum = 0

4: for all count c in counts [c1,c2,…] do

5: sum = sum + c

6: EMIT(term t, count sum)


Associative data schemes for cloud computing

Challenges and Hurdles in MapReduce

  • Map function conducts its operation assuming all related data is distributed vertically, i.e. records being uniformly distributed across the network. However, it is possible that some parts of the related records being stored at different physical locations.

  • Intermediate records would need to be sorted before these are input to the reduce function.

  • Solution must be expressed in terms of the Map and Reduce functions working on key/value pairs, whilein some cases this may not be possible or natural, such as multi-stage processes.

  • Moreover, dependency on HDFS for data storage and retrieval can create single-points of failure for Map/Reduce infrastructure, especially at master nodes.


Associative data schemes for cloud computing

Contents

  • Existing data management schemes do not work well when data is partitioned among numerous available nodes dynamically.

  • Approaches towards scalable data management in cloud, which offer greater portability, manageability and compatibility of applications and data, are yet to be fully realised.

1

Cloud Computing

2

Hadoop MapReduce

Distributed Pattern Recognition

3

4

Graph Neuron (GN)

Hierarchical Graph Neuron (HGN)

5

Distributed Hierarchical Graph Neuron (DHGN)

6

Edge Detecting Hierarchical Graph Neuron (EdgeHGN)

7

8

Simulation Showcase

9

Question Time


Associative data schemes for cloud computing

Solution?

To develop a distributed data access scheme that enables data

storage and retrieval by association

Treat data records as patterns

As a result, data storage and retrieval is performed using a distributed

pattern recognition approach that is implemented through the integration

of loosely-coupled computational networks, followed by a divide-and-

distribute approach that allows distribution of these networks within the

cloud dynamically.


Associative data schemes for cloud computing

Associative Model of Data

This associative model treats data records as pattern and hence it

does not matter how data is represented.

The associative model uses a single, common structure for all

types of data


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

Distributed Pattern Recognition

  • Distributed computing approach offers seemingly unlimited scalability towards pattern growth with the rapid advent of network computing technology that enables processing to be performed within the body of a network rather than concentrating on exhaustive single-CPU utilization

  • Existing approaches are still lagged behind, due to highly-complex recognition algorithms being implemented.

  • Neural network approach offers promising tool for large-scale pattern recognition. However, there are also several issues related to its implementation. These include:

    • convergence problems,

    • complex iterative learning procedures,

    • and low scalability with regards to the training data required for optimum recognition


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

  • An eight node GN is in the process of storing patterns (Khan, 2002).

  • P1 (RED), P2 (BLUE), P3 (BLACK), and P4 (GREEN)


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

Hierarchical Graph Neuron (HGN)

  • HGN compositions of 2-dimension (7x5) and 3-dimension (7x5x3) for pattern sizes


Associative data schemes for cloud computing

Distributed Hierarchical Graph Neuron (DHGN)

  • DHGN distributed pattern recognition architecture

  • (Muhammad Amin and Khan, 2009).


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

Research Objectives

  • • Redesigning data management architecture from a scalable associative computing perspective for creating a database-like functionality that can scale up or down over the available infrastructure without interruption or degradation, dynamically.

  • Investigating a distributed data access scheme that enables data storage and retrieval by association while data records are treated as patterns

  • Processing the database and handling the dynamic load using a distributed pattern recognition approach

  • Developing an intelligent MapReduce framework that allows complex data representations to be used as keys for Map operations

  • Reducing cloud storage fragmentation by implementing a divide-and-distribute approach

  • Enhancing the existing cloud data management models for scalability

  • Validation of results and finding asymptotical limits of the technique through a rigorously designed computer simulation environment


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

Progress to Date

  • Proposing a Web-based GN for Real-time Image Recognition


Associative data schemes for cloud computing

Web-based GN

  • (a) Total number of positive and negative matches. (b) Distortion rates for each line of image (each constructed HGN).

  • Image distortion rates vs. rotation degrees.


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

Edge Detecting Hierarchical Graph Neuron (EdgeHGN)

7-by-7 bit Binary Character A and its 7 equally-sized DHGN subnets

Reducing number of neurons by applying a drop-fall technique


Associative data schemes for cloud computing

Drop Fall Scheme

  • Drop-fall is often used for dividing touching pairs of digits into isolated character. Drop-fall algorithm simulates the path produced by a drop of water falling from above the character and sliding downwards along the contour under the action of gravity.

  • When the drop gets stuck in a groove, it melts the character‘s stroke and then continues to fall. The dividing path produced by Drop-fall algorithm depends on three aspects: a start point, movement rules, and direction.

  • There are four possible directions that generally produce four different paths to divide touching digits. They can start on the left or right side and can evolve downwards or upwards. One of the four is likely to produce the right result.

  • Therefore, a set of Drop-fall algorithms consists of four methods which try to segment a block by simulating a drop-falling process: Descending-left algorithm, Descending-right algorithm, Ascending-left algorithm, and Ascending-right algorithm


Associative data schemes for cloud computing

EdgeHGN Performance


Associative data schemes for cloud computing

Contents

1

Cloud Computing

2

Hadoop MapReduce

3

Pattern Recognition and Distributed Approach

4

Graph Neuron for Scalable Pattern Recognition

5

HGN and DHGN

6

Research Objective

7

Web-based GN

8

EdgeHGN

9

Simulation Showcase


Associative data schemes for cloud computing

Disclaimer

I am not proposing any computer vision scheme for Image processing here.

I am not suggesting in any way that my scheme is capable of competing against a bunch of image processing and face recognition algorithms which are treated in the literature.

I am doing pattern matching and I could simply use any form of data representation for the purpose of my research.

Images are complex matrixes of values, but people can relate to images very well, and that is why I found it an easy way to illustrate the effectiveness and strength of my proposed model.


Associative data schemes for cloud computing

Binary Image Recognition

Fifty different individuals in the face image dataset obtained from the Face Recognition Data.


Associative data schemes for cloud computing

Sobel Operator

In simple terms, the Sobel operator calculates the gradient of the image intensity at

each point, giving the direction of the largest possible increase from light to dark and

the rate of change in that direction.

The result therefore shows how "abruptly" or "smoothly" the image changes at that point, and therefore how likely it is that that part of the image represents an edge, as well as how that edge is likely to be oriented.

Edge map after applying Global Binary Signature and Sobel‘s edge detection


Associative data schemes for cloud computing

References

Abadi, D.J. (2009). Data Management in the Cloud: Limitations and Opportunities, Bulletin of the Technical Committee on Data Engineering, pp. 3 - 12.

Khan, A. I. and Muhamad Amin, A. (2007). One shot associative memory method for distorted pattern recognition, Al 2007: Advances in Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 705—709.

Muhamad Amin, A. and Khan, A. I. (2009). Collaborative-comparison learning for complex event detection using distributed hierarchical graph neuron (DHGN) approach in wireless sensor network, Al 2009: Advances in Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 111—120

Nasution, B. B. and Khan, A. I. (2008). A hierarchical graph neuron scheme for real-time pattern recognition, IEEE Transactions on Neural Networks 19(2): 212—229.

Shiers, J. (2009). Grid today, clouds on the horizon, Computer Physics Communications, pp. 559 - 563.

Welsh, M., Malan, D., Duncan, B., Fulford-Jones, T. and Moulton, S. (2004). Wireless sensor networks for emergency medical care, GE global conference, Harvard university and Boston University school of medicine, Boston, MA.


Associative data schemes for cloud computing

Acknowledgement

I would like here to thank everyone who helped me to make this possible. The first and foremost person that deserves immense gratitude is my thesis supervisor, Dr Asad Khan for his support and kind contributions.

Thank You.


  • Login