Pattern analysis machine intelligence research group university of waterloo
Download
1 / 60

Pattern Analysis & Machine Intelligence Research Group UNIVERSITY OF WATERLOO - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Pattern Analysis & Machine Intelligence Research Group UNIVERSITY OF WATERLOO. LORNET Theme 4. Data Mining and Knowledge Extraction for LO. T L : Mohamed Kamel PI’s: O. Basir, F. Karray, H. Tizhoosh Assoc PI’s: A. Wong, C. DiMarco. PI’s: Dr. Basir Dr. Tizhoosh Researchers H. Ayad

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Pattern Analysis & Machine Intelligence Research Group UNIVERSITY OF WATERLOO' - juliet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Pattern analysis machine intelligence research group university of waterloo

Pattern Analysis & Machine IntelligenceResearch GroupUNIVERSITY OF WATERLOO

LORNET Theme 4

Data Mining and Knowledge Extraction for LO

T L : Mohamed Kamel

PI’s: O. Basir, F. Karray, H. Tizhoosh

Assoc PI’s: A. Wong, C. DiMarco


Theme 4 team leader m kamel

PI’s:

Dr. Basir

Dr. Tizhoosh

Researchers

H. Ayad

R. Kashef

A. Ghazel

Dr. Makhreshi

Funding

CRC/CFI/OIT

NSERC

PAMI Lab

Dr. Karray

Asso PI (Wong, DiMarco

M. Shokri

S. Hassan

A. Farahat

Dr. R. Khoury

PDS,

Vestech,

Desire2Learn

Theme 4 TeamLeader: M. Kamel

Graduated

  • R. Khoury, PhD 07

  • L. Chen, PhD 07

  • M. Makhreshi,PhD 07

  • K.Hammouda,PhD 07

  • R. Dara, PhD 07

  • Y.Sun, PhD 07

  • K. Shaban, PhD 06

  • Y. Sun, PhD 06

  • M. Hussin, PhD 05

  • Jan Bakus, PhD 05

  • A. Adegorite, MA.Sc04

  • A. Khandani, MA.Sc05.

  • S. Podder, MA.Sc.04

PAMI Research Group, University of Waterloo


Data and knowledge mining
Data and Knowledge Mining

  • Knowledge extraction and discovery of patterns from data.

  • Labeling and categorization, summarization, classification, prediction, association rules, clustering

PAMI Research Group, University of Waterloo


Theme overview
Theme Overview

LO

Mining

From Text

Syntactic: Keyword, Keyphrase-based

Semantic: Concept-based

From Images

Image Features, Shape Features

From Text + Images

Describing Images with Text Enriching Text with Images

Knowledge

Extraction

Classification

(MCS, Data Partitioning, Imbalanced Classes)

Clustering

(Parallel/Distributed Clustering, Cluster Aggregation)

LO Similarity and Ranking

Association Rules / Social Networks

Reinforcement Learning

Specialized / Personalized Search

Tagging

and

Organizing

Matching

and

Ranking

PAMI Research Group, University of Waterloo


Types of data in lornet
Types of Data in LORNET

TELOS

LCMS

Course

Module

Lesson

LO

Course

Module

Lesson

LO

Module

Lesson

LO

Course

Subject Matter

Text, Images, Flash, Applets, Metadata, Interaction Logs

Resource

Resource

Resource

Discussion Board

Board

Thread

Post

Board

Thread

Post

Board

Thread

Post

SemanticLayer

Discussions

Text, Interaction Logs

LOR

Record

Metadata

Record

Metadata

Record

Metadata

Resources

Metadata,Semantic References

LO Descriptors

Metadata

PAMI Research Group, University of Waterloo


Abstract view of data for mining
Abstract View of Data for Mining

  • Text (Plain or Markup)

    • Any resource that contains text is viewed as an abstract text document (some markup can be preserved to indicate different weights); e.g. HTML page, Word document, email message, discussion post, even metadata records.

    • Suitable for text mining, information/metadata extraction, summarization, natural language processing, semantic/concept analysis, social network analysis.

  • Numeric Matrix (Vector Space Model)

    • Requires text mining algorithms to convert the original text to numeric form through feature extraction and statistical weighting.

    • Suitable for machine learning algorithms that expect numeric input, especially classification and clustering algorithms.

  • Feature Vectors

    • Suitable for mining images: description, indexing, and retrieval (CBIR). Requires image processing algorithms to extract image features.

    • Also suitable for mining and learning from interaction logs, where each vector describes an event.

  • Relationship

    • Provides domain knowledge about data, such as containment (e.g. LO within Course, Post within Thread) and relatedness (collection of resources, cross-referenced LOs).

    • The extra knowledge could be exploited to improve accuracy, or to apply the same algorithm to different parts of the data (e.g. generating one summary for entire course, or one summary per lesson.)

PAMI Research Group, University of Waterloo


Data representation
Data Representation

  • What level of granularity

  • One representation or multiple

  • Feature representation

  • Dimensionality issues

PAMI Research Group, University of Waterloo


Document modeling
Document Modeling

  • Document is represented by a set of concepts called “indexing terms”  Document segmentation

    • sub-word level (decomposition of words and their morphology)

    • word level (words and lexical information)

    • multi-word level (phrases and syntactic information)

    • semantic level (the meaning of the text)

    • pragmatic level (the meaning of the text with respect to the context and situation- ontology?)

PAMI Research Group, University of Waterloo


Document modeling1
Document Modeling

sub-word

word

multiword

semantic

pragmatic

noise &

redundancy

dimensionality

content-based

context-based

complex

algorithms

required

domain knowledge

PAMI Research Group, University of Waterloo


Document modeling2
Document Modeling

sub-word

word

multiword

semantic

pragmatic

Not usual

Not explored

Emerging

Term-level

(most popular)

PAMI Research Group, University of Waterloo


Document modeling3
Document Modeling

  • Bag-of-words (VSM): most popular document representation model

    • word sequence

    • weighting terms by their importance (based on frequency)

    • terms are independent and uncorrelated

  • Bag-of-words (VSM):Drawbacks

    • ignoring term dependencies and correlations

    • ignoring text structure

    • ignoring ordering of the words in the document

      • IR research shows that word ordering is not important.

    • ignoring grammar  language independent

  • Solutions: generalized VSM, LSI, Phrase based model, concept based representation

PAMI Research Group, University of Waterloo


Curse of dimensionality
Curse of Dimensionality

  • the number of training samples is exponential function of the number of features

  • For a fixed sample size, increasing the number of features may degrade the performance (Peaking Phenomenon)

  • Limited sample size leads the overfitting problem which implies the lack of generalization and low performance.

PAMI Research Group, University of Waterloo


Dimensionality reduction
Dimensionality Reduction

  • Feature extraction

    • employing all dimensions and measurement space to obtain a new transformed space (compacting feature space without removing any)

      • identifying important combination of the features (PCA, manifold learning, SVD and factor analysis)

      • low dimensional embeddings (random projections)

    • Pros and Cons

      + promising results

      + solid mathematical background

      - high complexity (time and space)

      • lack of scalability

      • fails in high dimensional problems of data mining

      • extracted features usually have no meaning.

PAMI Research Group, University of Waterloo


Dimensionality reduction1
Dimensionality Reduction

  • Feature selection

    • reducing the feature space dimensionality by removing useless, redundant, irrelevant and noise features

    • it is a problem of searching for a subset of features among the total number of features based on one or more performance index (objective function)

      Makrehchi and Kamel, IEEE SMC 07.

PAMI Research Group, University of Waterloo


New representation models
New Representation Models

  • Phrase Based Representation

    Document Index Graph(DIG)

    Hammouda and Kamel, KIS 2004, IEEE KDE 2004

  • Concept Based Representation

    Shehata, Karray and Kamel, ICDM 2006, KDD 07, WI07

PAMI Research Group, University of Waterloo


Concept-based Mining Model

PAMI Research Group, University of Waterloo


Concept based statistical analyzer
Concept-based Statistical Analyzer

  • Text Preprocessing

  • Separate sentences

  • Label terms

  • Remove stop-words

  • Stem words

  • Concept-based

  • Term Analysis

  • Term frequency (tf)

  • Conceptual term

  • frequency (ctf)

Text

Docs

Cluster

2

  • Clustering Techniques

  • Single Pass

  • HAC (ward)

  • HAC (complete)

  • k-NN

Concept-based

Document

Similarity

Cluster

1

Cluster

3

PAMI Research Group, University of Waterloo


Evaluation
Evaluation

F-measure of the HAC (Ward) (Higher is better)

Entropy of the HAC (Ward) (Lower is better)

PAMI Research Group, University of Waterloo


Evaluation cont
Evaluation (cont.)

F-measure of the k-NN

Entropy of the k-NN

PAMI Research Group, University of Waterloo


Classification
Classification

sports

set of objects

finance

Classifier

farming

  • Function that assigns an object to a class

  • Infer that “object X is about sports”

  • Automatically learn the function from a set of examples

Known Classes

PAMI Research Group, University of Waterloo


Classifiers
Classifiers

  • Template Matching: user need to supply template and metric

  • NMC: nearest class mean, simple, no training

  • K-NN: Asymptotically optimal, slow in testing

  • Bayes: yields simple classifier for Gaussian distributions

  • NN: nonlinear, sensitive to parameters, slow training

  • DT: binary, transparent, sensitive to overtraining

  • SVM: nonlinear, insensitive to overtraining, slow, good generalization

PAMI Research Group, University of Waterloo


Multiple classifier systems
Multiple Classifier Systems

  • Multiple classifier systems consist of a set of classifiers and a combination strategy.

  • Motivations:

    • Existence of many alternative classifiers each has its own feature and representation space

    • Existence of different training sets collected at different times and may even have different features.

    • Each classifier may have good performance in its own region of the feature space

    • Classifiers may have different patterns for making mistakes, even when they are trained on the same data

PAMI Research Group, University of Waterloo


Multiple classifier systems design
Multiple Classifier Systems Design

  • Design of MCS can be accomplished at 4 levels[Kuncheva 04]

    • Aggregation Level

    • Classifier Level

    • Feature level

    • Data Level

PAMI Research Group, University of Waterloo


Combining schemes
Combining Schemes

  • Static vs Adaptive, Fixed vs Trainable

  • Voting methods: Max, average, majority, Borda

  • Weighted average, fuzzy integrals, belief theory.

  • Decision Template, Behavior Knowledge space

  • Feature Base Architecture (Adaptive) (Wanas and Kamel 99-02) aggregation is trained and adapts to the data rather than postprocessing.

  • Data Level combining: partitioning technique for training multiple classifiers (Dara, .. and Kamel IF04, PR 06) that generates nearly optimal training partitions

PAMI Research Group, University of Waterloo


Imbalanced classes
Imbalanced Classes

NB AdaBoost AdaC1 AdaC2 AdaC3

C4.5 AdaBoost AdaC1 AdaC2 AdaC3

58.25 59.26 64.11 69.08 68.91

22.78 31.58 35.16 52.73 53.85

97.13 97.98 98.28 98.31 98.42

92.50 93.63 92.63 93.35 93.91

Acc

94.63 96.15 96.73 96.80 97.00

Acc

86.32 88.34 86.77 88.34 89.24

Sun and Kamel, ICDM 2006, PR 2007)

  • Data Set: SchoolNet

  • Class size ratio: 1/12

  • Performance measure: F-measure

  • Base classifier: Decision Trees C4.5

  • Data Set: 20-Newsgroup

  • Class size ratio: 1/15

  • Performance measure: F-measure

  • Base classifier: Naïve Bayesian

Performance on the small size class

Performance on the large size class

Observations:

  • Performance of the base classifier on the small class is poor

  • AdaBoost is capable to improve classification accuracy

  • AdaBoost does not guarantee the improved performance on the small class

  • AdaC2 and AdaC3 are effective in increasing the identification performance of the small class

PAMI Research Group, University of Waterloo


Dealing with time dependant data
Dealing with time dependant data

  • Time series data contains dynamic information and is difficult to be modelled by any individual representation methods

  • Traditional classifiers for time series data like Dynamic Time Warping (DTW) are not robust

  • Aggregating the decisions based on different representations could provide better and more reliable performances(Chen and Lei 2004-2006)

PAMI Research Group, University of Waterloo


Architecture
Architecture

PAMI Research Group, University of Waterloo


Experimental results
Experimental Results

PAMI Research Group, University of Waterloo


Clustering
Clustering

Inter-cluster distances are maximized

Intra-cluster distances are minimized

  • Finding groups of objects such that objects in a group are similar to one another and different from (dissimilar) objects in other groups

PAMI Research Group, University of Waterloo


Clustering approaches
Clustering Approaches

  • Hierarchal: single link

  • Partitional: K-means, Fuzzy K-means, Bisecting, VQ

  • Density based: DBScan, Chameleon

  • Agglomerative: starts from individual clusters then merge

  • Divisive: start from one and divide

  • Connectionest: SOM. ART

PAMI Research Group, University of Waterloo


Multi clustering
Multi-clustering

Overview of Combining Cluster Ensembles

PAMI Research Group, University of Waterloo


Cluster ensemble
Cluster Ensemble

  • Developed a prototype for cluster ensemble methods (Ayad and Kamel 2005-2007) include:- Generation of cluster ensembles based on: (1) multiple feature subsets, (2) statistical sampling techniques, and (3) variable number of clusters (multi-resolution ensembles).- Combiners of cluster ensembles based on (1) Shared nearest neighbors, (2) Different representations and distance measures between clusters, and (3) Voting.

  • Positive experimental results on text data, in addition to a variety of benchmark data for machine learning algorithms

PAMI Research Group, University of Waterloo


Categorization using cluster ensemble
Categorization using cluster ensemble

PAMI Research Group, University of Waterloo


Projects overview
Projects Overview

Image

Interaction Logs

Text Document

Text Document

Information Extraction

Analyzing content to extract relevant information

Categorization

Organizing LOs according to their content

Classification

- Traditional

- MCS

- Imbalanced

Keyword Extraction

Summarization

Concept Extraction

Social Network Analysis

- Traditional

- Ensembles

- Distributed

Clustering

Personalization

Providing user-specific results

Image Mining

Describing and finding relevant images

ReinforcementLearning

- Traditional

- Opposition- based

CBIR

- Traditional

- Fusion-based

Integration and Applications

Software Components

In Progress

Publications

Theme and Industry Collaboration

PAMI Research Group, University of Waterloo


Information extraction summarization
Information Extraction: Summarization

LO Content Package Summarization

  • Learning objects stored in IMS content pacakges are loaded and parsed. Textual content files are extracted for analysis.

  • Statistical term weighting and sentence ranking are performed on each document, and to the whole collection.

  • Top relevant sentences are extracted for each document.

  • Planned functionality: Summarization of whole modules or lessons (as opposed to single documents).

  • Benefits

    • Provide summarized overview of learning objects for quick browsing and access to learning material.

  • Scenarios

    • Learning Management Systems can call the summarization component to produce summaries for content packages.

  • Data is courtesy University of Saskatchewan

    PAMI Research Group, University of Waterloo


    Information extraction social network analysis
    Information Extraction: Social Network Analysis

    Social Network Builder

    • Tasks

      • Finding relationships between people based on their web pages

    • Progress

    • Modeling

      • Actors are represented by their associated documents

      • Links are modeled by

        • Pair-wise Similarity of the actors’ documents

        • Merging actors’ documents  relations are also modeled by documents

    • Learning

      • Some links are known learning social network is translated into text classification problem

      • No link is revealed  a clustering problem with very low performance

    PAMI Research Group, University of Waterloo


    Information extraction concept extraction
    Information Extraction: Concept Extraction

    Concept-Based Statistical Analyser

    Conceptual Ontological Graph (COG) Ranking

    PAMI Research Group, University of Waterloo


    Information extraction keyword extraction
    Information Extraction: Keyword Extraction

    Semantic Keyword Extraction

    • Tasks

      • Developing tools and techniques to extract semantic keywords toward facilitating metadata generation

      • Developing algorithms to enrich metadata (tags) which can be applied in index-based multimedia retrieval

    • Progress

      • Proposed a new information theoretic inclusion index to measure the asymmetric dependency between terms (and concepts), which can be used in term selection (keyword extraction) and taxonomy extraction (pseudo ontology)

    • Makrehchi, M. and Kamel, ICDM07, WI 07

    PAMI Research Group, University of Waterloo


    Information extraction keyword extraction1
    Information Extraction: Keyword Extraction

    • Rule base size shows quick initial growth, followed by slow and irregular growth and rule elimination

      • Learns 20 rules from the first 50 training rules

      • Learns 13 additional rules from the next 220 training rules

    Rule-based Keyword Extraction

    • Learn rules to find keywords in English sentences

    • Rules represent sentence fragments

      • Specific enough for reliable keyword extraction

      • General enough to be applied to unseen sentences

    • Rule generalization

      • Begin with an exact sentence fragment

      • Merge with another by moving different words to the lowest common level in the part-of-speech hierarchy

      • Keep merged rule if it does not reduce precision and recall of keyword extraction; keep original rules otherwise

    • Keyword extraction

      • Find sequence of rules that best cover an unseen sentence

      • Extract keywords according to rules

    • Both precision and recall values increase during training

      • Precision (blue) increases 10%

      • Recall (red) shows slight upward trend

    PAMI Research Group, University of Waterloo


    Categorization ensemble based clustering
    Categorization: Ensemble-based Clustering

    • Consensus Clustering

      • Categorization of learning objects using proposed consensus clustering algorithms.

      • The goal of consensus clustering is to find a clustering of the data objects that optimally summarizes an ensemble of multiple clusterings.

      • Consensus clustering can offer several advantages over a single data clustering, such as the improvement of clustering accuracy, enhancing the scalability of clustering algorithms to large volumes of data objects, and enhancing the robustness by reducing the sensitivity to outlier data objects or noisy attributes.

    • Tasks

      • Development of techniques for producing ensembles of multiple data clusterings where diverse information about the structure of the data is likely to occur.

      • Development of consensus algorithms to aggregate the individual clusterings.

      • Develop solutions for the cluster symbolic-label matching problem

      • Empirical analysis on real-world data and validation of proposed method.

    PAMI Research Group, University of Waterloo


    Categorization using cluster ensemble1
    Categorization using cluster ensemble

    PAMI Research Group, University of Waterloo


    Distributed environments
    Distributed Environments

    • Distributed Data MiningApplying Data Mining in an environment where the data, the mining process, or both are distributed.

    • Motivation

      • Natural distribution of data on the Web.

      • Scenarios that require the integration of disparate data and mining results are emerging (e.g. federation of repositories, news feed aggregation, digital libraries, business intelligence gathering, etc.)

      • Emerging technologies, such as Semantic Web, Web Services, Grid Computing, make it feasible to build distributed mining systems.

      • Availability of cheap low-end hardware that could be utilized in a distributed environment to achieve high-end goals (e.g. Google, [email protected], [email protected], etc.)

    PAMI Research Group, University of Waterloo


    Categorization distributed clustering
    Categorization: Distributed Clustering

    Hierarchical P2P Document Clustering

    • Peer nodes are arranged into groups called “neighborhoods”.

    • Multiple neighborhoods are formed at each level of the hierarchy.

    • This size of each neighborhood is determined through a network partitioning factor.

    • Each neighborhood has a designated supernode.

    • Supernodes of level h form the neibhorhoods for level h+1.

    • Clustering is done within neighborhood boundaries, then is merged up the hierarchy through the supernodes.

  • Benefits

    • Significant speedup over centralized clustering and flat peer-to-peer clustering.

    • Multiple levels of clusters.

    • Distributed summarization of clusters using CorePhrase keyphrase extraction.

  • Scenarios

    • Distributed knowledge discovery in hierarchical organizations.

  • HP2PC Architecture

    HP2PC Example3-level network, 16 nodes

    PAMI Research Group, University of Waterloo


    Categorization multiple classifier systems
    Categorization: Multiple Classifier Systems

    • Progress

      • Proposed a set of evaluation measures to select sub-optimal training partitions for training classifier ensembles.

      • Proposed an ensemble training algorithm called Clustering, De-clustering, and Selection (CDS).

      • Proposed and optimized a cooperative training algorithm called Cooperative Clustering, De-clustering, and Selection (CO-CDS).

      • Investigated the applications of proposed training methods (CDS and CO-CDS) on LO classification.

    • Tasks

      • To investigate various aspects of cooperation in Multiple Classifier Systems (Classifier Ensembles)

      • To develop evaluation measures in order to estimate various types of cooperation in the system

      • To gain insight into the impact of changes in the cooperative components with respect to system performance using the proposed evaluation measures

      • To apply these findings to optimize existing ensemble methods

      • To apply these findings to develop novel ensemble methods with the goal of improving classification accuracy and reducing computation complexity

    PAMI Research Group, University of Waterloo


    Categorization imbalanced class distribution
    Categorization: Imbalanced Class Distribution

    • Objective

      • Advance classification of multi-class imbalanced data

    • Tasks

      • To develop cost-sensitive boosting algorithm AdaC2.M1

      • To improve the identification performance on the important classes

      • To balance classification performance among several classes

    PAMI Research Group, University of Waterloo


    Categorization imbalanced class distribution1
    Categorization: Imbalanced Class Distribution

    Performance of Base Classification and AdaBoost

    Class Distribution

    Balanced performance among classes - Evaluated by G-mean

    PAMI Research Group, University of Waterloo


    Personalization
    Personalization

    • Opposition-based Reinforcement Learning for Personalizing Image Search

      • Developing a reliable technique to assist users, facilitate and enhance the learning process

      • Personalized ORL tool assists user to observe the searched images desirable for her/him

      • Personalized tool gathers images of the searched results, selects a sample of them

      • By interacting with user and presenting the sample, it learns the user’s preferences

    PAMI Research Group, University of Waterloo


    Personalization1
    Personalization

    PAMI Research Group, University of Waterloo


    Personalization2
    Personalization

    Opposition-based

    RL algorithms:

    OQ(lambda) (International

    Joint Conference

    on Neural Networks-2006)

    and

    NOQ(lambda) (IEEE Symposium

    on Approximate

    Dynamic Programming

    and Reinforcement Learning

    2007)

    PAMI Research Group, University of Waterloo


    Image mining cbir
    Image Mining: CBIR

    • Content based image retrieval

      • Build an IR system that can retrieve images based on: Textual Cues, Image content, NL Queries

    • Documents contain QI

    Image Retrieval

    Tool Set

    images

    • Images contain QT

    • Images match QI

    • NL Description of Image

    Rich

    Documents

    • Automated image tagging

    • Query Image QI

    • Query Text QT

    • Query Document

    PAMI Research Group, University of Waterloo


    Illustrative Example

    IZM

    FD

    Accuracy= 55%

    Accuracy= 70%

    Accuracy= 95%

    Accuracy= 60%

    The proposed approach

    MTAR

    PAMI Research Group, University of Waterloo


    Experimental results cont d
    Experimental Results (Cont’d)

    The Performance of the proposed approach

    PAMI Research Group, University of Waterloo


    Image mining cbir1
    Image Mining: CBIR

    Image

    Image

    Compound Document

    Compound Document

    Interface Module to TELOS

    LOR

    TELOS

    IKB-BLDR

    Image

    Admission

    Interface

    Query

    TELOS

    IR

    Text

    Response

    LO

    Image

    Repository

    PAMI Research Group, University of Waterloo


    Integration and applications
    Integration and Applications

    • Progress

      • Finished core parts of the common data mining framework.

      • Built components and services from theme researchers’ work around the data mining framework.

      • Provided documentation for the data mining framework and software components.

      • Launched web site to host components and documentation from Theme 4:http://pami.uwaterloo.ca/projects/lornet/software/

    PAMI Research Group, University of Waterloo


    Integration and applications1
    Integration and Applications

    • Progress

      • Core parts of the common data mining framework are available, including:

        • Vector and matrix manipulation.

        • Document parsing and tokenization.

        • Statistical term and sentence analysis.

        • Similarity calculation using multiple distance functions.

        • IMS Content Package compliant parser.

      • Components and tools built around the common data mining framework:

        • Metadata extraction from single documents; supports Dublin Core encoding.

        • Document similarity calculation using cosine similarity.

        • Single document and content package summarization.

        • Building of standard text datasets from large document collections.

      • Integration with TELOS:

        • Developed C# TELOS connector for integrating Theme 4 components.

        • Worked on component manifest specification with Theme 6.

        • Provided metadata extraction as part of a complete scenario for TELOS components integration.

        • The following components were wrapped for use by TELOS through the C# connector: Automatic Metadata Extractor, Document Similarity, and Document Summarizer.

    PAMI Research Group, University of Waterloo


    Theme and industry collaboration
    Theme and Industry Collaboration

    • Other LORNET themes

      • Providing tools for concept-based metadata extraction to SFU and U of Saskatchewan.

      • Providing tools for semantic-based ontology representation to SFU.

      • Providing tools for searching course content and discussion data provided by U of Saskatchewan.

      • Providing tools for comparing between course content and discussion board data provided by U of Saskatchewan.

    • Industry

      • Pattern Discovery Software (PDS) provided data mining software tools for use by researchers.

      • Vestech provided opportunities for researchers to work on speech technologies.

      • Desire2Learn opened job opportunities for LORNET researchers.

    PAMI Research Group, University of Waterloo


    Software components
    Software Components

    Overview of Components

    Scenarios for Use of Software Components

    • General Tools

      • C# Connector for TELOS

      • Common Data Mining Framework

    • Standard Text Mining Tools

      • Metadata Extractor

      • Document Summarizer

      • Content Package Summarizer

      • Document Similarity

      • LO Recommender

      • Metadata Harvester

      • Keyword Extractor

      • Taxonomy Extractor

      • Metadata Enrichment Tools

    • Concept-based and Semantic Text Mining Tools

      • Metadata Extractor

      • LO Search Engine

      • Document Similarity

      • Document Classifier

      • Document Clusterer

      • Semantic-based Ontology Representation

      • Semantic Metadata Matching

      • POS Rule-Learning System

      • Triplet Representation System

    • Categorization Tools

      • LO Classifier

      • LO Multiple Classifier

      • LO Clusterer

      • LO Ensemble Clusterer

      • LO Consensus Clusterer

      • LO Distributed Clusterer

    Environment

    Data Types

    Tasks

    TELOS

    • Metadata

    • Ontology

    • Ontology construction and unification

    • Finding relations between components

    • Ranking components

    • Grouping components

    • Tagging components

    Learning Object Repository

    • Metadata

    • Structured Text

    • Categorical

    • Automatic metadata extraction

    • LO automatic classification

    • LO organization through clustering

    • Multiple organization strategiesthrough cluster ensembles

    e-Learning Environment

    • Structured Text

    • Images

    • Object Relationships

    • Context

    • Extracting concepts from LO

    • Summarizing Documents

    • Grouping LOs

    • Tagging LOs

    • Discovering Similar Topics

    • Discovering Similar Peers

    • Building Social Networks

    • Detecting Plagiarism

    • LO recommendationusing similarity ranking

    • Personalization / Specialization through reinforcement learning

    • User-centric Tools

      • Personalized Search Engine

      • Social Network Learner

    • Image Mining Tools

      • Content-based Image Search

      • Personalized Image Search

      • Consensus-based Fusion for Image Retrieval

    Legend

    • Integrated

    • Ready

    • In Progress

    • Year 5

    PAMI Research Group, University of Waterloo


    Publications
    Publications

    PAMI Research Group, University of Waterloo


    Theme 4 team leader m kamel1

    PI’s:

    Dr. Basir

    Dr. Tizhoosh

    Researchers

    H. Ayad

    R. Kashef

    A. Ghazel

    Dr. Makhreshi

    Funding

    CRC/CFI/OIT

    NSERC

    PAMI Lab

    Dr. Karray

    Asso PI (Wong, DiMarco

    M. Shokri

    S. Hassan

    A. Farahat

    Dr. R. Khoury

    PDS,

    Vestech,

    Desire2Learn

    Theme 4 TeamLeader: M. Kamel

    Graduated

    • R. Khoury, PhD 07

    • L. Chen, PhD 07

    • M. Makhreshi,PhD 07

    • K.Hammouda,PhD 07

    • R. Dara, PhD 07

    • Y.Sun, PhD 07

    • K. Shaban, PhD 06

    • Y. Sun, PhD 06

    • M. Hussin, PhD 05

    • Jan Bakus, PhD 05

    • A. Adegorite, MA.Sc04

    • A. Khandani, MA.Sc05.

    • S. Podder, MA.Sc.04

    PAMI Research Group, University of Waterloo


    Pattern Analysis and Machine Intelligence Lab

    Electrical and Computer Engineering

    University of Waterloo

    Canada

    www.pami.uwaterloo.ca

    www.pami.uwaterloo.ca/projects/lornet/software/

    • www.pami.uwaterloo.ca/kamel.html

    publications

    PAMI Research Group, University of Waterloo


    ad