Capstone Project
Download
1 / 59

- PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Capstone Project Documents Management. Supervisor: Mr. Phan Trường Lâm. Students: Vũ Nhật Linh Lê Quang Hoàn Nguyễn Duy Quyền Hoàng Nam Nguyễn Thế Anh. Team information. Agenda. Introduction. Project plan. System Requirement Specifications. System Analysis and Design. Testing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - aldan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Supervisor mr phan tr ng l m

Capstone Project Documents Management

  • Supervisor:

  • Mr. PhanTrườngLâm

Students: VũNhậtLinh

LêQuangHoàn

NguyễnDuyQuyền

Hoàng Nam

NguyễnThếAnh



Agenda
Agenda

Introduction

Project plan

System Requirement Specifications

System Analysis and Design

Testing

Deployment and User Guide

Summary

Demo and Q&A


Introduction

Initial Idea

Literature Review of Existing System

Proposal & Product

Introduction

1

2

3

4

5

6

7

8


Initial idea
Initial Idea

1

2

3

4

5

6

7

8


Initial idea1
Initial Idea

1

2

3

4

5

6

7

8

We decide to develop a new system that integrated:

  • Collect documents

  • Organize these documents

  • Extract keyword

  • Ranking

  • Searching


Literature review of existing system
Literature Review of Existing System

1

2

3

4

5

6

7

8

  • Methods that these websites use

  • to build their systems:

    • Big database

    • Search

    • Ranking and highlight return results

    • Compare documents to detect plagiarism


Literature review
Literature Review

1

2

3

4

5

6

7

8

  • Achievements of the existing systems

    • Attractive

      • Easy to use

      • Speed & Reliability

      • Quality Results

      • Ensuring Security

    • Awareness

  • Limitations of the existing systems

    • Costs

    • Privacy


Proposal
Proposal

1

2

3

4

5

6

7

8

  • Collect and manage Capstone projects

  • Support looking up Capstone projects

  • Avoid repeating and copying idea

  • Ranking results

  • Refer to other materials

  • Friendly interface like Google

  • Public for everyone

  • Inside and outside University

  • Chipper to build

  • Free to use


Product
Product

1

2

3

4

5

6

7

8

Mobile application

(in future)

Web application


Project plan
Project Plan

1

2

3

4

5

6

7

8

Development environment

Process

Project organization

Project schedule

Risk management


Development environment
Development Environment

1

2

3

4

5

6

7

8

HARD WARE

2 Gb of RAM

100Gb of hard disk

Core 2 Duo 2.0 GHz

1 Gb of RAM

100Gb of hard disk

Core 2 Duo 2.0 GHz

SOFT WARE


Process
Process

1

2

3

4

5

6

7

8

  • Follow Waterfall model


Project organization
Project organization

1

2

3

4

5

6

7

8


Project organization1
Project organization

1

2

3

4

5

6

7

8

  • Controlling and Monitoring

    • Meeting

    • Assign task

    • Tracking task

    • Issue resolve

    • Review task

    • Report


Project organization2
Project organization

1

2

3

4

5

6

7

8

  • Communication control

    • Online activity

      • Email

      • Chat

      • Phone

    • Offline activity

      • Kick-Off project

      • Team building


Project schedule
Project Schedule

1

2

3

4

5

6

7

8

Overall plan


Risk management
Risk Management

1

2

3

4

5

6

7

8


System requirement specifications
System Requirement Specifications

1

2

3

4

5

6

7

8

  • User Requirements

  • System Requirements

Non-functional requirements


User requirements
User Requirements

1

2

3

4

5

6

7

8

  • Lecturers and Students:

    • Search project documents.

    • Download documents.

  • Librarians:

    • Edit profile.

    • Search documents.

    • Add/Edit/Delete document.

    • Add/Edit/Delete category.

  • Administrator

    • Edit profile.

    • Add/Edit/Delete account.


User requirements1
User Requirements

1

2

3

4

5

6

7

8

  • Other requirement

    • Searched results will be ranked.

    • Document has following information:

      • Name

      • Author

      • Supervisor

      • Category

      • Description


User requirements2
User Requirements

1

2

3

4

5

6

7

8

  • Input files:

    • Keyword file

    • Abstract file

    • Full document file

    • Other materials


System requirements
System Requirements

1

2

3

4

5

6

7

8

  • Communicate via the protocol HTTP to complete interactions based on service with client computers and use standard protocols.

  • Configuration

    • Server: Windows Server 2008 operating system

      .NET framework 3.5

      SQL server 2008

      IIS 7

    • Client: Web browser


Non functional requirements

Usability

Availability

Reliability

Security

Performance

Security

Maintainability

Non-functional Requirements

1

2

3

8

5

6

7

4

Non-functional Requirement


System analysis and design
System Analysis and Design

1

2

3

4

5

6

7

8

  • Architectural design

  • Detail design

  • Database design

  • Coding convention

  • Extract Keyword algorithm

  • Ranking


Architectural design
Architectural design

1

2

3

4

5

6

7

8

MVC architecture design pattern

Overall architecture


Detail design
Detail design

1

2

3

4

5

6

7

8

CProDMS Component Diagram


Database design
Database design

1

2

3

4

5

6

7

8

Entity diagram


Coding convention
Coding convention

1

2

3

4

5

6

7

8

  • Follow:

  • Microsoft .NET Library Standards

  • FxCop rules and Code Analysis for Managed Code Warnings


Extract keyword algorithm

Study Algorithm

Introduction

Evaluation

Extract Keyword Algorithm

1

2

3

4

5

6

7

8

Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information

(YUTAKA MATSUO and MITSURU ISHIZUKA)

(Dec. 10, 2003)


Algorithm what is the keyword

Meaning

Algorithm – What is the keyword?

1

2

3

4

5

6

7

8

Keyword

Frequency

Position


Algorithm step by step
Algorithm – Step by step

1

2

3

4

5

6

7

8

Discard stop words

Stem

Extract frequency

Preprocessing

Calculate

X’2 value

Expected probability

Select frequent term

Processing

Output


Algorithm studying

Original Text

Information is the most powerful weapon in the modern society. Every day we are overflowed with a huge amount of data in form of electronic newspaper articles, emails, web pages and search results. Often, information we receive is incomplete, such that further search activities are required to enable correct interpretation and usage of this information.

Algorithm – Studying

1

8

7

2

5

4

3

6

Step2

Example:

Step1

Stemmed Words

Information powerful

weapon modern society

day overflowed

huge amount data

electronic newspaper

articles emails web pages

search results Often

information receive

incomplete such further

search activities required

enable correct interpretation

usage information

Informatpower

weapon modern societi

day overflow

huge amoun data

electronic newspaper

articl email web page

search result Often

informat receive

incomplet such further

search activrequir

enable correct interpret

usaginformat

Discarded Stop Words

Information is the most powerful weapon in the modern society.Every day we are overflowed withahuge amount ofdata inform of electronic newspaper articles,emails, web pages and search results.Often, information we receive isincomplete,such thatfurther search activities are required to enable correct interpretation and usage of this information.

Information is the most powerful weapon in the modern society. Every day we are overflowed with a huge amount ofdata inform of electronic newspaper articles,emails, web pages and search results.Often, information we receive isincomplete,such thatfurther search activities are required to enable correct interpretation and usage of this information.

Using Porter Stemming Algorithm


Algorithm studying1
Algorithm – Studying

1

2

3

4

5

6

7

8

Select frequent Term

As study, number of keyword is about 10% number of term in document and no more than 30 terms.

The top ten frequent terms (denoted as G) and the probability of occurrence, normalized so that the sum is to be 1.


Algorithm studying2
Algorithm – Studying

1

2

3

4

5

6

7

8

Co-occurrence and Importance

Two terms in a sentence are considered to co-occur once.

  • Example:

    • The imitation game could then be played with the machine in question and the mimicking digitalcomputer and the interrogator would be unable to distinguish them.

“imitation” and “digital computer” have one co-occurrence


Algorithm studying3
Algorithm – Studying

1

2

3

4

5

6

7

8

Co-occurrence and Importance


Algorithm studying4
Algorithm – Studying

1

2

3

4

5

6

7

8

Co-occurrence and Importance

The degree of biases of co-occurrence can be used as a indicator of term importance


Algorithm studying5
Algorithm – Studying

1

2

3

4

5

6

7

8

The statistical value of χ2 is defined as

pgUnconditional probability of a frequent term g ∈ G

(the expected probability)

nwThe total number of co-occurrence of term w and frequent terms G

freq (w, g)Frequency of co-occurrence of term w and term g


Algorithm studying6
Algorithm – Studying

1

2

3

4

5

6

7

8

We consider the length of each sentence and revise our definitions

pg (the sum of the total number of terms in sentences where g appears) divided by (the total number of terms in the document)

nwThe total number of terms in the sentences where w appears including w


Algorithm studying7
Algorithm – Studying

1

2

3

4

5

6

7

8


Algorithm studying8
Algorithm – Studying

1

2

3

4

5

6

7

8

the following function to measure robustness of bias values

Subtracts the maximal term from the X2 value


Algorithm studying9
Algorithm – Studying

1

2

3

4

5

6

7

8


Algorithm studying10
Algorithm – Studying

1

2

3

4

5

6

7

8

  • To improve extracted keyword, we will cluster terms

  • Two major approaches (Hofmann & Puzicha 1998) are:

  • Similarity-based clustering

  • If terms w1 and w2 have similar distribution of co-occurrence with other terms, w1 and w2 are considered to be the same cluster.

  • Pairwise clustering

  • If terms w1 and w2 co-occur frequently, w1 and w2 are considered to be the same cluster.

Eg: Monday is a day in week.

Tuesday is a day in week.

Wednesday is a day in week.


Algorithm studying11
Algorithm – Studying

1

2

3

4

5

6

7

8

Similarity-based clustering centers upon Red Circles

Pairwise clustering focuses on Green Circles


Algorithm studying12
Algorithm – Studying

1

2

3

4

5

6

7

8

Similarity-based clustering

Cluster a pair of terms whose Jensen-Shannon divergence is

Where:

and:


Algorithm studying13
Algorithm – Studying

1

2

3

4

5

6

7

8

Pairwise clustering

Cluster a pair of terms whose mutual information is

Where:


Algorithm evaluation
Algorithm – Evaluation

1

2

3

4

5

6

7

8

Precision: Ratio of right keyword to number of keyword

Coverage: Ratio of indispensable keyword in list to all the indispensable terms

Frequency index: average frequency of keyword in list


Ranking why
Ranking – Why?

1

2

3

4

5

6

7

8

Ranking Result


Ranking
Ranking

1

2

3

4

5

6

7

8


Ranking1
Ranking

1

2

3

4

5

6

7

8

Frequency of Term t in the given document

Total number of documents that contain Term t

Use rank calculate formula Term in a collection documents:

( Automatic Keyword Extraction for Database Search

First examiner : Prof. Dr. techn. Dipl.-Ing. Wolfgang Nejdl

Second examiner : Prof. Dr. Heribert Vollmer

Supervisor : MSc. Dipl.-Inf. Elena Demidova)

R(t) = Fd(t)*log(1 + N/N(t)) (1)

Rank of Term t in document, which extracted by Extract Service

reliability coefficient

Ranking formula :

Rank = d * Rd(t) / R(t) (2)

=> Rank = d * Rd(t) / (Fd(t)*log(1 + N/N(t))) (3)

Rank of Term t in all the collection

Total number of documents in the collection


Searching
Searching

1

2

3

4

5

6

7

8


Testing
Testing

1

2

3

4

5

6

7

8

V - model


Testing1
Testing

1

2

3

4

5

6

7

8


Testing2
Testing

1

2

3

4

5

6

7

8

Test result


Supervisor mr phan tr ng l m

Deployment

  • Package Source Code

  • Client side

  • Server side


User guide
User guide

1

2

3

4

5

6

7

8


Summary
Summary

1

2

3

4

5

6

7

8

  • Strong point

    • Enthusiasm

    • Creative

    • Cope with change

      • Weak point

        • Lack of technical skill

        • Lack of management skills

  • Lessons learned

    • Improve technical & management skills

    • Release on-time product with the restriction of time and resource

    • Improve communication skills & problem solving


  • Demo q a
    Demo & Q&A

    1

    2

    3

    4

    5

    6

    7

    8