ringkasan dokumen n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
RINGKASAN DOKUMEN PowerPoint Presentation
Download Presentation
RINGKASAN DOKUMEN

Loading in 2 Seconds...

play fullscreen
1 / 32

RINGKASAN DOKUMEN - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

RINGKASAN DOKUMEN. SHINTA P. Pendahuluan. Apa hal pertama yang Anda baca dalam sebuah novel? Memberikan ringkasan halaman web diambil terkait dengan permintaan pengguna. Diperlukan mesin peringkas otomatis. Ringkasan yang dihasilkan manusia yang mahal. Informasi: Headline news.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'RINGKASAN DOKUMEN' - hadar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pendahuluan
Pendahuluan
  • Apa hal pertama yang Anda baca dalam sebuah novel? Memberikan ringkasan halaman web diambil terkait dengan permintaan pengguna.
  • Diperlukan mesin peringkas otomatis.Ringkasan yang dihasilkan manusia yang mahal.
informasi headline news
Informasi: Headline news

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

tv guides pengambilan keputusan
TV-GUIDES — Pengambilan Keputusan

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

abstracts of papers menghemat waktu
Abstracts of papers — Menghemat waktu

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

graphical maps orientasi
Graphical maps — Orientasi

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

slide8

TRIBUNNEWS.COM, BANDA ACEH -  Sembilan personel girlband Cherrybelle konser di Hotel Hermes Palace, Banda Aceh, Selasa (30/4/2013) malam.

Dalam konser tadi malam mereka membawakan lagu dari album pertama maupun album terbarunya, yaitu Diam-diam Suka. Saat tampil membawakan lagu pertama Best Friend Forever, para fans Cherrybelle menyambutnya dengan histeris.

Ketika tampil di atas panggung, sembilan perempuan imut ini terlihat memakai busana yang berbeda. Tidak seperti tampil di daerah lain, baju minimalis mereka pun ditanggalkan. Sebagai gantinya blus lengan panjang dengan warna-warna kalem membalut tubuh mereka.

Bagaimana dengan tatanan kepala? Cherrybelle tampil polos tanpa aksesoris apa pun menempel di rambutnya. Hanya terlihat gaya ikat ekor kuda dan sebagian jepit. Tak ada pula kerudung atau hijab di kepala mereka, sebagaimana layaknya penampilan perempuan Aceh lainnya.

Barulah saat di sela-sela persiapan menjelang tampil tadi malam, personel Cherrybelle menyisihkan waktu untuk melayani pertanyaan Serambinews.com (Tribunnews.com Network). Saat sesi wawancara dan foto barulah mereka mengenakan kerudung putih/

Dalam wawancara singkat mereka katakan bahwa Banda Aceh merupakan kota pertama yang mereka kunjungi dalam roadshow ini.

Mereka langsung tertarik pada Kota Banda Aceh sejak pertama kali tiba di Bandara Sultan Iskandar Muda yang berlokasi di Blangbintang, Aceh Besar. Konser di Aceh ini merupakan rangkaian dari agenda roadshow Cherrybelle Beat Indonesia di 33 provinsi selama 31 hari.

"Kami tuh sudah tertarik sama Banda Aceh sejak menginjakkan kaki di bandara. Bandaranya bagus banget, beda kali dengan bandara di kota lain. Di sini atapnya berbentuk kubah masjid, indah banget," puji mereka kompak saat wawancara eksklusif dengan Serambinews.com.

slide9

Purpose

  • Indicative vs Informative
    • Indicative - indicates types of information (“alerts”)
      • “The work of Consumer Advice Centres is examined…”
    • Informative
      • “The work of Consumer Advice Centres was found to be a waste of resources due to low availabily…”
    • Critic / Evaluative
      • Evaluates the content of the document
slide10
Form
  • Abstrak (IR)
  • Ekstrak( IE)
dimension
Dimension
  • Single Doc
  • Multi Doc
konteks
Konteks
  • Query Independent
  • Query Spesifik
pendekatan
Pendekatan
  • Shallow Approach
    • Hanya pada permukaan dokumen
    • Hasil berupa Sentence Extraction
    • Bisa out of Context
  • Deep Method
    • Hasil berupa Abstrak
computational approach basics
Computational Approach: Basics
  • Bottom-Up:
  • I’m dead curious: what’s in the text?
  • Pengguna ingin mendapatkan semua info penting.
  • System butuh data2 yang penting untuk pencarian.

Top-Down:

  • I know what I want! — don’t confuse me with drivel!
  • Pengguna hanya ingin jenis info tertentu.
  • Sistem membutuhkan kriteria tertentu yang menarik, digunakan untuk memfokuskan pencarian.

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

top down info extraction ie
Top-Down: Info. Extraction (IE)
  • IE task: Given a form and a text, find all the information relevant to each slot of the form and fill it in.
  • Summ-IE task: Given a query, select the best form, fill it in, and generate the contents.
  • Questions:
    • 1. IE works only for very particular forms; can it scale up?
    • 2. What about info that doesn’t fit into any form—is this a generic limitation of IE?

xx xxx xxxx x xx xxxx

xxx xx xxx xx xxxxx x

xxx xx xxx xx x xxx xx

xx xxx x xxx xx xxx x

xx x xxxx xxxx xxxx xx

xx xxxx xxx

xxx xx xx xxxx x xxx

xx x xx xx xxxxx x x xx

xxx xxxxxx xxxxxx x x

xxxxxxx xx x xxxxxx

xxxx

xx xx xxxxx xxx xx x xx

xx xxxx xxx xxxx xx

xxxxx xxxxx xx xxx x

xxxxx xxx

Xxxxx: xxxx

Xxx: xxxx

Xxx: xx xxx

Xx: xxxxx x

Xxx: xx xxx

Xx: x xxx xx

Xx: xxx x

Xxxx: xx

Xxx: x

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

bottom up info retrieval ir

xx xxx xxxx xxx xxxx

xxx xx xxx xx xxxxx x

xxx xx xxx xx x xxx xx

xx xxx x xxx xx xxx x

xx x xxxx xxxx xx

xx xxxx xxx

xxx xx xx xxxx x xxx

xx x xx xx xxxxx x x xx

xxx xxxxxx xxxxxx x x

xxxxxxx xx x xxxxxx

xxxx

xx xx xxxxx xxx xx x

xx xxxx xxx xxxx xx

xxxxx xxxxx xx xxx x

xxxxx xxx

Bottom-Up: Info. Retrieval (IR)
  • IR task: Given a query, find the relevant document(s) from a large set of documents.
  • Summ-IR task: Given a query, find the relevant passage(s) from a set of passages (i.e., from one or more documents).
  • Questions:
    • 1. IR techniques work on large volumes of data; can they scale down accurately enough?
    • 2. IR works on words; do abstracts require abstract representations?

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

paradigms ie vs ir

IR:

  • Approach: operate at word level—use word frequency, collocation counts, etc.
  • Need: large amounts of text.
  • Strengths: robust; good for query-oriented summaries.
  • Weaknesses: lower quality; inability to manipulate information at abstract levels.
  • IE:
  • Approach: try to ‘understand’ text—transform content into ‘deeper’ notation; then manipulate that.
  • Need: rules for text analysis and manipulation, at all levels.
  • Strengths: higher quality; supports abstracting.
  • Weaknesses: speed; still needs to scale up to robust open-domain summarization.
Paradigms: IE vs. IR

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

the optimal solution
The Optimal Solution...

Combine strengths of both paradigms…

...use IE/NLP when you have suitable form(s),

...use IR when you don’t…

…but how exactly to do it?

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

a summarization machine
A Summarization Machine

MULTIDOCS

DOC

QUERY

50%

Very Brief

Brief

Headline

10%

100%

Long

ABSTRACTS

Extract

Abstract

?

Indicative

Informative

CASE FRAMES

TEMPLATES

CORE CONCEPTS

CORE EVENTS

RELATIONSHIPS

CLAUSE FRAGMENTS

INDEX TERMS

Generic

Query-oriented

EXTRACTS

Just the news

Background

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

the modules of the summarization machine
The Modules of the Summarization Machine

MULTIDOC

EXTRACTS

E

X

T

R

A

C

T

I

O

N

I

N

T

E

R

P

R

E

T

A

T

I

O

N

G

E

N

E

R

A

T

I

O

N

F

I

L

T

E

R

I

N

G

ABSTRACTS

DOC

EXTRACTS

CASE FRAMES

TEMPLATES

CORE CONCEPTS

CORE EVENTS

RELATIONSHIPS

CLAUSE FRAGMENTS

INDEX TERMS

?

EXTRACTS

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

karakteristik ringkasan
Karakteristik Ringkasan
  • Pengukuran
  • - Compression Rate = panjang ringkasan/ panjang doc asil
  • 2. Keinformatifan
  • - Kepercayan pada summber,bias atau tidak khususnya bagi yang bersifat evaluatif
  • 3. Bentuk yang baik
  • Perbaikan terhadap dagling, kalimat tidak nyambung, dan Anaphora (ketidak jelasan reference)
langkah langkah
Langkah-langkah:

Kalimat terpilih

Koleksi Docs

Ringkasan Koheren :

dapat dibaca dan dimengerti maksudnya

Pilih Kalimat :

Metode Zift, Tf Idf, etc

Ekstrak:

Urutkan kalimat sesuai lokasi, lakukan smoothing,

Ubah jadi kalimat yang baik

typical 3 stages of summarization
Typical 3 Stages of Summarization

1. Topic Identification: find/extract the most important material

2. Topic Interpretation: compress it

3. Summary Generation: say it in your own words

…as easy as that!

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

some definitions
Some Definitions
  • Language:
    • Syntax = grammar, sentence structure sleep colorless furiously ideas green — no syntax
    • Semantics = meaning colorless green ideas sleep furiously — no semantics
  • Evaluation:
    • Recall =how many of the things you should have found/did, did you actually find/do?
    • Precision = of those you actually found/did, how many were correct?

SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA

metode evaluasi
Metode Evaluasi
  • Intriksi,
    • Menguji sendiri dengan kriteria tertentu:
      • Koherens, mudah dibaca dn dimengerti
      • Informatifness, dapat memberikan informasi tentang doc asli
  • Ekstrinsi
    • Menguji sistem dalam hubungannya dengan tugas lain dengan meminta orang lain untuk mengevaluasi.
summarizer 1
SUMMARIZER 1
  • The main steps of SUMMARIZER 1 are:
    • For each sentence i S, compute the relevance measure between Si and D: Inner Product, or Cosine Similarity, or Jaccard coefficient.
    • Select sentence Sk that has the highest relevance score and add it to the summary.
    • Delete Sk from S, and eliminate all the terms contained in Sk from the document vector and S vectors. Re-compute the weighted term-frequency vectors (D and all Si).
    • If the number of sentences in the summary reaches the predefined value, terminate the operation: otherwise go to step 1.

A. Bellaachia

summarizer 2
SUMMARIZER 2
  • This summarizer is the simplest among all the proposed techniques.
  • It uses the TF*IDF weighting schema to select sentences.
  • It works as follows:
    • Create the weighted term-frequency vector Si for each sentence i S using TF*IDF (Term frequency * Inverse Document Frequency).
    • Sum up the TF*IDF score for each sentence and rank them.
    • Select the predefined number of sentences in the summary from S.

A. Bellaachia

summarizer 3
SUMMARIZER 3
  • This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.
  • K-means:
    • Start with random position of

K centroids.

    • Iteratre until centroids are stable
    • Assign points to centroids
    • Move centroids to centerof assign points

Iteration = 0

A. Bellaachia

summarizer 3 cont d
SUMMARIZER 3 (Cont’d)
  • This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.
  • K-means:
    • Start with random position of

K centroids.

    • Iteratre until centroids are stable
    • Assign points to centroids
    • Move centroids to centerof assign points

Iteration = 1

A. Bellaachia

summarizer 3 cont d1
SUMMARIZER 3 (Cont’d)
  • This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.
  • K-means:
    • Start with random position of

K centroids.

    • Iteratre until centroids are stable
    • Assign points to centroids
    • Move centroids to centerof assign points

Iteration = 2

A. Bellaachia

summarizer 3 cont d2
SUMMARIZER 3 (Cont’d)
  • This summarizer uses the popular k-means clustering algorithm where k is the size of the summary.
  • K-means:
    • Start with random position of

K centroids.

    • Iteratre until centroids are stable
    • Assign points to centroids
    • Move centroids to centerof assign points

Iteration = 3

A. Bellaachia

summarizer 3 cont d3
SUMMARIZER 3 (Cont’d)
  • This summarizer works as follows:
    • Create the weighted term-frequency vector Ai for each sentence Si using TF*IDF.
    • Form a sentences-by-terms matrix and feed it to the K-means clustering algorithm to generate k clusters.
    • Sum up the TF*IDF score for each sentence in each cluster.
    • Pick the sentence with the highest TF*IDF score from within each cluster and add it to the summary.

A. Bellaachia