1 / 20

Functional Programming

13 Miscellaneous. Functional Programming. More…. Computer languages ranking http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&lang2=sbcl

whitney
Download Presentation

Functional Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 13 Miscellaneous Functional Programming

  2. More… • Computer languages ranking • http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&lang2=sbcl • “Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot. This is the same argument you tend to hear for learning Latin. It won't get you a job, except perhaps as a classics professor, but it will improve your mind, and make you a better writer in languages you do want to use, like English.”

  3. Introduction • Search papers using keywords “text summarization” “summarizer”, … • “Summarizing based on concept counting and hierarchy analysis”, H. Ji, Z. Luo, M. Wan, and X. Gao, IEEE SMC. • An effective English text summarizing system • Concept extraction • Semantic analysis

  4. Introduction • Methods of weighting sentences • Position information • Cue words • Word counting • Lexical chains • Structural information • Heuristic rules

  5. Introduction • Word counting based Vector Space Model (VSM) is the leading method • Every sentence is corresponded to a vector S(T1, W1, T2, W2, …, Tn, Wn) • Ti is a word in the text and Wi is the frequency of Ti in S • Miss the semantic relations between words

  6. Summarization • E.g. a text about Bayesian Network • The topic “network” is expressed by the words “network” , “net”, and “system” • When considering whether “network” is a topic, word counting based VSM misses the reflection of “net” and “system” • This paper construct VSM and extract abstracts based on concept counting instead of word counting

  7. Concept counting algorithm • Concept hierarchy tree • A concept is a generalization of particular instances on the abstract level • A concept may be corresponded to a word or several semantically related words

  8. Concept counting algorithm • Concept hierarchy tree

  9. Concept counting algorithm • Selection of topic concepts • A topic concept should possess generalization power of its son concepts • Three evaluation parameters • S-Frequency • T-Frequency • Conclusion Rate

  10. Concept counting algorithm • Concept S-Frequency • F(Wi) is the frequency of word Wi in the text{W1, W2, W3, …, Wn} : the words belong to concept C • Concept T-Frequency • {A1, A2, A3, …, An} : the offspring nodes of C

  11. Concept counting algorithm • Concept Conclusion Rate • {S1, S2, S3, …, Sn} : the son concepts of C • Higher R(C) represents more generalization of a parent node C → it is more reasonable to use C as the topic concept

  12. Concept counting algorithm • Concept Selection Rate • In the experiments, α=1, β=0.25,γ=1, and δ=0.5 • FS(“subject_matter”)=0;FT(“subject_matter”)=11;R(“subject_matter”)=1-6/11=0.45;Sel(“subject_matter”)=(log1+0.25log12)(0.45+0.5)=0.256

  13. Concept counting algorithm • 1. Place all the nodes on the second level into CandConceptSet; • 2. Take the node with maximum selection rate --- C from CandConceplSet, if CandConceptSet is empty, then end; • 3. If C is a leaf node, then place C into TopicConceptSet, go to 2; • 4. If Sel(C) >= SelThreshold, then:(1) Add C into TopicConceptSet;(2) Delete the son tree rooted as C from CHT. and ;count the parameters ofrelated nodes again. go to 2 • 5. If Sel(C) < Serrhereshold, add the son concepts of C into CandConceptSet, go to 2.

  14. Concept counting algorithm • After the selection step, we obtain possible topic concepts, e.g., {language, subject_matter, performance, summarization, punctuation, text software, macro} • Importance of topic concepts • In order to compute the importance of the sentences in the text → compute the importance of every topic concept I(Ti) firstly • FT(Ti) is the number of words that express topic Ti • λTis 1.2 if Tiis at title position, and is 1 otherwise ;評估那個topic是重要的

  15. Summarization • Topic concept based VSM • For each sentence S in the text, every word in S is corresponded to its related topic concept • S can be represented by a node in a n-dimension vector space: S(T1, W1; T2, W2; …, Tn, Wn)Tiis a topic concept of S and Wiis the frequency of Ti in S • After VSM is built, we compute the importance of every sentence and extract the most important sentences to form the abstract; 評估重要的topic在這個句子出現的頻率

  16. Summarization • ;算某個句子的重要性 λposis the position weight of S and λparis the importance of the paragraph S • At last the sentences are sorted according to their importance and the abstract draft is composed of the sentences with highest importance

  17. Summarization • Topic concept based partition ;考慮多主題的狀況 • If we extract sentences only by their importance, the structure of abstract may be unbalanced • Especially for a multi-topic text • A multi-topic text includes several concept hierarchy trees • P(Tr1,V1;Tr2,V2;......Trn,Vn) Tri is a concept hierarchy tree, and Vi is the frequency of the topic concepts of P which are located in Tri • Compute the similarity between Pi and Pj to decide which continuous paragraphs are “topic part” • Extract important sentences based on topic parts to make up the last abstract

  18. Experiment Results • Measure the performance Nhm: number of sentences extracted that also appear in the human summary Nh: number of the sentences in the human summary Nm: number of the sentences extracted by a special method α: relative importance of R and P

  19. Experiment Results • Fmeasurevalue of two methods

  20. Other Features • Sentence Length Cut-off Feature: • Short sentences tend not to be included in summaries • Fixed-Phrase Feature • Sentences containing any of a list of fixed phrases, mostly two words long (e.g., “this letter...”, “In conclusion...” etc.) • Paragraph Feature • This feature records information for the first ten paragraphs and last five paragraphs in a document • Thematic Word Feature • The most frequent content words are defined as thematic words • Uppercase Word • Proper names are often important

More Related