290 likes | 467 Views
BRAINSUP Brainstorming Support for Creative Sentence Generation. Gözde Özbal Carlo Strapparava FBK- irst Trento, Italy Daniele Pighin Google Inc. Zürich, Switzerland ACL 2013. Introduction. 在現實世界裡 , 創作是一件非常費時費力的事 廣告標語 : punchy, catchy, memorable 前人有做過類似的研究 , 但是都未提出一個統一的格式
E N D
BRAINSUP Brainstorming Support for Creative Sentence Generation GözdeÖzbal Carlo Strapparava FBK-irst Trento, Italy Daniele Pighin Google Inc. Zürich, Switzerland ACL 2013
Introduction • 在現實世界裡, 創作是一件非常費時費力的事 • 廣告標語: punchy, catchy, memorable • 前人有做過類似的研究,但是都未提出一個統一的格式 • 作者提出Brainsup, 一個可擴展的framework,使用者可以控制所有在創作過程中會使用到的參數,來更符合使用者的需求.
Architecture of BRAINSUP • 首先, 使用者可以選擇一定要出現在句子內的target words, 另外也可以選擇像是 • 特定的semanticdomain: 運動, 毯子… • 特定的emotiondomain: 喜悅, 憤怒,或者負面情緒 • 特定的color: 紅, 藍… • 字的phonetic properties:rhymes(押韻), alliterations (頭韻)and plosives(塞音) • 使用者輸入 U=<t, d, c, e, p,w> • 在target和domain words, 使用者可以選擇words所要考慮的詞性, 例如:“drink/verb” or “drink/verb,noun.
Architecture of BRAINSUP • Pattern selection • Searching the solution space • Filler selection and solution scoring
Architecture of BRAINSUP 最多/最少要產生幾個句子, 最多要考慮幾種pattern, 句子的最長長度… set of meta-parameters User input <t, d, c, e, p,w> 根據user input U, 從treebank L挑選符合pattern p的最佳解答 從curposP中挑選常見且符合使用者需求的patterns
Architecture of BRAINSUPPattern selection • 從corpus P中挑選出morpho-syntactic patterns • First:選擇corpus, 不同的corpus產生的句子其風格不同 • Second:用Stanford parser對corpus內句子做parse, 再將content words移除, 產生patterns,並記錄每種pattern在corpus中出現的次數
Architecture of BRAINSUPPattern selection 空格內可以填入使用者所選的target words嗎? target words t = [heading/VBG, edge/NN] X t = [heading/NN, edge/NN] V
Architecture of BRAINSUPPattern selection • 空格的數量必須大於targetwords的數量 CompatiblePatterns(.)slots > t, slots的最大/最小數量在Θ內控制, 另外, 為了避免同樣的inputs會產生相同的結果, sort algorithm內加入random component(一樣在Θ內控制) • CompatiblePatterns(.)最後依照patterns出現的次數(多少)回傳
Architecture of BRAINSUPSearching the solution space • 挑選完patterns之後, 再來要選擇每個空格內要填入哪些字(從dependencies數量最多的空格開始執行) 僅包含stop words, syntactic relations, morphologic constraints(POS tags)
Architecture of BRAINSUPSearching the solution space • 分析大型corpus L(資料為parsed sentences)並記錄head-relation-modifier(<h,r,m>) dependency relations出現次數(operator τr(h)) m h m h τ-1nsubj(fires) τamod(smoke)
Architecture of BRAINSUPSearching the solution space τ-1nsubj(fires) τ-1dobj(smoke) τ-1prep(in)
Architecture of BRAINSUPFiller selection and solution scoring • 得到候選字的lists之後, 再來要選擇填入哪些字分數最高且符合使用者的需求
Architecture of BRAINSUPFiller selection and solution scoring • 12 feature functions: • Chromatic and emotional connotation • C為使用者選定的color, si為句子中第i個word • Domain relatedness • d為使用者選定的domain, si為句子中第i個word
Architecture of BRAINSUPFiller selection and solution scoring • Semantic cohesion • 與Domain relatedness相同, 將domain d換成target words t • Target-words scorer • 強迫target words t 必須在sentence中出現
Architecture of BRAINSUPFiller selection and solution scoring • Phonetic features (plosives, alliteration and rhyme) • plosives:計算plosives在一個sentence中出現的比例 • alliteration:用trie來紀錄, ci表示node i走過的次數 • rhyme:和alliteration相同,不過在加入trie前先反轉
Architecture of BRAINSUPFiller selection and solution scoring • Variety scorer • calculated as the number of distinct words in the sentence over the size of the sentence • Unusual-words scorer • ci表示從另一個corpus V中每一個word si∈ s所觀察到的次數
Architecture of BRAINSUPFiller selection and solution scoring • N-gram likelihood • Dependency likelihood
Evaluation • Five experienced annotators were asked to rate 432 creative sentences • 1) Catchiness: is the sentence attractive, catchy or memorable? [Yes/No] • 2) Humor: is the sentence witty or humorous? [Yes/No]; • 3) Relatedness: is the sentence semantically related to the target domain? [Yes/No]; • 4) Correctness: is the sentence grammatically correct?[Ungrammatical/Slightly disfluent/Fluent]; • 5) Success: could the sentence be a good slogan for the target domain? [As it is/With minor editing/No].
Evaluation • Randomly selected a subset of these slogans and for each of them generated an input specification U
Evaluation • t: 從句子中隨機選2~3個 • d:commerical domain • e:positive • c:domain如果有極度相關的顏色才使用,不然就隨機選擇一個顏色 • 產生10個tuple<t, d, c, e, p>再配合5種不同的features組合
Evaluation • Base: Target-word scorer + N-gram likelihood + Dependency likelihood + Variety scorer + Unusual-words scorer + Semantic cohesion • Base + D: base + Domain relatedness • Base + D + C: base + D + Chromatic connotation • Base + D + E: base + D + Emotional connotation • Base + D + P: base + D + Phonetic features • 50種input各產生10句sentences,總共產生432句
Evaluation • weight: set heuristically • Target Word scorer: 1.0 • Variety and Unusual Word scorers: 0.99 • Phonetic Features, Chromatic/Emotional Connotation and Semantic Cohesion scorers :0.98 • Domain, N-gram and Dependency Likelihood scorers: 0.97 • Patterns :corpus of 16,000 proverbs • Dependency operators :British National Corpus • 只考慮字數不大於20個字,且裡面所有的字在wordnet中查得到的sentences
Evaluation - result • 有63個cases每一個dimensions都是標為YES,table 1的例子就是選自其中, 除了正確性外, 還可以觀察到許多修辭方法 • 隱喻:a summer sunshine • 雙關: lash your drama • 擬人化:lips and eyes want. • 語音特性的使用 • plosives :passionate kiss, perfect lips • alliteration:thedark drink • rhyme :lips and eyes wantthe kiss
Conclusion • 提出一個extensible framework Brainsup, 使用者可以依照個人需求定義參數. • 系統大量的使用dependency parsed data來保證創造出的句子符合句法性. • 雖然創造出的句子不一定完全符合使用者的需求,但至少會對使用者產生啟發作用.
Conclusion It is wiser to believe in sciencethan in everlasting love.