1 / 14

Scientific and technical achievements

Scientific and technical achievements. The FIRST Consortium. Data acquisition pipeline (DacqPipe). Syntactic analysis. Semantic preprocessing. HTML tokenizer. HTML tokenizer. Filter. OBIE. OBIE. Language detector. DB writer. B ' plate remover & duplicate detector. DB writer.

chas
Download Presentation

Scientific and technical achievements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific and technical achievements The FIRST Consortium FIRST Y3 Review Meeting

  2. Data acquisition pipeline (DacqPipe) Syntactic analysis Semantic preprocessing HTML tokenizer HTML tokenizer Filter OBIE OBIE Language detector DB writer B'plate remover & duplicate detector DB writer B'plate remover & duplicate detector Language detector Filter NLP pipe NLP pipe Read & parse Clean Store Emit • Resembles big data streaming architectures such as Twitter Storm • Running continuously since April 2011 • Several scientific contributions • Boilerplate remover & gold standard dataset • Ontology & ontology-based information extractor • Executable available at http://first.ijs.si/software/DacqPipeJun2013.zip • Source code: https://github.com/project-first/dacqpipe 0MQchannel RSS reader RSS reader DB FIRST Y3 Review Meeting

  3. Dataset of news & blogs • Since April 2011 • Data from 219 Web sites; 3,159 RSS feeds • Roughly 15 million unique documents collected • Actively used by many non-FIRST people, basis for new projects • Available at http://first.ijs.si/FIRSTDataset FIRST Y3 Review Meeting

  4. Knowledge-based sentiment analysis • Sentence-level knowledge-based approach • Glass box: detailed drill-down capability • Best paper award at CEC 2011 • Gold standard sentiment corpus (evaluation, hybrid model) • First such attempt in the financial domain and on this scale (connected to DacqPipe) • Source code & rules: https://github.com/project-first/semanticinformationextraction FIRST Y3 Review Meeting

  5. Quantitative/qualitative models Quantitative models • Quantitative models for document categorization, pump-and-dump detection, and Twitter sentiment classification • More or less black boxes • Source code • https://github.com/project-first/documentcategorizerdemo • https://github.com/project-first/pumpanddumpclassifierdemo P&D model / FIRST Y3 Review Meeting

  6. Quantitative/qualitative models Quantitative models Qualitative models • Qualitativemulti-attribute models for reputational risk assessment and pump-and-dump detection • Rule-based, developed by domain experts • Glass box: drill-down, what-if analysis… • Integrated into use-case prototypes • Best paper award at Bled eConference 2013 • Source code: https://github.com/project-first/rimmodel Streams P&D model / FIRST Y3 Review Meeting

  7. Visualization API • Comprehensive technical documentation with examples • Both data sources • News & blogs • Twitter • Easy to use, used in use-case prototypes • Available at http://first.ijs.si/VisApi/indexVis.html FIRST Y3 Review Meeting

  8. b-next: Market manipulation prototype • Java-based software prototype for capital market surveillance • Two new & unique market abuse scenarios based on unstructured information • Alerts based on individual threshold configurations • Exploration of suspicious market constellations based on alerting and visualisation components • Positive end-user feedback • Customers think that these scenarios are real problems and need to be addressed FIRST Y3 Review Meeting

  9. MPS: Reputational risk prototype • Sentimentanalysisincluded in reputationalriskmodule on financialcounterparts • Data sources: a mix of structured (Basel II, Pillar 3) and unstructured information (Web sources) • RIM model is fully scalable (by counterpart and by financial product) • Visualisation tools to support decisions • …fills in a methodological gap in quantitative reputational risk assessment for financial institutions • …can fulfill also the needs of non-financial organisations • Available at http://first-vm1.ijs.si/mps FIRST Y3 Review Meeting

  10. IDMS: Retail brokerage prototypes • Additional indicators based on sentiment and tweet volume • Content exploration and drill-down • Exploring lagged correlations • Trading volume : tweeting volume • Price : sentiment polarity • Positive feedback from potential customers FIRST Y3 Review Meeting

  11. Sovereign debt prototype • New§tream: Web-based visual interface for exploratory news analysis • When, how much, with which sentiment? • Volume and sentiment charts, canyon flows, tag clouds, drill-down • Use cases • Globalization of local news • Effects of news on CDS • Available at http://first.ijs.si/Occurrences (http://newstream.ijs.si) FIRST Y3 Review Meeting

  12. Sentify portal • End-user oriented GUI as an entry point to showcase the FIRST results • Document navigation and sentiment drill-down • Exploration of aggregated sentiment data • Comparative analysis between fuzzy and crisp sentiments • Reputation topics analysis • Available at http://sentify.project-first.eu • Source code: http://github.com/project-first/sentify-portal FIRST Y3 Review Meeting

  13. Political sentiment on Twitter • Slovene presidential elections, November 2012 • Live sentiment stream shownon POP TV • Political leaning based on sentiment well correlated with the election results • Polling agencies and newspapers failed to predict the victor FIRST Y3 Review Meeting

  14. Political sentiment on Twitter • Bulgarian parliamentarian elections, May 2013 • Big scandal on the day before the elections (illegal ballots) • Prevailing negative sentiment • Nearly perfect match between Twitter volume and election results FIRST Y3 Review Meeting

More Related