TANGO Progress Report on X-Y Trees and Table Processing Techniques
110 likes | 244 Views
The TANGO Progress Report summarizes advancements in understanding and processing X-Y trees and related table structures. It includes discussions on grammar parsing, distance metrics for table similarity, and prediction models for table interaction times based on various features such as size and footnotes. Ongoing and completed projects are highlighted, including the development of software for converting and analyzing tables (such as TAT). The report serves as a reference for improvements in table layouts and editing efficiencies, presenting valuable insights for future work and collaborations.
TANGO Progress Report on X-Y Trees and Table Processing Techniques
E N D
Presentation Transcript
TANGO (RPI, June 2009) George Nagy, Mukkai Krishnamoorthy, Sharad Seth Raghav Padmanabhan, Ramana C. Jandhyala, Sean Kelley Max Muthalathu, William Silversmith
Completed Stuff • WNT (Piyushee, MS May 2008) • TAT (Raghav, MS May 2009) Pubs: ICPR08, WNT PJ & GN, Dec. 2008 ICPR08, QBT, RP & GN Dec. 2008 MKM09, Tessellations, RJ, RP, MK, GN, SS, WS, July 2009 GREC09, TAT results, RP, RP, MK, GN, SS, WS, July 2009 TANGO PROGRESS REPORT
Software • TAT (demo) • EX2XY, XY2EX (Ramana) • OO2XY, XY2OO (Sean, in progress) • XY2LN (SS, MK) • XY2WN (Bill) • TAT stat analysis (RB & GN, in progress) TANGO PROGRESS REPORT
Partial grammar for X-Y trees (MK & SS) SXY = { c [ c c ] c [ c { c [ c c ] } c { c [ c c ] } ] Grammar G1 for parsing all layout-equivalent tessellations of this kind is: S : = A A : = { B } B : = c [ X ] B | c [ X ] X : = c X | A X | A | c TANGO PROGRESS REPORT
A’ and A’’ table formats A’ A’’ Hybrid TANGO PROGRESS REPORT
Appearance-based distance (WS?) Each table cell is described by a vector:width, type size, typeface, indent, justification, alpha/num, color, #_of_chars,… Compute differences between horizontally and vertically adjacent cells From resulting “gradient map” determine row header, column header, and delta cell regions. (Show GN’s Excel example) TANGO PROGRESS REPORT
Prediction of TAT-time Multiple regression of interaction time from: • Size of table (#cols, #rows, or # cells) • Number of aggregates • Number of footnotes • Number units • Other? (GN has tried it with 20 tables – have Excel ‘GN_Data_Analysis’) TANGO PROGRESS REPORT
Table similarity • May be useful to determine similar edit sequences. • Tree distance between X-Y representationssymmetry? • Edit distance between linear P-notation for X-Y trees • Metric for parse sequences?? • Tree distance between Wang category forests? (new) TANGO PROGRESS REPORT
Learning ??? • Retain edit sequences from TAT • Make X-Y tree from each imported but not edited table • Find distance of X-Y tree from new table to all previous • Execute edit sequences of nearest neighbor(s) • Check algorithmically if resulting X-Y tree corresponds to correct WN • Check visually if table corresponding to resulting X-Y tree is equivalent to original table. • If not, edit • Concatenate further edit and associate with X-Y tree ofnew table, then add to reference set TANGO PROGRESS REPORT
Discussion Items • Lists & Ordering • XML format and verification • Augmentations (spotting and processing) • Open Office • Table ontology • XY tree to WN via lexical parse (checks?) • Use of parse trees for XY2WN • Learning? • Overall TANGO evaluation for final report • Critique draft slides for GREC and MKM • Tools: RPI: OO, VBA, Matlab, Python, BYU: ?? • Other RPI projects: PERFECT, CERVITOR, CAVIAR TANGO PROGRESS REPORT
NSF TANGO Final Report ! New NSF proposal (Maria) Other possible sponsors? Confs Archival Journals Collaborators Demos and dissemination Next visit Survival Plans TANGO PROGRESS REPORT