1 / 8

DEiXTo

DEiXTo. DEiXTo. Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl) DEiXToBot agent (implemented in Perl) W3C Document Object Model (DOM) DOM-based extraction rules (wrappers).

rudolf
Download Presentation

DEiXTo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DEiXTo

  2. DEiXTo • Powerful web data extraction tool • Freeware GUI tool (built with Turbo Delphi, Windows-only) • Free, cross-platform Command Line Executor (in Perl) • DEiXToBot agent (implemented in Perl) • W3C Document Object Model (DOM) • DOM-based extraction rules (wrappers). • Extracted data can be exported to a wide variety of formats (tab delimited, XML, RSS, etc). • Command Line Executor: • has database support via the Database independent interface for Perl • supports additional formats: Excel, CSV, OpenDocument Spreadsheet (.ods), HTML

  3. GUI DEiXTo • user friendly graphical interface • enhanced, tree based, extraction rules • HTML tag filtering • fast, flexible and high performance tree pattern matching algorithm • regular expression support • can follow "Next Page" links and submit simple forms • can export results to XML and tab delimited formats and create RSS feeds • XML encoded wrapper project files (.wpf) that can be executed at will • last but not least, it's freeware!

  4. DEiXTo Command Line Executor (CLE) • portable, efficient and fast command line executor of GUI DEiXTo generated wrappers • provides options and flexibility that you cannot get with GUI DEiXTo • supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet • provides database support via DBI (the Database independent interface for Perl) • supports HTML output using an HTML template processor and an editable template file • overwrite, append and prepend output modes for all supported formats • can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux) • it is free and open source, distributed under the GNU General Public License (GPL) Version 3!

  5. DEiXToBot • A Mechanize agent (essentially a browser emulator) capable of extracting data of interest. • Flexible and efficient. • Allows extensive customization. • Supports multiple patterns on a single page and combination of their results. • Allows post-processing of the extracted data and enables you to transform it to any format you wish. • Programming skills required though to utilize it.

  6. Corgialenios Library use case From HTML unstructured data To ESE format!

  7. DEiXTo Services • We can definitely help you to: • transform the contents of your digital library into OAI-PMH or another suitable format • quickly populate product catalogues with full specifications • search various web resources in real time and extract the results returned • prepare large, focused datasets for scientific tasks (i.e. data mining) • monitor prices of the competition • <your extraction task goes here!>

  8. Happy DEiXTo users! For further information, please visit http://deixto.com

More Related