1 / 57

If you have the Content, then Apache has the Technology!

A whistle-stop tour of the Apache content related projects. If you have the Content, then Apache has the Technology!. Software Engineer Alfresco. Nick Burch. Apache Projects. 79 Top Level Projects 40 Incubating Projects 30 “Content Related” Main Projects

morgan
Download Presentation

If you have the Content, then Apache has the Technology!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A whistle-stop tour of the Apache content related projects If you have the Content,then Apache has theTechnology!

  2. Software Engineer Alfresco Nick Burch

  3. Apache Projects 79 Top Level Projects 40 Incubating Projects 30 “Content Related” Main Projects 7 “Content Related” Incubating Projects

  4. 37 Projects in 50 minutes With time for questions... This is not a comprehensive guide!

  5. Different Technologies Serving Storing Transforming Generating Hosting Web Framework Rendering / Templating / etc

  6. What can we get in 50 mins? A quick overview of each project When talks on the project are happening When meetups on the project are happening Anything new/exciting about the project? What interests me in the project!

  7. Serving up your Content

  8. Apache HTTPD Serverhttp://httpd.apache.org/ Talks – All day WednesdayMeetup – Thursday evening Very wide range of features (Fairly) easy to extend Can host most programming languages Can front most content systems Can proxy your content applications Can host code and content

  9. Apache TrafficServerhttp://trafficserver.apache.org/ High performance web proxy Forward and reverse proxy Ideally suited to sitting between your content application and the internet For proxy-only use cases, will probably be better than httpd Fewer other features though Often used as a cloud-edge http router

  10. Apache Tomcathttp://tomcat.apache.org/ Talks – All day Friday! Java based, as many of the Apache Content Technologies are Java Servlet Container And you probably all know the rest!

  11. Tomcat – What's Newhttp://tomcat.apache.org/ Memory leak detection – for your applications, and for the JVM! Easier to embed – no need for large numbers of config files! Asynchronous request processing for things like Comet / Bayeux Servlet 3.0 Improved JMX configurability

  12. Storing all that Content

  13. Apache Cassandrahttp://cassandra.apache.org/ Talk - 11am WednesdayMeetup - Wednesday evening One of our many NoSQL Databases Column-Family store Eventually consistent Distributed, replicating, no SPF Can elastically add machines

  14. Apache CouchDBhttp://couchdb.apache.org/ 12pm Wednesday Relax! Erlang NoSQL Document orientated distributed store Eventually consistent if replicating Map-Reduce queries

  15. Apache HBasehttp://hbase.apache.org/ 2pm Wednesday Recently graduated from Hadoop Another NoSQL Database Column-Family store, modelled on Google's Big Table paper Some transactions and locking Fast range queries and sorting Built on HDFS

  16. Which Apache NoSQL? Do you have tuples, documents, variable key/values or complex object? Must data always be consistent? If you loose a chunk of machines (partition), should read/write still work? Query by id, range, arbitrary key/value or map-reduce function? How much human interaction is required to add or remove nodes?

  17. Apache DB: Derbyhttp://db.apache.org/derby/ Small, easy to embed SQL database Can be embedded and accessed via an embedded JDBC driver Can be accessed over the network Can be run entirely in-memory Efficient on-disk format Has a JavaME version – run it on basic cell phones!

  18. Apache Directoryhttp://directory.apache.org/ LDAP Directory Optimised for many reads per write Hierarchical, class/attribute based storage Triggers, stored procedures, queries and views Multi-master replication Rich permissions model built in

  19. Apache JackRabbithttp://jackrabbit.apache.org/ 1.30pm Thursday JCR (Java Content Repository) Hierarchical content store Supports structured and unstructured data Transactional Support versions Full text search built in

  20. Apache Lucenehttp://lucene.apache.org/ All day Friday + Meetup Tuesday night Inverted index store (Each term lists it documents, rather than each document listing terms) Searching is faster than adding Normally stores text, but additional data can be associated with it Can hold indexed and un-indexed data

  21. Lucene – What's New?http://lucene.apache.org/ Lucene and SOLR have merged Near real-time support when indexing Better storing of attributes and other data in the token stream Numeric fields improved – no need to externally process numbers into range buckets yourself Fast vector highlighter for large docs

  22. Apache Subversionhttp://subversion.apache.org/ Meetup Thursday evening Versioning content store Efficient at storing changes Normally stores code, text and the odd binary blob If you have textual data and you want a versioning store, it's a good fit! Used by the new Apache CMS

  23. Apache Xindicehttp://xml.apache.org/xindice/ Native XML Database No need to map your complex XML files to a different data structure Ideally suited to problems where you have large numbers of XML files, and little / no other content Schema independent model XPath queries

  24. Transforming and Reading Content

  25. Apache PDFBoxhttp://pdfbox.apache.org/ 4pm Wednesday Read, Write, Create and Edit PDFs Create PDFs from text Fill in PDF forms Extract text and formatting (Lucene, Tika etc) Edit existing files, add images, add text etc

  26. Apache POIhttp://poi.apache.org/ 3pm Wednesday + FastFeatherTrack File format reader and writer for Microsoft office file formats Support binary & ooxml formats Strong read edit write for .xls & .xlsx Read and basic edit for .doc & .docx Read and basic edit for .ppt & .pptx Read for Visio, Publisher, Outlook

  27. Apache Tikahttp://tika.apache.org/ 9am Friday + Fast Feather Track Java (+ command line) toolkit for detecting and extracting content Identifies what a blob of content is Gives you consistent metadata back for it Parses the contents into plain text, HTML, XHTML or sax events

  28. Tika – What's New?http://tika.apache.org/ Lots of new parsers – text, office formats, publishing formats, images, audio, CAD, fonts etc Long standing parsers improved – better HTML from word for example Embedded resources and containers Use expanding – used by many SOLR users, Alfresco, lots of people crunching masses of data on Hadoop

  29. Apache Cocoonhttp://cocoon.apache.org/ Component Pipeline framework Plug together “Lego-Like” generators, transformers and serialisers Generate your content once in your application, serve to different formats Read in formats, translate and publish Can power your own “Yahoo Pipes” Modular, powerful and easy

  30. Apache Xalanhttp://xalan.apache.org/ XSLT processor XPath engine Java and C++ flavours Cross platform Library and command line executables Transform your XML Fast and reliable XSLT transformation engine

  31. Apache XML Graphics: Batikhttp://xmlgraphics.apache.org/#batik Java SVG toolkit + library SVG Parser – read and process existing SVG files SVG Generator – Graphics2D implementation that outputs SVG SVG Dom – easy way to manipulate your SVG files SVG viewer program (Squiggle) Command line SVG rasteriser

  32. Apache XML Graphics: FOPhttp://xmlgraphics.apache.org/#fop XSL-FO processor in Java Reads W3C XSL-FO, applies the formatting rules to your XML document, and renders it Output to Text, PS, PDF, SVG, RTF, Java Graphics2D etc Lets you leave your XML clean, and define semantically meaningful rich rendering rules for it

  33. Apache Commons: Codechttp://commons.apache.org/codec/ Commons Track – Thursday Morning Encode and decode a variety of encoding formats Base64, Hex, Phonetic and URLs Handy when interchanging content with external systems

  34. Apache Commons: Compresshttp://commons.apache.org/compress/ Commons Track – Thursday Morning Standard way to deal with archive formats Read and write support zip, tar, gzip, bzip, cpio and ar Wider range of capabilities than java.util.Zip Common API across all formats

  35. Apache Commons: Sanselanhttp://commons.apache.org/sanselan/ Commons Track – Thursday Morning Pure Java image reader and writer Fast parsing of image metadata and information (size, color space, icc etc) Much easier to use than ImageIO Slower though, as pure Java Wider range of formats supported PNG, GIF, TIFF, JPEG + Exif, BMP, ICO, PNM, PPM, PSD, XMP

  36. Generating Content

  37. Apache Forresthttp://forrest.apache.org/ Document rendering solution build on top of cocoon Reads in content in a variety of formats (xml, wiki etc), applies the appropriate formatting rules, then outputs to different formats Heavily used for documentation and websites eg read in a file, format as changelog and readme, output as html + pdf

  38. Apache Abderahttp://abdera.apache.org/ Atom – syndication and publishing High performance Java implementation of RFC 4287 + 5023 Generate Atom feeds from Java or by converting Parse and process Atom feeds Atompub server and clients Supports Atom extensions like GeoRSS, MediaRSS & OpenSearch

  39. Apache Droids (Incubating)http://incubator.apache.org/droids/ Intelligent Robots! Generic standalone crawler framework Easy to extending existing common crawlers Easy to write custom ones Queue requests for content, protocol handler gets it, multi threaded Uses Apache Tika for core of handling fetched resources

  40. Apache JSPWiki (Incubating)http://incubator.apache.org/jspwiki/ Feature-rich extensible wiki Written in Java (Servlets + JSP) Fairly easy to extend Can be used as a wiki out of the box Provides a good platform for new wiki based application Rich wiki markup and syntax Attachments, security, templates etc

  41. Apache ManifoldCF (Incubating)http://incubator.apache.org/connectors/ Name has changed a few times... (Lucene/Apache Connectors) Provides a standard way to get content out of other systems, ready for sending to Lucene etc Different goals to CMIS (Chemistry) Uses many parsers and libraries to talk to the different repositories / systems Analogous to Tika but for repos

  42. Apache PhotArk (Incubating)http://incubator.apache.org/photark/ 5pm Thursday Open Source Photo Gallery application Standalone or servlet modes Can host photos locally Can aggregate external photo albums (Flickr, Picassa) for a unified view SCA programming model – uses Apache Tuscany to power it

  43. Hosting Content

  44. Apache Chemistry (Incubating)http://incubator.apache.org/chemistry/ 2pm Wednesday Java, Python and PHP, Atom and WS* OASIS CMIS (Content Management Interoperability Services) Client and Server bindings “SQL for Content” Consistent view on content across different repositories Read / Write / Manipulate content

  45. Chemistry vs ManifoldCFincubator /chemistry/ /connectors/ ManifoldCF treats repo as nasty black box, and handles talking to the parsers Chemistry talks / exposes repo's contents through CMIS ManifoldCF supports a wider range of repositories Chemistry supports read and write Chemistry delivers a richer model ManifoldCF great for getting text out

  46. Apache Lenyahttp://lenya.apache.org/ 9am Thursday XML Content Management system Powered by Apache Cocoon WSIWYG editors onto Relax-NG XML Rich workflow engine + staging Clean URLs, CSS for styling Sensible handling of metadata, assets, internal links, users, permissions etc

  47. Apache Rollerhttp://roller.apache.org/ Multi-user blog server Used by the ASF internally Scales to thousands of users & blogs Should work with any JavaEE servlet container and SQL database Comment moderation and spam filters Each author has full layout control Indexes, feeds and Metaweblog API support for 3rd party clients

  48. Apache Shindighttp://shindig.apache.org/ Open Social Application Container Hosts your open social widgets Renders OpenSocial applications into HTML + JavaScript Stores the data for your application Full client-side JavaScript libraries to deliver gadget functionality Reference implementation

  49. Apache Wookie (Incubating)http://incubator.apache.org/wookie/ 5.30pm Wednesday W3C Widgets server Upload, Deploy and Host Widgets Widgets can range from a badge, through a small app to a full-blown collaborative system like chat Connector framework to make it easy to write widgets in many languages

  50. Web Frameworks (those with a strong Content focus to them)

More Related