An overview of all the different content related technologies at the Apache Software Foundation
Talk from ApacheCon NA 2010 in Atlanta in November 2010
6. What can we get in 50 mins?
• A quick overview of each project
• When talks on the project are
happening
• When meetups on the project are
happening
• Anything new/exciting about the
project?
• What interests me in the project!
8. Apache HTTPD Server
http://httpd.apache.org/
• Talks – All day Wednesday
Meetup – Thursday evening
• Very wide range of features
• (Fairly) easy to extend
• Can host most programming
languages
• Can front most content systems
• Can proxy your content applications
• Can host code and content
9. Apache TrafficServer
http://trafficserver.apache.org/
• High performance web proxy
• Forward and reverse proxy
• Ideally suited to sitting between your
content application and the internet
• For proxy-only use cases, will probably
be better than httpd
• Fewer other features though
• Often used as a cloud-edge http router
10. Apache Tomcat
http://tomcat.apache.org/
• Talks – All day Friday!
• Java based, as many of the Apache
Content Technologies are
• Java Servlet Container
• And you probably all know the rest!
11. Tomcat – What's New
http://tomcat.apache.org/
• Memory leak detection – for your
applications, and for the JVM!
• Easier to embed – no need for large
numbers of config files!
• Asynchronous request processing for
things like Comet / Bayeux
• Servlet 3.0
• Improved JMX configurability
15. Apache HBase
http://hbase.apache.org/
• 2pm Wednesday
• Recently graduated from Hadoop
• Another NoSQL Database
• Column-Family store, modelled on
Google's Big Table paper
• Some transactions and locking
• Fast range queries and sorting
• Built on HDFS
16. Which Apache NoSQL?
• Do you have tuples, documents,
variable key/values or complex object?
• Must data always be consistent?
• If you loose a chunk of machines
(partition), should read/write still work?
• Query by id, range, arbitrary key/value
or map-reduce function?
• How much human interaction is
required to add or remove nodes?
17. Apache DB: Derby
http://db.apache.org/derby/
• Small, easy to embed SQL database
• Can be embedded and accessed via
an embedded JDBC driver
• Can be accessed over the network
• Can be run entirely in-memory
• Efficient on-disk format
• Has a JavaME version – run it on
basic cell phones!
18. Apache Directory
http://directory.apache.org/
• LDAP Directory
• Optimised for many reads per write
• Hierarchical, class/attribute based
storage
• Triggers, stored procedures, queries
and views
• Multi-master replication
• Rich permissions model built in
19. Apache JackRabbit
http://jackrabbit.apache.org/
• 1.30pm Thursday
• JCR (Java Content Repository)
• Hierarchical content store
• Supports structured and unstructured
data
• Transactional
• Support versions
• Full text search built in
20. Apache Lucene
http://lucene.apache.org/
• All day Friday + Meetup Tuesday night
• Inverted index store
• (Each term lists it documents, rather
than each document listing terms)
• Searching is faster than adding
• Normally stores text, but additional
data can be associated with it
• Can hold indexed and un-indexed data
21. Lucene – What's New?
http://lucene.apache.org/
• Lucene and SOLR have merged
• Near real-time support when indexing
• Better storing of attributes and other
data in the token stream
• Numeric fields improved – no need to
externally process numbers into range
buckets yourself
• Fast vector highlighter for large docs
22. Apache Subversion
http://subversion.apache.org/
• Meetup Thursday evening
• Versioning content store
• Efficient at storing changes
• Normally stores code, text and the odd
binary blob
• If you have textual data and you want
a versioning store, it's a good fit!
• Used by the new Apache CMS
23. Apache Xindice
http://xml.apache.org/xindice/
• Native XML Database
• No need to map your complex XML
files to a different data structure
• Ideally suited to problems where you
have large numbers of XML files, and
little / no other content
• Schema independent model
• XPath queries
25. Apache PDFBox
http://pdfbox.apache.org/
• 4pm Wednesday
• Read, Write, Create and Edit PDFs
• Create PDFs from text
• Fill in PDF forms
• Extract text and formatting (Lucene,
Tika etc)
• Edit existing files, add images, add text
etc
26. Apache POI
http://poi.apache.org/
• 3pm Wednesday + FastFeatherTrack
• File format reader and writer for
Microsoft office file formats
• Support binary & ooxml formats
• Strong read edit write for .xls & .xlsx
• Read and basic edit for .doc & .docx
• Read and basic edit for .ppt & .pptx
• Read for Visio, Publisher, Outlook
27. Apache Tika
http://tika.apache.org/
• 9am Friday + Fast Feather Track
• Java (+ command line) toolkit for
detecting and extracting content
• Identifies what a blob of content is
• Gives you consistent metadata back
for it
• Parses the contents into plain text,
HTML, XHTML or sax events
28. Tika – What's New?
http://tika.apache.org/
• Lots of new parsers – text, office
formats, publishing formats, images,
audio, CAD, fonts etc
• Long standing parsers improved –
better HTML from word for example
• Embedded resources and containers
• Use expanding – used by many SOLR
users, Alfresco, lots of people
crunching masses of data on Hadoop
29. Apache Cocoon
http://cocoon.apache.org/
• Component Pipeline framework
• Plug together “Lego-Like” generators,
transformers and serialisers
• Generate your content once in your
application, serve to different formats
• Read in formats, translate and publish
• Can power your own “Yahoo Pipes”
• Modular, powerful and easy
30. Apache Xalan
http://xalan.apache.org/
• XSLT processor
• XPath engine
• Java and C++ flavours
• Cross platform
• Library and command line executables
• Transform your XML
• Fast and reliable XSLT transformation
engine
31. Apache XML Graphics: Batik
http://xmlgraphics.apache.org/#batik
• Java SVG toolkit + library
• SVG Parser – read and process
existing SVG files
• SVG Generator – Graphics2D
implementation that outputs SVG
• SVG Dom – easy way to manipulate
your SVG files
• SVG viewer program (Squiggle)
• Command line SVG rasteriser
32. Apache XML Graphics: FOP
http://xmlgraphics.apache.org/#fop
• XSL-FO processor in Java
• Reads W3C XSL-FO, applies the
formatting rules to your XML
document, and renders it
• Output to Text, PS, PDF, SVG, RTF,
Java Graphics2D etc
• Lets you leave your XML clean, and
define semantically meaningful rich
rendering rules for it
33. Apache Commons: Codec
http://commons.apache.org/codec/
• Commons Track – Thursday Morning
• Encode and decode a variety of
encoding formats
• Base64, Hex, Phonetic and URLs
• Handy when interchanging content
with external systems
34. Apache Commons: Compress
http://commons.apache.org/compress/
• Commons Track – Thursday Morning
• Standard way to deal with archive
formats
• Read and write support
• zip, tar, gzip, bzip, cpio and ar
• Wider range of capabilities than
java.util.Zip
• Common API across all formats
35. Apache Commons: Sanselan
http://commons.apache.org/sanselan/
• Commons Track – Thursday Morning
• Pure Java image reader and writer
• Fast parsing of image metadata and
information (size, color space, icc etc)
• Much easier to use than ImageIO
• Slower though, as pure Java
• Wider range of formats supported
• PNG, GIF, TIFF, JPEG + Exif, BMP,
ICO, PNM, PPM, PSD, XMP
37. Apache Forrest
http://forrest.apache.org/
• Document rendering solution build on
top of cocoon
• Reads in content in a variety of
formats (xml, wiki etc), applies the
appropriate formatting rules, then
outputs to different formats
• Heavily used for documentation and
websites
• eg read in a file, format as changelog
and readme, output as html + pdf
38. Apache Abdera
http://abdera.apache.org/
• Atom – syndication and publishing
• High performance Java
implementation of RFC 4287 + 5023
• Generate Atom feeds from Java or by
converting
• Parse and process Atom feeds
• Atompub server and clients
• Supports Atom extensions like
GeoRSS, MediaRSS & OpenSearch
39. Apache Droids (Incubating)
http://incubator.apache.org/droids/
• Intelligent Robots!
• Generic standalone crawler framework
• Easy to extending existing common
crawlers
• Easy to write custom ones
• Queue requests for content, protocol
handler gets it, multi threaded
• Uses Apache Tika for core of handling
fetched resources
40. Apache JSPWiki (Incubating)
http://incubator.apache.org/jspwiki/
• Feature-rich extensible wiki
• Written in Java (Servlets + JSP)
• Fairly easy to extend
• Can be used as a wiki out of the box
• Provides a good platform for new wiki
based application
• Rich wiki markup and syntax
• Attachments, security, templates etc
41. Apache ManifoldCF (Incubating)
http://incubator.apache.org/connectors/
• Name has changed a few times...
(Lucene/Apache Connectors)
• Provides a standard way to get content
out of other systems, ready for sending
to Lucene etc
• Different goals to CMIS (Chemistry)
• Uses many parsers and libraries to talk
to the different repositories / systems
• Analogous to Tika but for repos
42. Apache PhotArk (Incubating)
http://incubator.apache.org/photark/
• 5pm Thursday
• Open Source Photo Gallery application
• Standalone or servlet modes
• Can host photos locally
• Can aggregate external photo albums
(Flickr, Picassa) for a unified view
• SCA programming model – uses
Apache Tuscany to power it
44. Apache Chemistry (Incubating)
http://incubator.apache.org/chemistry/
• 2pm Wednesday
• Java, Python and PHP, Atom and WS*
• OASIS CMIS (Content Management
Interoperability Services)
• Client and Server bindings
• “SQL for Content”
• Consistent view on content across
different repositories
• Read / Write / Manipulate content
45. Chemistry vs ManifoldCF
incubator /chemistry/ /connectors/
• ManifoldCF treats repo as nasty black
box, and handles talking to the parsers
• Chemistry talks / exposes repo's
contents through CMIS
• ManifoldCF supports a wider range of
repositories
• Chemistry supports read and write
• Chemistry delivers a richer model
• ManifoldCF great for getting text out
46. Apache Lenya
http://lenya.apache.org/
• 9am Thursday
• XML Content Management system
• Powered by Apache Cocoon
• WSIWYG editors onto Relax-NG XML
• Rich workflow engine + staging
• Clean URLs, CSS for styling
• Sensible handling of metadata, assets,
internal links, users, permissions etc
47. Apache Roller
http://roller.apache.org/
• Multi-user blog server
• Used by the ASF internally
• Scales to thousands of users & blogs
• Should work with any JavaEE servlet
container and SQL database
• Comment moderation and spam filters
• Each author has full layout control
• Indexes, feeds and Metaweblog API
support for 3rd
party clients
48. Apache Shindig
http://shindig.apache.org/
• Open Social Application Container
• Hosts your open social widgets
• Renders OpenSocial applications into
HTML + JavaScript
• Stores the data for your application
• Full client-side JavaScript libraries to
deliver gadget functionality
• Reference implementation
49. Apache Wookie (Incubating)
http://incubator.apache.org/wookie/
• 5.30pm Wednesday
• W3C Widgets server
• Upload, Deploy and Host Widgets
• Widgets can range from a badge,
through a small app to a full-blown
collaborative system like chat
• Connector framework to make it easy
to write widgets in many languages
51. Apache Sling
http://sling.apache.org/
• 12pm Wednesday
• “Fun” and easy web framework
• REST based
• Backed by Jackrabbit content repo
• Powered by OSGi
• Easy to script, supports multiple output
languages (JSP, server side javascript,
scala etc)
• Stores both templates and content
52. Apache Tapestry
http://tapestry.apache.org/
• Object Orientated web applications
• Build your application in terms of
objects, methods and properties
• Tapestry handles URLs, query
parameters and state for you
• Pages built with simple HTML
• Concentrate on the content that backs
each part, and the business logic for it
• Tapestry glues it together for you
53. Apache Tiles
http://tiles.apache.org/
• Templating framework for Java
• Works well with Struts and Shale
• Lets you build your page from lots of
tiles (components), which can nest
• Build tiles together to make templates
• Clean separation between your
content, the business logic to select it,
and the rendering rules
54. Apache Velocity
http://velocity.apache.org/
• Templating engine
• MVC webapp or standalone
• Can generate HTML, SQL, PostScript,
XML, Java Code or email from
templates
• Anakia lets you make a xdoc file
available to a velocity template, handy
when generating HTML from xdoc
• Fairly rich templating language
55. Apache Wicket
http://wicket.apache.org/
• Build your web applications in Java
• Uses Java in preference to JavaScript,
CSS etc
• Handy if you have a strong Java team
and you need to do some web stuff
• Fits well with your Java components
• But JS / CSS front end devs tend to be
cheaper than Java ones....
56. Apache Clerezza (Incubating)
http://incubator.apache.org/clerezza/
• OSGi based modular semantic web
application framework
• Lets you build applications that fit into
the Semantic Web
• Stores and easily manipulates RDF
• Full control over REST and URIs
• Build applications that both consume
semantic data (eg RDF files), and that
expose content to others