SlideShare una empresa de Scribd logo
1 de 57
If you have the Content,
then Apache has the
Technology!
A whistle-stop tour of the
Apache content related projects
Nick Burch
Software Engineer
Alfresco
Apache Projects
• 79 Top Level Projects
• 40 Incubating Projects
• 30 “Content Related” Main Projects
• 7 “Content Related” Incubating
Projects
37 Projects in 50 minutes
With time for questions...
This is not a comprehensive guide!
Different Technologies
• Serving
• Storing
• Transforming
• Generating
• Hosting
• Web Framework Rendering /
Templating / etc
What can we get in 50 mins?
• A quick overview of each project
• When talks on the project are
happening
• When meetups on the project are
happening
• Anything new/exciting about the
project?
• What interests me in the project!
Serving up
your Content
Apache HTTPD Server
http://httpd.apache.org/
• Talks – All day Wednesday
Meetup – Thursday evening
• Very wide range of features
• (Fairly) easy to extend
• Can host most programming
languages
• Can front most content systems
• Can proxy your content applications
• Can host code and content
Apache TrafficServer
http://trafficserver.apache.org/
• High performance web proxy
• Forward and reverse proxy
• Ideally suited to sitting between your
content application and the internet
• For proxy-only use cases, will probably
be better than httpd
• Fewer other features though
• Often used as a cloud-edge http router
Apache Tomcat
http://tomcat.apache.org/
• Talks – All day Friday!
• Java based, as many of the Apache
Content Technologies are
• Java Servlet Container
• And you probably all know the rest!
Tomcat – What's New
http://tomcat.apache.org/
• Memory leak detection – for your
applications, and for the JVM!
• Easier to embed – no need for large
numbers of config files!
• Asynchronous request processing for
things like Comet / Bayeux
• Servlet 3.0
• Improved JMX configurability
Storing all
that Content
Apache Cassandra
http://cassandra.apache.org/
• Talk - 11am Wednesday
Meetup - Wednesday evening
• One of our many NoSQL Databases
• Column-Family store
• Eventually consistent
• Distributed, replicating, no SPF
• Can elastically add machines
Apache CouchDB
http://couchdb.apache.org/
• 12pm Wednesday
• Relax!
• Erlang
• NoSQL
• Document orientated distributed store
• Eventually consistent if replicating
• Map-Reduce queries
Apache HBase
http://hbase.apache.org/
• 2pm Wednesday
• Recently graduated from Hadoop
• Another NoSQL Database
• Column-Family store, modelled on
Google's Big Table paper
• Some transactions and locking
• Fast range queries and sorting
• Built on HDFS
Which Apache NoSQL?
• Do you have tuples, documents,
variable key/values or complex object?
• Must data always be consistent?
• If you loose a chunk of machines
(partition), should read/write still work?
• Query by id, range, arbitrary key/value
or map-reduce function?
• How much human interaction is
required to add or remove nodes?
Apache DB: Derby
http://db.apache.org/derby/
• Small, easy to embed SQL database
• Can be embedded and accessed via
an embedded JDBC driver
• Can be accessed over the network
• Can be run entirely in-memory
• Efficient on-disk format
• Has a JavaME version – run it on
basic cell phones!
Apache Directory
http://directory.apache.org/
• LDAP Directory
• Optimised for many reads per write
• Hierarchical, class/attribute based
storage
• Triggers, stored procedures, queries
and views
• Multi-master replication
• Rich permissions model built in
Apache JackRabbit
http://jackrabbit.apache.org/
• 1.30pm Thursday
• JCR (Java Content Repository)
• Hierarchical content store
• Supports structured and unstructured
data
• Transactional
• Support versions
• Full text search built in
Apache Lucene
http://lucene.apache.org/
• All day Friday + Meetup Tuesday night
• Inverted index store
• (Each term lists it documents, rather
than each document listing terms)
• Searching is faster than adding
• Normally stores text, but additional
data can be associated with it
• Can hold indexed and un-indexed data
Lucene – What's New?
http://lucene.apache.org/
• Lucene and SOLR have merged
• Near real-time support when indexing
• Better storing of attributes and other
data in the token stream
• Numeric fields improved – no need to
externally process numbers into range
buckets yourself
• Fast vector highlighter for large docs
Apache Subversion
http://subversion.apache.org/
• Meetup Thursday evening
• Versioning content store
• Efficient at storing changes
• Normally stores code, text and the odd
binary blob
• If you have textual data and you want
a versioning store, it's a good fit!
• Used by the new Apache CMS
Apache Xindice
http://xml.apache.org/xindice/
• Native XML Database
• No need to map your complex XML
files to a different data structure
• Ideally suited to problems where you
have large numbers of XML files, and
little / no other content
• Schema independent model
• XPath queries
Transforming and
Reading Content
Apache PDFBox
http://pdfbox.apache.org/
• 4pm Wednesday
• Read, Write, Create and Edit PDFs
• Create PDFs from text
• Fill in PDF forms
• Extract text and formatting (Lucene,
Tika etc)
• Edit existing files, add images, add text
etc
Apache POI
http://poi.apache.org/
• 3pm Wednesday + FastFeatherTrack
• File format reader and writer for
Microsoft office file formats
• Support binary & ooxml formats
• Strong read edit write for .xls & .xlsx
• Read and basic edit for .doc & .docx
• Read and basic edit for .ppt & .pptx
• Read for Visio, Publisher, Outlook
Apache Tika
http://tika.apache.org/
• 9am Friday + Fast Feather Track
• Java (+ command line) toolkit for
detecting and extracting content
• Identifies what a blob of content is
• Gives you consistent metadata back
for it
• Parses the contents into plain text,
HTML, XHTML or sax events
Tika – What's New?
http://tika.apache.org/
• Lots of new parsers – text, office
formats, publishing formats, images,
audio, CAD, fonts etc
• Long standing parsers improved –
better HTML from word for example
• Embedded resources and containers
• Use expanding – used by many SOLR
users, Alfresco, lots of people
crunching masses of data on Hadoop
Apache Cocoon
http://cocoon.apache.org/
• Component Pipeline framework
• Plug together “Lego-Like” generators,
transformers and serialisers
• Generate your content once in your
application, serve to different formats
• Read in formats, translate and publish
• Can power your own “Yahoo Pipes”
• Modular, powerful and easy
Apache Xalan
http://xalan.apache.org/
• XSLT processor
• XPath engine
• Java and C++ flavours
• Cross platform
• Library and command line executables
• Transform your XML
• Fast and reliable XSLT transformation
engine
Apache XML Graphics: Batik
http://xmlgraphics.apache.org/#batik
• Java SVG toolkit + library
• SVG Parser – read and process
existing SVG files
• SVG Generator – Graphics2D
implementation that outputs SVG
• SVG Dom – easy way to manipulate
your SVG files
• SVG viewer program (Squiggle)
• Command line SVG rasteriser
Apache XML Graphics: FOP
http://xmlgraphics.apache.org/#fop
• XSL-FO processor in Java
• Reads W3C XSL-FO, applies the
formatting rules to your XML
document, and renders it
• Output to Text, PS, PDF, SVG, RTF,
Java Graphics2D etc
• Lets you leave your XML clean, and
define semantically meaningful rich
rendering rules for it
Apache Commons: Codec
http://commons.apache.org/codec/
• Commons Track – Thursday Morning
• Encode and decode a variety of
encoding formats
• Base64, Hex, Phonetic and URLs
• Handy when interchanging content
with external systems
Apache Commons: Compress
http://commons.apache.org/compress/
• Commons Track – Thursday Morning
• Standard way to deal with archive
formats
• Read and write support
• zip, tar, gzip, bzip, cpio and ar
• Wider range of capabilities than
java.util.Zip
• Common API across all formats
Apache Commons: Sanselan
http://commons.apache.org/sanselan/
• Commons Track – Thursday Morning
• Pure Java image reader and writer
• Fast parsing of image metadata and
information (size, color space, icc etc)
• Much easier to use than ImageIO
• Slower though, as pure Java
• Wider range of formats supported
• PNG, GIF, TIFF, JPEG + Exif, BMP,
ICO, PNM, PPM, PSD, XMP
Generating
Content
Apache Forrest
http://forrest.apache.org/
• Document rendering solution build on
top of cocoon
• Reads in content in a variety of
formats (xml, wiki etc), applies the
appropriate formatting rules, then
outputs to different formats
• Heavily used for documentation and
websites
• eg read in a file, format as changelog
and readme, output as html + pdf
Apache Abdera
http://abdera.apache.org/
• Atom – syndication and publishing
• High performance Java
implementation of RFC 4287 + 5023
• Generate Atom feeds from Java or by
converting
• Parse and process Atom feeds
• Atompub server and clients
• Supports Atom extensions like
GeoRSS, MediaRSS & OpenSearch
Apache Droids (Incubating)
http://incubator.apache.org/droids/
• Intelligent Robots!
• Generic standalone crawler framework
• Easy to extending existing common
crawlers
• Easy to write custom ones
• Queue requests for content, protocol
handler gets it, multi threaded
• Uses Apache Tika for core of handling
fetched resources
Apache JSPWiki (Incubating)
http://incubator.apache.org/jspwiki/
• Feature-rich extensible wiki
• Written in Java (Servlets + JSP)
• Fairly easy to extend
• Can be used as a wiki out of the box
• Provides a good platform for new wiki
based application
• Rich wiki markup and syntax
• Attachments, security, templates etc
Apache ManifoldCF (Incubating)
http://incubator.apache.org/connectors/
• Name has changed a few times...
(Lucene/Apache Connectors)
• Provides a standard way to get content
out of other systems, ready for sending
to Lucene etc
• Different goals to CMIS (Chemistry)
• Uses many parsers and libraries to talk
to the different repositories / systems
• Analogous to Tika but for repos
Apache PhotArk (Incubating)
http://incubator.apache.org/photark/
• 5pm Thursday
• Open Source Photo Gallery application
• Standalone or servlet modes
• Can host photos locally
• Can aggregate external photo albums
(Flickr, Picassa) for a unified view
• SCA programming model – uses
Apache Tuscany to power it
Hosting
Content
Apache Chemistry (Incubating)
http://incubator.apache.org/chemistry/
• 2pm Wednesday
• Java, Python and PHP, Atom and WS*
• OASIS CMIS (Content Management
Interoperability Services)
• Client and Server bindings
• “SQL for Content”
• Consistent view on content across
different repositories
• Read / Write / Manipulate content
Chemistry vs ManifoldCF
incubator /chemistry/ /connectors/
• ManifoldCF treats repo as nasty black
box, and handles talking to the parsers
• Chemistry talks / exposes repo's
contents through CMIS
• ManifoldCF supports a wider range of
repositories
• Chemistry supports read and write
• Chemistry delivers a richer model
• ManifoldCF great for getting text out
Apache Lenya
http://lenya.apache.org/
• 9am Thursday
• XML Content Management system
• Powered by Apache Cocoon
• WSIWYG editors onto Relax-NG XML
• Rich workflow engine + staging
• Clean URLs, CSS for styling
• Sensible handling of metadata, assets,
internal links, users, permissions etc
Apache Roller
http://roller.apache.org/
• Multi-user blog server
• Used by the ASF internally
• Scales to thousands of users & blogs
• Should work with any JavaEE servlet
container and SQL database
• Comment moderation and spam filters
• Each author has full layout control
• Indexes, feeds and Metaweblog API
support for 3rd
party clients
Apache Shindig
http://shindig.apache.org/
• Open Social Application Container
• Hosts your open social widgets
• Renders OpenSocial applications into
HTML + JavaScript
• Stores the data for your application
• Full client-side JavaScript libraries to
deliver gadget functionality
• Reference implementation
Apache Wookie (Incubating)
http://incubator.apache.org/wookie/
• 5.30pm Wednesday
• W3C Widgets server
• Upload, Deploy and Host Widgets
• Widgets can range from a badge,
through a small app to a full-blown
collaborative system like chat
• Connector framework to make it easy
to write widgets in many languages
Web Frameworks
(those with a strong
Content focus to them)
Apache Sling
http://sling.apache.org/
• 12pm Wednesday
• “Fun” and easy web framework
• REST based
• Backed by Jackrabbit content repo
• Powered by OSGi
• Easy to script, supports multiple output
languages (JSP, server side javascript,
scala etc)
• Stores both templates and content
Apache Tapestry
http://tapestry.apache.org/
• Object Orientated web applications
• Build your application in terms of
objects, methods and properties
• Tapestry handles URLs, query
parameters and state for you
• Pages built with simple HTML
• Concentrate on the content that backs
each part, and the business logic for it
• Tapestry glues it together for you
Apache Tiles
http://tiles.apache.org/
• Templating framework for Java
• Works well with Struts and Shale
• Lets you build your page from lots of
tiles (components), which can nest
• Build tiles together to make templates
• Clean separation between your
content, the business logic to select it,
and the rendering rules
Apache Velocity
http://velocity.apache.org/
• Templating engine
• MVC webapp or standalone
• Can generate HTML, SQL, PostScript,
XML, Java Code or email from
templates
• Anakia lets you make a xdoc file
available to a velocity template, handy
when generating HTML from xdoc
• Fairly rich templating language
Apache Wicket
http://wicket.apache.org/
• Build your web applications in Java
• Uses Java in preference to JavaScript,
CSS etc
• Handy if you have a strong Java team
and you need to do some web stuff
• Fits well with your Java components
• But JS / CSS front end devs tend to be
cheaper than Java ones....
Apache Clerezza (Incubating)
http://incubator.apache.org/clerezza/
• OSGi based modular semantic web
application framework
• Lets you build applications that fit into
the Semantic Web
• Stores and easily manipulates RDF
• Full control over REST and URIs
• Build applications that both consume
semantic data (eg RDF files), and that
expose content to others
Any Questions?
Any cool projects that
I happened to miss?

Más contenido relacionado

La actualidad más candente

Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
A Tale of 2 Systems
A Tale of 2 SystemsA Tale of 2 Systems
A Tale of 2 SystemsDavid Newman
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRicard Clau
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceBrian Culver
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platformLuis Cabaceira
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentationRamesh Mudunuri
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessNick Barkas
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
 
Git - Introduction and Overview
Git - Introduction and OverviewGit - Introduction and Overview
Git - Introduction and Overviewasmajlovic
 
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLNoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLAndrew Morgan
 
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...Nexcess.net LLC
 
Redis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolRedis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolEberhard Wolff
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceAbdelmonaim Remani
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 

La actualidad más candente (20)

Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
A Tale of 2 Systems
A Tale of 2 SystemsA Tale of 2 Systems
A Tale of 2 Systems
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHP
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
Drop acid
Drop acidDrop acid
Drop acid
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentation
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Git - Introduction and Overview
Git - Introduction and OverviewGit - Introduction and Overview
Git - Introduction and Overview
 
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLNoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
 
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Redis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolRedis - The Universal NoSQL Tool
Redis - The Universal NoSQL Tool
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
Drupal Skils Lab 302Labs
Drupal Skils Lab 302Labs Drupal Skils Lab 302Labs
Drupal Skils Lab 302Labs
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 

Similar a Apache Content Technologies

If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Lois Patterson: Markup Languages and Warp-Speed Documentation
Lois Patterson:  Markup Languages and Warp-Speed DocumentationLois Patterson:  Markup Languages and Warp-Speed Documentation
Lois Patterson: Markup Languages and Warp-Speed DocumentationJack Molisani
 
Markup languages and warp-speed documentation
Markup languages and warp-speed documentationMarkup languages and warp-speed documentation
Markup languages and warp-speed documentationLois Patterson
 
QueryPath, Mash-ups, and Web Services
QueryPath, Mash-ups, and Web ServicesQueryPath, Mash-ups, and Web Services
QueryPath, Mash-ups, and Web ServicesMatt Butcher
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemStreamNative
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataWes McKinney
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
High Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed ContentHigh Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed ContentNicolle Morton
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoopgregchanan
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
 
Caching strategies with lucee
Caching strategies with luceeCaching strategies with lucee
Caching strategies with luceeGert Franz
 

Similar a Apache Content Technologies (20)

If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Apache drill
Apache drillApache drill
Apache drill
 
Lois Patterson: Markup Languages and Warp-Speed Documentation
Lois Patterson:  Markup Languages and Warp-Speed DocumentationLois Patterson:  Markup Languages and Warp-Speed Documentation
Lois Patterson: Markup Languages and Warp-Speed Documentation
 
Markup languages and warp-speed documentation
Markup languages and warp-speed documentationMarkup languages and warp-speed documentation
Markup languages and warp-speed documentation
 
QueryPath, Mash-ups, and Web Services
QueryPath, Mash-ups, and Web ServicesQueryPath, Mash-ups, and Web Services
QueryPath, Mash-ups, and Web Services
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data Ecosystem
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
High Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed ContentHigh Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed Content
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
 
Caching strategies with lucee
Caching strategies with luceeCaching strategies with lucee
Caching strategies with lucee
 

Más de gagravarr

Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with GroovyTurning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovygagravarr
 
But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?gagravarr
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?gagravarr
 
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...gagravarr
 
The Apache Way
The Apache WayThe Apache Way
The Apache Waygagravarr
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?gagravarr
 
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?gagravarr
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...gagravarr
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!gagravarr
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-endgagravarr
 

Más de gagravarr (12)

Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with GroovyTurning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy
 
But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?
 
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?
 
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-end
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Apache Content Technologies

  • 1. If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects
  • 3. Apache Projects • 79 Top Level Projects • 40 Incubating Projects • 30 “Content Related” Main Projects • 7 “Content Related” Incubating Projects
  • 4. 37 Projects in 50 minutes With time for questions... This is not a comprehensive guide!
  • 5. Different Technologies • Serving • Storing • Transforming • Generating • Hosting • Web Framework Rendering / Templating / etc
  • 6. What can we get in 50 mins? • A quick overview of each project • When talks on the project are happening • When meetups on the project are happening • Anything new/exciting about the project? • What interests me in the project!
  • 8. Apache HTTPD Server http://httpd.apache.org/ • Talks – All day Wednesday Meetup – Thursday evening • Very wide range of features • (Fairly) easy to extend • Can host most programming languages • Can front most content systems • Can proxy your content applications • Can host code and content
  • 9. Apache TrafficServer http://trafficserver.apache.org/ • High performance web proxy • Forward and reverse proxy • Ideally suited to sitting between your content application and the internet • For proxy-only use cases, will probably be better than httpd • Fewer other features though • Often used as a cloud-edge http router
  • 10. Apache Tomcat http://tomcat.apache.org/ • Talks – All day Friday! • Java based, as many of the Apache Content Technologies are • Java Servlet Container • And you probably all know the rest!
  • 11. Tomcat – What's New http://tomcat.apache.org/ • Memory leak detection – for your applications, and for the JVM! • Easier to embed – no need for large numbers of config files! • Asynchronous request processing for things like Comet / Bayeux • Servlet 3.0 • Improved JMX configurability
  • 13. Apache Cassandra http://cassandra.apache.org/ • Talk - 11am Wednesday Meetup - Wednesday evening • One of our many NoSQL Databases • Column-Family store • Eventually consistent • Distributed, replicating, no SPF • Can elastically add machines
  • 14. Apache CouchDB http://couchdb.apache.org/ • 12pm Wednesday • Relax! • Erlang • NoSQL • Document orientated distributed store • Eventually consistent if replicating • Map-Reduce queries
  • 15. Apache HBase http://hbase.apache.org/ • 2pm Wednesday • Recently graduated from Hadoop • Another NoSQL Database • Column-Family store, modelled on Google's Big Table paper • Some transactions and locking • Fast range queries and sorting • Built on HDFS
  • 16. Which Apache NoSQL? • Do you have tuples, documents, variable key/values or complex object? • Must data always be consistent? • If you loose a chunk of machines (partition), should read/write still work? • Query by id, range, arbitrary key/value or map-reduce function? • How much human interaction is required to add or remove nodes?
  • 17. Apache DB: Derby http://db.apache.org/derby/ • Small, easy to embed SQL database • Can be embedded and accessed via an embedded JDBC driver • Can be accessed over the network • Can be run entirely in-memory • Efficient on-disk format • Has a JavaME version – run it on basic cell phones!
  • 18. Apache Directory http://directory.apache.org/ • LDAP Directory • Optimised for many reads per write • Hierarchical, class/attribute based storage • Triggers, stored procedures, queries and views • Multi-master replication • Rich permissions model built in
  • 19. Apache JackRabbit http://jackrabbit.apache.org/ • 1.30pm Thursday • JCR (Java Content Repository) • Hierarchical content store • Supports structured and unstructured data • Transactional • Support versions • Full text search built in
  • 20. Apache Lucene http://lucene.apache.org/ • All day Friday + Meetup Tuesday night • Inverted index store • (Each term lists it documents, rather than each document listing terms) • Searching is faster than adding • Normally stores text, but additional data can be associated with it • Can hold indexed and un-indexed data
  • 21. Lucene – What's New? http://lucene.apache.org/ • Lucene and SOLR have merged • Near real-time support when indexing • Better storing of attributes and other data in the token stream • Numeric fields improved – no need to externally process numbers into range buckets yourself • Fast vector highlighter for large docs
  • 22. Apache Subversion http://subversion.apache.org/ • Meetup Thursday evening • Versioning content store • Efficient at storing changes • Normally stores code, text and the odd binary blob • If you have textual data and you want a versioning store, it's a good fit! • Used by the new Apache CMS
  • 23. Apache Xindice http://xml.apache.org/xindice/ • Native XML Database • No need to map your complex XML files to a different data structure • Ideally suited to problems where you have large numbers of XML files, and little / no other content • Schema independent model • XPath queries
  • 25. Apache PDFBox http://pdfbox.apache.org/ • 4pm Wednesday • Read, Write, Create and Edit PDFs • Create PDFs from text • Fill in PDF forms • Extract text and formatting (Lucene, Tika etc) • Edit existing files, add images, add text etc
  • 26. Apache POI http://poi.apache.org/ • 3pm Wednesday + FastFeatherTrack • File format reader and writer for Microsoft office file formats • Support binary & ooxml formats • Strong read edit write for .xls & .xlsx • Read and basic edit for .doc & .docx • Read and basic edit for .ppt & .pptx • Read for Visio, Publisher, Outlook
  • 27. Apache Tika http://tika.apache.org/ • 9am Friday + Fast Feather Track • Java (+ command line) toolkit for detecting and extracting content • Identifies what a blob of content is • Gives you consistent metadata back for it • Parses the contents into plain text, HTML, XHTML or sax events
  • 28. Tika – What's New? http://tika.apache.org/ • Lots of new parsers – text, office formats, publishing formats, images, audio, CAD, fonts etc • Long standing parsers improved – better HTML from word for example • Embedded resources and containers • Use expanding – used by many SOLR users, Alfresco, lots of people crunching masses of data on Hadoop
  • 29. Apache Cocoon http://cocoon.apache.org/ • Component Pipeline framework • Plug together “Lego-Like” generators, transformers and serialisers • Generate your content once in your application, serve to different formats • Read in formats, translate and publish • Can power your own “Yahoo Pipes” • Modular, powerful and easy
  • 30. Apache Xalan http://xalan.apache.org/ • XSLT processor • XPath engine • Java and C++ flavours • Cross platform • Library and command line executables • Transform your XML • Fast and reliable XSLT transformation engine
  • 31. Apache XML Graphics: Batik http://xmlgraphics.apache.org/#batik • Java SVG toolkit + library • SVG Parser – read and process existing SVG files • SVG Generator – Graphics2D implementation that outputs SVG • SVG Dom – easy way to manipulate your SVG files • SVG viewer program (Squiggle) • Command line SVG rasteriser
  • 32. Apache XML Graphics: FOP http://xmlgraphics.apache.org/#fop • XSL-FO processor in Java • Reads W3C XSL-FO, applies the formatting rules to your XML document, and renders it • Output to Text, PS, PDF, SVG, RTF, Java Graphics2D etc • Lets you leave your XML clean, and define semantically meaningful rich rendering rules for it
  • 33. Apache Commons: Codec http://commons.apache.org/codec/ • Commons Track – Thursday Morning • Encode and decode a variety of encoding formats • Base64, Hex, Phonetic and URLs • Handy when interchanging content with external systems
  • 34. Apache Commons: Compress http://commons.apache.org/compress/ • Commons Track – Thursday Morning • Standard way to deal with archive formats • Read and write support • zip, tar, gzip, bzip, cpio and ar • Wider range of capabilities than java.util.Zip • Common API across all formats
  • 35. Apache Commons: Sanselan http://commons.apache.org/sanselan/ • Commons Track – Thursday Morning • Pure Java image reader and writer • Fast parsing of image metadata and information (size, color space, icc etc) • Much easier to use than ImageIO • Slower though, as pure Java • Wider range of formats supported • PNG, GIF, TIFF, JPEG + Exif, BMP, ICO, PNM, PPM, PSD, XMP
  • 37. Apache Forrest http://forrest.apache.org/ • Document rendering solution build on top of cocoon • Reads in content in a variety of formats (xml, wiki etc), applies the appropriate formatting rules, then outputs to different formats • Heavily used for documentation and websites • eg read in a file, format as changelog and readme, output as html + pdf
  • 38. Apache Abdera http://abdera.apache.org/ • Atom – syndication and publishing • High performance Java implementation of RFC 4287 + 5023 • Generate Atom feeds from Java or by converting • Parse and process Atom feeds • Atompub server and clients • Supports Atom extensions like GeoRSS, MediaRSS & OpenSearch
  • 39. Apache Droids (Incubating) http://incubator.apache.org/droids/ • Intelligent Robots! • Generic standalone crawler framework • Easy to extending existing common crawlers • Easy to write custom ones • Queue requests for content, protocol handler gets it, multi threaded • Uses Apache Tika for core of handling fetched resources
  • 40. Apache JSPWiki (Incubating) http://incubator.apache.org/jspwiki/ • Feature-rich extensible wiki • Written in Java (Servlets + JSP) • Fairly easy to extend • Can be used as a wiki out of the box • Provides a good platform for new wiki based application • Rich wiki markup and syntax • Attachments, security, templates etc
  • 41. Apache ManifoldCF (Incubating) http://incubator.apache.org/connectors/ • Name has changed a few times... (Lucene/Apache Connectors) • Provides a standard way to get content out of other systems, ready for sending to Lucene etc • Different goals to CMIS (Chemistry) • Uses many parsers and libraries to talk to the different repositories / systems • Analogous to Tika but for repos
  • 42. Apache PhotArk (Incubating) http://incubator.apache.org/photark/ • 5pm Thursday • Open Source Photo Gallery application • Standalone or servlet modes • Can host photos locally • Can aggregate external photo albums (Flickr, Picassa) for a unified view • SCA programming model – uses Apache Tuscany to power it
  • 44. Apache Chemistry (Incubating) http://incubator.apache.org/chemistry/ • 2pm Wednesday • Java, Python and PHP, Atom and WS* • OASIS CMIS (Content Management Interoperability Services) • Client and Server bindings • “SQL for Content” • Consistent view on content across different repositories • Read / Write / Manipulate content
  • 45. Chemistry vs ManifoldCF incubator /chemistry/ /connectors/ • ManifoldCF treats repo as nasty black box, and handles talking to the parsers • Chemistry talks / exposes repo's contents through CMIS • ManifoldCF supports a wider range of repositories • Chemistry supports read and write • Chemistry delivers a richer model • ManifoldCF great for getting text out
  • 46. Apache Lenya http://lenya.apache.org/ • 9am Thursday • XML Content Management system • Powered by Apache Cocoon • WSIWYG editors onto Relax-NG XML • Rich workflow engine + staging • Clean URLs, CSS for styling • Sensible handling of metadata, assets, internal links, users, permissions etc
  • 47. Apache Roller http://roller.apache.org/ • Multi-user blog server • Used by the ASF internally • Scales to thousands of users & blogs • Should work with any JavaEE servlet container and SQL database • Comment moderation and spam filters • Each author has full layout control • Indexes, feeds and Metaweblog API support for 3rd party clients
  • 48. Apache Shindig http://shindig.apache.org/ • Open Social Application Container • Hosts your open social widgets • Renders OpenSocial applications into HTML + JavaScript • Stores the data for your application • Full client-side JavaScript libraries to deliver gadget functionality • Reference implementation
  • 49. Apache Wookie (Incubating) http://incubator.apache.org/wookie/ • 5.30pm Wednesday • W3C Widgets server • Upload, Deploy and Host Widgets • Widgets can range from a badge, through a small app to a full-blown collaborative system like chat • Connector framework to make it easy to write widgets in many languages
  • 50. Web Frameworks (those with a strong Content focus to them)
  • 51. Apache Sling http://sling.apache.org/ • 12pm Wednesday • “Fun” and easy web framework • REST based • Backed by Jackrabbit content repo • Powered by OSGi • Easy to script, supports multiple output languages (JSP, server side javascript, scala etc) • Stores both templates and content
  • 52. Apache Tapestry http://tapestry.apache.org/ • Object Orientated web applications • Build your application in terms of objects, methods and properties • Tapestry handles URLs, query parameters and state for you • Pages built with simple HTML • Concentrate on the content that backs each part, and the business logic for it • Tapestry glues it together for you
  • 53. Apache Tiles http://tiles.apache.org/ • Templating framework for Java • Works well with Struts and Shale • Lets you build your page from lots of tiles (components), which can nest • Build tiles together to make templates • Clean separation between your content, the business logic to select it, and the rendering rules
  • 54. Apache Velocity http://velocity.apache.org/ • Templating engine • MVC webapp or standalone • Can generate HTML, SQL, PostScript, XML, Java Code or email from templates • Anakia lets you make a xdoc file available to a velocity template, handy when generating HTML from xdoc • Fairly rich templating language
  • 55. Apache Wicket http://wicket.apache.org/ • Build your web applications in Java • Uses Java in preference to JavaScript, CSS etc • Handy if you have a strong Java team and you need to do some web stuff • Fits well with your Java components • But JS / CSS front end devs tend to be cheaper than Java ones....
  • 56. Apache Clerezza (Incubating) http://incubator.apache.org/clerezza/ • OSGi based modular semantic web application framework • Lets you build applications that fit into the Semantic Web • Stores and easily manipulates RDF • Full control over REST and URIs • Build applications that both consume semantic data (eg RDF files), and that expose content to others
  • 57. Any Questions? Any cool projects that I happened to miss?