SlideShare a Scribd company logo
1 of 36
Download to read offline
Building SaaS solutions with
        Apache Solr
   Alberto Mijares, Canoo Engineering AG
   alberto.mijares@canoo.com, 26/05/2011
               Twitter: @lemaiol
Bullet point time!




                     3
What I Will Cover
§  Practical applications of Apache Solr and
    Apache Lucene: how to increase the time
    spent by a user in an website and do
    website “cross-selling”.
§  Use case: how Canoo helped Axel Springer
    Switzerland to increased the page
    impressions, user permanence time and
    traffic in their financial online newspapers.
§  Key concepts:
   •  How to achieve this using Lucene & Solr
   •  How to profit from a SaaS business model

                                                    4
Who I am
§  Alberto Mijares
§  Canoo Engineering AG
§  Background in web applications and standards:
  •  Participated in W3C Semantic Web interest
     group (SWEO)
  •  Led web standards compliance tools
     development in the past (Web Accessibility and
     Mobile Web)
  •  Led enterprise information retrieval projects in
     the recent past
  •  Actually coaching Google Web Toolkit projects’
     development
                                                        5
Who is Canoo
§  People:
   •  Dirk Koenig: Groovy founder
   •  Andres Almiray: Griffon project lead and Java
      Champion
   •  Hamlet D’Arcy: Groovy committer and enthusiast
   •  … almost 40 more top software engineers
§  Products:
   •    WebTest: framework for web functional testing
   •    RIA Suite (aka ULC): Java based RIA framework
   •    FindIT: information retrieval and search tools
   •    WMTrans: language analysis tools
                                                         6
Canoo FindIT




http://www.canoo.com/videos/FindIT.html




                                          7
Stop “bullet-pointing”!




                          8
The facts
Axel Springer group is a market leader


              Bilanz, Handelszeitung and Stocks


     In Switzerland financials are important!

Financial language is German

                         Online media is the future

                                                      9
The facts
Axel Springer group is a market leader


              Bilanz, Handelszeitung and Stocks


     In Switzerland financials are important!

Financial language is German

                         Online media is the future

                                                      10
The gap



Make the online versions more profitable



         Make all newspapers “market leaders”




                                                11
The gap



Make the online versions more profitable



         Make all newspapers “market leaders”




                                                12
The how

Workshop



                     “Related articles”



   “Cross-selling”


                                          13
The how

Workshop



                     “Related articles”



   “Cross-selling”


                                          14
The analysis
Use Lucene’s “More like this”


          Integrate back the suggestions


               Implement a selection mechanism


    Find a funding model

                                                 15
The analysis
Use Lucene’s “More like this”


          Integrate back the suggestions


               Implement a selection mechanism


    Find a funding model

                                                 16
The issues
 “More like this” was “experimental”



     Without “semantics” not always makes sense



Indexing full pages produces noise



               Works out-of-the-box only in English
                                                      17
The issues
 “More like this” was “experimental”



     Without “semantics” not always makes sense



Indexing full pages produces noise



               Works out-of-the-box only in English
                                                      18
The key




          19
The key




          20
The functional requirements

Discover and index articles




                       Extract only content




      Simple and flexible query service


                                              21
The functional requirements

Discover and index articles




                       Extract only content




      Simple and flexible query service


                                              22
The funding model




                    23
The business model




        SaaS




                     24
The “other” requirements
Lucene-based analysis pipeline

                           Web oriented platform

  Multi-application platform

                Reliable, fast and scalable

                                        Plan B?

                                                   25
The “other” requirements
Lucene-based analysis pipeline

                           Web oriented platform

  Multi-application platform

                Reliable, fast and scalable

                                        Plan B?

                                                   26
The search
Wraps Lucene in a nice way

                It is mature and Open Source

   Supports scheduling, REST API, DIH,…

                    Scalability out-of-the-box

Well documented and has professional support


                                                 27
The search
Wraps Lucene in a nice way

                It is mature and Open Source

   Supports scheduling, REST API, DIH…

                    Scalability out-of-the-box

Well documented and has professional support


                                                 28
The plan




From POC to PROD in “80 days”




                                29
The plan




From POC to PROD in “80 days”




                                30
The results




Google analytics




                   31
The results




Google analytics




                   32
The conclusions




                  33
The Q&A




 Thanks!




           34
Sources
§  Links
   •    http://people.canoo.com/share
   •    http://www.canoo.com
   •    http://www.canoo.net
   •    http://www.leo.org
   •    http://www.bilanz.ch
   •    http://www.handelszeitung.ch
   •    http://www.stocks.ch




                                        35
Contact
§  Alberto Mijares
   •  alberto.mijares@canoo.com
   •  Twitter: @lemaiol




                                  36
Architecture

 Platform: Apache Solr 1.4.1
 Architecture:
        Intern access              Extern access

Solr container              Web container

                                                   Requests
 Springer Solr              Springer WebApp

 Customer 2 Solr            Customer 2 WebApp

 Customer 3 Solr            Customer 3 WebApp

More Related Content

Viewers also liked

The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketPaul Williamson
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentationMarty Kaszubowski
 
"Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey""Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey"Lucidworks (Archived)
 
Maroon5
Maroon5Maroon5
Maroon5tanica
 
Tennis
TennisTennis
Tennisaritz
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
How The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open SourceHow The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open SourceLucidworks (Archived)
 
All the lovers
All the loversAll the lovers
All the loverstanica
 
Artist Update8 11
Artist Update8 11Artist Update8 11
Artist Update8 11LaRue
 
Hellosong
HellosongHellosong
Hellosongtanica
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Shining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoringShining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoringLucidworks (Archived)
 

Viewers also liked (17)

The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the market
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentation
 
"Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey""Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey"
 
Maroon5
Maroon5Maroon5
Maroon5
 
Tennis
TennisTennis
Tennis
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
How The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open SourceHow The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open Source
 
What’s new in apache lucene 3.0
What’s new in apache lucene 3.0What’s new in apache lucene 3.0
What’s new in apache lucene 3.0
 
All the lovers
All the loversAll the lovers
All the lovers
 
Simbad marinela
Simbad marinelaSimbad marinela
Simbad marinela
 
Artist Update8 11
Artist Update8 11Artist Update8 11
Artist Update8 11
 
Hellosong
HellosongHellosong
Hellosong
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
What’s new in apache lucene 3.0
What’s new in apache lucene 3.0What’s new in apache lucene 3.0
What’s new in apache lucene 3.0
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Shining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoringShining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoring
 

Similar to Building SaaS Solutions for Online Media Using Apache Solr

Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto MijareBuilding SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijarelucenerevolution
 
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijares
Building SaaS Solutions for Online Media Using Apache Solr - By  Alberto MijaresBuilding SaaS Solutions for Online Media Using Apache Solr - By  Alberto Mijares
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijareslucenerevolution
 
Orchestration of TEL proposals for the European Framework Programme
Orchestration of TEL proposals for the European Framework ProgrammeOrchestration of TEL proposals for the European Framework Programme
Orchestration of TEL proposals for the European Framework ProgrammeMarco Kalz
 
Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?Enterprise 2.0 Conference
 
Be My API How to Implement an API Strategy Everyone will Love
Be My API How to Implement an API Strategy Everyone will Love Be My API How to Implement an API Strategy Everyone will Love
Be My API How to Implement an API Strategy Everyone will Love CA API Management
 
The value of a platform approach for ECM
The value of a platform approach for ECMThe value of a platform approach for ECM
The value of a platform approach for ECMNuxeo
 
Cohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 ArgumentationCohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 ArgumentationSimon Buckingham Shum
 
Introduction to drupal
 Introduction to drupal Introduction to drupal
Introduction to drupalRachit Gupta
 
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Acquia
 
Shockingly Fast Site Development with Acquia Lightning 4.0
Shockingly Fast Site Development with Acquia Lightning 4.0Shockingly Fast Site Development with Acquia Lightning 4.0
Shockingly Fast Site Development with Acquia Lightning 4.0Rachel Wandishin
 
Introduction to the Art of API Practice
Introduction to the Art of API PracticeIntroduction to the Art of API Practice
Introduction to the Art of API PracticeBill Doerrfeld
 
OW2con'14 - OpenPaaS, the open source collaboration platform, Linagora
OW2con'14 - OpenPaaS, the open source collaboration platform, LinagoraOW2con'14 - OpenPaaS, the open source collaboration platform, Linagora
OW2con'14 - OpenPaaS, the open source collaboration platform, LinagoraOW2
 
Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...
Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...
Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...Tom Williams
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19OW2
 
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
PoolParty Semantic Suite -  LT-Innovate Industry Summit-2016 - BrusselsPoolParty Semantic Suite -  LT-Innovate Industry Summit-2016 - Brussels
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - BrusselsMartin Kaltenböck
 
Acquia Business Mandate Deck Final
Acquia Business Mandate Deck FinalAcquia Business Mandate Deck Final
Acquia Business Mandate Deck FinalAcquia
 

Similar to Building SaaS Solutions for Online Media Using Apache Solr (20)

Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto MijareBuilding SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijare
 
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijares
Building SaaS Solutions for Online Media Using Apache Solr - By  Alberto MijaresBuilding SaaS Solutions for Online Media Using Apache Solr - By  Alberto Mijares
Building SaaS Solutions for Online Media Using Apache Solr - By Alberto Mijares
 
Orchestration of TEL proposals for the European Framework Programme
Orchestration of TEL proposals for the European Framework ProgrammeOrchestration of TEL proposals for the European Framework Programme
Orchestration of TEL proposals for the European Framework Programme
 
Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?
 
Be My API How to Implement an API Strategy Everyone will Love
Be My API How to Implement an API Strategy Everyone will Love Be My API How to Implement an API Strategy Everyone will Love
Be My API How to Implement an API Strategy Everyone will Love
 
The value of a platform approach for ECM
The value of a platform approach for ECMThe value of a platform approach for ECM
The value of a platform approach for ECM
 
Cohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 ArgumentationCohere: Towards Web 2.0 Argumentation
Cohere: Towards Web 2.0 Argumentation
 
Building your career powered by open source
Building your career powered by open sourceBuilding your career powered by open source
Building your career powered by open source
 
Introduction to drupal
 Introduction to drupal Introduction to drupal
Introduction to drupal
 
OpenStack 2015 Marketing Plan
OpenStack 2015 Marketing PlanOpenStack 2015 Marketing Plan
OpenStack 2015 Marketing Plan
 
Content Publishing
Content PublishingContent Publishing
Content Publishing
 
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
 
OaaS:Open as a Strategy
OaaS:Open as a StrategyOaaS:Open as a Strategy
OaaS:Open as a Strategy
 
Shockingly Fast Site Development with Acquia Lightning 4.0
Shockingly Fast Site Development with Acquia Lightning 4.0Shockingly Fast Site Development with Acquia Lightning 4.0
Shockingly Fast Site Development with Acquia Lightning 4.0
 
Introduction to the Art of API Practice
Introduction to the Art of API PracticeIntroduction to the Art of API Practice
Introduction to the Art of API Practice
 
OW2con'14 - OpenPaaS, the open source collaboration platform, Linagora
OW2con'14 - OpenPaaS, the open source collaboration platform, LinagoraOW2con'14 - OpenPaaS, the open source collaboration platform, Linagora
OW2con'14 - OpenPaaS, the open source collaboration platform, Linagora
 
Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...
Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...
Autodesk Knowledge Network: A Knowledge Ecosystem Approach to Integrated Cont...
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19
 
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
PoolParty Semantic Suite -  LT-Innovate Industry Summit-2016 - BrusselsPoolParty Semantic Suite -  LT-Innovate Industry Summit-2016 - Brussels
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
 
Acquia Business Mandate Deck Final
Acquia Business Mandate Deck FinalAcquia Business Mandate Deck Final
Acquia Business Mandate Deck Final
 

More from Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucidworks (Archived)
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovationsLucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
 

Building SaaS Solutions for Online Media Using Apache Solr

  • 1. Building SaaS solutions with Apache Solr Alberto Mijares, Canoo Engineering AG alberto.mijares@canoo.com, 26/05/2011 Twitter: @lemaiol
  • 3. What I Will Cover §  Practical applications of Apache Solr and Apache Lucene: how to increase the time spent by a user in an website and do website “cross-selling”. §  Use case: how Canoo helped Axel Springer Switzerland to increased the page impressions, user permanence time and traffic in their financial online newspapers. §  Key concepts: •  How to achieve this using Lucene & Solr •  How to profit from a SaaS business model 4
  • 4. Who I am §  Alberto Mijares §  Canoo Engineering AG §  Background in web applications and standards: •  Participated in W3C Semantic Web interest group (SWEO) •  Led web standards compliance tools development in the past (Web Accessibility and Mobile Web) •  Led enterprise information retrieval projects in the recent past •  Actually coaching Google Web Toolkit projects’ development 5
  • 5. Who is Canoo §  People: •  Dirk Koenig: Groovy founder •  Andres Almiray: Griffon project lead and Java Champion •  Hamlet D’Arcy: Groovy committer and enthusiast •  … almost 40 more top software engineers §  Products: •  WebTest: framework for web functional testing •  RIA Suite (aka ULC): Java based RIA framework •  FindIT: information retrieval and search tools •  WMTrans: language analysis tools 6
  • 8. The facts Axel Springer group is a market leader Bilanz, Handelszeitung and Stocks In Switzerland financials are important! Financial language is German Online media is the future 9
  • 9. The facts Axel Springer group is a market leader Bilanz, Handelszeitung and Stocks In Switzerland financials are important! Financial language is German Online media is the future 10
  • 10. The gap Make the online versions more profitable Make all newspapers “market leaders” 11
  • 11. The gap Make the online versions more profitable Make all newspapers “market leaders” 12
  • 12. The how Workshop “Related articles” “Cross-selling” 13
  • 13. The how Workshop “Related articles” “Cross-selling” 14
  • 14. The analysis Use Lucene’s “More like this” Integrate back the suggestions Implement a selection mechanism Find a funding model 15
  • 15. The analysis Use Lucene’s “More like this” Integrate back the suggestions Implement a selection mechanism Find a funding model 16
  • 16. The issues “More like this” was “experimental” Without “semantics” not always makes sense Indexing full pages produces noise Works out-of-the-box only in English 17
  • 17. The issues “More like this” was “experimental” Without “semantics” not always makes sense Indexing full pages produces noise Works out-of-the-box only in English 18
  • 18. The key 19
  • 19. The key 20
  • 20. The functional requirements Discover and index articles Extract only content Simple and flexible query service 21
  • 21. The functional requirements Discover and index articles Extract only content Simple and flexible query service 22
  • 24. The “other” requirements Lucene-based analysis pipeline Web oriented platform Multi-application platform Reliable, fast and scalable Plan B? 25
  • 25. The “other” requirements Lucene-based analysis pipeline Web oriented platform Multi-application platform Reliable, fast and scalable Plan B? 26
  • 26. The search Wraps Lucene in a nice way It is mature and Open Source Supports scheduling, REST API, DIH,… Scalability out-of-the-box Well documented and has professional support 27
  • 27. The search Wraps Lucene in a nice way It is mature and Open Source Supports scheduling, REST API, DIH… Scalability out-of-the-box Well documented and has professional support 28
  • 28. The plan From POC to PROD in “80 days” 29
  • 29. The plan From POC to PROD in “80 days” 30
  • 34. Sources §  Links •  http://people.canoo.com/share •  http://www.canoo.com •  http://www.canoo.net •  http://www.leo.org •  http://www.bilanz.ch •  http://www.handelszeitung.ch •  http://www.stocks.ch 35
  • 35. Contact §  Alberto Mijares •  alberto.mijares@canoo.com •  Twitter: @lemaiol 36
  • 36. Architecture Platform: Apache Solr 1.4.1 Architecture: Intern access Extern access Solr container Web container Requests Springer Solr Springer WebApp Customer 2 Solr Customer 2 WebApp Customer 3 Solr Customer 3 WebApp