SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Smartlogic
                                     TM




 Lucene Revolution 2012
                                     	
  
                                     	
  
            Jeremy	
  Bentley,	
  CEO	
  
1st degree of order


Filing management
• 80% of enterprise information is
unstructured
• Doubling every 19 months and
accelerating [Gartner]
• Increasing burden of compliance
• Enterprise 2.0 additions
• Big Data connotations
2nd degree of order


Index management
• File plans and metadata schema
• Manually applied classification
• Low level of consistency and quality
3rd degree Order
                               Enterprise	
        Content	
  
                                 Search	
        Management	
  

              Portal	
  
          Infrastructure	
                                              Document	
  
                                                                   	
  Management	
  


                             Automation of
      SharePoint	
             1st & 2nd                                 Records	
  
                                                                       Management	
  
                               Degrees
           Publishing	
                                             Process	
  	
  
            Systems	
                                            Management	
  &	
  
                              Digital	
                            Workflow	
  
                               Asset	
  
                            Management	
  
                                                eDiscovery	
  
5	
  

A 10 year Flatline
          User	
  Search	
  
          Sa5sfac5on	
  



                   50%	
  
                                                                       48%	
  




                        2001	
                                       2011	
  
•  2001,	
  IDC,	
  “Quan5fying	
  Enterprise	
  Search”	
  
	
  Searchers	
  are	
  successful	
  in	
  finding	
  what	
  they	
  seek	
  50%	
  of	
  the	
  9me	
  or	
  less	
  	
  
        	
  
•  2011,	
  MindMetre/SmartLogic	
  
More	
  than	
  half	
  	
  (52%)	
  cannot	
  find	
  the	
  informa9on	
  they	
  need	
  using	
  their	
  Enterprise	
  
search	
  system	
  	
  
The explosion of information
                                                               80Tb	
  




                                                              ?	
  
                                    20	
  5mes	
  
      Terabytes	
  of	
  data	
  

                                    increase	
  in	
  
                                    Informa5on	
  
                                    volume	
  


                                          4Tb	
  




                                    1993-­‐2001	
          2001-­‐2009	
  

                                                         Source:	
  the	
  Na5onal	
  Archives	
  
Volume + other disruptive factors

Velocity	
  
	
  
Variety	
  
	
  
Complexity	
  
	
  
             	
  Cross-­‐organiza5onal	
  	
  and	
  cross	
  pla[orm	
  informa5on	
  needs	
  
	
  
             	
  Changing	
  requirements	
  for	
  informa5on	
  over	
  5me	
  
	
  




                                     Copyright	
  @	
  2011	
  Smartlogic	
  Semaphore	
  Limited	
     7	
  
New 4th degree of order
                               Enterprise	
        Content	
  
                                 Search	
        Management	
  

              Portal	
  
          Infrastructure	
                                              Document	
  
                                                                   	
  Management	
  




      SharePoint	
                        Content                        Records	
  
                                        Intelligence                   Management	
  



           Publishing	
                                             Process	
  	
  
            Systems	
                                            Management	
  &	
  
                              Digital	
                            Workflow	
  
                               Asset	
  
                            Management	
  
                                                eDiscovery	
  
Content Intelligence


                                                Informa5on	
  
                                               Manufacturing	
  
             Mone5sa5on	
  



                                                               Knowledge	
  
                                         Metadata	
             Recovery	
  
         Data	
  Loss	
  Preven5on	
  
          Risk	
  &	
  Compliance	
  


                                             Content	
  	
  
                                             Analy5cs	
  
Knowing what you have
Metadata
Information

   Subject	
                                                                                   Crea5on	
  Date	
  


 Loca5on	
                                                                                     Modified	
  Date	
  


    Project	
                                                                                  Author	
  


 Func5on	
                                                                                     Format	
  
                                                                                               (PDF,DOC,XLS)	
  
 (IT,HR,Finance)	
  
                                    Protec5ve	
  
                                       Marker	
  


                                                       Expiry	
  

                                                                    Publisher	
  
                       Expert	
  




                                                    Reten5on	
  




                                                                                    Site	
  
Process                                                                                        Structural
4th degree of order
Content Intelligence




                                         Content	
  Intelligence	
  Pla[orm	
  



                 	
  	
  	
  FAST	
  


                 SharePoint       	
  
What is Content Intelligence

               Content	
  Intelligence	
  is	
  the	
  process	
  of	
  	
  	
  
                                     	
  
                                         	
  
                                         	
  
                                         	
  
IDENTIFYING	
   CLASSIFYING	
            	
  
                                    EXTRACTING	
     ANALYZING	
       SURFACING	
  
                                         	
  
                                   informa5on	
  
           based	
  on	
  its	
  meaning	
  and	
  context	
  to	
  make	
  	
  
               !mely	
  and	
  informed	
  business	
  decisions.	
  
                                         	
  
Content Intelligence Solutions


 KNOWLEDGE	
  	
  
                              MICROTARGETING	
  
 ACQUSITION	
  
                              &	
  DISTRIBUTION	
  
    &	
  REUSE	
  



         GOVERNANCE,	
  
        COMPLIANCE	
  &	
             WEB-­‐BASED	
  
                RISK	
                SELF	
  SERVICE	
  
Big Data + Content Intelligence




                                  From	
  Gartner,	
  2011	
  	
  
Semaphore – Three Core Capabilities
                                                   Seman5c	
  	
     Ontology	
  	
  
                     Build,	
  Manage	
  and	
     Model	
           Manager	
  
                    Deploy	
  Vocabularies/	
  
                          Libraries	
  




               Expose	
                                                                     Apply	
  
                                              SEMAPHORE	
  
               Users      	
                                                            Content	
  
                                                                                                      ClassificaJon	
  
  SemanJc	
  
                                                                                                         Server	
  
Enhancement	
  
   Server	
                                          Inform	
  
           Explore	
  data	
  to	
  find	
                                         Automate	
  the	
  
                 insights	
                                                    Metadata	
  Enrichment	
  

                                                                                                                     16	
  
Enterprise Classification

Important	
  requirements	
  for	
  Velocity/Volume:	
  
•  Scalability	
  for	
  large	
  volumes	
  of	
  content,	
  users,	
  
   metadata	
  and	
  systems	
  
•  Easy	
  integra5on	
  with	
  processing	
  systems	
  -­‐	
  
   search,	
  content,	
  records	
  and	
  document	
  
   management	
  systems	
  as	
  well	
  as	
  file	
  shares	
  
   and	
  content	
  migra5on	
  tools	
  
•  Support	
  for	
  all	
  the	
  organiza5on‘s	
  languages	
  
   and	
  data	
  formats	
  
From Many Different Sources
Metadata Generation
 Information

    Brand                                                           Creation Date


   Service                                                          Modified Date


 Geography                                                          Author


  Products                                                          Format
                                                                    (PDF,DOC,XLS)
               Expert


                        Protective


                                     Retention
                           Marker




                                                 Publisher
                                        Expiry




                                                             Site
 Process                                                            Structural
Different Vocabulary and Ambiguity
You	
  Say	
           I	
  Say	
  
Perpetrator	
          Burglar	
  
                       Thief	
  
Swine	
  Flu	
         Swine	
  Influenza	
  Virus	
                          	
  Missing	
  results	
  
                       H1N1	
  
Touchscreen	
          Touch	
  screen	
  
                       Mul5-­‐touch	
  

You	
  Say	
           What	
  do	
  you	
  mean?	
  
Apple	
                A	
  fruit?	
  
                       Fiona	
  -­‐	
  A	
  singer	
  /	
  songwriter?	
  
                       An	
  electronics	
  company?	
  
Rights	
               Employment	
  rights?	
  
                       Equal	
  rights?	
                                    	
  Too	
  many	
  results	
  
                       Right	
  of	
  way?	
  
Ford	
                 Ford	
  Motor	
  
                       Forward	
  Industrials	
  (5cker=FORD)	
  
                       A	
  shallow	
  river	
  crossing	
  




       ©	
  2010	
                                                                                        20	
  
Without Accurate Metadata
	
  
	
       Big	
  Data	
  has	
  its	
  perils.	
  With	
  huge	
  data	
  
	
        sets	
  and	
  fine-­‐grained	
  measurement,	
  
                there	
  is	
  increased	
  risk	
  of	
  “false	
  
        discoveries.”	
  The	
  trouble	
  with	
  seeking	
  a	
  
        meaningful	
  needle	
  in	
  massive	
  haystacks	
  
        of	
  data	
  is	
  that	
  “many	
  bits	
  of	
  straw	
  look	
  
                                like	
  needles.”	
  
                                          	
  
       -­‐	
  Trevor	
  Has5e,	
  	
  
       Sta5s5cs	
  Professor	
  at	
  Stanford	
  University	
  	
  
What Classification Must Handle
Capability	
                                                                            Included	
  
Look	
  for	
  all	
  the	
  vocabulary	
  associated	
  with	
  topic/en5ty	
  
Determine	
  aboutness	
  /	
  avoid	
  passing	
  men5ons	
  
Address	
  term	
  ambiguity	
  
Handle	
  stemming	
  errors	
  
Determine	
  if	
  topics	
  in	
  the	
  same	
  context	
  
Split	
  documents	
  into	
  components	
  
Generate	
  scores	
  (so	
  most	
  relevant	
  content	
  bubbles	
  to	
  top)	
  
Show	
  dynamic	
  summaries	
  to	
  users	
  
Enhancing Metadata
•  Accurately	
  classify	
  content	
  into	
  subject	
  areas	
  
   defined	
  in	
  a	
  taxonomy/ontology	
  
•  En5ty	
  extrac5on	
  (Text	
  Mining)	
  
•  Sen5ment	
  Analysis	
  
•  Fact	
  Extrac5on	
  
Physical Architecture
   Ontology	
  Management	
  Services	
  
                                    Ontology	
  Manager	
                                             Ontology	
  Manager	
  Desktop	
                                             Ontology	
  Manager	
  Desktop	
  
                                   Standalone	
  Desktop	
  
                                         Win	
  7,	
  Vista	
                                                          Win	
  7,	
  Vista	
                                                                           Win7,	
  Vista	
  
                                         2Gb	
  RAM	
                                                                  2Gb	
  RAM	
                                                                                   2Gb	
  RAM	
  
                                         2GHz	
  Dual	
  CPU	
                                                         2GHz	
  Dual	
  CPU	
                                                                          2GHz	
  Dual	
  CPU	
  




                                   Op5onal	
  RDBMS	
  data	
  store	
                                 Ontology	
  Manager	
  Server	
  
                                                Oracle	
                                                                          Port	
  8001	
                Port	
  8002	
  
                                               MySQL	
                                                                                                                                        Win	
  7,	
  Vista,	
  2003,	
  2008	
  +R2	
  
                                                                                                                                   Ontology	
                    Ontology	
                   Linux	
  
                                    SQL	
  Server	
  2005	
  +	
  2008	
  +	
                                                     Instance	
  1	
               Instance	
  2	
               2Gb	
  RAM	
  
                                               2008	
  R2	
                                                                                                                                   2GHz	
  CPU	
  



   Seman5c	
  Enhancement	
  Server	
                                                                                                    Content	
  Classifica5on	
  Server	
  
    Search	
  Enhancement	
  Server	
                                                                                                        Classifica5on	
  Server	
                                                                             Classifica5on	
  Test	
  Interface	
  
                                                                                                                                                                                             Port	
  5058	
  
                                                Search	
                              GSA	
  Extensions	
                                                                                  Classifica5on	
                                                           Internet	
  Explorer	
  
                                             Enhancement	
                           FAST	
  Extensions	
                                                                                     Instance	
                                                            Firefox	
  
                                               Instance	
                          Sharepoint	
  Extensions	
  
                                                                                                                                                                                                                                                  Rule	
  and	
  Template	
  Editor	
  
      Windows	
  Server	
  2003	
  ,2008	
  (32bit/64bit)	
  +R2	
                                                                               Windows	
  Server	
  2003	
  ,2008	
  (32bit/64bit)	
  +	
  R2	
                                                   Win	
  7,	
  Vista	
  
      Linux	
                                                                                                                                    Linux	
                                                                                                            2Gb	
  RAM	
  
      IIS/Apache	
  HTTP	
  Server	
                                                                                                             CPU	
  	
  and	
  RAM	
  intensive.	
  Scale	
  to	
  volume	
  of	
  content	
                                    2GHz	
  Dual	
  CPU	
  
      RAM	
  and	
  disk	
  access	
  intensive.	
  Scale	
  to	
  expected	
  peak	
  search	
  throughput	
                                    and	
  number	
  of	
  publishing	
  users	
  




    Google	
  Classifica5on	
  Handler	
                                                                                                                                                                                                             Integra5on	
  Components	
  
                                Dispatcher	
  
                                   Proxy	
  

   Windows	
  Server	
  2003	
  ,2008	
  (32bit/64bit)	
  +R2	
  
   Scale	
  for	
  throughput	
  of	
  GSA	
  Indexing	
  Crawler	
  
                                                                                       Search	
  Applica5on	
  Framework	
                         Search	
  Applica5on	
  Framework	
  
                                                                                                                                                                                                                                           Document	
  Library	
  Components	
  
                                                                                     Semaphore	
  Document	
  Processor	
                        Semaphore	
  Document	
  Processor	
  
           Search	
  Applica5on	
  Framework	
                                                                                                                                                                                                    Search	
  Web	
  Parts	
  

                                                                                         Microsou	
  FAST	
  ESP	
                                                                                                              Microsou	
  Office	
  SharePoint	
  
         Google	
  Search	
  Appliance	
                                                    Server	
  Farm	
                                                             SOLR	
                                               Server	
  2007	
  /	
  	
  2010	
  Server	
  Farm	
  
Leveraging Metadata Schemes
Examples – Customer Service
Examples – Following Trends
Examples – Fact Extraction
How Else Does Semaphore Help
                            Disambiguate queries
         	
  	
  

Perfectly formed filters
 organised by facet


                             Graphical drill down




                            Explore relationships




                           Supporting documents
Happy, Successful Customers

Más contenido relacionado

La actualidad más candente

Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
Rubik Open Integration Portal
Rubik Open Integration PortalRubik Open Integration Portal
Rubik Open Integration Portalbob_ark
 
Sap Supplier Risk Performance 2011
Sap Supplier Risk  Performance 2011Sap Supplier Risk  Performance 2011
Sap Supplier Risk Performance 2011Henner Schliebs
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawlmartingarland
 
2011 Sharepoint Summit - Overview of enterprise content management in share_...
2011 Sharepoint Summit - Overview of enterprise content management  in share_...2011 Sharepoint Summit - Overview of enterprise content management  in share_...
2011 Sharepoint Summit - Overview of enterprise content management in share_...MSHOWTO Bilisim Toplulugu
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDon Jackson
 

La actualidad más candente (6)

Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
Rubik Open Integration Portal
Rubik Open Integration PortalRubik Open Integration Portal
Rubik Open Integration Portal
 
Sap Supplier Risk Performance 2011
Sap Supplier Risk  Performance 2011Sap Supplier Risk  Performance 2011
Sap Supplier Risk Performance 2011
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
 
2011 Sharepoint Summit - Overview of enterprise content management in share_...
2011 Sharepoint Summit - Overview of enterprise content management  in share_...2011 Sharepoint Summit - Overview of enterprise content management  in share_...
2011 Sharepoint Summit - Overview of enterprise content management in share_...
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 

Similar a Big Data Meets Metadata – Analyzing Large Data Sets

SharePoint Saturday DC by ImageTech Systems - David Strock
SharePoint Saturday DC by ImageTech Systems - David StrockSharePoint Saturday DC by ImageTech Systems - David Strock
SharePoint Saturday DC by ImageTech Systems - David StrockJeff Shuey
 
CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...
CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...
CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...Alfresco Software
 
10 key decisions_your_ecm_checklist
10 key decisions_your_ecm_checklist10 key decisions_your_ecm_checklist
10 key decisions_your_ecm_checklistQuestexConf
 
SharePoint & ERM
SharePoint & ERMSharePoint & ERM
SharePoint & ERMNick Inglis
 
Moss 2007 Technology Briefing
Moss 2007 Technology BriefingMoss 2007 Technology Briefing
Moss 2007 Technology BriefingTeguhsantoso
 
Innovative_ecm_with_sharepoint_2010
Innovative_ecm_with_sharepoint_2010Innovative_ecm_with_sharepoint_2010
Innovative_ecm_with_sharepoint_2010QuestexConf
 
Information opportunities in social, mobile, and cloud technologies
Information opportunities in social, mobile, and cloud technologiesInformation opportunities in social, mobile, and cloud technologies
Information opportunities in social, mobile, and cloud technologiesJohn Mancini
 
Nj sharepoint user group
Nj sharepoint user groupNj sharepoint user group
Nj sharepoint user groupPeter1020
 
Why Should Consultants and Systems Integrators Become Certified Information P...
Why Should Consultants and Systems Integrators Become Certified Information P...Why Should Consultants and Systems Integrators Become Certified Information P...
Why Should Consultants and Systems Integrators Become Certified Information P...John Mancini
 
Presenting SharePoint as a service back to your organization
Presenting SharePoint as a service back to your organizationPresenting SharePoint as a service back to your organization
Presenting SharePoint as a service back to your organizationJeremy Thake
 
20100430 introduction to business objects data services
20100430 introduction to business objects data services20100430 introduction to business objects data services
20100430 introduction to business objects data servicesJunhyun Song
 
94670552 alfresco-aiim-2006-05-16
94670552 alfresco-aiim-2006-05-1694670552 alfresco-aiim-2006-05-16
94670552 alfresco-aiim-2006-05-16hishamfire
 
Envision IT - Designing your SharePoint Extranet to work for you
Envision IT - Designing your SharePoint Extranet to work for youEnvision IT - Designing your SharePoint Extranet to work for you
Envision IT - Designing your SharePoint Extranet to work for youEnvision IT
 
Asug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAPAsug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAPBrendan Kane
 
SharePoint for information Management in The Legal Profession
SharePoint for information Management in The Legal ProfessionSharePoint for information Management in The Legal Profession
SharePoint for information Management in The Legal ProfessionCSIRO National AI Centre
 
Share Point Presentation Introduction To Sharepoint
Share Point Presentation    Introduction To SharepointShare Point Presentation    Introduction To Sharepoint
Share Point Presentation Introduction To Sharepointrpeterson1
 
Business process-outsourcing and ECM 02-04-09
Business process-outsourcing and ECM 02-04-09Business process-outsourcing and ECM 02-04-09
Business process-outsourcing and ECM 02-04-09Ganesha DM
 

Similar a Big Data Meets Metadata – Analyzing Large Data Sets (20)

SharePoint Saturday DC by ImageTech Systems - David Strock
SharePoint Saturday DC by ImageTech Systems - David StrockSharePoint Saturday DC by ImageTech Systems - David Strock
SharePoint Saturday DC by ImageTech Systems - David Strock
 
CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...
CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...
CASE-6 Structured Content Authoring and Publishing through Alfresco and Compo...
 
10 key decisions_your_ecm_checklist
10 key decisions_your_ecm_checklist10 key decisions_your_ecm_checklist
10 key decisions_your_ecm_checklist
 
SharePoint & ERM
SharePoint & ERMSharePoint & ERM
SharePoint & ERM
 
IT Governance Portals
IT Governance   PortalsIT Governance   Portals
IT Governance Portals
 
Moss 2007 Technology Briefing
Moss 2007 Technology BriefingMoss 2007 Technology Briefing
Moss 2007 Technology Briefing
 
Innovative_ecm_with_sharepoint_2010
Innovative_ecm_with_sharepoint_2010Innovative_ecm_with_sharepoint_2010
Innovative_ecm_with_sharepoint_2010
 
Information opportunities in social, mobile, and cloud technologies
Information opportunities in social, mobile, and cloud technologiesInformation opportunities in social, mobile, and cloud technologies
Information opportunities in social, mobile, and cloud technologies
 
Nj sharepoint user group
Nj sharepoint user groupNj sharepoint user group
Nj sharepoint user group
 
Why Should Consultants and Systems Integrators Become Certified Information P...
Why Should Consultants and Systems Integrators Become Certified Information P...Why Should Consultants and Systems Integrators Become Certified Information P...
Why Should Consultants and Systems Integrators Become Certified Information P...
 
Presenting SharePoint as a service back to your organization
Presenting SharePoint as a service back to your organizationPresenting SharePoint as a service back to your organization
Presenting SharePoint as a service back to your organization
 
20100430 introduction to business objects data services
20100430 introduction to business objects data services20100430 introduction to business objects data services
20100430 introduction to business objects data services
 
94670552 alfresco-aiim-2006-05-16
94670552 alfresco-aiim-2006-05-1694670552 alfresco-aiim-2006-05-16
94670552 alfresco-aiim-2006-05-16
 
Envision IT - Designing your SharePoint Extranet to work for you
Envision IT - Designing your SharePoint Extranet to work for youEnvision IT - Designing your SharePoint Extranet to work for you
Envision IT - Designing your SharePoint Extranet to work for you
 
AIS SharePoint & BI Presentation 24th july 2012
AIS SharePoint & BI Presentation 24th july 2012AIS SharePoint & BI Presentation 24th july 2012
AIS SharePoint & BI Presentation 24th july 2012
 
Asug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAPAsug SAP HANA Presentation - Perceptive Technologies SAP
Asug SAP HANA Presentation - Perceptive Technologies SAP
 
SharePoint for information Management in The Legal Profession
SharePoint for information Management in The Legal ProfessionSharePoint for information Management in The Legal Profession
SharePoint for information Management in The Legal Profession
 
Share Point Presentation Introduction To Sharepoint
Share Point Presentation    Introduction To SharepointShare Point Presentation    Introduction To Sharepoint
Share Point Presentation Introduction To Sharepoint
 
Business process-outsourcing and ECM 02-04-09
Business process-outsourcing and ECM 02-04-09Business process-outsourcing and ECM 02-04-09
Business process-outsourcing and ECM 02-04-09
 
E biz blueprint
E biz blueprintE biz blueprint
E biz blueprint
 

Más de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Más de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Big Data Meets Metadata – Analyzing Large Data Sets

  • 1. Smartlogic TM Lucene Revolution 2012     Jeremy  Bentley,  CEO  
  • 2. 1st degree of order Filing management • 80% of enterprise information is unstructured • Doubling every 19 months and accelerating [Gartner] • Increasing burden of compliance • Enterprise 2.0 additions • Big Data connotations
  • 3. 2nd degree of order Index management • File plans and metadata schema • Manually applied classification • Low level of consistency and quality
  • 4. 3rd degree Order Enterprise   Content   Search   Management   Portal   Infrastructure   Document    Management   Automation of SharePoint   1st & 2nd Records   Management   Degrees Publishing   Process     Systems   Management  &   Digital   Workflow   Asset   Management   eDiscovery  
  • 5. 5   A 10 year Flatline User  Search   Sa5sfac5on   50%   48%   2001   2011   •  2001,  IDC,  “Quan5fying  Enterprise  Search”    Searchers  are  successful  in  finding  what  they  seek  50%  of  the  9me  or  less       •  2011,  MindMetre/SmartLogic   More  than  half    (52%)  cannot  find  the  informa9on  they  need  using  their  Enterprise   search  system    
  • 6. The explosion of information 80Tb   ?   20  5mes   Terabytes  of  data   increase  in   Informa5on   volume   4Tb   1993-­‐2001   2001-­‐2009   Source:  the  Na5onal  Archives  
  • 7. Volume + other disruptive factors Velocity     Variety     Complexity      Cross-­‐organiza5onal    and  cross  pla[orm  informa5on  needs      Changing  requirements  for  informa5on  over  5me     Copyright  @  2011  Smartlogic  Semaphore  Limited   7  
  • 8. New 4th degree of order Enterprise   Content   Search   Management   Portal   Infrastructure   Document    Management   SharePoint   Content Records   Intelligence Management   Publishing   Process     Systems   Management  &   Digital   Workflow   Asset   Management   eDiscovery  
  • 9. Content Intelligence Informa5on   Manufacturing   Mone5sa5on   Knowledge   Metadata   Recovery   Data  Loss  Preven5on   Risk  &  Compliance   Content     Analy5cs  
  • 11. Metadata Information Subject   Crea5on  Date   Loca5on   Modified  Date   Project   Author   Func5on   Format   (PDF,DOC,XLS)   (IT,HR,Finance)   Protec5ve   Marker   Expiry   Publisher   Expert   Reten5on   Site   Process Structural
  • 12. 4th degree of order Content Intelligence Content  Intelligence  Pla[orm        FAST   SharePoint  
  • 13. What is Content Intelligence Content  Intelligence  is  the  process  of               IDENTIFYING   CLASSIFYING     EXTRACTING   ANALYZING   SURFACING     informa5on   based  on  its  meaning  and  context  to  make     !mely  and  informed  business  decisions.    
  • 14. Content Intelligence Solutions KNOWLEDGE     MICROTARGETING   ACQUSITION   &  DISTRIBUTION   &  REUSE   GOVERNANCE,   COMPLIANCE  &   WEB-­‐BASED   RISK   SELF  SERVICE  
  • 15. Big Data + Content Intelligence From  Gartner,  2011    
  • 16. Semaphore – Three Core Capabilities Seman5c     Ontology     Build,  Manage  and   Model   Manager   Deploy  Vocabularies/   Libraries   Expose   Apply   SEMAPHORE   Users   Content   ClassificaJon   SemanJc   Server   Enhancement   Server   Inform   Explore  data  to  find   Automate  the   insights   Metadata  Enrichment   16  
  • 17. Enterprise Classification Important  requirements  for  Velocity/Volume:   •  Scalability  for  large  volumes  of  content,  users,   metadata  and  systems   •  Easy  integra5on  with  processing  systems  -­‐   search,  content,  records  and  document   management  systems  as  well  as  file  shares   and  content  migra5on  tools   •  Support  for  all  the  organiza5on‘s  languages   and  data  formats  
  • 19. Metadata Generation Information Brand Creation Date Service Modified Date Geography Author Products Format (PDF,DOC,XLS) Expert Protective Retention Marker Publisher Expiry Site Process Structural
  • 20. Different Vocabulary and Ambiguity You  Say   I  Say   Perpetrator   Burglar   Thief   Swine  Flu   Swine  Influenza  Virus    Missing  results   H1N1   Touchscreen   Touch  screen   Mul5-­‐touch   You  Say   What  do  you  mean?   Apple   A  fruit?   Fiona  -­‐  A  singer  /  songwriter?   An  electronics  company?   Rights   Employment  rights?   Equal  rights?    Too  many  results   Right  of  way?   Ford   Ford  Motor   Forward  Industrials  (5cker=FORD)   A  shallow  river  crossing   ©  2010   20  
  • 21. Without Accurate Metadata     Big  Data  has  its  perils.  With  huge  data     sets  and  fine-­‐grained  measurement,   there  is  increased  risk  of  “false   discoveries.”  The  trouble  with  seeking  a   meaningful  needle  in  massive  haystacks   of  data  is  that  “many  bits  of  straw  look   like  needles.”     -­‐  Trevor  Has5e,     Sta5s5cs  Professor  at  Stanford  University    
  • 22. What Classification Must Handle Capability   Included   Look  for  all  the  vocabulary  associated  with  topic/en5ty   Determine  aboutness  /  avoid  passing  men5ons   Address  term  ambiguity   Handle  stemming  errors   Determine  if  topics  in  the  same  context   Split  documents  into  components   Generate  scores  (so  most  relevant  content  bubbles  to  top)   Show  dynamic  summaries  to  users  
  • 23. Enhancing Metadata •  Accurately  classify  content  into  subject  areas   defined  in  a  taxonomy/ontology   •  En5ty  extrac5on  (Text  Mining)   •  Sen5ment  Analysis   •  Fact  Extrac5on  
  • 24. Physical Architecture Ontology  Management  Services   Ontology  Manager   Ontology  Manager  Desktop   Ontology  Manager  Desktop   Standalone  Desktop   Win  7,  Vista   Win  7,  Vista   Win7,  Vista   2Gb  RAM   2Gb  RAM   2Gb  RAM   2GHz  Dual  CPU   2GHz  Dual  CPU   2GHz  Dual  CPU   Op5onal  RDBMS  data  store   Ontology  Manager  Server   Oracle   Port  8001   Port  8002   MySQL   Win  7,  Vista,  2003,  2008  +R2   Ontology   Ontology   Linux   SQL  Server  2005  +  2008  +   Instance  1   Instance  2   2Gb  RAM   2008  R2   2GHz  CPU   Seman5c  Enhancement  Server   Content  Classifica5on  Server   Search  Enhancement  Server   Classifica5on  Server   Classifica5on  Test  Interface   Port  5058   Search   GSA  Extensions   Classifica5on   Internet  Explorer   Enhancement   FAST  Extensions   Instance   Firefox   Instance   Sharepoint  Extensions   Rule  and  Template  Editor   Windows  Server  2003  ,2008  (32bit/64bit)  +R2   Windows  Server  2003  ,2008  (32bit/64bit)  +  R2   Win  7,  Vista   Linux   Linux   2Gb  RAM   IIS/Apache  HTTP  Server   CPU    and  RAM  intensive.  Scale  to  volume  of  content   2GHz  Dual  CPU   RAM  and  disk  access  intensive.  Scale  to  expected  peak  search  throughput   and  number  of  publishing  users   Google  Classifica5on  Handler   Integra5on  Components   Dispatcher   Proxy   Windows  Server  2003  ,2008  (32bit/64bit)  +R2   Scale  for  throughput  of  GSA  Indexing  Crawler   Search  Applica5on  Framework   Search  Applica5on  Framework   Document  Library  Components   Semaphore  Document  Processor   Semaphore  Document  Processor   Search  Applica5on  Framework   Search  Web  Parts   Microsou  FAST  ESP   Microsou  Office  SharePoint   Google  Search  Appliance   Server  Farm   SOLR   Server  2007  /    2010  Server  Farm  
  • 28. Examples – Fact Extraction
  • 29. How Else Does Semaphore Help Disambiguate queries     Perfectly formed filters organised by facet Graphical drill down Explore relationships Supporting documents