Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Alfresco Search - Meetup for Government

533 visualizaciones

Publicado el

Alfresco Search with Apache Solr. Presentation delivered by Sergio Rojas, CTO at Zaizi for the Alfresco Meetup for government.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Alfresco Search - Meetup for Government

  1. 1. Work Together Effectively Work Together Effectively
  2. 2. Work Together Effectively Alfresco Search: Tips from my experience Sergio Rojas (CTO at Zaizi)
  3. 3. Work Together Effectively About myself •  Over 10 years experience on Data and Alfresco projects as a developer and technical architect. •  CTO at Zaizi in charge of building technical capability and leading technical team
  4. 4. Work Together Effectively Why is search key •  Every day we create the equivalent of 4 Eiffel Towers worth of Blue-Ray Discs worth of Data (2.5 quintillion bytes) •  The data is distributed across the whole world in many different formats and devices
  5. 5. Work Together Effectively Why is search key
  6. 6. Work Together Effectively Zaizi and Search •  Zaizi has delivered search capabilities to organisations like BEIS, MOJ, DFE, IICSA, Bristol City Council, etc … •  Part of Zaizi’s R&D team is specifically dedicated to improve search on organisations with huge amounts of data distributed across the world •  Sensefy: Zaizi’s solution for enterprise and semantic Search
  7. 7. Work Together Effectively Agenda 1.  Short presentation on Alfresco Search – Lessons learned (medium technical level) 2.  Round table on Search - Enterprise search, semantic search, search across linked data, future of search … (Business level)
  8. 8. Work Together Effectively Experience “The only source of Knowledge is experience” Albert Einstein
  9. 9. Work Together Effectively Terminology •  Alfresco: Open and scalable enterprise content management system. •  Solr: Search platform built on Apache Lucene that provides distributed indexing and searching for Alfresco.
  10. 10. Work Together Effectively Terminology •  Tracking: Look for new documents and folders in the system with the objective of creating new Indexes to enable search for those new documents and folders •  Searching: Execute queries to find documents or folders matching certain criteria
  11. 11. Work Together Effectively You must know – number 1 Solr Search in Alfresco is not transactional Document metadata and content search may or may not be available immediately after creating the document in Alfresco
  12. 12. Work Together Effectively Challenge •  Some B2B scenarios might require transactional search
  13. 13. Work Together Effectively B2B Example 1.  Activiti creates a folder in alfresco and set idPerson attribute to “Ana” 2.  Immediately after (less than 1 second) Activiti performs a search for all the folders with idPerson “Ana” 3.  By default Solr Alfresco tracks every 5 seconds for new elements so the index does not yet contain the last folder created. 4.  For real time environments this could be a problem
  14. 14. Work Together Effectively Potential solution •  Transactional search is normally required for metadata (e.g. unique identifiers) •  Alfresco transactional metadata query: Uses database for metadata query and solr for content search.
  15. 15. Work Together Effectively Alfresco transactional metadata query •  Addresses B2B scenario •  Ideal when Content Search is not required (Index server license not required)
  16. 16. Work Together Effectively You must know – number 2 Problems will come as the data repository grows in size
  17. 17. Work Together Effectively Common scenario •  20 million Documents •  15 Terabytes of Content •  2TB of Solr Indexes
  18. 18. Work Together Effectively Potential issues •  Delay in Indexing: Long time between document creation and the document being available to be searched for •  Search performance: Searches take more time than expected •  Index corruption: Some of the documents are not found
  19. 19. Work Together Effectively Look after your indexes •  Index gets corrupted sometimes resulting in inconsistent search results •  On small indices it is easy to completely rebuild the indexes •  This is not possible in environments with a significant amount of documents and folders
  20. 20. Work Together Effectively Real scenario - BEIS •  Alfresco 4.2.4 •  Solr 1.4 •  15 million nodes (document + folders) •  12 TB of content •  2TB of indices •  Dedicated Index Server (8CPU cores, 96GB RAM) •  Between 2 and 3 weeks to completely rebuild the index
  21. 21. Work Together Effectively Look after your indexes – Sharding •  Available only from Alfresco 5.1 •  Divide index into multiple pieces called Shards •  Indexing can be done in parallel for different shards
  22. 22. Work Together Effectively Look after your indexes – Recommendations •  Backup your indexes regularly (daily) •  Avoid full rebuild à Tools to maintain Solr indexes
  23. 23. Work Together Effectively Tools to maintain your indexes •  FIX: Repair unindexed transactions http://localhost:8080/solr4/admin/cores?action=FIX •  PURGE: Remove transactions, acl transactions, nodes and acls from the index http://localhost:8080/solr4/admin/cores? action=PURGE&txid=1&acltxid=2&nodeid=3&aclid=4 •  REINDEX: Reindex a transaction, acl transactions, nodes and acls http://localhost:8080/solr4/admin/cores? action=REINDEX&txid=1&acltxid=2&nodeid=3&aclid=4
  24. 24. Work Together Effectively Tools to maintain your indexes •  INDEX: Create entries in the index http://localhost:8080/solr4/admin/cores? action=INDEX&txid=1&acltxid=2&nodeid=3&aclid=4 •  RETRY: Retry indexing any node that failed to index and was skipped http://localhost:8080/solr4/admin/cores?action=RETRY •  REPORT: Generate a report on the status of indexes http://localhost:8080/solr4/admin/cores?action=REPORT&wt=xml
  25. 25. Work Together Effectively What is coming with Alfresco 5.2? •  Integration with Solr6 •  Enhanced Index Sharding (e.g. shard based on Date) •  New Search Public API including support for paging, filter queries, faceting, spellcheck, highlighting, sorting, field selection •  Numerous performance enhancements
  26. 26. Work Together Effectively What is coming with Alfresco 5.2? •  Term Hit Highlighting (name/title/description/content) in search results •  Bulk actions on search results e.g. copy, move, start workflow, delete •  Live search within the context of a site
  27. 27. Work Together Effectively Conclusions & Recommendations •  If possible upgrade to Alfresco 5.1 or even 5.2 when available to benefit from Solr Sharding capabilities •  Bear in mind that Solr Search is not transactional •  Keep your indexes backed up and perform regular checks to make sure they are healthy
  28. 28. Work Together Effectively Breakout Session
  29. 29. Work Together Effectively Search in Alfresco: Tips from my experience Sergio Rojas (CTO at Zaizi)

×