SlideShare una empresa de Scribd logo
1 de 39
Angel Borroy
Tom Page
10th June 2020
(Re)Indexing
Large Repositories
22
Agenda
(Re)Indexing Large Repositories
• Alfresco SOLR in a Nutshell
• Indexing Process
• Indexing Scenarios
• When to Re-Index
• Deployment Alternatives
• Demo time without downtime!
• Benchmark Review
• Improvements in 1.4.2
• Future Improvements
• Recap
Alfresco SOLR
3
Alfresco SOLR in a Nutshell
SOLR 6 is used in Alfresco to perform two main processes:
• Indexing (or tracking) metadata, permissions and content from Alfresco Repository
• Returning results from search queries supporting several syntaxes (AFTS, CMIS)
Indexing process
Asynchronous
4
Searching process
Eventual consistency
SOLR is indexing the information after the database has committed the transaction, so there is a short period of time
when not all the documents are available in SOLR Index. We call this eventual consistency, as SOLR will catch up with
Repository eventually.
Syntax
AFTS
CMIS
Alfresco SOLR in a Nutshell
Permission
Checks
Synchronous
5
Alfresco SOLR in a Nutshell
Alfresco SOLR Storage
By default two SOLR cores are created, one for the living documents (alfresco) and one for the removed documents
(archive).
Each core includes following storage folders:
• Default SOLR Index files in the solrhome/<core>/index folder
• Alfresco customized Content Store in the contentstore folder
• This folder includes a cached copy of Repository content and metadata
• Content Store will be removed in Search Services 2.0
“These folders are populated by the Indexing Process
6
Indexing process
● Each tracker is fired asynchronously according to a cron expression: alfresco.cron or alfresco.*.tracker.cron
● Transactions and ACL Change Sets are processed in batches of Nodes or ACLs
● Batches are split to be executed in parallel by Workers
● However, Content Tracker recovers text from content nodes one by one
● Commit Tracker writes the changes from the different Trackers to SOLR Index "eventually"
>> Cascade Tracker is not running when indexing from scratch
7
Indexing scenarios
1. When updating the repository using applications or bulk ingestion
processes, the transactions will include a long list of nodes to be
indexed
2. When using Alfresco Share to create new content, there will be
more transactions but every transaction will include a small list of
nodes to be indexed
3. When setting the permission level for every node in a hierarchy
manually, the ACL Change Sets will include a long list of ACLs to
be indexed
4. When using default Alfresco permissions design, the ACL
Change Sets will include a small list of ACLs to be indexed
5. When using complex format of documents, Transformation
Service will require additional resources
6. When using large documents, SOLR Index will require additional
storage
8
Indexing scenarios
Controlling what to index
• Content can be excluded from SOLR Index by configuration
solrcore.properties > alfresco.index.transformContent=false
https://docs.alfresco.com/search-community/concepts/solrcore-properties-file.html
• Some nodes can be excluded from SOLR Index by using the Index Control aspect
cm:indexControl > cm:isIndexed :: false, metadata and content is not
indexed
cm:indexControl > cm:isContentIndexed :: false, content is not indexed
https://docs.alfresco.com/community/concepts/admin-indexes.html
• Some properties can be excluded from SOLR Index by design in the Content Model
<property>
<index enabled=”false”/>
</property>
https://docs.alfresco.com/community/references/dev-extension-points-content-model-define-and-deploy.html
Add this setting to
archive
core by default!
9
Re-indexing process can take some time, from a few hours to a few days, in large repositories.
Full re-index
• When upgrading to a major Search Services release, like 2.0
• When the SOLR Index has been corrupted, due to technical reasons
• When breaking changes are introduced in common custom Content Models
Partial re-index
• This process could also take some time, depending on the amount of documents to be re-indexed. But it will take
less than a full re-index
• When incremental changes are introduced in a Content Model, partial reindexation can be fired by using the SOLR
REST API
http://localhost:8983/solr/admin/cores?action=reindex&query=TYPE:person
When to re-index
10
Deployment alternatives
https://docs.alfresco.com/sie/concepts/solr-shard-overview.html
https://docs.alfresco.com/sie/concepts/solr-replication.html
https://docs.alfresco.com/sie/tasks/solr-install.html
11
• Using the ZIP Distribution file
https://docs.alfresco.com/search-community/concepts/solr-install-config.html
• Using Docker or Docker Compose
https://github.com/Alfresco/SearchServices/tree/master/search-services
https://github.com/Alfresco/acs-community-deployment/tree/master/docker-compose
https://github.com/Alfresco/alfresco-docker-installer
• Using Kubernetes
https://github.com/Alfresco/acs-community-deployment/tree/master/helm/alfresco-content-services-community
Installing alternatives
12
Deployment schema to minimize downtime in re-indexing processes
> When using different SOLR version,
configure Alfresco Repository to use the new SOLR server *
> When using the same SOLR version,
INDEX folder can be used directly
* Upgrading from SOLR 4 to SOLR 6 is not allowed when using Alfresco CE 6.2.0-ga (thanks for raising this @AFaust) >> SEARCH-2289
Deployment for Re-Indexing
13
When configuring an Alfresco Node to perform the reindexing process, there are some services you can switch off
depending on your requirements:
• Scheduled Jobs can be disabled, as they will be run by the Alfresco instance in the living service
https://docs.alfresco.com/6.2/concepts/scheduled-jobs.html
• Some ACS features can be disabled
https://docs.alfresco.com/6.2/concepts/maincomponents-disable.html
• Additional subsystems (apart from Search or Transformation) can be disabled
https://docs.alfresco.com/6.2/concepts/subsystem-categories.html
• Activities
• Audit
• Email
• …
“Don’t make a copy of your Alfresco Repository production configuration and press the start button!
Alfresco Repository Indexing Configuration
14
Monitoring
Profiling
• Using VisualVM or YourKit Java Profiler for the JVMs
(Repository, SOLR)
• Using pg_stats_statements extension or some other DB tool
https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-6-
profiling-with-docker/ba-p/295846
https://github.com/aborroy/alfresco-6-profiling
Monitoring
• Using Prometheus with Grafana (Repository, SOLR)
https://hub.alfresco.com/t5/alfresco-content-services-blog/monitoring-
alfresco-solr-with-prometheus-and-grafana/ba-p/294157
https://github.com/aborroy/alfresco-solr-monitoring
1515
Demo time without downtime!
16
• Living Docker Compose environment running with around 4,000 text documents indexed
• Using YourKit-Java-Profiler to monitor Repository performance
• Starting a new Search Services 2.0 server locally to start indexing the repository
• Once Search Services 2.0 is updated, change Solr hostname value from Admin Web
Console or modify alfresco-global.properties
Search Services 2.0
is not
released yet!
Demo time without downtime!
http://127.0.0.1:8083/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json
http://127.0.0.1:8983/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json
1717
Benchmark Review
18
1 Billion Documents Review (2015)
• Review from 1 billion benchmarks (November 2015)
• 10 repository nodes (Alfresco 5.1), 20 Solr 4 nodes (Alfresco Index Server)
• Indexed 1b documents in 5 days
How Alfresco powered a 1.2 Billion document deployment on Amazon Web Services
1919
Improvements in 1.4.2
20
1.2 Billion Baseline Plan (2020)
• Customer-sponsored benchmark to see performance of system with their configuration
• Want 1.2b documents indexed into Search Services
• 20 instances, each containing a single shard (DB_ID_RANGE based sharding)
21
• Bottlenecks
• Database (getChildAssocs)
• Transformers (when using large documents)
• Network (when using large metadata/content)
• Time spent processing data for other shards
Performance considerations
22
Baseline Results
• Estimated completion in 21 days
23
Baseline Results
• Estimated completion in 21 days
24
DB_ID_RANGE Sharding
• Does not require specifying total number of shards in advance
• Index can continue to grow with repository
See https://docs.alfresco.com/search-enterprise/concepts/solr-shard-approaches.html
25
Cascade Tracking
26
Cascade Tracking
27
Time spend processing transactions for other shards
• With DB_ID_RANGE sharding we know that only a range of transactions are relevant
• Skip transactions when using DB_ID_RANGE
• To support path queries we sometimes need to update data on multiple shards from a single change
• Option to disable cascade tracking
28
Reduce Database Access and Network Usage
• Reduce amount of data requested
• Remove unused calls to getChildAssocs
• Compress communication where appropriate
• Add option to compress content transfer
Lorem ipsum dolor
sit amet,
consectetur
adipiscing elit...
Please give me all
metadata for the
node
Please give me:
● X
● Y
● Z
78 9c 05 c1 81 09
c0 30 08 04 c0 ...
29
Overview of Improvements in 1.4.2
• Search Services 1.4.2 (and Insight Engine 1.4.2)
• ACS Repository 6.2 Enterprise
• No ACS Community release containing this yet
• However can use existing ACS and jars from https://github.com/aborroy/solr-performance-services-repo
Reindex of 1.2b documents in 10 days
(6 repo nodes, 20 search nodes)
Search Services 1.3.0
150 documents/second
Search Services 1.4.2
1200-3500 documents/second*
(depending on the number of
shards, size of documents, etc.)
* Depending on exact configuration
(Nb. Not yet validated on the production system)
3030
Future Improvements
31
Future Improvements - Coming in 2.0.0
• Schema Simplification
• Smaller index
• Removing Duplicate Fields
• Smaller communication
• Improved Trackers
• Less duplication with large transactions
• New tracker parallelism option
• Content Store Removal
• Reduced disk usage
• Less duplication
• Better usage of Solr optimisations
• Adds potential to use other Solr features
32
Scenarios datasets
• 100,000 documents created with 100,000 transactions
• 100,000 documents created with 1 transaction
• Changing the path for 100,000 documents
• 200,000 ACLs created with 200,000 ACL change sets
Parameters investigated
• The existing *BatchSize size parameters
• The new *MaxParallelism parameters
• These change the number of workers assigned to the
tracker. They use a ForkJoinPool, and can impact the
resources available to other processes
Improved Trackers - Testing
33
Hotspot calculation
• Increasing the Transaction Batch Size for nodes and ACLs
has an impact while the maximum number for your
deployment is not reached. After that, you can increase this
batch size but there will be no performance changes
• Increasing the Node Batch Size can improve your
performance while you are down the right number for your
deployment. After that, you can increase this batch size but
the performance will be penalised
• Increasing the maximum number of Parallel Threads
improved performance until the maximum number for our
deployment was reached. However in a real world
deployment it may be useful to use a lower number to avoid
impacting other processes.
Improved Trackers - Testing
Duration
(ms)
#
34
Content Store Removal
• Solr Content store removal will reduce disk usage and simplify replication
The Solr Content Store
35
Content Store Removal
• Solr Content store removal will reduce disk usage and simplify replication
The Solr Content Store
Replication of index across Solr nodes
3636
Recap
37
When to re-index
• When upgrading to major Search Services releases
How to re-index
• Running some small tests to ensure the performance of the indexing process before running it in production
• Indexing from scratch with the upgraded Repository
• Indexing in a parallel deployment
How to measure
• Profiling
• Monitoring
Recap
Thank you!
39
Relevant works
https://nathanmcminn.com/2017/01/11/alfresco-and-solr-search-reindexing-and-index-cluster-size/
https://www.slideshare.net/JosePortillo26/jose-portillo-dev-con-presentation-1138
https://www.slideshare.net/angelborroy/2019-dev-con115angelborroy
https://blog.xenit.eu/blog/ethias-sharding
https://hub.alfresco.com/t5/alfresco-content-services-blog/large-repository-upgrades/ba-p/287877
https://hub.alfresco.com/t5/alfresco-content-services-blog/scaling-search-with-db-id-range/ba-p/287900
https://www.alfresco.com/technical-whitepaper/alfresco-content-services-solr-deployment-options
https://www.alfresco.com/technical-whitepaper/alfresco-content-services-solr-deployment-example-aws
https://docs.alfresco.com/6.2/concepts/upgrade-path.html

Más contenido relacionado

La actualidad más candente

Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
Architectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyondArchitectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyondStefan Kopf
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseAngel Borroy López
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoringMiguel Rodriguez
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
 
Alfresco 5.2 REST API
Alfresco 5.2 REST APIAlfresco 5.2 REST API
Alfresco 5.2 REST APIJ V
 
Alfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zonesAlfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zonesSanket Mehta
 
Alfresco Security Best Practices Guide
Alfresco Security Best Practices GuideAlfresco Security Best Practices Guide
Alfresco Security Best Practices GuideToni de la Fuente
 
Alfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White PaperAlfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White PaperToni de la Fuente
 
Alfresco devcon 2019: How to track user activities without using the audit fu...
Alfresco devcon 2019: How to track user activities without using the audit fu...Alfresco devcon 2019: How to track user activities without using the audit fu...
Alfresco devcon 2019: How to track user activities without using the audit fu...konok
 
Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Angel Borroy López
 
Alfresco Transform Service DevCon 2019
Alfresco Transform Service DevCon 2019Alfresco Transform Service DevCon 2019
Alfresco Transform Service DevCon 2019J V
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content TransformationAlfresco Software
 
Alfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin IdeasAlfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin IdeasAlfrescoUE
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platformLuis Cabaceira
 
Alfresco勉強会#26 Alfresco SDK + Eclipseで開発してみよう
Alfresco勉強会#26 Alfresco SDK + Eclipseで開発してみようAlfresco勉強会#26 Alfresco SDK + Eclipseで開発してみよう
Alfresco勉強会#26 Alfresco SDK + Eclipseで開発してみようJun Terashita
 
No Docker? No Problem: Automating installation and config with Ansible
No Docker? No Problem: Automating installation and config with AnsibleNo Docker? No Problem: Automating installation and config with Ansible
No Docker? No Problem: Automating installation and config with AnsibleJeff Potts
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 

La actualidad más candente (20)

Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Upgrading to Alfresco 6
Upgrading to Alfresco 6Upgrading to Alfresco 6
Upgrading to Alfresco 6
 
Architectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyondArchitectural changes in the repo in 6.1 and beyond
Architectural changes in the repo in 6.1 and beyond
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the Trade
 
Alfresco 5.2 REST API
Alfresco 5.2 REST APIAlfresco 5.2 REST API
Alfresco 5.2 REST API
 
Alfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zonesAlfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zones
 
Alfresco Security Best Practices Guide
Alfresco Security Best Practices GuideAlfresco Security Best Practices Guide
Alfresco Security Best Practices Guide
 
Alfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White PaperAlfresco Backup and Disaster Recovery White Paper
Alfresco Backup and Disaster Recovery White Paper
 
Alfresco devcon 2019: How to track user activities without using the audit fu...
Alfresco devcon 2019: How to track user activities without using the audit fu...Alfresco devcon 2019: How to track user activities without using the audit fu...
Alfresco devcon 2019: How to track user activities without using the audit fu...
 
Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0Alfresco Transform Services 4.0.0
Alfresco Transform Services 4.0.0
 
Alfresco Transform Service DevCon 2019
Alfresco Transform Service DevCon 2019Alfresco Transform Service DevCon 2019
Alfresco Transform Service DevCon 2019
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content Transformation
 
Alfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin IdeasAlfresco Share - Recycle Bin Ideas
Alfresco Share - Recycle Bin Ideas
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
Storage and Alfresco
Storage and AlfrescoStorage and Alfresco
Storage and Alfresco
 
Alfresco勉強会#26 Alfresco SDK + Eclipseで開発してみよう
Alfresco勉強会#26 Alfresco SDK + Eclipseで開発してみようAlfresco勉強会#26 Alfresco SDK + Eclipseで開発してみよう
Alfresco勉強会#26 Alfresco SDK + Eclipseで開発してみよう
 
No Docker? No Problem: Automating installation and config with Ansible
No Docker? No Problem: Automating installation and config with AnsibleNo Docker? No Problem: Automating installation and config with Ansible
No Docker? No Problem: Automating installation and config with Ansible
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 

Similar a (Re)Indexing Large Repositories in Alfresco

Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptxLemonReddy1
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10Anshum Gupta
 
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...Symphony Software Foundation
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)Mathew Beane
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsKublr
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoAlluxio, Inc.
 
Lucene/Solr 8: The next major release
Lucene/Solr 8: The next major releaseLucene/Solr 8: The next major release
Lucene/Solr 8: The next major releaseSteve Rowe
 
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucidworks
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to KubernetesVishal Biyani
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
Day 7 - Make it Fast
Day 7 - Make it FastDay 7 - Make it Fast
Day 7 - Make it FastBarry Jones
 
Alfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform UpdateAlfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform UpdateAlfresco Software
 

Similar a (Re)Indexing Large Repositories in Alfresco (20)

Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptx
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
ITB2017 - Keynote
ITB2017 - KeynoteITB2017 - Keynote
ITB2017 - Keynote
 
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
 
Solr 4
Solr 4Solr 4
Solr 4
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container Operations
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Lucene/Solr 8: The next major release
Lucene/Solr 8: The next major releaseLucene/Solr 8: The next major release
Lucene/Solr 8: The next major release
 
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
Day 7 - Make it Fast
Day 7 - Make it FastDay 7 - Make it Fast
Day 7 - Make it Fast
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Alfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform UpdateAlfresco Day Roma 2015: Platform Update
Alfresco Day Roma 2015: Platform Update
 

Más de Angel Borroy López

Transitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchTransitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchAngel Borroy López
 
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAlfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAngel Borroy López
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Using Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherUsing Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherAngel Borroy López
 
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Angel Borroy López
 
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1Angel Borroy López
 
Docker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoDocker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoAngel Borroy López
 
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeCSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeAngel Borroy López
 
Alfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAlfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAngel Borroy López
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoAngel Borroy López
 
Desarrollando una Extensión para Docker
Desarrollando una Extensión para DockerDesarrollando una Extensión para Docker
Desarrollando una Extensión para DockerAngel Borroy López
 
DockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfDockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfAngel Borroy López
 
Deploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsDeploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsAngel Borroy López
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrAngel Borroy López
 
Docker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de ZaragozaDocker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de ZaragozaAngel Borroy López
 
How to Write Alfresco Addons that Last Forever
How to Write Alfresco Addons that Last ForeverHow to Write Alfresco Addons that Last Forever
How to Write Alfresco Addons that Last ForeverAngel Borroy López
 

Más de Angel Borroy López (20)

Transitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearchTransitioning from Customized Solr to Out-of-the-Box OpenSearch
Transitioning from Customized Solr to Out-of-the-Box OpenSearch
 
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 EuropeAlfresco integration with OpenSearch - OpenSearchCon 2024 Europe
Alfresco integration with OpenSearch - OpenSearchCon 2024 Europe
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Using Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms togetherUsing Generative AI and Content Service Platforms together
Using Generative AI and Content Service Platforms together
 
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
Enhancing Document-Centric Features with On-Premise Generative AI for Alfresc...
 
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
La Guía Definitiva para una Actualización Exitosa a Alfresco 23.1
 
Docker Init with Templates for Alfresco
Docker Init with Templates for AlfrescoDocker Init with Templates for Alfresco
Docker Init with Templates for Alfresco
 
Before & After Docker Init
Before & After Docker InitBefore & After Docker Init
Before & After Docker Init
 
Using Podman with Alfresco
Using Podman with AlfrescoUsing Podman with Alfresco
Using Podman with Alfresco
 
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeCSP: Evolución de servicios de código abierto en un mundo Cloud Native
CSP: Evolución de servicios de código abierto en un mundo Cloud Native
 
Alfresco Embedded Activiti Engine
Alfresco Embedded Activiti EngineAlfresco Embedded Activiti Engine
Alfresco Embedded Activiti Engine
 
Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0Alfresco Transform Core 3.0.0
Alfresco Transform Core 3.0.0
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for Alfresco
 
Desarrollando una Extensión para Docker
Desarrollando una Extensión para DockerDesarrollando una Extensión para Docker
Desarrollando una Extensión para Docker
 
DockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdfDockerCon 2022 Spanish Room-ONBOARDING.pdf
DockerCon 2022 Spanish Room-ONBOARDING.pdf
 
Deploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP PlatformsDeploying Containerised Open-Source CSP Platforms
Deploying Containerised Open-Source CSP Platforms
 
Introduction to AWS
Introduction to AWSIntroduction to AWS
Introduction to AWS
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache Solr
 
Docker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de ZaragozaDocker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
Docker 101 - Zaragoza Docker Meetup - Universidad de Zaragoza
 
How to Write Alfresco Addons that Last Forever
How to Write Alfresco Addons that Last ForeverHow to Write Alfresco Addons that Last Forever
How to Write Alfresco Addons that Last Forever
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

(Re)Indexing Large Repositories in Alfresco

  • 1. Angel Borroy Tom Page 10th June 2020 (Re)Indexing Large Repositories
  • 2. 22 Agenda (Re)Indexing Large Repositories • Alfresco SOLR in a Nutshell • Indexing Process • Indexing Scenarios • When to Re-Index • Deployment Alternatives • Demo time without downtime! • Benchmark Review • Improvements in 1.4.2 • Future Improvements • Recap Alfresco SOLR
  • 3. 3 Alfresco SOLR in a Nutshell SOLR 6 is used in Alfresco to perform two main processes: • Indexing (or tracking) metadata, permissions and content from Alfresco Repository • Returning results from search queries supporting several syntaxes (AFTS, CMIS) Indexing process Asynchronous
  • 4. 4 Searching process Eventual consistency SOLR is indexing the information after the database has committed the transaction, so there is a short period of time when not all the documents are available in SOLR Index. We call this eventual consistency, as SOLR will catch up with Repository eventually. Syntax AFTS CMIS Alfresco SOLR in a Nutshell Permission Checks Synchronous
  • 5. 5 Alfresco SOLR in a Nutshell Alfresco SOLR Storage By default two SOLR cores are created, one for the living documents (alfresco) and one for the removed documents (archive). Each core includes following storage folders: • Default SOLR Index files in the solrhome/<core>/index folder • Alfresco customized Content Store in the contentstore folder • This folder includes a cached copy of Repository content and metadata • Content Store will be removed in Search Services 2.0 “These folders are populated by the Indexing Process
  • 6. 6 Indexing process ● Each tracker is fired asynchronously according to a cron expression: alfresco.cron or alfresco.*.tracker.cron ● Transactions and ACL Change Sets are processed in batches of Nodes or ACLs ● Batches are split to be executed in parallel by Workers ● However, Content Tracker recovers text from content nodes one by one ● Commit Tracker writes the changes from the different Trackers to SOLR Index "eventually" >> Cascade Tracker is not running when indexing from scratch
  • 7. 7 Indexing scenarios 1. When updating the repository using applications or bulk ingestion processes, the transactions will include a long list of nodes to be indexed 2. When using Alfresco Share to create new content, there will be more transactions but every transaction will include a small list of nodes to be indexed 3. When setting the permission level for every node in a hierarchy manually, the ACL Change Sets will include a long list of ACLs to be indexed 4. When using default Alfresco permissions design, the ACL Change Sets will include a small list of ACLs to be indexed 5. When using complex format of documents, Transformation Service will require additional resources 6. When using large documents, SOLR Index will require additional storage
  • 8. 8 Indexing scenarios Controlling what to index • Content can be excluded from SOLR Index by configuration solrcore.properties > alfresco.index.transformContent=false https://docs.alfresco.com/search-community/concepts/solrcore-properties-file.html • Some nodes can be excluded from SOLR Index by using the Index Control aspect cm:indexControl > cm:isIndexed :: false, metadata and content is not indexed cm:indexControl > cm:isContentIndexed :: false, content is not indexed https://docs.alfresco.com/community/concepts/admin-indexes.html • Some properties can be excluded from SOLR Index by design in the Content Model <property> <index enabled=”false”/> </property> https://docs.alfresco.com/community/references/dev-extension-points-content-model-define-and-deploy.html Add this setting to archive core by default!
  • 9. 9 Re-indexing process can take some time, from a few hours to a few days, in large repositories. Full re-index • When upgrading to a major Search Services release, like 2.0 • When the SOLR Index has been corrupted, due to technical reasons • When breaking changes are introduced in common custom Content Models Partial re-index • This process could also take some time, depending on the amount of documents to be re-indexed. But it will take less than a full re-index • When incremental changes are introduced in a Content Model, partial reindexation can be fired by using the SOLR REST API http://localhost:8983/solr/admin/cores?action=reindex&query=TYPE:person When to re-index
  • 11. 11 • Using the ZIP Distribution file https://docs.alfresco.com/search-community/concepts/solr-install-config.html • Using Docker or Docker Compose https://github.com/Alfresco/SearchServices/tree/master/search-services https://github.com/Alfresco/acs-community-deployment/tree/master/docker-compose https://github.com/Alfresco/alfresco-docker-installer • Using Kubernetes https://github.com/Alfresco/acs-community-deployment/tree/master/helm/alfresco-content-services-community Installing alternatives
  • 12. 12 Deployment schema to minimize downtime in re-indexing processes > When using different SOLR version, configure Alfresco Repository to use the new SOLR server * > When using the same SOLR version, INDEX folder can be used directly * Upgrading from SOLR 4 to SOLR 6 is not allowed when using Alfresco CE 6.2.0-ga (thanks for raising this @AFaust) >> SEARCH-2289 Deployment for Re-Indexing
  • 13. 13 When configuring an Alfresco Node to perform the reindexing process, there are some services you can switch off depending on your requirements: • Scheduled Jobs can be disabled, as they will be run by the Alfresco instance in the living service https://docs.alfresco.com/6.2/concepts/scheduled-jobs.html • Some ACS features can be disabled https://docs.alfresco.com/6.2/concepts/maincomponents-disable.html • Additional subsystems (apart from Search or Transformation) can be disabled https://docs.alfresco.com/6.2/concepts/subsystem-categories.html • Activities • Audit • Email • … “Don’t make a copy of your Alfresco Repository production configuration and press the start button! Alfresco Repository Indexing Configuration
  • 14. 14 Monitoring Profiling • Using VisualVM or YourKit Java Profiler for the JVMs (Repository, SOLR) • Using pg_stats_statements extension or some other DB tool https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-6- profiling-with-docker/ba-p/295846 https://github.com/aborroy/alfresco-6-profiling Monitoring • Using Prometheus with Grafana (Repository, SOLR) https://hub.alfresco.com/t5/alfresco-content-services-blog/monitoring- alfresco-solr-with-prometheus-and-grafana/ba-p/294157 https://github.com/aborroy/alfresco-solr-monitoring
  • 16. 16 • Living Docker Compose environment running with around 4,000 text documents indexed • Using YourKit-Java-Profiler to monitor Repository performance • Starting a new Search Services 2.0 server locally to start indexing the repository • Once Search Services 2.0 is updated, change Solr hostname value from Admin Web Console or modify alfresco-global.properties Search Services 2.0 is not released yet! Demo time without downtime! http://127.0.0.1:8083/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json http://127.0.0.1:8983/solr/alfresco/select?indent=on&q=TEXT:[* TO *]&wt=json
  • 18. 18 1 Billion Documents Review (2015) • Review from 1 billion benchmarks (November 2015) • 10 repository nodes (Alfresco 5.1), 20 Solr 4 nodes (Alfresco Index Server) • Indexed 1b documents in 5 days How Alfresco powered a 1.2 Billion document deployment on Amazon Web Services
  • 20. 20 1.2 Billion Baseline Plan (2020) • Customer-sponsored benchmark to see performance of system with their configuration • Want 1.2b documents indexed into Search Services • 20 instances, each containing a single shard (DB_ID_RANGE based sharding)
  • 21. 21 • Bottlenecks • Database (getChildAssocs) • Transformers (when using large documents) • Network (when using large metadata/content) • Time spent processing data for other shards Performance considerations
  • 22. 22 Baseline Results • Estimated completion in 21 days
  • 23. 23 Baseline Results • Estimated completion in 21 days
  • 24. 24 DB_ID_RANGE Sharding • Does not require specifying total number of shards in advance • Index can continue to grow with repository See https://docs.alfresco.com/search-enterprise/concepts/solr-shard-approaches.html
  • 27. 27 Time spend processing transactions for other shards • With DB_ID_RANGE sharding we know that only a range of transactions are relevant • Skip transactions when using DB_ID_RANGE • To support path queries we sometimes need to update data on multiple shards from a single change • Option to disable cascade tracking
  • 28. 28 Reduce Database Access and Network Usage • Reduce amount of data requested • Remove unused calls to getChildAssocs • Compress communication where appropriate • Add option to compress content transfer Lorem ipsum dolor sit amet, consectetur adipiscing elit... Please give me all metadata for the node Please give me: ● X ● Y ● Z 78 9c 05 c1 81 09 c0 30 08 04 c0 ...
  • 29. 29 Overview of Improvements in 1.4.2 • Search Services 1.4.2 (and Insight Engine 1.4.2) • ACS Repository 6.2 Enterprise • No ACS Community release containing this yet • However can use existing ACS and jars from https://github.com/aborroy/solr-performance-services-repo Reindex of 1.2b documents in 10 days (6 repo nodes, 20 search nodes) Search Services 1.3.0 150 documents/second Search Services 1.4.2 1200-3500 documents/second* (depending on the number of shards, size of documents, etc.) * Depending on exact configuration (Nb. Not yet validated on the production system)
  • 31. 31 Future Improvements - Coming in 2.0.0 • Schema Simplification • Smaller index • Removing Duplicate Fields • Smaller communication • Improved Trackers • Less duplication with large transactions • New tracker parallelism option • Content Store Removal • Reduced disk usage • Less duplication • Better usage of Solr optimisations • Adds potential to use other Solr features
  • 32. 32 Scenarios datasets • 100,000 documents created with 100,000 transactions • 100,000 documents created with 1 transaction • Changing the path for 100,000 documents • 200,000 ACLs created with 200,000 ACL change sets Parameters investigated • The existing *BatchSize size parameters • The new *MaxParallelism parameters • These change the number of workers assigned to the tracker. They use a ForkJoinPool, and can impact the resources available to other processes Improved Trackers - Testing
  • 33. 33 Hotspot calculation • Increasing the Transaction Batch Size for nodes and ACLs has an impact while the maximum number for your deployment is not reached. After that, you can increase this batch size but there will be no performance changes • Increasing the Node Batch Size can improve your performance while you are down the right number for your deployment. After that, you can increase this batch size but the performance will be penalised • Increasing the maximum number of Parallel Threads improved performance until the maximum number for our deployment was reached. However in a real world deployment it may be useful to use a lower number to avoid impacting other processes. Improved Trackers - Testing Duration (ms) #
  • 34. 34 Content Store Removal • Solr Content store removal will reduce disk usage and simplify replication The Solr Content Store
  • 35. 35 Content Store Removal • Solr Content store removal will reduce disk usage and simplify replication The Solr Content Store Replication of index across Solr nodes
  • 37. 37 When to re-index • When upgrading to major Search Services releases How to re-index • Running some small tests to ensure the performance of the indexing process before running it in production • Indexing from scratch with the upgraded Repository • Indexing in a parallel deployment How to measure • Profiling • Monitoring Recap