My DevCon 2019 talk discusses how to make it easier to integrate Alfresco with other systems using an event-based approach. Two real world examples are discussed and demonstrated. The first is about reporting against Alfresco metadata. The second is about enriching metadata by running content through a Natural Language Processing (NLP) model. Both solutions work by listening to generic events generated by Alfresco and placed on an Apache Kafka queue. For the reporting example, the Spring Boot consumer subscribes to Kafka events, then fetches metadata via CMIS and indexes that into Elasticsearch. For the NLP example, a separate Spring Boot consumer subscribes to the same events, but in this case, fetches the content, extracts text using Apache Tika, runs the text through Apache OpenNLP, then writes back extracted entities to Alfresco via CMIS. These are relatively simple examples, but illustrate how a de-coupled, asynchronous, event-based approach can make integrating Alfresco with other systems easier.
SpotFlow: Tracking Method Calls and States at Runtime
Moving From Actions & Behaviors to Microservices
1. Moving from Actions & Behaviors to Microservices
Jeff Potts, Metaversant
@jeffpotts01
2. How do we make it easier to
integrate Alfresco with other
systems?
3. Learn. Connect. Collaborate.
“We want to be able to report against metadata in real-time.”
“When this custom property changes we need to notify this other system.”
“We want to improve how Alfresco transforms Word documents into HTML.”
“When content changes we want to run it through an NLP model.”
“Our company has an enterprise search solution that needs to index Alfresco content.”
“We want to replicate content between multiple Alfresco servers.”
Recurring customer requirements
4. Learn. Connect. Collaborate.
Traditional approaches run in-process
• Custom Alfresco Actions
– Java, deployed to Alfresco WAR
– Triggered by rule on a folder, a UI action, or by a schedule
• Custom Alfresco Behaviors
– Java, deployed to Alfresco WAR
– Bound to a policy on a class of nodes (e.g., specific type or aspect)
• Custom Web Scripts
– Java or JavaScript, deployed to Alfresco WAR
– Triggered by a REST call
• All of these run in Alfresco’s process
5. Learn. Connect. Collaborate.
Tradeoffs of the traditional approach
• Advantages
– Full access to the Alfresco API
– Runs as the authenticated user or as the system user
– Code is managed with the content model and other customizations
• Disadvantages
– Performance risk
– Requires server restart to deploy
– Requires an Alfresco developer familiar with Alfresco API
• Java & JavaScript are the only practical language options
– Long-running tasks may block user interface
– Scales as Alfresco scales
7. Learn. Connect. Collaborate.
Event-based integration approach
• Alfresco can be extended to generate generic events when something
happens to a node
• Interested systems
– Listen for Alfresco events
– Filter out what they don’t care about
– Fetch additional data from Alfresco and perform custom logic as needed
• Additional systems can be added without touching Alfresco
• Systems can use different frameworks & languages
• Independently scalable
• Can use Alfresco Kafka as a starting point
8. Learn. Connect. Collaborate.
Apache Kafka
Alfresco
Microservice
Event
Event
Microservice
Event
Microservice
Event
Move logic out of Alfresco into microservices
alfresco-
kafka
Kafka
Client JAR
18. Learn. Connect. Collaborate.
Example: Alfresco reporting
• Customer: “We want to be able to report against metadata in real-time.”
• Solution:
– Spring Boot microservice consumes Alfresco Kafka events
– When a node changes that is interesting, it fetches the metadata using CMIS
– Indexes metadata into Elasticsearch
– Kibana dashboard used to visualize data
• Demo: https://youtu.be/jGZVfP5L8yU
19. Learn. Connect. Collaborate.
Indexer Service
• Small Spring Boot app
• Runs in a servlet container
• Listens for Alfresco Kafka events
• Fetches the Alfresco Node as
JSON
• Indexes the Node JSON into
Elasticsearch
• Deletes objects from
Elasticsearch when DELETE
events occur
Apache Kafka
Alfresco
Elasticsearch Cluster
alf-es-indexer
alfresco-
kafka
Kafka
Client JAR
Event
Event
CMIS GET
Node JSON
Node JSON
24. Learn. Connect. Collaborate.
Example: Natural Language Processing
• Customer: “I want to be able to enrich Alfresco metadata by extracting
people, places, and names from content using an NLP model”
• Solution:
– Spring Boot microservice consumes Alfresco Kafka events
– When a node with a “marker” aspect changes, the microservice fetches the
content
– Fingerprints are used to avoid repeatedly processing the same content
– Text is extracted using Apache Tika
– Extracted text is run through Apache OpenNLP to extract people and places
– People and places are written to Alfresco content metadata via CMIS
• Demo: https://youtu.be/H-2TgoUijzY
25. Learn. Connect. Collaborate.
NLP Enricher Service
• Small Spring Boot app
• Runs in a servlet container
• Listens for Alfresco Kafka events
• Fetches Alfresco content
• Extracts people, places, and orgs
• Writes metadata back to Alfresco Apache Kafka
Alfresco
alf-nlp-enricher
alfresco-
kafka
Kafka
Client JAR
Event
Event
CMIS GET
Node JSON
CMIS POST
31. Learn. Connect. Collaborate.
Apache Kafka
Alfresco
Microservice
Event
Event
Microservice
Event
Microservice
Event
Move logic out of Alfresco into microservices
alfresco-
kafka
Kafka
Client JAR
32. Learn. Connect. Collaborate.
Other potential uses
• Full-text search indexing into standalone search engine
• Synchronizing content with other servers
• Improved HTML transformations
• Notification/subscription service
• Chat integration
33. Learn. Connect. Collaborate.
Event-based approach disadvantages
• More code/complexity than traditional approach
• User feedback/notification is not straightforward
• Potentially increases the number of “containers” in the IT shop
34. Learn. Connect. Collaborate.
Event-based approach advantages
• In-line with Alfresco’s stated architectural direction
• Reduces the amount of code running in Alfresco’s process
– Reduces the number of deployments required to support integrations
– Off-loads long-running and/or process-intensive integrations from Alfresco
– Scales independently of Alfresco
• Integrations are more loosely-coupled from Alfresco
– Requires less Alfresco knowledge
– Frees up architectural choices for integrations (not just Java)
• Integration apps are relatively easy to containerize
• Can work alongside traditional approach
35. Learn. Connect. Collaborate.
Demo Dependency Versions
• Alfresco 5.2.g CE & 5.2.3 Enterprise with
– Metaversant Alfresco Kafka open source add-on 0.0.2
• Apache Kafka 2.12-0.10.2.1
• Elasticsearch 6.3.2
• Kibana 6.3.2
• Custom Spring Boot applications
– Spring Boot 1.5.8
– Elasticsearch High-level Rest Client 6.3.2
– Tika 1.18
– OpenNLP 1.8.4
– Apache Chemistry 1.0.0
37. Learn. Connect. Collaborate.
See Also
• Apache ManifoldCF
– http://manifoldcf.apache.org/
– Crawler that indexes from repositories like Alfresco into Solr & Elasticsearch
• Apache Stanbol
– http://stanbol.apache.org/
– Semantic engine that can do metadata enhancement and other things
• Apache Camel
– http://camel.apache.org/
– Enterprise integration platform
38. Learn. Connect. Collaborate.
• Consulting firm focused on solving business problems with open source
Content Management, Workflow, & Search technology
• Founded in 2010
• Clients all over the world in a variety of industries, including:
– Airlines & Aerospace
– Manufacturing
– Construction
– Financial Services
– Higher Education
– Life Sciences
– Professional Services
https://www.metaversant.com
39. Moving from Actions &
Behaviors to Microservices
Jeff Potts, Metaversant
@jeffpotts01
Notas del editor
…without writing one-off integrations that must be deployed to the Alfresco server
…without adding unnecessary performance burden on Alfresco
…without requiring other teams to learn Alfresco internals
Triggers can be action-driven or change-driven or both.
These requirements often met with actions and behaviors.
Event is kept minimal to avoid disclosing sensitive information and to keep the solution as generic as possible.
Alfresco Insight Engine is an interesting alternative, but it requires the latest Alfresco version.
For a production implementation, the past hashes should probably be persisted somewhere instead of being kept in memory