Alfresco WebScript Connector for Apache ManifoldCF

Apache ManifoldCF
Alfresco WebScript Repository Connector

Alfresco Meetup
Rome 2013

About me
● Open Source ECM Specialist at Sourcesence

● Author and Technical Reviewer at Packt Publishing
○ Alfresco 3 Web Services (2010)
○ GateIn Cookbook (2012)

● Alfresco Community (nickname OpenPj)
○ Alfresco Community Star
○ Alfresco Wiki Gardener
○ Top 10 supporter (english and italian)
○ Moderator of the italian forum

● PMC Member and Committer at the Apache Software Foundation

● JBoss Community
○ Content editor for jboss.org
○ Project Leader and Committer for PortletSwap / Blog / Wiki

Overview
● Introducing Apache ManifoldCF
○ What is ManifoldCF?
○ Why ManifoldCF?
○ Architecture
○ Who is using ManifoldCF?
○ The book
● How ManifoldCF supports Alfresco
● The goal of the new connector
○ Architecture
○ Roadmap
○ The team
● Resources

The story
The original ManifoldCF code base was granted by MetaCarta to the
Apache Software Foundation in December 2009.

The MetaCarta effort represented more than five years of successful
development and testing in multiple, challenging enterprise
environments.

The project was graduated as Apache Top Level Project in July 2012.

What is ManifoldCF?
Open Source crawler
● crawling model (add, change, delete)
● schedule jobs to create indexes
○ get contents from repositories
○ push contents on search servers
Repository 1 Search Server 1

Repository 2 Apache ManifoldCF Search Server 2

Repository 3 Search Server 3

What is ManifoldCF?

● Out-Of-The-Box it is distributed as a webapp
○ REST API
○ Authority Service
○ Crawler UI

● can be embedded in any Java application

Why ManifoldCF?
● Reliability

● Incremental

● Flexible

● Multi repositories

● Security model

● Monitoring

Why ManifoldCF? - Reliability
Jobs scheduling and configuration are stored in the database to
maintain the state of all the executions

Repository Pull Agent Daemon Search Server
configuration and scheduling

Database

Why ManifoldCF? - Incremental
get content changesets obtained from the repository API

Repository

complete
changesets Apache ManifoldCF

Why ManifoldCF? - Flexible
If the repository can't supply all the changes Manifold can
discover them through crawling

Repository

incomplete
changesets Apache Manifold CF
Change
Discovery

N1
N2

Why ManifoldCF? - Multi repositories
Jobs can retrieve contents from the following repositories:
● CMIS-compliant
● Alfresco
● IBM FileNet
● EMC Documentum
● Microsoft SharePoint
● OpenText LiveLink
● Autonomy Meridio
● Memex Patriarch
● Windows Share/DFS
● Generic JDBC
● Generic Filesystem
● Generic RSS and Web

Why ManifoldCF? - Multi repositories
Jobs can ingest contents to the following search
servers:
● Apache Solr
● ElasticSearch
● OpenSearchServer
● MetaCarta GTS

Why ManifoldCF? - Security model
Retrieve per-content ACLs Authority 1

Authority Service Authority 2

Authority 3

Repository 1

Repository 2 Pull Agent Daemon
user access
Repository 3 tokens
doc access
tokens
user specific
Search Server search
results

Why ManifoldCF? - Monitoring
UI Crawler allows you to:
● configure jobs and connectors
● monitor jobs execution
● monitor contents ingestion
○ status reports
■ document status
■ queue status
○ history reports
■ simple history
■ maximum activity
■ maximum bandwidth
■ result histogram

Architecture - Job
Authority
Connector
ACLs
Repository
Connector
retrieve Output
content ACL Connector

Repository Job Search Server

query to retrieve contents - metadata mapping
- verbal description - content ingestion
- crawling model
- scheduling

The book: ManifoldCF in Action

ManifoldCF in Action
by Karl Wright
published by Manning

Karl is the original developer and the
principal committer of Apache ManifoldCF

The book is available at http://www.manning.com/wright

How ManifoldCF supports Alfresco
● CMIS Repository Connector based on OpenCMIS

● The current Alfresco Repository Connector only supports CML
○ works on any version of Alfresco 2.x, 3.x and 4.x
○ no support for quering Solr from Alfresco
○ it will die at the end of the year
○ Please see the Alfresco Roadmap

Alfresco Solr search subsystem
● Remote crawling of contents and ACLs into Solr
○ REST API for retrieving changesets from Alfresco db
● Solr server provided by Alfresco
○ based on Apache Solr 1.4.1 (uhm...really!!!???)
● hardcoded
● can't be used with your own Solr instance
○ customers have newer version of Solr
■ interested in new features (SolrCloud, sharding...)
■ hundred of improvements available in 3.x and 4.x

Alfresco Solr search subsystem

Tra
nsa
ctio
Solr 1.4.1
Alfresco ns a
nd A
CL
(provided by Alfresco)

Alfresco REST Client

alf_transaction
alf_acl_*
alf_node_*
Indexes

Goal - 1
Create a new connector using the Alfresco REST Client
● provided and supported by Alfresco
○ for us is a Maven dependency :)

● invokes the Alfresco Solr API

Goal - 2 - check feasibility
Create a real Enterprise alternative for managing indexes

● compatibility with the SearchService of Alfresco
● repository takes care only of contents
● indexes are managed externally
● no redundancy for indexes

effort to redirect queries executions

Goal - 3 - Security
Implement an Alfresco authority connector
○ manages ACLs indexing

Goal - 4
Manage indexes using ManifoldCF against any supported

search server

● Apache Solr 3.x / 4.x

● ElasticSearch

● Open Search Server

● MetaCarta

Architecture

ManifoldCF
Search
Alfresco Alfresco WebScript Server
Repository Connector

Alfresco REST
Client

alf_transaction Output Connector
alf_acl_* Indexes
alf_node_*

The team of the new connector
● Piergiorgio Lucidi (Sourcesense + ASF)

● Maurizio Pillitu (Alfresco)

● Aingaran Pillai (Zaizi) [new entry]

● Fran Alvarez (Zaizi) [new entry]

● Abraham Ayala (Zaizi) [new entry]

Join us!

● We are looking for developers

● this is a work in progress

● don't fork the project feel free to join us

^__^

Resources

● Apache ManifoldCF
http://manifoldcf.apache.org/

● The connector hosted on github:
https://github.com/maoo/alfresco-webscript-manifold-connector

● it will be included in Apache ManifoldCF

Thank you for your
attention!

http://www.open4dev.com

Alfresco WebScript Connector for Apache ManifoldCF

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Alfresco WebScript Connector for Apache ManifoldCF

Similar a Alfresco WebScript Connector for Apache ManifoldCF (20)

Más de Piergiorgio Lucidi

Más de Piergiorgio Lucidi (11)

Último

Último (20)

Alfresco WebScript Connector for Apache ManifoldCF