This is the Slide Deck used in Alfresco's Tech Talk Live, May 1, 2013. It featured my Alfresco add-on: Alfresco Business Reporting. The purpose is to the technical 'why' and 'how' of the add-on module, the challenge faced and he solutions designed.
5. Agenda
• Who am I
• Why business reporting?
• What is it about?
• How was it achieved?
• Demo
• Q&A
6. Why Business Reporting
The challenge:
• Alfresco does ‘not really’ support reporting
• Business has reporting needs
• Reporting needs:
– change over time
– can be specific for each business/organization/dept.
7. My solution
• Based on standard tooling
(Pentaho Report Designer)
• Scheduled execution (no UI for live configuration)
• In a language a business user understands
• Against Alfresco:
– business objects (docs, folders, sites, users, audit)
– metadata/properties
Why Business Reporting
8. Agenda
• Who am I
• Why business reporting?
• What is it about?
• How was it achieved?
• Demo
• Q&A
9. What is it about
Harvesting
Business related
objects + metadata
Execution
10. What is it about - Harvesting
# usage: key = tablename, value=Lucene query
folder=TYPE:"cm:folder" AND NOT TYPE:"st:site" AND NOT
TYPE:"dl:dataList" AND NOT TYPE:"bpm:package" AND NOT
TYPE:"cm:systemfolder" AND NOT TYPE:"fm:forum"
document=TYPE:"cm:content" AND NOT TYPE:"bpm:task"
AND NOT TYPE:"dl:dataListItem" AND NOT
TYPE:"ia:calendarEvent" AND NOT TYPE:"lnk:link" AND
NOT TYPE:"cm:dictionaryModel" AND NOT
ASPECT:"reporting:executionResult"
calendar=TYPE:"ia:calendarEvent"
forum=TYPE:"fm:forum"
link=TYPE:"lnk:link"
site=TYPE:"st:site"
#datalist=TYPE:"dl:dataList"
datalistitem=TYPE:"dl:dataListItem"
11. What is it about - Execution
ReportingTemplate ReportingRoot
12. Agenda
• Who am I
• Why business reporting?
• What is it about?
• How it was achieved?
• Demo
• Q&A
13. Reporting Considerations
• The options
– NoSQL
– XML
– Other
– SQL…
• Considerations:
– Business needs to operate reporting
– Knowledge and experience needs to exist in organizations
– Run in cooperation with existing reporting tooling
14. Reporting database principles
• Alfresco short-qname becomes column name
– sys:node-dbid sys_node_dbid
( : and – are not allowed in column/table names)
• Multi value properties are comma separated concat.
• Fixed (thoug configurable) default mapping of
Alfresco types onto database types.
– There are exceptions to the rule, therefore:
• Possibility to override default mapping on a per-
property basis.
– E.g. bt default d:noderefs=VARCHAR(400) but
– Someco_relatedProducts=VARCHAR(800)
15. Reporting database principles
• alfresco-global.properties settings:
– ‘Blacklist’ properties to hide from reporting db
– Configure to harvest WorkSpace and/or ArchiveSpace
• Module accepts config override in
shared/classes/alfresco/extension
• Module harvests as System user.
‘All stuff is there’. Reporting people are
responsible to behave nicely.
16. Design decisions
• Scheduled harvesting versus real time
– Performance impact
(need policies/behaviours for everything)
– Started as JavaScript API
– Not all objects to harvest generate events (audit)
• Scheduled execution versus user interaction
– Started as JavaScript API (e.g. no UI)
– Parameterized reports are ‘recent’ development
– UI driven configuration is even more recent
– Manual configuration within UI might be possible (within
limits of report tempate)
• Reporting != auditing
17. How it was achieved: Harvesting
• Initial principles:
– Metadata of all business objects (incl. customizations)
(Aspects…)
– Harvest only changed objects since last successful run
– Process versions
– JavaScript API to allow flexible execution
• Expanded to
– List of categories (tree-like structure)
– Auditing framework
– Users/Groups/SiteMembers
– UI over JavaScript
18. Limited number of search results
• Problem: MaxSearchResults
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=10000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=1000
• Solution:
– Limitation is there for a reason. Deal with it.
(although technically (Java only?) you can work around it)
– Search & sort by sys:node-dbid
– Append to query:
"AND @sys:node-dbid:[" + (last_dbid +1) + " TO MAX]";
19. Model & Search: Nugget
• Why is this feature hidden in the data modelling?
<includedInSuperTypeQuery>
false
</includedInSuperTypeQuery>
• Nice way of hiding custom sub-types from parent
queries (especially config-like types)
20. Aspects & Associations & Categories
• Strong Alfresco features flexible & powerful
– Find business objects (by query)
– For each business object:
• Get all properties – push to array
– Respect multi-value (becomes comma separated)
– Resolve tags and categories into labels
• Get all parent-child assocs – push to array
• Get all source-target assocs – push to array
• Derive some meaningful props – push to array
(for example: site name, display path, size)
21. Push to reporting database
• Tables ‘create ${table} if not exist’
• Named queries from searches
– Named auditing applications
– Category name
– Predefined names for users, groups, sitegroups
• For each batch of results,
– Determine superset of columns (=properties + types)
– ‘Create %{column} if not exist’
• Insert batch into the table
– If date-modified changed, insert new row.
– Insert statement varies depending on number of aspects/assocs
– Set validFrom/validTo/isLatest on current & previous version/row
• One mechanism fits all (also users/groups/categories)
22. How it was achieved: Execution
• Embed an existing reporting tool in Alfresco.
• Business must be comfortable operating reporting
tool
• Scheduled execution needs no UI. Administrators
can configure, business can use.
zip-like subreports sub-reports by relative path
23. Credentials
• Does anyone embed Pentaho and use Java API ?
• Username/password stored inside report
– Update all reports when migrating to other source or
from dev test prod
• JNDI (delegate credentials to app server)
• Report is self contained (credentials/JNDI)
• JNDI is the only enterprise solution, updating
each and every report is not an option…
– Requires additional config step in alfresco.xml
24. Parameterization
• Pentaho (and JasperReports) accept parameters
to drive a report.
• Current Alfresco ActionExecuter accepts up to 4
parameters per report
• Used to generate site based reports
• Used to create generic report, and make specific
(e.g. report Sites with non-internal SiteManagers)
25. Execution Structure
• Reporting Root(s)
– Defines scope for contained containers/templates
– Defines target queries
– Enables/disables scheduled harvesting/execution
– Execute all, harvest all
• Reporting Container(s)
– Contains reports scheduled at same frequency
– Execute all Reporting Templates inside
• Reporting Template(s)
– Actual reporting templates (Pentaho’s prpt’s)
– Enable/disable for automatic execution
– Defines output path (by noderef or relative to ‘target’)
– References target object from query in Reporting Root
27. Troubles along the way
• Little knowledge available about Pentaho and
credentials/authentications using Java API
• mltext-type fields (the name ‘Data Dictionary’ is
not the same in other languages)
– Forces me into ActionHandlers to fix Explorer UI,
– Or in Share development
(needs to be done one point in time)
• EagerContentCleaner cleans Alfresco’s temp
folder. Very eagerly
28. Troubles along the way
• Max length of sum of column sizes.
(MySQL < 65.000 byte if UTF-8)
– Tweak default mapping (decrease the defaults)
– Make exceptions by property QName
(increase/decrease per prop)
• Auditing framework uses call-back mechanism
different from other services
• Module started as a JavaScript API
• Documentation is ‘a lot of work’
• Finalizing a (side) project is ‘a lot of work’
29. Challenges
• How to detect changes in Categories/structures
– Currently no incremental updates
• How to detect changes in group structure and
users
– Currently no incremental updates
• If there is no property yet, there is no column
– Can be an issue creating reports
– Prepping the reporting database with empty columns
• Not always possible configurable?
30. ToDo
• Allow reporting database multi-vendor
– MyBatis integration in progress
• Allow multilangual Alfresco install’s
– mltext properties bite (Explorer UI)
• UI to Share
– Harvest & Execution in Admin Panel
– Execute parameterized reports on demand?
• Cron jobs cluster aware
• Get rid of JavaScript history (harvesting)
– Script *not* thread-safe, run max 1 instance!
• Mavenize & include more unit tests
33. Main report: Select Site
SELECT
`site`.`site`,
`site`.`st_siteVisibility`,
`site`.`cm_title`,
`site`.`cm_description`,
`site`.`cm_owner`
FROM
`site`
WHERE
`site`.`isLatest` = true
AND `site`.`site` = ${sitename}
35. Sub report: Users per Role
SELECT
count(*) as amount,
`siteperson`.`siteRole` as role
FROM
`siteperson`
WHERE `siteperson`.`siteName` = ${sitename}
GROUP BY `siteperson`.`siteRole`
37. Sub report: Site members
SELECT DISTINCT `siteperson`.`userName`,
`person`.`cm_email`,
`person`.`cm_mobile`,
`person`.`cm_telephone`,
`person`.`cm_firstName`,
`person`.`cm_instantmsg`,
`person`.`cm_lastName`
FROM
`siteperson` INNER JOIN `person` ON `person`.`cm_userName` =
`siteperson`.`userName`
WHERE
`siteperson`.`siteName` = ${sitename}
ORDER BY
`person`.`cm_lastName` ASC,
`person`.`cm_firstName` ASC
39. My Best Practices
• Dashlets/Pages are for real-time information
– E.g. workflow progress
• Reporting is for insight that does not have to be
real-time.
• Reporting must be extendible by the customer
• Design for Reporting
– Have metadata available
– accept redundancy and one or two additional
policies/behaviours
41. I like to publish your reporting case on the wiki.
And I have a few books to give away to ‘impressive’ contributions:
[en] [nl]
Your reporting case…