Artvaark: A community building tool for organisations. The brainchild of the guys at Tincan, Artvaark leverages three social oriented principles - recommend, reward and redeem. The presentation rattles through the tech details of the project including some of the fails, speedbumps and brick walls we've hit along the way. Presentation was given at #DrupalShowAndTell in London
8. Recommendations
Three kinds:
● User behaviour
● Curated
● Smart questioning
Sources:
● Algos
● Tastemakers
● User
preferences
Drupal Show-and-Tell | May 2014
9. Recommendations engine
Algorithms that look for correlations between user
behaviour in relation to events
input base data - events (node IDs), users (uids)
input action data - when a known user views an event, or
when a known user buys a ticket for an event
Drupal Show-and-Tell | May 2014
11. Recommendations
admin
Drupal admin interface echoes
PredictionIO config panel.
Important for users to be able to
manage recommendations from
within Drupal.
Drupal Show-and-Tell | May 2014
12. API example
An event node has been viewed by a registered user, so notify PredictionIO:
Drupal Show-and-Tell | May 2014
What do Chapter ‘do’ with Drupal? its a responsive multi-artform event-based site built in Drupal 7, integrated with the Patronbase ticketing/payments system using XML event data feeds and an XML-RPC API for user login integration, basically single sign on and synchronisation of sessions.
Here are a couple of screen shots of what the What’s On listing looks like, and an example Event page
Chapter screenshots of what’s on and event page
So artvaark then..here’s a picture..within the context of Chapter’s audience and their engagement with Chapter online and offline, the concepts for the experiment are a combination of recommendations, rewards and redemption (of those rewards, rather than in general). The focus is on the events programme, how the audience finds and attends events, and the other things they do in the course of their participation
It is an experiment, from the nature of the funding through to the approach of researching, prototyping, beta testing..measuring, hopefully learning, and iterating
Drilling down towards what Drupal’s got to do with it all (hurry up man), lets talk about recommendations.
Through the discovery/research phase and working with the researchers from Cardiff University, we found three kinds of recommendation sources we wanted to explore.
User behaviour - what do people actually do, look at, buy?
Curated - who do people listen to when making choices?
Smart questioning - or at this point simple questioning - what might people tell us about themselves?
We thought that if we could combine these, and measure how people interact with them, it might give us something useful in terms of recommendations that people are actually interested in
We would call the ‘thing’ that combines our recommendation sources a recommendations engine, and we do..but we also call the thing that does the user behaviour-based recommendations the recommendation engine, and that’s the first thing we’re going to talk about.
So - we use an open source algorithm-based recommendation engine called prediction.io.
it works like this - you give it base data (events, users), and you give it action data (“conversion data”, e.g browsing, purchasing, any action you can capture and report), and then you train the algorithms..
and then you ask it for recommendations either for a user, or for an event.
It uses the data to produce a number of recommendations..
in the user case it looks for correlations between the selected users behaviour (either event browsing, or event purchasing) and other users behaviour (other people who looked at X also looked at Y and Z),
in the event case it uses the same action data to look for correlations between the event passed in, and other events (when this event X has been viewed, these other events have been viewed).
So now i’ve told you how its supposed to work, Jason will explain what it actually does
Drupal modules:
So PredictionIO is an open-source machine learning server that sits on Apache. We’ve built an Artvaark Components Module to manage its configuration. PredictionIO lets you set up various ‘engine types’, such as an Item Recommendation engine or Item Similarity engine. Each engine serves its own set of recommendation results, so for example, your Drupal site might require two components - one engine for recommending events to users based on their interests, and another engine for suggesting related events. For each engine, you can select pre-built algorithms, set number of recommendations to return, set engine priority. All of this administration is done through the Components Module.
For fuctionality and actual transactions, we’ve developed an Artvaark API Module. This matches activity in Drupal (e.g. firing hook_node_view on an event page) with a call to PredictionIO to log the transaction.
PredictionIO components:
Artvaark API module has a dependency on PredictionIO PHP SDK. Once installed, this gives you set of classes, REST library to make calls to the API server
So when a user visits an event node, this diagram kind of represents the notification chain up to the machine learning server. PredictionIO runs calculations every hour, storing results in MongoDB
When we want to pull down a set of recommendations for a user, Artvaark API Module includes a class that queries PredictionIO, passing results as NIDs to a Views contextual filter. So we’re using Views to display recommendations.
Screenshot of the components module admin page. We really want site admins to be able to control the system within their Drupal site. Sometimes integrations like this suffer because admins still have to jump around between consoles to get the job done.
Simple code snipped from the Artvaark API. Once the PredictionIO SDK has been loaded, we’re firing off API engine calls.
So in this instance, an node has been viewed by a registered user. PredictionIO already knows about the existence of the node and the user, thanks to earlier calls when they were created in Drupal. So on condition that the user visits a node of type that we’re monitoring and it is published, we make two calls to identify the user with PredictionIO and record the view action there.
Existing solutions for personalised content in Drupal are out there. From our review:
RecommenderAPI
A Google Summer of Code project 2009. Originally entirely written in PHP. This is not the ideal language for heavy-duty, memory-intensive algorithmic calculation.
So it moved to Java and uses Apache Mahout as the underlying computational library. Drupal module rewritten to use this service.
Gathered some momentum as a viable solution (integration projects with Ubercart, Commerce) but seen any developer attention since 2012. Developer graduated moved on to another startup.
Acquia Lift
Interesting development. Launched in Feb this year. Primarily an automated A/B testing service that runs within Drupal, learning user behaviour and targeting personalised content/campaigns.
Not comparable as a content recommendation engine, but it demonstrates the market for personalised content managed within Drupal, and not bolted on or spread across mutiple services.
Wanted to note why we went with PredictionIO, and cover some performance features.
PredictionIO algorithms are resource intensive calculations. Java multithreading is more performant and overcomes memory limitations.
Run every hour and stores result in MongoDB. API queries MongoDB for recommendations, so it’s like ‘cached’ result retrieval and super-fast.
Runs on top of scalable frameworks such as Hadoop and Cascading. Ready to handle big data.
PredictionIO uses Guzzle HTTP client, which has been heavily profiled for speed. One reason for its adoption in Drupal 8 core.
To highlight an additional complication in all of this.
Events have start and end dates, so they expire as far as recommendation results are concerned. But they often remain published on the Drupal site, as the venue might want to keep an archive of past events. So alongside the functionality we’ve covered, we also have to update PredictionIO to remove past events. We do this on cron, effectively deleting these records from the engine.
A similar scenario might apply to sold-out or unavailable Commerce products.