The Apereo Open Academic Environment is a platform that focusses on group collaboration between researchers, students and lecturers, and strongly embraces openness, creation, re-use, re-mixing and discovery of content, people and groups.
How does Apereo OAE work? OAE targets a large scale and a multi-tenant cloud-compatible deployment model, where a single installation can host multiple institutions at the same time.
This presentation provides an overview of the different components and technologies that are being used, as well as details around deploying and configuring OAE and its associated running costs.
A summary of the approach used for continuous nightly performance testing and how we are validating the desired (horizontal) scalability is provided. Details around back-end and UI unit testing, code coverage and security testing will be shared and contribution models for service development and UI development is discussed as well.
7. Multi-tenancy
• Market is heading
• Support multiple institutions at same time
• Multi-tenancy+
• Easily created, maintained and configured
Wednesday, 12 June 13
13. Topics
1. Project Goals
2. Hilary System Architecture
3. Performance Testing
4. Deployment and Automation
5. UI Architecture
6. Customization and Configuration
7. Questions?
Wednesday, 12 June 13
14. OAE Architecture
The Apereo OAE project is made up of 2 distinct source code
platforms:
• “Hilary”
• Server-side RESTful web platform that exposes the
OAE services
• Written entirely using server-side JavaScript in Node.js
• “3akai-ux”
• A client-side / browser platform that provides the
HTML, JavaScript and CSS that make up the browser
UI of the application
Wednesday, 12 June 13
17. Application Servers
• Written in server-side JavaScript, run in Node.js
• Node.js used by: eBay, LinkedIn, Storify, Trello
• Light-weight (80Mb memory) single-threaded platform that processes IO
asynchronously / non-blocking
• App servers can be configured into functional specialization:
• User Request Processor
• Activity Processor
• Search Indexer
• Preview Processor
• Specializing app servers allows for clustering different types of application
processing in distinct ways
Wednesday, 12 June 13
18. Apache Cassandra
• Canonical data source
• Provides high-availability and fault-tolerance without trading away
performance
• Gives flexibility with incremental scalability in a cloud environment
• Helps overcome unpredictable growth of multi-tenant systems
• Option for multi-datacenter deployments to localize reads and writes
in geographical regions
• Can trade off consistency for availability at the query level
• Scales linearly by sharding and balancing rows across nodes, with
configurable replication levels
• Used by: Netflix, eBay,Twitter
Wednesday, 12 June 13
19. ElasticSearch
• Lucene-backed search platform
• Built for cloud-friendly incremental scaling and high-
availability
• Exposes HTTP RESTful APIs for indexing and querying
documents
• RESTful query interface uses JSON-based Query DSL
• Scales linearly by distributing pre-determined number of
shards among nodes and automatically and rebalances when
necessary
• Used by: GitHub, FourSquare, StackOverflow,WordPress
Wednesday, 12 June 13
20. RabbitMQ
• Message queue platform written in Erlang
• Used for distributing tasks to specialized
application server instances
• Supports active-active queue mirroring to
nodes for high availability
• Used by: Joyent
Wednesday, 12 June 13
21. Redis
• Commonly known as a cache server
• Fills a variety of functionality:
• Caching of basic user profiles
• Broadcast messaging (can move to RabbitMQ)
• Locking
• Holds volatile activity aggregation data
• Comes with no managed clustering solution (yet), but has slave
replication for active fail-over
• Some clients manage master-slave switching, and distributed reads for
you
• Used by:Twitter, Instagram, StackOverflow, Flickr
Wednesday, 12 June 13
22. Etherpad
• Open Source collaborative editing application written in
Node.js
• Originally developed by Google and Mozilla
• Licensed under Apache License v2
• Powers collaborative document editing in OAE
• Doesn’t cluster, but we shard for performance
• If an etherpad server goes down, active sessions on that
server are lost
• But document data is flushed to Cassandra on the fly so
large volumes of progress are not lost as a result
Wednesday, 12 June 13
23. Nginx
• HTTP and reverse-proxy server
• Used to distribute load to application
servers, etherpad servers and stream file
downloads
• Useful rate-limiting features based on
source IP
• Used by: Netflix,WordPress.com
Wednesday, 12 June 13
24. Topics
1. Project Goals
2. Hilary System Architecture
3. Performance Testing
4. Deployment and Automation
5. UI Architecture
6. Customization and Configuration
Wednesday, 12 June 13
25. Performance Testing:Workflow
1. Generate data with Model Loader
2. Load data into the system with Model Loader
3. Generate Tsung Test with custom framework
4. Run Tsung Test
5. Analysis
Wednesday, 12 June 13
34. So, does it scale?
• Yes. We can scale the application
horizontally by adding more nodes
• Doubling the hardware, roughly doubles
the throughput
Wednesday, 12 June 13
37. Topics
1. Project Goals
2. Hilary System Architecture
3. Performance Testing
4. Deployment and Automation
5. UI Architecture
6. Customization and Configuration
7. Questions?
Wednesday, 12 June 13
38. Deployment and
Automation
• As you can imagine, many machines to manage. Current inventory:
• 3x Cassandra
• 2x Redis
• 2x RabbitMQ
• 4x Application + Indexer
• 3x Preview Processor
• 1x Activity Processor
• 1x Nginx
• 3x Etherpad
• Performance testing with a cluster of 21 virtual machines
• Additional scalability testing and verification with ~30 virtual machines
Wednesday, 12 June 13
39. Puppet
• Use puppet to centralize machine configuration and prevent configuration drift
• Collection of “Manifests” that define the state that the machine should in based on
its hostname / role:
• What files should exist? What should their contents be?
• What packages should be installed?
• What services should be running, or stopped?
• http://github.com/sakaiproject/puppet-hilary
• All 20+ machines in cluster have Puppet installed, which ask for “catalog” info
(expected configuration state) from a single puppet master machine
• Puppet Master knows how to determine the machine state from the manifests
based on its host (e.g., db0 is a cassandra node, it should have cassandra, java, etc...)
• Use puppetdb with “External Resources” to share machine-specific information
with each other node in the cluster
Wednesday, 12 June 13
40. MCollective
• Provides parallel execution over a number of machines at
one time
• Start / Stop / Check status of services
• Install / Remove / Check version of packages
• Use puppet resource syntax to check adhoc machine
facts
• Apply puppet manifests
• Each cluster node subscribes to an ActiveMQ server to
receive commands. Central machine (the “client”) publishes
the command and waits for reply
Wednesday, 12 June 13
41. Slapchop
• Missing piece:We need to create 21 machines of different specs in a
cloud service, and somehow get MCollective on them
• A tool we lovingly call slapchop
• Define a JSON manifest that holds machines configs and instances
• Run slapchop to create the machines in Joyent cloud, start them, get
mcollective installed
• Well, kind of...
• Now you can log in to the MCollective client and run mco puppet
apply
• Well, kind of...
• Go from empty cloud to working 21 machine cluster in ~15 minutes
Wednesday, 12 June 13
43. Security
• Infrastructure penetration tests by University
of Murcia
• No major issues found
• UI vulnerability testing performed by SCIRT
group
• Followed up on all known XSS issues
• Using OWASP JQuery plugin for XSS
filtering user-created data
Wednesday, 12 June 13
44. Topics
1. Project Goals
2. Hilary System Architecture
3. Performance Testing
4. Deployment and Automation
5. UI Architecture
6. Customization and Configuration
7. Questions?
Wednesday, 12 June 13