Apereo OAE - Architectural overview

Apereo OAE
Architectural Overview, San Diego 2013
Wednesday, 12 June 13

http://oae.oaeproject.org

Topics
1. Project Goals
2. Hilary System Architecture
3. Performance Testing
4. Deployment and Automation
5. UI Architecture
6. Customization and Conﬁguration
7. Questions?

Project goals
• Multi-tenant platform
• Cloud-ready
• SaaS
• Used at large scale

Project goals
• Maintainable
• Extendable
• Integrate-able

Solid foundation
Modern, not exotic

Multi-tenancy
• Market is heading
• Support multiple institutions at same time
• Multi-tenancy+
• Easily created, maintained and conﬁgured

Multi-tenancy

Performance!
• Ability to scale horizontally
• Evidence based
• Continuous

OAE Architecture
The Apereo OAE project is made up of 2 distinct source code
platforms:
• “Hilary”
• Server-side RESTful web platform that exposes the
OAE services
• Written entirely using server-side JavaScript in Node.js
• “3akai-ux”
• A client-side / browser platform that provides the
HTML, JavaScript and CSS that make up the browser
UI of the application

OAE Architecture

Hilary System Architecture

Application Servers
• Written in server-side JavaScript, run in Node.js
• Node.js used by: eBay, LinkedIn, Storify, Trello
• Light-weight (80Mb memory) single-threaded platform that processes IO
asynchronously / non-blocking
• App servers can be conﬁgured into functional specialization:
• User Request Processor
• Activity Processor
• Search Indexer
• Preview Processor
• Specializing app servers allows for clustering different types of application
processing in distinct ways

Apache Cassandra
• Canonical data source
• Provides high-availability and fault-tolerance without trading away
performance
• Gives flexibility with incremental scalability in a cloud environment
• Helps overcome unpredictable growth of multi-tenant systems
• Option for multi-datacenter deployments to localize reads and writes
in geographical regions
• Can trade off consistency for availability at the query level
• Scales linearly by sharding and balancing rows across nodes, with
configurable replication levels
• Used by: Netflix, eBay,Twitter

ElasticSearch
• Lucene-backed search platform
• Built for cloud-friendly incremental scaling and high-
availability
• Exposes HTTP RESTful APIs for indexing and querying
documents
• RESTful query interface uses JSON-based Query DSL
• Scales linearly by distributing pre-determined number of
shards among nodes and automatically and rebalances when
necessary
• Used by: GitHub, FourSquare, StackOverﬂow,WordPress

RabbitMQ
• Message queue platform written in Erlang
• Used for distributing tasks to specialized
application server instances
• Supports active-active queue mirroring to
nodes for high availability
• Used by: Joyent

Redis
• Commonly known as a cache server
• Fills a variety of functionality:
• Caching of basic user proﬁles
• Broadcast messaging (can move to RabbitMQ)
• Locking
• Holds volatile activity aggregation data
• Comes with no managed clustering solution (yet), but has slave
replication for active fail-over
• Some clients manage master-slave switching, and distributed reads for
you
• Used by:Twitter, Instagram, StackOverﬂow, Flickr

Etherpad
• Open Source collaborative editing application written in
Node.js
• Originally developed by Google and Mozilla
• Licensed under Apache License v2
• Powers collaborative document editing in OAE
• Doesn’t cluster, but we shard for performance
• If an etherpad server goes down, active sessions on that
server are lost
• But document data is ﬂushed to Cassandra on the ﬂy so
large volumes of progress are not lost as a result

Nginx
• HTTP and reverse-proxy server
• Used to distribute load to application
servers, etherpad servers and stream ﬁle
downloads
• Useful rate-limiting features based on
source IP
• Used by: Netﬂix,WordPress.com

Topics
1. Project Goals
2. Hilary System Architecture
3. Performance Testing
4. Deployment and Automation
5. UI Architecture
6. Customization and Conﬁguration

Performance Testing:Workﬂow
1. Generate data with Model Loader
2. Load data into the system with Model Loader
3. Generate Tsung Test with custom framework
4. Run Tsung Test
5. Analysis

Performance Testing:
Setup
• 1 nginx load balancer (0.5GB / 1CPU)
• 2 app nodes (0.5GB / 1 CPU)
• 3 db nodes (8GB / 2 CPU)
• 1 redis node (0.5GB / 1 CPU)
• 1 search node (0.5GB / 1 CPU)

Transactions

Request latency

Transactions / sec

Arrival rate of new users

Simultaneous users

HTTP Requests / sec

Histogram latency - POST /api/*

So, does it scale?
• Yes. We can scale the application
horizontally by adding more nodes
• Doubling the hardware, roughly doubles
the throughput

Requests / sec

Deployment and
Automation
• As you can imagine, many machines to manage. Current inventory:
• 3x Cassandra
• 2x Redis
• 2x RabbitMQ
• 4x Application + Indexer
• 3x Preview Processor
• 1x Activity Processor
• 1x Nginx
• 3x Etherpad
• Performance testing with a cluster of 21 virtual machines
• Additional scalability testing and veriﬁcation with ~30 virtual machines

Puppet
• Use puppet to centralize machine configuration and prevent configuration drift
• Collection of “Manifests” that define the state that the machine should in based on
its hostname / role:
• What files should exist? What should their contents be?
• What packages should be installed?
• What services should be running, or stopped?
• http://github.com/sakaiproject/puppet-hilary
• All 20+ machines in cluster have Puppet installed, which ask for “catalog” info
(expected configuration state) from a single puppet master machine
• Puppet Master knows how to determine the machine state from the manifests
based on its host (e.g., db0 is a cassandra node, it should have cassandra, java, etc...)
• Use puppetdb with “External Resources” to share machine-specific information
with each other node in the cluster

MCollective
• Provides parallel execution over a number of machines at
one time
• Start / Stop / Check status of services
• Install / Remove / Check version of packages
• Use puppet resource syntax to check adhoc machine
facts
• Apply puppet manifests
• Each cluster node subscribes to an ActiveMQ server to
receive commands. Central machine (the “client”) publishes
the command and waits for reply

Slapchop
• Missing piece:We need to create 21 machines of different specs in a
cloud service, and somehow get MCollective on them
• A tool we lovingly call slapchop
• Deﬁne a JSON manifest that holds machines conﬁgs and instances
• Run slapchop to create the machines in Joyent cloud, start them, get
mcollective installed
• Well, kind of...
• Now you can log in to the MCollective client and run mco puppet
apply
• Well, kind of...
• Go from empty cloud to working 21 machine cluster in ~15 minutes

Monitoring
• Nagios
• Munin

Security
• Infrastructure penetration tests by University
of Murcia
• No major issues found
• UI vulnerability testing performed by SCIRT
group
• Followed up on all known XSS issues
• Using OWASP JQuery plugin for XSS
ﬁltering user-created data

UI Architecture
Hilary
3akai-ux
Mobile UI
3rd party integrations

Core UI Architecture
• JS frameworks
• CSS framework
• 3rd party plugins
• OAE UI API
• OAE CSS Components

Core frameworks
• RequireJS
• jQuery
• underscore.js

RequireJS
• File and module loader
• Necessity to keep things modular
• Optimisation built-in

• DOM manipulation
• Cross-browser abstraction
• Events
• Pretty much everything

• Utility toolbelt
• Manipulate objects, arrays, etc.

CSS frameworks
• Twitter Bootstrap
• Font Awesome

3rd party plug-ins
• Autosuggest
• History.js
• Fileupload
• Validate
• Templates
• etc.

OAE UI API
• Wrapper for REST requests
• Users
• Proﬁle
• Groups
• Content
• Discussions
• Search
• Conﬁg

OAE UI API
• Utilities
• i18n
• l10n
• Widget loading
• Template rendering
• Notiﬁcations
• XSS escaping
• etc.

OAE CSS Components
• Re-usable HTML fragments
• OAE speciﬁc elements
• Consistency
• Design guidelines

Toolbox
JS frameworks
CSS framework
3rd party plugins
OAE UI API
OAE CSS Components
WIDGET SDK

Putting it together

Widgets
• Modular components
• HTML Fragment
• JavaScript
• CSS
• Conﬁg ﬁle
• Language bundles
• Loaded into DOM

Tenant Administration
• Create, start & stop tenants
• Tenants are an application-
level concept, not
infrastructure
• Uses HTTP Host header to
identify tenant

Tenant Skinning
• Tenant skinning
• Uses LESS CSS framework
with dynamic values

Tenant Configuration
• Global or tenant-specific configuration
• Tenant configuration
• Single-Sign On (CAS, Shibboleth,
Social Media)
• Default privacy settings
• etc...
• Changes happen on-the-fly
• Uses Redis to broadcast changes
across cluster

July 1, 2013
1st production release

Apereo OAE - Architectural overview

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Apereo OAE - Architectural overview

Similar a Apereo OAE - Architectural overview (20)

Más de Nicolaas Matthijs

Más de Nicolaas Matthijs (8)

Último

Último (20)

Apereo OAE - Architectural overview