Two years ago Rackspace had a problem: how do we backup 20K network devices, in 8 datacenters, across 3 continents, with less than a 1% failure rate -- every single day? Many solutions were tried and found wanting: a pure Perl solution, a vendor solution and then one in Ruby, none worked well enough. They not fast enough or they were not reliable enough, or they were not transparent enough when things went wrong. Now we all love Ruby but good Rubyists know that it is not always the best tool for the job. After re-examining the problem we decided to rewrite the application in a mixture of Erlang and Ruby. By exploiting the strengths of both -- Erlang's astonishing support for parallelism and Ruby's strengths in web development -- the problem was solved. In this talk we'll get down and dirty with the details: the problems we faced and how we solved them. We'll cover the application architecture, how Ruby and Erlang work together, and the Erlang approach to asynchronous operations (hint: it does not involve callbacks). So come on by and find out how you can get these two great languages to work together.
55. If you know HTTP
Webmachine Is Simple
As Proven by the “Number of Types of Things”
Measurement of Complexity
56. The 3 Most Important Types of
Things In Webmachine
1. Dispatch Rules (pure data--barely a thing!)
2. Resources (composed of simple functions!)
3. Requests (simple get/set interface!)
57. Dispatch Rules
{ ["devices", server], device_resource, [] }
GET /devices/12345
Webmachine inspects the device_resource module for
defined callbacks, and sets the Request record’s “server”
value to 12345.
58. Resources
• POEM (Plain Old Erlang Module)
• Composed of referentially transparent functions*
• Functions are callbacks into the request lifecycle
• Approximately 30 possible callback functions, e.g.:
• resource_exists → 404 Not Found
• is_authorized → 401 Not Authorized
* mostly
59. Resource Functions
Perma-404
resource_exists(Request, Context) ->
{false, Request, Context}.
Lucky Auth
is_authorized(Request, Context) ->
S = calendar:time_to_seconds(now()),
case S rem 2 of
0 -> {true, Request, Context};
1 -> {“Basic realm=lucky”, Request, Context}
end.
60. Requests
• The first argument to each resource function
• Set and read request & response data
RemoteIP = wrq:peer(Request).
wrq:set_resp_header(“X-Answer”, “42”, Request).
* Not an Erlang tutorial (but you will see some code)\n* Giving our experiences with Erlang + Ruby\n
\n
* Firewalls, load balancers\n\n
* North America, Europe and Hong Kong\n* Everything runs from DC in Virginia\n* High latency to LON and HKG\n* ~5K devices in LON\n
* Pass traffic like a pro\n* Individual devices won’t get faster\n\n\n
* Backup all devices every 24 hours\n* Update devices during off hours\n\n
* Newer devices have real management interfaces (give examples Junos, BigIP) but...\n* Majority of management happens via screen-scraping an SSH session\n* Pixen are slow and we have a lot of them\n\n
\n
\n
* Lots of devices == lots of data\n
* 260 GB of backups in old DB\n* We diff the backups to see if they have changed\n* We don’t want to be too clever\n * Err on the side of keeping too much rather than too little\n
* Impossible to predict what combination of factors will lead to a need to search\n* Certain lb vendor had a vulnerability in SSH daemon\n* Had to search all firewall configs to find affected devices with SSH access allowed\n
\n
* Every event has different information that needs to be stored\n* Lots of events per device\n
\n
\n
* Serial #, OS version, chassis details\n* Information is parsed from device output\n* We want to expose information useful to the business in one place\n\n
* No accepted cross-vendor backup standard\n* We abstract information as “files”\n\n
* 17K devices in 2009, ~21K now\n
* What was in place in 2009\n
* Rails and Ruby scripts\n* Overlapping responsibilities\n* Information silos\n* Difficult to change\n
* concurrency with Ruby 1.8 threads\n * It is an I/O bound problem\n * Threads block on I/O\n* expect.rb has performance issues\n * Matching input one character at a time\n
* Only one type of device\n* Small number of devices per manager\n* Expensive\n
\n
* Fully Buzzword-compliant!\n
* No message queue\n* Poll MongoDB for some jobs\n* Schedule others in code\n* Very few writers to the database\n
* Backend updates generally use MongoDB atomic updates\n* Need transactions for cross-collection modifications, but in-document can use atomic updates\n
\n
* You guys know Rails and have probably heard of mongodb\n* Take a minute to talk about Erlang\n
* Ericsson Computer Science Laboratory\n* 1986: Joe Armstrong creates Erlang\n* 1988: Erlang escapes the lab\n* 1998: Erlang Released as Open Source\n\n
* Developed for BT\n* 1.5 Million Lines of Erlang Code\n* 2 year evaluation\n* 9 9s of uptime\n
* Erlang doesn’t guarantee 9 9s\n* Gives you the tools to make high uptime possible\n
* Some core concepts are different than imperative/OO languages\n
* Just like Ruby\n
* Only assign to variables once\n* Allows flexible pattern matching and runtime assertions\n* “=” is the pattern match operator, not assignment\n
* Variables are function-scoped, so single assignment is really a non-issue\n
\n
* Changing a data structure creates a new one\n* Purely Functional Data Structures - Okasaki\n
\n
* Concurrency built into the language, not an add-on\n
\n
\n
* Two areas of interest: jobs framework and ReSTful API\n
\n
* Runner spins up multiple workers\n* Runner and workers are generic\n* Interesting work for each job is in callback module\n
\n
\n
\n
\n
\n
\n
Example of how this solution made it easier to solve our problems\n\n* App running on two VMs and one developed problems\n* Moved all functionality to other node and rebuilt problematic VM\n* Doubled the workers on second VM with no detectable performance degradation\n\n
* Occupies approximately the same position in your architecture as Rails controllers, but thoroughly exposes you to the details of the HTTP request/response\n* When you develop apps with Webmachine, you are basically writing a custom webserver for your API\n* And what are those details... <transition to next slide>\n\n
* As seen through the eyes of Webmachine\n* Express API in terms of HTTP\n* Use HTTP as your domain language\n
\n
\n
* A bit like Rails routing\n* First element (the &#x201C;pathspec&#x201D;) is a list that matches the request URL\n* Second element is the name of an Erlang module that exports the overridden callbacks\n* Third element is the args to the init function on the callback module\n
* POEM - So far as I know, I coined this by accident\n* Why is the purity of functions important? predictability, testability, repeatability\n* You&#x2019;ll usually override 4 or 5\n
* Override only the parts of the request cycle that interest you, and Webmachine will do something reasonable for the other cases\n
* Request parameter data is analogous to that contained in Rack::Request\n
* Putting it all together\n* Previous callbacks stash data in the context (in this case a device record)\n\n
\n
* You can get far with Ruby&#x2019;s &#x201C;Big 3&#x201D; datatypes: String, Array and Hash\n\n
* Strings in Erlang are different\n* iolists are your friend\n
* Proplists/records vs hashes\n\n
* Erlang has internal iterators for lists (like each, etc)\n* No for loops, use recursion instead\n
* if statements must always have an else\n* case statements raise an error if no branches match\n* pattern matching can replace some conditions\n\n
* Erlang does not tolerate design mistakes as easily as Ruby\n* Pure vs impure functions\n* Pure is easy to test, IO is not your friend\n* Referential transparency\n* Dependency injection\n* Mocking is possible but not a panacea\n
* You must understand the concurrency primitives\n* In general you should be using the OTP behaviors\n* If you use ORM you must understand SQL\n
\n
* Emphasis on stability over new features\n* New useful features in Erlang for years that community frowns upon\n * Undocumented with uncertain future\n
* Standard library is very rich\n* 3rd party libraries tend to be immature\n
* lists, proplists, binary and string\n* string has duplicate functionality\n * made from merging two older modules\n* lists and proplists have duplicated, overlapping functionality\n
* agner, faxien, sinan, etc.\n
* how did we get this adopted?\n* how do you find Erlang programmers?\n* why not Node.js, EventMachine or Ruby + sockets?\n