1. Thinking in Documents
(dropping ACID)
César D. Rodas
crodas@member.fsf.org
http://crodas.org/
PHP Conference 2009
Sâo Paulo, Brasil
1
2. Who is this fellow?
Paraguayan
Part of the Google Summer of Code 2008
PHP Classes Innovation Award winner 2007, 2008
... and some other few things
@crodas - http://crodas.org/ - L EX
AT 2
3. Agenda
How to scale
The Web's major bottleneck
NoSQL databases
• Redis
• Tokyo Cabinet
• Cassandra
• CouchDB
• MongoDB
Thinking in documents
• Data behavior
• Complex operations
PHP Integration (The fun part!)
Map/Reduce (Extra time)
@crodas - http://crodas.org/ - L EX
AT 3
8. How to scale
Buying more hardware (and connectivity)
Reverses (threaded) proxies
DNS round robin for your Reverses proxies
Gearmand
Memcached
and.. What about the data?
@crodas - http://crodas.org/ - L EX
AT 8
9. How to scale data?
@crodas - http://crodas.org/ - L EX
AT 9
11. Scaling RDBMS - Solutions
Master - Slave replication
Multi-Master replication
Data sharding
DRDB and Heartbeat (RAID-1 over the network)
@crodas - http://crodas.org/ - L EX
AT 11
13. Master-Slave replication
We need to modify our app
It worth only if our application is read intense
It doesn't spread the data across servers
Single point of failure
@crodas - http://crodas.org/ - L EX
AT 13
14. Scaling RDBMS - Problems
SQL
JOIN
Autoincrement
Transactions (ACID)
@crodas - http://crodas.org/ - L EX
AT 14
16. Strong Consistency, High Availability, Partition-tolerance
Theorem
@crodas - http://crodas.org/ - L EX
AT 16
17. BASE
Basically Available, Soft state, Eventually Consistent
@crodas - http://crodas.org/ - L EX
AT 17
18. Everybody is doing it
Google
Amazon
eBay
Yahoo!
Facebook
...
@crodas - http://crodas.org/ - L EX
AT 18
19. Open implementations
Cassandra
Redis
Tokyo Cabinet/Tyrant
CouchDB
MongoDB (FTW!)
...
@crodas - http://crodas.org/ - L EX
AT 19
20. Cassandra
No master (p2p)
Storage model more like BigTable
Open source
Incremental scalable
PHP interface (with Thrift)
Never played too much with it.
@crodas - http://crodas.org/ - L EX
AT 20
22. Key-value
Fast
Similar to PHP's array
Simple
Easy to distribute across machines
@crodas - http://crodas.org/ - L EX
AT 22
23. Memcached
It is a key-value store engine used as a cache.
No persistence(RAM, uses LRU)
Lightening fast
Well supported
*Everybody* is using it
Several clients for PHP [even I had wrote one ;-)]
@crodas - http://crodas.org/ - L EX
AT 23
24. Redis
Very new
As fast as Memcached
Persistent to disk
Very simple protocol
Support lists and tuples
Replication
Operation in the key space
I loved it!
• Until I realised it is in-memory DB
@crodas - http://crodas.org/ - L EX
AT 24
25. Tokyo Tyrant
Very similar to BerkeleyDB ( dba open() )
Performs well (I've been playing a bit with it)
Actively developed
HTTP Interface (+/-)
Memcached Protocol (++)
Going to Document-oriented (supports "tables")
@crodas - http://crodas.org/ - L EX
AT 25
34. MongoDB
Forgot about its name meaning in Portuguese.
Fast, Fast, Fast
JSON and BSON (Binary JSON-ish)
Asynchronous replication, autosharding
Support indexes (FTW!)
Nested documents (FTW!)
Advanced queries (FTW!)
Native extension for PHP
@crodas - http://crodas.org/ - L EX
AT 34
37. MongoDB - Connection
<?php
/* connects to localhost:27017 */
$connection = new Mongo();
/* connect to a remote host (default port) */
$connection = new Mongo( "example.com" );
/* connect to a remote host at a given port */
$connection = new Mongo( "example.com:65432" );
/* select some DB (and create if it doesn't exits yet) */
$db = $connection->selectDB("db name");
?>
@crodas - http://crodas.org/ - L EX
AT 37
38. MongoDB - "Tables"
<?php
$db = $connection->selectDB("db name");
$table = $db->getCollection("table");
?>
@crodas - http://crodas.org/ - L EX
AT 38
39. FROM SQL to MongoDB
@crodas - http://crodas.org/ - L EX
AT 39
40. MongoDB - Count
<?php
/* SELECT count(*) FROM table */
$collection->count();
/* SELECT count(*) FROM table WHERE foo = 1 */
$collection->find(array("foo" => 1))->count();
?>
@crodas - http://crodas.org/ - L EX
AT 40
41. MongoDB - Queries
<?php
/*
* SELECT * FROM table WHERE field IN (5,6,7) and enable=1
* and worth < 5
* ORDER BY timestamp DESC
*/
$collection->ensureIndex(
array('field'=>1, 'enable'=>1, 'worth'=>1, 'timestamp'=>-1)
);
$filter = array(
'field' => array('$in' => array(5,6,7),
'enable' => 1,
'worth' => array('$lt' => 5)
);
$results = $collection->find($filter)->sort(array('timestamp' => -1));
@crodas - http://crodas.org/ - L EX
AT 41
42. MongoDB - Pagination
<?php
/*
* SELECT * FROM table WHERE field IN (5,6,7) and enable=1
* and worth < 5
* ORDER BY timestamp DESC LIMIT $offset, 20
*/
$filter = array(
'field' => array('$in' => array(5,6,7),
'enable' => 1,
'worth' => array('$lt' => 5)
);
$cursor = $collection->find($filter);
$cursor = $cursor->sort(array('timestamp' => -1))->skip($offset)->limit(20);
foreach ($cursor as $result) {
var dump($result);
}
@crodas - http://crodas.org/ - L EX
AT 42
46. MongoDB - Data structure
<?php
/***
* - SELECT * FROM posts WHERE uri = <uri>
* - SELECT tags.tag FROM post has tags
* INNER JOIN tags ON (tags id == tags.id) WHERE post id = <post id>
* - SELECT * FROM comments WHERE post = <post id>
*/
$result = $collection->find(array("uri" => "<uri>"));
?>
@crodas - http://crodas.org/ - L EX
AT 46
47. MongoDB
<?php
/***
* SELECT posts.* FROM posts INNER
* JOIN comments ON (comments.post = posts.id)
* WHERE comments.email = '<email>'
*
*/
$filter = array(
"comments.email" => 'crodas@member.fsf.org',
);
$result = $collection->find($filter);
?>
@crodas - http://crodas.org/ - L EX
AT 47
48. MongoDB
<?php
/***
* SELECT * FROM posts
* WHERE id IN (SELECT posts id FROM posts has tags
* INNER JOIN tags ON (tags id == tags.id) WHERE tag = <tag>)
*
*/
$filter = array(
"tags" => '<tag>',
);
$result = $collection->find($filter);
?>
@crodas - http://crodas.org/ - L EX
AT 48
49. MongoDB
<?php
/***
* SELECT * FROM posts WHERE id IN (
* SELECT post FROM comments GROUP
* BY post HAVING count(*) > 10)
*/
$filter = array(
"comments" => array('$size' => array('$gt' => 10))
);
$result = $collection->find($filter);
?>
@crodas - http://crodas.org/ - L EX
AT 49
50. MongoDB
<?php
/***
* SELECT * FROM posts WHERE 10 < (
* SELECT count(*) FROM comments
* post = posts.id)
*/
/* on insert a comment */
$collection->update(
array("uri" => "uri"), // select
array('$inc' => array('comments size'=>1)) //increment
);
$filter = array(
"comments size" => array('$gt' => 10)
);
$result = $collection->find($filter);
@crodas - http://crodas.org/ - L EX
AT 50
51. Map/Reduce
Extra time
@crodas - http://crodas.org/ - L EX
AT 51
52. Map/Reduce -- Theory
<?php
for($i=0; $i < 50; $i++) {
$result[$i] = pow($i, 2);
}
var dump($result);
/***
* IF pow takes 1 second
* 1 process = 50 seconds
* 10 process = 5 seconds
*/
?>
@crodas - http://crodas.org/ - L EX
AT 52
53. Map/Reduce -- Theory II
<?php
$data = range(1, 1000);
/* MAP */
foreach ($data as $key => $value) {
$n key = $value % 10;
/* append */
$tmp[$n key][] = $value;
}
/* REDUCE */
foreach ($tmp as $key => $value) {
$value = array sum($value);
print "{$key} = {$value}n";
}
@crodas - http://crodas.org/ - L EX
AT 53