Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Beyond PHP - it's not (just) about the code

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 56 Anuncio

Beyond PHP - it's not (just) about the code

Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.

Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Beyond PHP - it's not (just) about the code (20)

Anuncio

Más de Wim Godden (20)

Más reciente (20)

Anuncio

Beyond PHP - it's not (just) about the code

  1. Beyond PHP : It's not (just) about the code Wim Godden Cu.be Solutions
  2. Who am I ? Wim Godden (@wimgtr) Founder of Cu.be Solutions (http://cu.be) Open Source developer since 1997 Developer of OpenX Zend Certified Engineer Zend Framework Certified Engineer MySQL Certified Developer Speaker at PHP and Open Source conferences
  3. Cu.be Solutions ? Open source consultancy PHP-centered High-speed redundant network (BGP, OSPF, VRRP) High scalability development Nginx + extensions MySQL Cluster Projects : mostly IT & Telecom companies lots of public-facing apps/sites
  4. Who are you ? Developers ? Anyone setup a MySQL master-slave ? Anyone setup a site/app on separate web and database server ? → How much traffic between them ?
  5. The topic Things we take for granted Famous last words : "It should work just fine" Works fine today → might fail tomorrow Most common mistakes PHP code ↔ PHP ecosystem How-to & How-NOT-to
  6. It starts with... … code ! First up : database
  7. Database queries – complexity SELECT DISTINCT n.nid, n.uid, n.title, n.type, e.event_start, e.event_start AS event_start_orig, e.event_end, e.event_end AS event_end_orig, e.timezone, e.has_time, e.has_end_date, tz.offset AS offset, tz.offset_dst AS offset_dst, tz.dst_region, tz.is_dst, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND AS event_start_utc, e.event_end - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND AS event_end_utc, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_start_user, e.event_end - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_end_user, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_start_site, e.event_end - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_end_site, tz.name as timezone_name FROM node n INNER JOIN event e ON n.nid = e.nid INNER JOIN event_timezones tz ON tz.timezone = e.timezone INNER JOIN node_access na ON na.nid = n.nid LEFT JOIN domain_access da ON n.nid = da.nid LEFT JOIN node i18n ON n.tnid > 0 AND n.tnid = i18n.tnid AND i18n.language = 'en' WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all'))) AND ((da.realm = "domain_id" AND da.gid = 4) OR (da.realm = "domain_site" AND da.gid = 0)) AND (n.language ='en' OR n.language ='' OR n.language IS NULL OR n.language = 'is' AND i18n.nid IS NULL) AND ( n.status = 1 AND ((e.event_start >= '2010-01-31 00:00:00' AND e.event_start <= '2010-03-01 23:59:59') OR (e.event_end >= '2010-01-31 00:00:00' AND e.event_end <= '2010-03-01 23:59:59') OR (e.event_start <= '2010-01-31 00:00:00' AND e.event_end >= '2010-03-01 23:59:59')) ) GROUP BY n.nid HAVING (event_start >= '2010-02-01 00:00:00' AND event_start <= '2010-02-28 23:59:59') OR (event_end >= '2010-02-01 00:00:00' AND event_end <= '2010-02-28 23:59:59') OR (event_start <= '2010-02-01 00:00:00' AND event_end >= '2010-02-28 23:59:59') ORDER BY event_start ASC;
  8. Database - indexing 'select id from stock where status = 2 order by qty' → aggregate index on (status, qty) 'select id from stock where status > 2 order by qty' → aggregate index on (status, qty) ? → No : range selection stops use of aggregate index → separate index on status and qty
  9. Database - indexing Indexes make database faster → Let's index everything ! → DON'T : Insert/update/delete → Index modification Each query → evaluation of all indexes "Relational schema design is based on data but index design is based on queries" (Bill Karwin, Percona)
  10. Databases – detecting problematic queries Slow query log → SET GLOBAL slow_query_log = ON; Queries not using indexes → In my.cnf/my.ini : 'log_queries_not_using_indexes' General query log → SET GLOBAL general_log = ON; → Turn it off quickly ! Percona Toolkit (Maatkit) pt-query-digest
  11. Databases - pt-query-digest # Profile # Rank Query ID Response time Calls R/Call Apdx V/M Item # ==== ================== ================ ===== ======= ==== ===== ========== # 1 0x543FB322AE4330FF 16526.2542 62.0% 1208 13.6806 1.00 0.00 SELECT output_option # 2 0xE78FEA32E3AA3221 0.8312 10.3% 6412 0.0001 1.00 0.00 SELECT poller_output poller_item # 3 0x211901BF2E1C351E 0.6811 8.4% 6416 0.0001 1.00 0.00 SELECT poller_time # 4 0xA766EE8F7AB39063 0.2805 3.5% 149 0.0019 1.00 0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships # 5 0xA3EEB63EFBA42E9B 0.1999 2.5% 51 0.0039 1.00 0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary # 6 0x94350EA2AB8AAC34 0.1956 2.4% 89 0.0022 1.00 0.01 UPDATE wp_options # MISC 0xMISC 0.8137 10.0% 3853 0.0002 NS 0.0 <147 ITEMS>
  12. Databases - pt-query-digest # Query 2: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299 # This item is included in the report because it matches --limit. # Scores: Apdex = 1.00 [1.0], V/M = 0.00 # Query_time sparkline: | _^ | # Time range: 2011-12-28 18:42:47 to 19:03:10 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 1 312 # Exec time 50 4s 5ms 25ms 13ms 20ms 4ms 12ms # Lock time 3 32ms 43us 163us 103us 131us 19us 98us # Rows sent 59 62.41k 203 231 204.82 202.40 3.99 202.40 # Rows examine 13 73.63k 238 296 241.67 246.02 10.15 234.30 # Rows affecte 0 0 0 0 0 0 0 0 # Rows read 59 62.41k 203 231 204.82 202.40 3.99 202.40 # Bytes sent 53 24.85M 46.52k 84.36k 81.56k 83.83k 7.31k 79.83k # Merge passes 0 0 0 0 0 0 0 0 # Tmp tables 0 0 0 0 0 0 0 0 # Tmp disk tbl 0 0 0 0 0 0 0 0 # Tmp tbl size 0 0 0 0 0 0 0 0 # Query size 0 21.63k 71 71 71 71 0 71 # InnoDB: # IO r bytes 0 0 0 0 0 0 0 0 # IO r ops 0 0 0 0 0 0 0 0 # IO r wait 0 0 0 0 0 0 0 0 # pages distin 40 11.77k 34 44 38.62 38.53 1.87 38.53 # queue wait 0 0 0 0 0 0 0 0 # rec lock wai 0 0 0 0 0 0 0 0 # Boolean: # Full scan 100% yes, 0% no # String: # Databases wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more # Hosts # InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more # Last errno 0 # Users wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms ################################################################ # 100ms # 1s # 10s+ # Tables # SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'G # SHOW CREATE TABLE `wp_blog_one `.`wp_options`G # EXPLAIN /*!50100 PARTITIONS*/ SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'G
  13. Databases – pt-query-digest – Digest UI
  14. Databases – next step : explain explain <query> "How will MySQL execute the query"
  15. Databases – next step : explain +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | employees | ALL | NULL | NULL | NULL | NULL | 299809 | USING WHERE | +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+ +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | itdevice | const | PRIMARY,fk_device_devicetype1 | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | devicetype | const | PRIMARY | PRIMARY | 4 | const | 1 | | +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+
  16. Databases – next step : explain explain <query> "How will MySQL execute the query" Shows : Indexes available Indexes used (do you see one ?) Number of rows scanned Type of lookup 'system', 'const' and 'ref' = good 'ALL' = bad Extra info Using index = good Using filesort = usually bad Using where = bad
  17. Databases – when to use / not to use Good at : Fetching data Storing data Searching through data Bad at : select `someField` from `bigTable` where crc32(`field`) = "something" → full table scan
  18. For / foreach $customers = CustomerQuery::create() ->filterByState('SC') ->find(); foreach ($customers as $customer) { $contacts = ContactsQuery::create() ->filterByCustomerid($customer->getId()) ->find(); foreach ($contacts as $contact) { doSomestuffWith($contact); } }
  19. Joins $contacts = mysql_query(" select contacts.* from customer join contact on contact.customerid = customer.id where state = 'SC' "); while ($contact = mysql_fetch_array($contacts)) { doSomeStuffWith($contact); } or the ORM equivalent
  20. Better... 10001 → 1 query Sadly : people still produce code with query loops Usually : Growth not anticipated Internal app → Public app
  21. The origins of this talk Customers : Projects we built Projects we didn't build, but got pulled into Fixes Changes Infrastructure migration 15 years of 'how to cause mayhem with a few lines of code'
  22. Client X Jobs search site Monitor job views : Daily hits Weekly hits Monthly hits Which user saw which job
  23. Client X Originally : when user viewed job details Now : when job is in search result Search for 'php' → 50 jobs = 50 jobs to be updated → 50 updates for shown_today → 50 updates for shown_week → 50 updates for shown_month → 50 inserts for shown_user
  24. Client X : the code foreach ($jobs as $job) { $db->query(" $db->query(" insert into shown_month( insert into shown_today( jobId, jobId, number number ) values( ) values( " . $job['id'] . ", " . $job['id'] . ", 1 1 ) ) on duplicate key on duplicate key update update number = number + 1 number = number + 1 "); "); $db->query(" $db->query(" insert into shown_user( insert into shown_week( jobId, jobId, userId, number when ) values( ) values ( " . $job['id'] . ", " . $job['id'] . ", 1 " . $user['id'] . ", ) now() on duplicate key ) update "); number = number + 1 } ");
  25. Client X : the graph
  26. Client X : the numbers 600-1000 updates/sec (peaks up to 1600) 400-1000 updates/sec (peaks up to 2600) 16 core machine
  27. Client X : panic ! Mail : "MySQL slave is more than 5 minutes behind master" We set it up → who did they blame ? Wait a second !
  28. Client X : what's causing those peaks ?
  29. Client X : possible cause ? Code changes ? → According to developers : none Action : turn on general log, analyze with pt-query-digest → 50+-fold increase in queries → Developers : 'Oops we did make a change' After 3 days : 2,5 days behind Every hour : 50 min extra lag
  30. Client X : But why is the slave lagging ? File : File : master-bin-xxxx.log Slave I/O thread master-bin-xxxx.log p Sl um av g d ad th e S re Q n lo e ad L Bi thr Master Slave
  31. Client X : Master
  32. Client X : Slave
  33. Client X : fix ? foreach ($jobs as $job) { $db->query(" $db->query(" insert into shown_month( insert into shown_today( jobId, jobId, number number ) values( ) values( " . $job['id'] . ", " . $job['id'] . ", 1 1 ) ) on duplicate key on duplicate key update update number = number + 1 number = number + 1 "); "); $db->query(" $db->query(" insert into shown_user( insert into shown_week( jobId, jobId, userId, number when ) values( ) values ( " . $job['id'] . ", " . $job['id'] . ", 1 " . $user['id'] . ", ) now() on duplicate key ) update "); number = number + 1 } ");
  34. Client X : the code change $todayQuery = " insert into shown_today( jobId, number ) values "; foreach ($jobs as $job) { $todayQuery .= "(" . $job['id'] . ", 1),"; } $todayQuery = substr($todayQuery, -1); $todayQuery .= " ) on duplicate key update number = number + 1 "; $db->query($todayQuery); Result : insert into shown_today values (5, 1), (8, 1), (12, 1), (18, 1), ... Careful : max_allowed_packet !
  35. Client X : the chosen solution $db->autocommit(false); $db->query(" foreach ($jobs as $job) { insert into shown_month( $db->query(" jobId, insert into shown_today( number jobId, ) values( number " . $job['id'] . ", ) values( 1 " . $job['id'] . ", ) 1 on duplicate key ) update on duplicate key number = number + 1 update "); number = number + 1 $db->query(" "); insert into shown_user( $db->query(" jobId, insert into shown_week( userId, jobId, when number ) values ( ) values( " . $job['id'] . ", " . $job['id'] . ", " . $user['id'] . ", 1 now() ) ) on duplicate key "); update } number = number + 1 $db->commit(); ");
  36. Client X : conclusion For loops are bad (we already knew that) Add master/slave and it gets much worse Use transactions : it will provide huge performance increase Result : slave caught up 5 days later
  37. Database → Network Customer Y Top 10 site in Belgium Growing rapidly At peak traffic : Unexplicable latency on database Load on webservers : minimal Load on database servers : acceptable
  38. Client Y : the network
  39. Client Y : the network 60GB 700GB 700GB
  40. Client Y : network overload Cause : Drupal hooks → retrieving data that was not needed Only load data you actually need Don't know at the start ? → Use lazy loading Caching : Same story Memcached/Redis are fast But : data still needs to cross the network
  41. Network trouble : more than just traffic Customer Z 150.000 visits/day News ticker : XML feed from other site (owned by same customer) Cached for 15 min
  42. Customer Z – fetching the feed if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) { unlink(APP_DIR . '/tmp/ScrambledSiteName.xml'); file_put_contents( APP_DIR . '/tmp/ScrambledSiteName.xml', file_get_contents('http://www.scrambledsitename.be/xml/feed.xml') ); } $xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml'); What's wrong with this code ?
  43. Customer Z – no feed without the source Feed source
  44. Customer Z – no feed without the source Feed source
  45. Customer Z : timeout default_socket_timeout : 60 sec by default Each visitor : 60 sec wait time People keep hitting refresh → more load More active connections → more load Apache hits maximum connections → entire site down
  46. Customer Z : timeout fix $context = stream_context_create( array( 'http' => array( 'timeout' => 5 ) ) ); if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) { unlink(APP_DIR . '/tmp/ScrambledSiteName.xml'); file_put_contents( APP_DIR . '/tmp/ScrambledSiteName.xml', file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context) ); } $xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml');
  47. Customer Z : don't delete from cache $context = stream_context_create( array( 'http' => array( 'timeout' => 5 ) ) ); if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) { unlink(APP_DIR . '/tmp/ScrambledSiteName.xml'); file_put_contents( APP_DIR . '/tmp/ScrambledSiteName.xml', file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context) ); } $xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml');
  48. Network resources Use timeouts for all : fopen curl SOAP … Data source trusted ? → setup a webservice → let them push updates when their feed changes → less load on data source → no timeout issues Add logging → early detection
  49. Logging Logging = good Logging in PHP using fopen → bad idea : locking issues → Use file_put_contents($filename, $data, FILE_APPEND) For Firefox : FirePHP (add-on for Firebug) Debug logging = bad on production Watch your logs ! Don't log on slow disks → I/O bottlenecks
  50. File system : I/O bottlenecks Causes : Excessive writes (database updates, logfiles, swapping, …) Excessive reads (non-indexed database queries, swapping, small file system cache, …) How to detect ? top iostat See iowait ? Stop worrying about php, fix the I/O problem !
  51. File system Worst of all : NFS PHP files → lstat calls Templates → same Sessions → locking issues → corrupt data → store sessions in database, Memcached, Redis, ...
  52. Much more than code XML feed Network User Webserver DB server
  53. Questions ?
  54. Questions ?
  55. Contact Twitter @wimgtr Web http://techblog.wimgodden.be Slides http://www.slideshare.net/wimg E-mail wim.godden@cu.be Please... Rate my talk : http://spkr8.com/t/21141
  56. Thanks ! Please... Rate my talk : http://spkr8.com/t/21141

Notas del editor

  • 5kbit/sec or 100Mbit/sec ?
  • Let&apos;s talk about code Without : we don&apos;t exist What are most common mistakes in ecosystem Let&apos;s start with the database
  • time spent per query pattern how many queries of that query pattern
  • Get back to what I said Lots of people use ORM - easier - don&apos;t need to write queries - object-oriented but people start doing this Imagine 10000 customers → 10001 queries
  • Not best code Uses deprecated mysql extension no error handling
  • Master : 16 CPU cores 12 cores for SQL 1 core for binlog dump rest for system Slave : 16 CPU cores 1 core for slave I/O 1 core for slave SQL
  • Grouping Works fine, but : maximum size of string ? PHP = no limit MySQL = max_allowed_packet
  • All in a single commit Note : transaction has max. size Possible : combination with previous solution
  • took few moments to figure out No network monitoring → iptraf → 100Mbit/sec limit → packets dropped → connections dropped Customer : upgrade switch Us : why 100Mbit/sec ?
  • Databases → network What other network related issues ?
  • Server on which feed located : crashed Fine for few minutes (cache) 15 minutes : file_get_contents uses default_socket_timeout
  • Better, not perfect. What else is wrong ? Multiple visitors hit expiring cache → file delete → xml feed hit a lot
  • Better, not perfect. What else is wrong ? Multiple visitors hit expiring cache → file delete → xml feed hit a lot
  • How do you treat your data : - where do you get it - how long did you have to wait to get it - how is it transported - how is it processed minimize the amount of data : retrieved transported processed, sent to db and users

×