Más contenido relacionado Bigger data with PostgreSQL 91. Slide 1
Bigger data with
PostgreSQL 9
Datawarehousing in the 21st century.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
© by Numius nv
Open systems, Smarter people
2. Slide 2
The presenter..
• Bert Desmet
• Consultant @ Deloitte
• System Engineer / DBA for deloitteanalytics.eu
• 'devop'?
© by Numius nv
Open systems, Smarter people
3. Slide 3
agenda
• Introduction
• Release the elephants!
• Impacting factors
• Divide et impera
• Basic configuration
• Passing the speed limits
• Keep your database fit
© by Numius nv
Open systems, Smarter people
4. Slide 4
Big data?
●
44x data growth per year!
●
●
80% of data is unstructured
●
●
About 35.2 zettabyte by 2020
The volume will grow by a whopping 650% in the next 5years
80% of organisations will use cloud analytics
●
By 2014 80% of eneterprises will want a saas based bi system
© by Numius nv
Open systems, Smarter people
5. Slide 5
Know your limits
●
DB2
●
More load
●
Scaling
●
●
●
Speed
Data size
Pricing
© by Numius nv
Open systems, Smarter people
7. Slide 7
PostgreSQL 9
●
Good for big databases
●
Easy maintenance
●
Scales!
●
Very fast
●
Extendable
© by Numius nv
Open systems, Smarter people
9. Slide 9
Higly impacting operations
• Dataload
• In bulk (ETL)
• Row by row. Up to 100k rows / minute
• Datafetch (Reporting)
• We do like joins. The more the better.
© by Numius nv
Open systems, Smarter people
10. Slide 10
Extra problems
• a lot of I/O
• A lot of cpu power (index creation)
• A lot of locks
© by Numius nv
Open systems, Smarter people
11. Slide 11
The solution?
• Use at least 2 servers
• Set up binary replication
• Put a lot of ram in your servers.
© by Numius nv
Open systems, Smarter people
14. Slide 14
Replication with postgres
• 8.3 Warm Standby
• 9.0 Async. Binary Replication
• 9.1 Synchronous Replication
• 9.2 Cascading Replication
• 9.3 more improvents towards fail overs / switching masters
• 9.4 Multimaster Binary Replication?
© by Numius nv
Open systems, Smarter people
15. Slide 15
Configure replication
• Wal_level = ‘host standby’
• Checkpoint_segments >= 32
• Checkpoint_completetion_target >= 0.8
• Hot_standby = on
• Hot_standby_feedback = on
© by Numius nv
Open systems, Smarter people
17. Slide 17
Keep it simple, stupid
• 2nd quadrant is pretty awesome
• Barman for backups
• Repmgr for replication management
© by Numius nv
Open systems, Smarter people
19. Slide 19
Raise those memory limits!
• shared_buffers = 1/8 to ¼ of RAM
• work_mem = 128MB to 1GB
• maintenance_work_mem = 512MB to 1GB
• temp_buffers = 128MB to 1GB
• effective_cache_size = ¾ of RAM
• wal_buffers = 32MB
© by Numius nv
Open systems, Smarter people
20. Slide 20
Tune the planner for correct planning
• Random_page_cost = 3
• Cpu_tuple_cost = 0.1
• Contraint_exclusion=on
• From_collapse_limit => 12
• Join_collapse_limit => 12
© by Numius nv
Open systems, Smarter people
22. Slide 22
Use partitions
• Think about the partition key!
• Trigger based for row / row inserts
• Rule based for bulk inserts
• Make sure you add constraints
© by Numius nv
Open systems, Smarter people
23. Slide 23
Use indexes
• Learn to read query explains
• Use http://explain.depesz.com/
• Don’t over index
© by Numius nv
Open systems, Smarter people
24. Slide 24
Other sane things to do
• Use unique indexes
• Auto created when defining a primary key
• Use clustered indexes
• And cluster those tables regularly
© by Numius nv
Open systems, Smarter people
25. Slide 25
Use partial indexes
• Can only be found in Postgres and Mysql.
• Really usefull on big tables
• Disadvantage: no ‘moving’ indexes. Eg: index for current_day.
© by Numius nv
Open systems, Smarter people
27. Slide 27
Vacuum
• Disable autovacuum for datawarehouses
• Vacuum once a day
• Check regulary if the vacuums to run!
• Prevents data loss
• Prevents the database to go out of control, size wise
© by Numius nv
Open systems, Smarter people
28. Slide 28
Analyze
• Analyze once a day
• Together with vacuum
• Vacuum analyze <schema>.<table>;
• ‘default_statistics_target’ >= 300
© by Numius nv
Open systems, Smarter people
29. Slide 29
Check for bloat!
• Free space on tables.
• Indexes are not optimized anymore
• use nagios check_postgres.pl
© by Numius nv
Open systems, Smarter people
30. Slide 30
Prevent bloat
• Vacuum full
• Offline!
• Only when a pk is not available
• Repack
• Online!
• Orders the tables (clustered index)
• Needs a pk on the table
• Reindex
• Reindex regulary.
© by Numius nv
Open systems, Smarter people
31. Slide 31
Partial indexes?
• Write a script
• Use a cronjob
• Recreate your time-aware indexes every day. Will be fast.
© by Numius nv
Open systems, Smarter people
33. Slide 33
Questions?
• Postgres has an awesome community ®
• Irc: #postgresql @ freenode
• Check the mailing list
© by Numius nv
Open systems, Smarter people