SlideShare una empresa de Scribd logo
1 de 102
Big Bad “Upgraded” Postgres


                Robert Treat


                        / Presentation



Wednesday, May 23, 12
Intro




          • Robert Treat
                • xzilla.net
                • @robtreat2
                • +RobertTreat




Wednesday, May 23, 12
Intro, part 2


                • COO @ OmniTI
                        • Full Stack Tech Consulting
                        • Remote Database Management and Consulting
                        • Large Scale / Mission Critical
                        • We’re Hiring!




Wednesday, May 23, 12
Philosophy

                  OmniTI has a reputation for scalable web
                      applications and architectures.

                        We didn't learn this stuff overnight.

         Like many success stories, we acquired
       experience through trial and error, constant
         collaboration between development and
    operations teams, and an unwavering commitment
                      to excellence.

         But we still lean on our friends and peers to see
                 how things can be done better.
Wednesday, May 23, 12
Philosophy




Wednesday, May 23, 12
So, What Is Big Bad Postgres?




                        • Original Project
                         • Convert TB+ Sized Oracle ODS To Postgres
                         • 2005, Postgres 8.1




Wednesday, May 23, 12
So, What Is Big Bad Postgres?
                                 OLTP Instance:
                                 Drives the Site
                                                            Warm Standby
                        Oracle
                                                   Oracle

  mysql                                               Data Warehouse
  logging, bulk processing
                                                   Oracle
                mysql


Wednesday, May 23, 12
So, What Is Big Bad Postgres?
                                 OLTP Instance:
                                 Drives the Site
                                                          Warm Standby
                        Oracle
                                                    Oracle

  mysql                                                Data Warehouse
  logging, bulk processing


                   XXX
                                                   Postgres
                mysql


Wednesday, May 23, 12
So, What Is Big Bad Postgres?



                         ProTip

                   •    Remove Oracle
                        Licensing Costs

                   •    Remove Oracle Add-On
                        Costs




Wednesday, May 23, 12
So, What Is Big Bad Postgres?



                         ProTip                     BroTip

                   •    Remove Oracle
                                               •   Missing features
                        Licensing Costs

                   •    Remove Oracle Add-On   •   Upgrades are a
                                                   problem
                        Costs




Wednesday, May 23, 12
So, What Is Big Bad Postgres?


                        • Missing Features (Postgres 8.1)
                         • Heterogeneous Replication (dbi-link)
                         • Autonomous Transactions (dblink)
                         • Backups (zfs)
                         • Aggregate SQL (plpgsql)
                         • Large Selects (cursors?)
                         • Upgrades (?)


Wednesday, May 23, 12
So, What Is Big Bad Postgres?



                        • Save $500K in Licenses
                          • $100K in labor
                        • Took Roughly 6 Months
                          • Built lots of useful tools
                          • Learned a lot about Postgres
                          • Everybody’s happy

          http://lethargy.org/~jesus/writes/big-bad-postgresql

Wednesday, May 23, 12
Big Bad “Broken” Postgres



                        • Feb 2008, disaster struck
                          • disk failures
                          • memory issues
                          • software bugs
                          • no failover box!
                          • disaster recovery ensued



Wednesday, May 23, 12
Big Bad “Broken” Postgres




      Using ZFS snapshots, we were able to modify Postgres
       code, deploy / test on production copy of data, and
                 eventually get a running system.




Wednesday, May 23, 12
Big Bad “Broken” Postgres



                        • Fallout:
                          • New Machines (2 of them!)
                          • Upgrade to Postgres 8.3
                             • given corrupt data, dump/restore was forced
                             • made life easier operationally
                        • Happiness restored!



Wednesday, May 23, 12
So, What Is Big Bad Postgres?
                                 OLTP Instance:
                                 Drives the Site
                                                          Warm Standby
                        Oracle
                                                    Oracle


                            “identical”
                          Data Warehouse
                                                   Postgres
          Postgres                                   8.3
            8.3

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres



                    robert@omniti.com at 12:24 on 2009-02-17
                  8.4 is approaching beta, and has some good
                  features we could take advantage of on pgods,
                  if we could get it to upgrade.




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




          • Issues to handle
                • 3TB, compressed
                • Limited by hardware

                • Limited by spare time




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


          • Options?




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


          • Options?
                • SLONY - Well known, but mostly unusable
                   solution (issues with partitioned tables are
                   primary culprit, but lots of other dynamic ddl as
                   well)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


          • Options?
                • SLONY - Well known, but mostly unusable
                   solution (issues with partitioned tables are
                   primary culprit, but lots of other dynamic ddl as
                   well)

                • Mammoth Replicator - can replicate between
                   versions, but slave creation is a problem.




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


          • Options?
                • SLONY - Well known, but mostly unusable
                   solution (issues with partitioned tables are
                   primary culprit, but lots of other dynamic ddl as
                   well)

                • Mammoth Replicator - can replicate between
                   versions, but slave creation is a problem.

                • “8.4 dev tree has a pg_migrator script in it, for
                   in place upgrades, this should be investigated
                   in our environment.”



Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




          • 8.4 dev pg_migrator
               • Ran into bugs with Solaris support during dev
                        • "file.c", line 592: warning: argument #4 is incompatible with
                          prototype:


               • Spent ~ 1 month on this before it got put on the
                   back burner




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




          • 8.4.0 pg_migrator (Aug 2009)
                        • packages all built without major issues

                        • 8.4.0 had a bug with indexes & plperl
                          • so, we delayed

                          • actually, we postponed



Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




          • 9.0 upgrade?
                        • Jan 2011, 9.0 is in development

                        • Given months of testing, we focused on 9.0




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


          • Some tools we use:
                • dblink
                • plperl / dbilink
                • fuzzystring
                • freespacemap
                • pg_reorg
                • secure check postgres


Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • SNAG #1
               • can’t symlink across filesystems
                        • (aka zfs datasets)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • SNAG #1
               • can’t symlink across filesystems
                        • (aka zfs datasets)

                             Given $PGDATA = /pgsql/main


         NO:                                    YES:
         /pgsql/main                            /pgsql/main/83
         /pgsql/main9                           /pgsql/main/90

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
          • SNAG #2
          • un-upgradable data types




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
          • SNAG #2
          • un-upgradable data types
               • left over pg_reorg cruft
               • (special table data types)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
          • SNAG #2, part b
          • un-upgradable data types




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
          • SNAG #2, part b
          • un-upgradable data types
               • “NAME” type
               • check_postgres.pg_stat_activity.datname




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
          • SNAG #2, part c
          • un-upgradable data types
               • “NAME” type




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
          • SNAG #2, part c
          • un-upgradable data types
               • “NAME” type

          create table x as
          select
            tablename,
            pg_relation_size(schemaname||’.’||tablename)
          from
            pg_tables


Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
                              -- plan of attack --
                       swap in minimal cron / configs
                           shut down 8.3 database
                                zfs snapshot fs
                        bring up 8.3 / 9.0 databases
                               run pg_upgrade
                          verify migration complete
                           turn on snap replication
                                sanity checking
                    slow role more services back on-line
                               back in buisness

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
                         -- actual attack --
                  swap in minimal cron / configs
                     shut down 8.3 database
                           zfs snapshot fs
                   bring up 8.3 / 9.0 databases
                          run pg_upgrade
                   **hit a bug in role creation**
                               discuss
                               rollback
             (rename the control file of the old cluster)
                   bring back up 8.3 database

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




    CREATE ROLE asha;
    ALTER ROLE asha SET role TO
    'omniti';
    .. sometime later ...
    CREATE ROLE omniti;




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres



    Added to TODO:

    ! Allow pg_dumpall to output
    restorable ALTER USER/DATABASE
    SET settings



                                      March 2011

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres



    We could have done something where we dropped or
    modified all roles and recreated them after upgrade, but
               this didn’t seem like the right fix.




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres



    We could have done something where we dropped or
    modified all roles and recreated them after upgrade, but
               this didn’t seem like the right fix.




                        eventually, we started working on a patch


Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




                        Meanwhile, at another client, not far away...




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • more pg_upgrade bugs
               • Fix pg_upgrade's handling of TOAST tables
                   (april)

               • Fix pg_upgrade to preserve toast tables'
                   relfrozenxids during an upgrade from 8.3
                   (sept)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • more pg_upgrade bugs
               • Fix pg_upgrade's handling of TOAST tables
                   (april)

               • Fix pg_upgrade to preserve toast tables'
                   relfrozenxids during an upgrade from 8.3
                   (sept)

                          Not directly relevant to “Big Bad”,
                           but did take time / energy away,
                          and was not confidence inspiring



Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • back to the role bug
               • after ~ 3 months, the patch was accepted
                   (oct)

               • 9.1 is now just around the corner, so...




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • back to the role bug
               • after ~ 3 months, the patch was accepted
                   (oct)

               • 9.1 is now just around the corner, so...



                                we decided to go to 9.1




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

         • 9.1 released in September, 2011
               • 9.1 upgrade testing begins

                    • schema tests
                    • compatibility tests




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

         • 9.1 released in September, 2011
               • 9.1 upgrade testing begins

                    • schema tests
                    • compatibility tests



                                       2011-10-25
                                     Judgement Day




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

         • doh #1
               • non-empty tablespace
               • left over from some 9.1 testing (pg_dump /
                  pg_restore of schema only)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

         • doh #1
               • non-empty tablespace
               • left over from some 9.1 testing (pg_dump /
                  pg_restore of schema only)


 Creating databases in the new cluster
 psql:/pgdata/main/pg_upgrade_dump_globals.sql:247: NOTICE: schema "ods" does not exist
 psql:/pgdata/main/pg_upgrade_dump_globals.sql:255: NOTICE: schema "check_postgres" does not exist

 psql:/pgdata/main/pg_upgrade_dump_globals.sql:303: ERROR: directory "/
 pgdata/alldata1/PG_9.1_201105231" already in use as a tablespace

 There were problems executing "/opt/pgsql911/bin/psql" --set
 ON_ERROR_STOP=on --no-psqlrc --port 5491 --username "postgres" -f "/
 pgdata/main/pg_upgrade_dump_globals.sql" --dbname template1 >> "/dev/null"
 Failure, exiting



Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • doh #2
               • new bad data types had snuck in (more
                   name, some unknown)

               • drop tables, alter data types




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • doh #3
               • actually it worked*




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • doh #3
               • actually it worked*

               • begin brining services back online
                        • analyze, vacuum

                        • turn on “snap job” replication

                        • turn on regular replication




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • doh #3
               • actually it worked*

               • begin brining services back online
                        • analyze, vacuum

                        • turn on “snap job” replication

                        • turn on regular replication
                          • aka “the point of no return”

                              * don’t worry, it didn’t really work, it just looked like it did

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
          • Replication was running
               • 100’s of tables
          • ETL services not yet on
               • dinner time :-)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                             Things Were Sailing Along Fine




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres



             “if this were the movie titanic, this is
             the scene where you see the two
             guys up in the watch tower joking
             around before the iceberg hits.”




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




             NOTICE: ERROR: value too long for type character
             varying(40) at line 96.




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




             NOTICE: ERROR: value too long for type character
             varying(40) at line 96.




    normally we see this on (oracle) number -> (pgsql) integer




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

          • Verified function with debug output
               • function creates a temp table
                   matching primary table
               • copies replicated data into temp
                   table
               • replaces data in actual table with
                   data from temp table
          • This was all working



Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




          • function with same data working on
             the other server
          • verified sql ran fine using dblink
             (so, plperl specific)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                           Comparing Bad Data - The “Good”
    -[ RECORD 1 ]----+---------------------------------
    userid           | 78184652
    username         | 78184652
    title            |
    firstname        | ????????????????
    lastname         | ??????????????????
    middleinitial    |
    email            | user@example.com
    address          | ???????????????? 34?? ????6
    address2         |
    city             | ??????????????
    state            | ?????????????????? ??????
    zipcode          | 211573
    country          | by
    active           | 1
    subscribed       | 1
    cookieusername   | ed9cd5fed817628ca5b052ebfe11925d
    partner          | 1059105
    actual_phone     |
    dob              |
    phone            |
    regdate          | 2011-10-26 15:32:07
    ipaddress        | 21.12.21.12
    last_open_ts     |
    last_click_ts    |
    last_play_ts     |
    last_delivery_ts |




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                            Comparing Bad Data - The “Bad”
    -[ RECORD 1 ]----+--------------------------------------------------------------------------
    userid           | 78184652
    username         | 78184652
    title            |
    firstname        | ����������������
    lastname         | ������������������
    middleinitial    |
    email            | user@example.com
    address          | ���������������� 34�� ����6
    address2         |
    city             | ��������������
    state            | ������������������ ������
    zipcode          | 211573
    country          | by
    active           | 1
    subscribed       | 1
    cookieusername   | ed9cd5fed817628ca5b052ebfe11925d
    partner          | 1059105
    actual_phone     |
    dob              |
    phone            |
    regdate          | 2011-10-26 15:32:07
    ipaddress        | 21.12.21.12
    last_open_ts     |
    last_click_ts    |
    last_play_ts     |
    last_delivery_ts |




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




                        I hate encoding issues




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


               • Ruled out several config options
                   (lc_*, locale on the machine)
               • plperl running on different version
                   of libpq?




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


               • Ruled out several config options
                   (lc_*, locale on the machine)
               • plperl running on different version
                   of libpq?
                        • install libpq5 built on 9.1

                        • install dbdpg built on new libpq




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


               • Ruled out several config options
                   (lc_*, locale on the machine)
               • plperl running on different version
                   of libpq?
                        • install libpq5 built on 9.1

                        • install dbdpg built on new libpq


                                          ...no

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




               • wrote perl script that used dbd:pg
                   to grab data from oracle
                        • worked!

               • so, plperl eh?




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
              postgres=# show client_encoding;
               client_encoding
              -----------------
               UTF8
              (1 row)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
              postgres=# show client_encoding;
               client_encoding
              -----------------
               UTF8
              (1 row)

              postgres=# select * from dbi_link.remote_select(3,'select chr(255) as
              bar') t (bar text);
               bar
              -----
               ÿ
              (1 row)




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
              postgres=# show client_encoding;
               client_encoding
              -----------------
               UTF8
              (1 row)

              postgres=# select * from dbi_link.remote_select(3,'select chr(255) as
              bar') t (bar text);
               bar
              -----
               ÿ
              (1 row)

              postgres=# set client_encoding = latin1;
              SET
              postgres=# select * from dbi_link.remote_select(3,'select chr(255) as
              bar') t (bar text);
               bar
              -----
               ÿ
              (1 row)


Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


                        • check lang var

                         • pg user

                         • root

                         • smf init script

                        • everything checked out :-




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
                “But Wait, Maybe It Is The LC Stuff”
               -bash-3.00$ sudo pargs -e 17302
               Password:
               17302: /opt/pgsql8314/bin/postgres -D /pgdata/main
               envp[0]: LC_TIME=C
               envp[1]: LC_NUMERIC=C
               envp[2]: LC_MONETARY=C
               envp[3]: LC_MESSAGES=en_US.UTF-8
               envp[4]: LC_CTYPE=en_US.UTF-8
               envp[5]: LC_COLLATE=en_US.UTF-8
               envp[6]: LD_LIBRARY_PATH=/opt/oracle/amd64
               envp[7]: ORACLE_HOME=/opt/oracle
               envp[8]: PATH=/usr/sbin:/usr/bin
               envp[9]: PERL5LIB=/data/CPAN/lib/site_perl
               envp[10]: PGDATA=/pgdata/main
               envp[11]: PGPREFIX=/opt/pgsql
               envp[12]: PGSYSCONFDIR=/opt/pgsql8314/etc
               envp[13]: PGUSER=postgres
               envp[14]: SMF_FMRI=svc:/database/postgres:default
               envp[15]: SMF_METHOD=/opt/pgsql/bin/pg_ctl -D $PGDATA
               start -w
               envp[16]: SMF_RESTARTER=svc:/system/svc/restarter:default
               envp[17]: TNS_ADMIN=/opt/oracle/network/admin
               envp[18]: TZ=US/Eastern


Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
                “But Wait, Maybe It Is The LC Stuff”

                 -bash-3.00$ sudo pargs -e `pgrep postgres` |
                 grep LC_CTYPE
                 Password:
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[6]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
                “But Wait, Maybe It Is The LC Stuff”

                 -bash-3.00$ sudo pargs -e `pgrep postgres` |
                 grep LC_CTYPE
                 Password:
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[6]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8
                 envp[4]: LC_CTYPE=en_US.UTF-8




               note: this is actually from the “broken” one

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres
                “But Wait, Maybe It Is The LC Stuff”




                 After more experimentation, including restart of
                 database with adjusted LANG and en vars, I was
                 able to fix the odd LC settings, but not the
                 remote data select with plperl...




                                          #DOH




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                           have you figured it out yet?




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                                pg_enable_utf8




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                                $dbh->{pg_enable_utf8} = 1;



             Force strings passed to and from plperl to be in UTF8 encoding.
             String       are converted to UTF8 on the way into perl
             and to       the database encoding on the way back. This
             avoids       a number of observed anomalies, and ensures
             Perl a       consistent view of the world.




        https://github.com/postgres/postgres/commit/50d89d422f9c68a52a6964e5468e8eb4f90b1d95




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                               $dbh->{pg_enable_utf8} = 1;


   Actually a change in 9.0
    http://www.postgresql.org/docs/9.1/static/release-9-0.html
   * Verify that PL/Perl return values are valid in the server
   encoding (Andrew Dunstan)




                             Note: Perl may otherwise make
                           assumptions that your data is Latin1

Wednesday, May 23, 12
Big Bad “Upgraded” Postgres


                           My 2:00AM Summary...


           “In any case, this seems like a horrible backwards
           compatibility nightmare thats likely to eat peoples data; I
           only noticed it because I was pulling data from a varchar(20)
           into a varchar(20) and it complained the data size was too
           long. Had I been using text (which is what I normally do), I
           think I would have screwed myself.”




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                            One last broken job...

  SELECT
     42, cntry_abbr as country,
     date_trunc('day', h.hitdate) as rollup_day,
     pg.price_group_id,
     count(1) as hits
  FROM
     tblhits h
     join tbladvertiser_campaign sc on h.partner = sc.source_code
     join tblcountry flc on promo.perl_geo_ip_country(h.ipaddress) = flc.cntry_abbr
     left join tbladvertiser_price_groups pg on pg.campaign_id = 42
        and h.hitdate between pg.start_date and coalesce(pg.end_date, h.hitdate)
        and get_bit(decode(pg.countries::text, 'hex'), flc.country_id::integer) > 0
  WHERE
     h.hitdate >= '2011-10-26'::date and sc.campaign_id = 42
     and h.hitdate < '2011-10-31'::date + '1 day'::interval
  GROUP BY
     flc.country_abbreviation, date_trunc('day', h.hitdate), pg.price_group_id;




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                            One last broken job...

        Default output of bytea columns was changed
        in Postgres 9.1




Wednesday, May 23, 12
Big Bad “Upgraded” Postgres

                            One last broken job...

  SELECT
     42, cntry_abbr as country,
     date_trunc('day', h.hitdate) as rollup_day,
     pg.price_group_id,
     count(1) as hits
  FROM
     tblhits h
     join tbladvertiser_campaign sc on h.partner = sc.source_code
     join tblcountry flc on promo.perl_geo_ip_country(h.ipaddress) = flc.cntry_abbr
     left join tbladvertiser_price_groups pg on pg.campaign_id = 42
        and h.hitdate between pg.start_date and coalesce(pg.end_date, h.hitdate)
        and get_bit(pg.countries, flc.country_id::integer) > 0
  WHERE
     h.hitdate >= '2011-10-26'::date and sc.campaign_id = 42
     and h.hitdate < '2011-10-31'::date + '1 day'::interval
  GROUP BY
     flc.country_abbreviation, date_trunc('day', h.hitdate), pg.price_group_id;




Wednesday, May 23, 12
Aftermath



          • did utf8 issue cause data corruption?

               • extensive post-upgrade testing

               • “data diff” between servers

               • fixed dozen of problems




Wednesday, May 23, 12
Aftermath



          • did utf8 issue cause data corruption?

               • extensive post-upgrade testing

               • “data diff” between servers

               • fixed dozen of problems

           NONE attributable to the upgrade or plperl issues


Wednesday, May 23, 12
Aftermath



                        pg_upgrade works pretty well




Wednesday, May 23, 12
Alls Well That Ends Well




                             We upgraded the other box




Wednesday, May 23, 12
Alls Well That Ends Well




                                 Took ~ 45 minutes




Wednesday, May 23, 12
Alls Well That Ends Well




                                   Nothing broke




Wednesday, May 23, 12
Alls Well That Ends Well




                                        Yet?

                                         ;-)




Wednesday, May 23, 12
THE END

                              Thanks!
                               PGCon


                         omniti dba team
                         postgres hackers

                                Slides
                           http://www.xzilla.net/

                                @robtreat2


Wednesday, May 23, 12
Wednesday, May 23, 12

Más contenido relacionado

Similar a Big Bad "Upgraded" Postgres

Dan node meetup_socket_talk
Dan node meetup_socket_talkDan node meetup_socket_talk
Dan node meetup_socket_talk
Ishi von Meier
 
GitHub Notable OSS Project
GitHub  Notable OSS ProjectGitHub  Notable OSS Project
GitHub Notable OSS Project
roumia
 
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2
Denish Patel
 
Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)
Ontico
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru Architecture
Dmitry Buzdin
 

Similar a Big Bad "Upgraded" Postgres (20)

Dan node meetup_socket_talk
Dan node meetup_socket_talkDan node meetup_socket_talk
Dan node meetup_socket_talk
 
GitHub Notable OSS Project
GitHub  Notable OSS ProjectGitHub  Notable OSS Project
GitHub Notable OSS Project
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
 
Check Please!
Check Please!Check Please!
Check Please!
 
Pg nordic-day-2014-2 tb-enough
Pg nordic-day-2014-2 tb-enoughPg nordic-day-2014-2 tb-enough
Pg nordic-day-2014-2 tb-enough
 
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, StrongerCassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
 
Charles
CharlesCharles
Charles
 
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2
 
GemStone/S Update
GemStone/S UpdateGemStone/S Update
GemStone/S Update
 
What makes JBoss AS7 tick?
What makes JBoss AS7 tick?What makes JBoss AS7 tick?
What makes JBoss AS7 tick?
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
 
OSDC 2017 | Lessons from database failures by Colin Charles
OSDC 2017 | Lessons from database failures by Colin CharlesOSDC 2017 | Lessons from database failures by Colin Charles
OSDC 2017 | Lessons from database failures by Colin Charles
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
 
Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)
 
Horizon 20110928
Horizon 20110928Horizon 20110928
Horizon 20110928
 
Lessons from database failures
Lessons from database failuresLessons from database failures
Lessons from database failures
 
Qcon talk
Qcon talkQcon talk
Qcon talk
 
Drupal for enterprise
Drupal for enterpriseDrupal for enterprise
Drupal for enterprise
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru Architecture
 

Más de Robert Treat

Less Alarming Alerts!
Less Alarming Alerts!Less Alarming Alerts!
Less Alarming Alerts!
Robert Treat
 
Managing Databases In A DevOps Environment
Managing Databases In A DevOps EnvironmentManaging Databases In A DevOps Environment
Managing Databases In A DevOps Environment
Robert Treat
 
Scaling with Postgres (Highload++ 2010)
Scaling with Postgres (Highload++ 2010)Scaling with Postgres (Highload++ 2010)
Scaling with Postgres (Highload++ 2010)
Robert Treat
 
Intro to Postgres 9 Tutorial
Intro to Postgres 9 TutorialIntro to Postgres 9 Tutorial
Intro to Postgres 9 Tutorial
Robert Treat
 

Más de Robert Treat (20)

Advanced Int->Bigint Conversions
Advanced Int->Bigint ConversionsAdvanced Int->Bigint Conversions
Advanced Int->Bigint Conversions
 
Explaining Explain
Explaining ExplainExplaining Explain
Explaining Explain
 
the-lost-art-of-plpgsql
the-lost-art-of-plpgsqlthe-lost-art-of-plpgsql
the-lost-art-of-plpgsql
 
Managing Chaos In Production: Testing vs Monitoring
Managing Chaos In Production: Testing vs MonitoringManaging Chaos In Production: Testing vs Monitoring
Managing Chaos In Production: Testing vs Monitoring
 
Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016
 
Less Alarming Alerts - SRECon 2016
Less Alarming Alerts - SRECon 2016 Less Alarming Alerts - SRECon 2016
Less Alarming Alerts - SRECon 2016
 
What Ops Can Learn From Design
What Ops Can Learn From DesignWhat Ops Can Learn From Design
What Ops Can Learn From Design
 
Postgres 9.4 First Look
Postgres 9.4 First LookPostgres 9.4 First Look
Postgres 9.4 First Look
 
Less Alarming Alerts!
Less Alarming Alerts!Less Alarming Alerts!
Less Alarming Alerts!
 
Past, Present, and Pachyderm - All Things Open - 2013
Past, Present, and Pachyderm - All Things Open - 2013Past, Present, and Pachyderm - All Things Open - 2013
Past, Present, and Pachyderm - All Things Open - 2013
 
Managing Databases In A DevOps Environment
Managing Databases In A DevOps EnvironmentManaging Databases In A DevOps Environment
Managing Databases In A DevOps Environment
 
The Essential PostgreSQL.conf
The Essential PostgreSQL.confThe Essential PostgreSQL.conf
The Essential PostgreSQL.conf
 
Pro Postgres 9
Pro Postgres 9Pro Postgres 9
Pro Postgres 9
 
Advanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITRAdvanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITR
 
Scaling with Postgres (Highload++ 2010)
Scaling with Postgres (Highload++ 2010)Scaling with Postgres (Highload++ 2010)
Scaling with Postgres (Highload++ 2010)
 
Intro to Postgres 9 Tutorial
Intro to Postgres 9 TutorialIntro to Postgres 9 Tutorial
Intro to Postgres 9 Tutorial
 
Database Scalability Patterns
Database Scalability PatternsDatabase Scalability Patterns
Database Scalability Patterns
 
A Guide To PostgreSQL 9.0
A Guide To PostgreSQL 9.0A Guide To PostgreSQL 9.0
A Guide To PostgreSQL 9.0
 
Scaling With Postgres
Scaling With PostgresScaling With Postgres
Scaling With Postgres
 
Intro to Postgres 8.4 Tutorial
Intro to Postgres 8.4 TutorialIntro to Postgres 8.4 Tutorial
Intro to Postgres 8.4 Tutorial
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Big Bad "Upgraded" Postgres

  • 1. Big Bad “Upgraded” Postgres Robert Treat / Presentation Wednesday, May 23, 12
  • 2. Intro • Robert Treat • xzilla.net • @robtreat2 • +RobertTreat Wednesday, May 23, 12
  • 3. Intro, part 2 • COO @ OmniTI • Full Stack Tech Consulting • Remote Database Management and Consulting • Large Scale / Mission Critical • We’re Hiring! Wednesday, May 23, 12
  • 4. Philosophy OmniTI has a reputation for scalable web applications and architectures. We didn't learn this stuff overnight. Like many success stories, we acquired experience through trial and error, constant collaboration between development and operations teams, and an unwavering commitment to excellence. But we still lean on our friends and peers to see how things can be done better. Wednesday, May 23, 12
  • 6. So, What Is Big Bad Postgres? • Original Project • Convert TB+ Sized Oracle ODS To Postgres • 2005, Postgres 8.1 Wednesday, May 23, 12
  • 7. So, What Is Big Bad Postgres? OLTP Instance: Drives the Site Warm Standby Oracle Oracle mysql Data Warehouse logging, bulk processing Oracle mysql Wednesday, May 23, 12
  • 8. So, What Is Big Bad Postgres? OLTP Instance: Drives the Site Warm Standby Oracle Oracle mysql Data Warehouse logging, bulk processing XXX Postgres mysql Wednesday, May 23, 12
  • 9. So, What Is Big Bad Postgres? ProTip • Remove Oracle Licensing Costs • Remove Oracle Add-On Costs Wednesday, May 23, 12
  • 10. So, What Is Big Bad Postgres? ProTip BroTip • Remove Oracle • Missing features Licensing Costs • Remove Oracle Add-On • Upgrades are a problem Costs Wednesday, May 23, 12
  • 11. So, What Is Big Bad Postgres? • Missing Features (Postgres 8.1) • Heterogeneous Replication (dbi-link) • Autonomous Transactions (dblink) • Backups (zfs) • Aggregate SQL (plpgsql) • Large Selects (cursors?) • Upgrades (?) Wednesday, May 23, 12
  • 12. So, What Is Big Bad Postgres? • Save $500K in Licenses • $100K in labor • Took Roughly 6 Months • Built lots of useful tools • Learned a lot about Postgres • Everybody’s happy http://lethargy.org/~jesus/writes/big-bad-postgresql Wednesday, May 23, 12
  • 13. Big Bad “Broken” Postgres • Feb 2008, disaster struck • disk failures • memory issues • software bugs • no failover box! • disaster recovery ensued Wednesday, May 23, 12
  • 14. Big Bad “Broken” Postgres Using ZFS snapshots, we were able to modify Postgres code, deploy / test on production copy of data, and eventually get a running system. Wednesday, May 23, 12
  • 15. Big Bad “Broken” Postgres • Fallout: • New Machines (2 of them!) • Upgrade to Postgres 8.3 • given corrupt data, dump/restore was forced • made life easier operationally • Happiness restored! Wednesday, May 23, 12
  • 16. So, What Is Big Bad Postgres? OLTP Instance: Drives the Site Warm Standby Oracle Oracle “identical” Data Warehouse Postgres Postgres 8.3 8.3 Wednesday, May 23, 12
  • 17. Big Bad “Upgraded” Postgres   robert@omniti.com at 12:24 on 2009-02-17 8.4 is approaching beta, and has some good features we could take advantage of on pgods, if we could get it to upgrade. Wednesday, May 23, 12
  • 18. Big Bad “Upgraded” Postgres • Issues to handle • 3TB, compressed • Limited by hardware • Limited by spare time Wednesday, May 23, 12
  • 19. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 20. Big Bad “Upgraded” Postgres • Options? Wednesday, May 23, 12
  • 21. Big Bad “Upgraded” Postgres • Options? • SLONY - Well known, but mostly unusable solution (issues with partitioned tables are primary culprit, but lots of other dynamic ddl as well) Wednesday, May 23, 12
  • 22. Big Bad “Upgraded” Postgres • Options? • SLONY - Well known, but mostly unusable solution (issues with partitioned tables are primary culprit, but lots of other dynamic ddl as well) • Mammoth Replicator - can replicate between versions, but slave creation is a problem. Wednesday, May 23, 12
  • 23. Big Bad “Upgraded” Postgres • Options? • SLONY - Well known, but mostly unusable solution (issues with partitioned tables are primary culprit, but lots of other dynamic ddl as well) • Mammoth Replicator - can replicate between versions, but slave creation is a problem. • “8.4 dev tree has a pg_migrator script in it, for in place upgrades, this should be investigated in our environment.” Wednesday, May 23, 12
  • 24. Big Bad “Upgraded” Postgres • 8.4 dev pg_migrator • Ran into bugs with Solaris support during dev • "file.c", line 592: warning: argument #4 is incompatible with prototype: • Spent ~ 1 month on this before it got put on the back burner Wednesday, May 23, 12
  • 25. Big Bad “Upgraded” Postgres • 8.4.0 pg_migrator (Aug 2009) • packages all built without major issues • 8.4.0 had a bug with indexes & plperl • so, we delayed • actually, we postponed Wednesday, May 23, 12
  • 26. Big Bad “Upgraded” Postgres • 9.0 upgrade? • Jan 2011, 9.0 is in development • Given months of testing, we focused on 9.0 Wednesday, May 23, 12
  • 27. Big Bad “Upgraded” Postgres • Some tools we use: • dblink • plperl / dbilink • fuzzystring • freespacemap • pg_reorg • secure check postgres Wednesday, May 23, 12
  • 28. Big Bad “Upgraded” Postgres • SNAG #1 • can’t symlink across filesystems • (aka zfs datasets) Wednesday, May 23, 12
  • 29. Big Bad “Upgraded” Postgres • SNAG #1 • can’t symlink across filesystems • (aka zfs datasets) Given $PGDATA = /pgsql/main NO: YES: /pgsql/main /pgsql/main/83 /pgsql/main9 /pgsql/main/90 Wednesday, May 23, 12
  • 30. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 31. Big Bad “Upgraded” Postgres • SNAG #2 • un-upgradable data types Wednesday, May 23, 12
  • 32. Big Bad “Upgraded” Postgres • SNAG #2 • un-upgradable data types • left over pg_reorg cruft • (special table data types) Wednesday, May 23, 12
  • 33. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 34. Big Bad “Upgraded” Postgres • SNAG #2, part b • un-upgradable data types Wednesday, May 23, 12
  • 35. Big Bad “Upgraded” Postgres • SNAG #2, part b • un-upgradable data types • “NAME” type • check_postgres.pg_stat_activity.datname Wednesday, May 23, 12
  • 36. Big Bad “Upgraded” Postgres • SNAG #2, part c • un-upgradable data types • “NAME” type Wednesday, May 23, 12
  • 37. Big Bad “Upgraded” Postgres • SNAG #2, part c • un-upgradable data types • “NAME” type create table x as select tablename, pg_relation_size(schemaname||’.’||tablename) from pg_tables Wednesday, May 23, 12
  • 38. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 39. Big Bad “Upgraded” Postgres -- plan of attack -- swap in minimal cron / configs shut down 8.3 database zfs snapshot fs bring up 8.3 / 9.0 databases run pg_upgrade verify migration complete turn on snap replication sanity checking slow role more services back on-line back in buisness Wednesday, May 23, 12
  • 40. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 41. Big Bad “Upgraded” Postgres -- actual attack -- swap in minimal cron / configs shut down 8.3 database zfs snapshot fs bring up 8.3 / 9.0 databases run pg_upgrade **hit a bug in role creation** discuss rollback (rename the control file of the old cluster) bring back up 8.3 database Wednesday, May 23, 12
  • 42. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 43. Big Bad “Upgraded” Postgres CREATE ROLE asha; ALTER ROLE asha SET role TO 'omniti'; .. sometime later ... CREATE ROLE omniti; Wednesday, May 23, 12
  • 44. Big Bad “Upgraded” Postgres Added to TODO: ! Allow pg_dumpall to output restorable ALTER USER/DATABASE SET settings March 2011 Wednesday, May 23, 12
  • 45. Big Bad “Upgraded” Postgres We could have done something where we dropped or modified all roles and recreated them after upgrade, but this didn’t seem like the right fix. Wednesday, May 23, 12
  • 46. Big Bad “Upgraded” Postgres We could have done something where we dropped or modified all roles and recreated them after upgrade, but this didn’t seem like the right fix. eventually, we started working on a patch Wednesday, May 23, 12
  • 47. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 48. Big Bad “Upgraded” Postgres Meanwhile, at another client, not far away... Wednesday, May 23, 12
  • 49. Big Bad “Upgraded” Postgres • more pg_upgrade bugs • Fix pg_upgrade's handling of TOAST tables (april) • Fix pg_upgrade to preserve toast tables' relfrozenxids during an upgrade from 8.3 (sept) Wednesday, May 23, 12
  • 50. Big Bad “Upgraded” Postgres • more pg_upgrade bugs • Fix pg_upgrade's handling of TOAST tables (april) • Fix pg_upgrade to preserve toast tables' relfrozenxids during an upgrade from 8.3 (sept) Not directly relevant to “Big Bad”, but did take time / energy away, and was not confidence inspiring Wednesday, May 23, 12
  • 51. Big Bad “Upgraded” Postgres • back to the role bug • after ~ 3 months, the patch was accepted (oct) • 9.1 is now just around the corner, so... Wednesday, May 23, 12
  • 52. Big Bad “Upgraded” Postgres • back to the role bug • after ~ 3 months, the patch was accepted (oct) • 9.1 is now just around the corner, so... we decided to go to 9.1 Wednesday, May 23, 12
  • 53. Big Bad “Upgraded” Postgres • 9.1 released in September, 2011 • 9.1 upgrade testing begins • schema tests • compatibility tests Wednesday, May 23, 12
  • 54. Big Bad “Upgraded” Postgres • 9.1 released in September, 2011 • 9.1 upgrade testing begins • schema tests • compatibility tests 2011-10-25 Judgement Day Wednesday, May 23, 12
  • 55. Big Bad “Upgraded” Postgres • doh #1 • non-empty tablespace • left over from some 9.1 testing (pg_dump / pg_restore of schema only) Wednesday, May 23, 12
  • 56. Big Bad “Upgraded” Postgres • doh #1 • non-empty tablespace • left over from some 9.1 testing (pg_dump / pg_restore of schema only) Creating databases in the new cluster psql:/pgdata/main/pg_upgrade_dump_globals.sql:247: NOTICE: schema "ods" does not exist psql:/pgdata/main/pg_upgrade_dump_globals.sql:255: NOTICE: schema "check_postgres" does not exist psql:/pgdata/main/pg_upgrade_dump_globals.sql:303: ERROR: directory "/ pgdata/alldata1/PG_9.1_201105231" already in use as a tablespace There were problems executing "/opt/pgsql911/bin/psql" --set ON_ERROR_STOP=on --no-psqlrc --port 5491 --username "postgres" -f "/ pgdata/main/pg_upgrade_dump_globals.sql" --dbname template1 >> "/dev/null" Failure, exiting Wednesday, May 23, 12
  • 57. Big Bad “Upgraded” Postgres • doh #2 • new bad data types had snuck in (more name, some unknown) • drop tables, alter data types Wednesday, May 23, 12
  • 58. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 59. Big Bad “Upgraded” Postgres • doh #3 • actually it worked* Wednesday, May 23, 12
  • 60. Big Bad “Upgraded” Postgres • doh #3 • actually it worked* • begin brining services back online • analyze, vacuum • turn on “snap job” replication • turn on regular replication Wednesday, May 23, 12
  • 61. Big Bad “Upgraded” Postgres • doh #3 • actually it worked* • begin brining services back online • analyze, vacuum • turn on “snap job” replication • turn on regular replication • aka “the point of no return” * don’t worry, it didn’t really work, it just looked like it did Wednesday, May 23, 12
  • 62. Big Bad “Upgraded” Postgres • Replication was running • 100’s of tables • ETL services not yet on • dinner time :-) Wednesday, May 23, 12
  • 63. Big Bad “Upgraded” Postgres Things Were Sailing Along Fine Wednesday, May 23, 12
  • 64. Big Bad “Upgraded” Postgres “if this were the movie titanic, this is the scene where you see the two guys up in the watch tower joking around before the iceberg hits.” Wednesday, May 23, 12
  • 65. Big Bad “Upgraded” Postgres NOTICE: ERROR: value too long for type character varying(40) at line 96. Wednesday, May 23, 12
  • 66. Big Bad “Upgraded” Postgres NOTICE: ERROR: value too long for type character varying(40) at line 96. normally we see this on (oracle) number -> (pgsql) integer Wednesday, May 23, 12
  • 67. Big Bad “Upgraded” Postgres • Verified function with debug output • function creates a temp table matching primary table • copies replicated data into temp table • replaces data in actual table with data from temp table • This was all working Wednesday, May 23, 12
  • 68. Big Bad “Upgraded” Postgres • function with same data working on the other server • verified sql ran fine using dblink (so, plperl specific) Wednesday, May 23, 12
  • 69. Big Bad “Upgraded” Postgres Comparing Bad Data - The “Good” -[ RECORD 1 ]----+--------------------------------- userid | 78184652 username | 78184652 title | firstname | ???????????????? lastname | ?????????????????? middleinitial | email | user@example.com address | ???????????????? 34?? ????6 address2 | city | ?????????????? state | ?????????????????? ?????? zipcode | 211573 country | by active | 1 subscribed | 1 cookieusername | ed9cd5fed817628ca5b052ebfe11925d partner | 1059105 actual_phone | dob | phone | regdate | 2011-10-26 15:32:07 ipaddress | 21.12.21.12 last_open_ts | last_click_ts | last_play_ts | last_delivery_ts | Wednesday, May 23, 12
  • 70. Big Bad “Upgraded” Postgres Comparing Bad Data - The “Bad” -[ RECORD 1 ]----+-------------------------------------------------------------------------- userid | 78184652 username | 78184652 title | firstname | ���������������� lastname | ������������������ middleinitial | email | user@example.com address | ���������������� 34�� ����6 address2 | city | �������������� state | ������������������ ������ zipcode | 211573 country | by active | 1 subscribed | 1 cookieusername | ed9cd5fed817628ca5b052ebfe11925d partner | 1059105 actual_phone | dob | phone | regdate | 2011-10-26 15:32:07 ipaddress | 21.12.21.12 last_open_ts | last_click_ts | last_play_ts | last_delivery_ts | Wednesday, May 23, 12
  • 71. Big Bad “Upgraded” Postgres I hate encoding issues Wednesday, May 23, 12
  • 72. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 73. Big Bad “Upgraded” Postgres • Ruled out several config options (lc_*, locale on the machine) • plperl running on different version of libpq? Wednesday, May 23, 12
  • 74. Big Bad “Upgraded” Postgres • Ruled out several config options (lc_*, locale on the machine) • plperl running on different version of libpq? • install libpq5 built on 9.1 • install dbdpg built on new libpq Wednesday, May 23, 12
  • 75. Big Bad “Upgraded” Postgres • Ruled out several config options (lc_*, locale on the machine) • plperl running on different version of libpq? • install libpq5 built on 9.1 • install dbdpg built on new libpq ...no Wednesday, May 23, 12
  • 76. Big Bad “Upgraded” Postgres • wrote perl script that used dbd:pg to grab data from oracle • worked! • so, plperl eh? Wednesday, May 23, 12
  • 77. Big Bad “Upgraded” Postgres Wednesday, May 23, 12
  • 78. Big Bad “Upgraded” Postgres postgres=# show client_encoding; client_encoding ----------------- UTF8 (1 row) Wednesday, May 23, 12
  • 79. Big Bad “Upgraded” Postgres postgres=# show client_encoding; client_encoding ----------------- UTF8 (1 row) postgres=# select * from dbi_link.remote_select(3,'select chr(255) as bar') t (bar text); bar ----- ÿ (1 row) Wednesday, May 23, 12
  • 80. Big Bad “Upgraded” Postgres postgres=# show client_encoding; client_encoding ----------------- UTF8 (1 row) postgres=# select * from dbi_link.remote_select(3,'select chr(255) as bar') t (bar text); bar ----- ÿ (1 row) postgres=# set client_encoding = latin1; SET postgres=# select * from dbi_link.remote_select(3,'select chr(255) as bar') t (bar text); bar ----- ÿ (1 row) Wednesday, May 23, 12
  • 81. Big Bad “Upgraded” Postgres • check lang var • pg user • root • smf init script • everything checked out :- Wednesday, May 23, 12
  • 82. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” -bash-3.00$ sudo pargs -e 17302 Password: 17302: /opt/pgsql8314/bin/postgres -D /pgdata/main envp[0]: LC_TIME=C envp[1]: LC_NUMERIC=C envp[2]: LC_MONETARY=C envp[3]: LC_MESSAGES=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[5]: LC_COLLATE=en_US.UTF-8 envp[6]: LD_LIBRARY_PATH=/opt/oracle/amd64 envp[7]: ORACLE_HOME=/opt/oracle envp[8]: PATH=/usr/sbin:/usr/bin envp[9]: PERL5LIB=/data/CPAN/lib/site_perl envp[10]: PGDATA=/pgdata/main envp[11]: PGPREFIX=/opt/pgsql envp[12]: PGSYSCONFDIR=/opt/pgsql8314/etc envp[13]: PGUSER=postgres envp[14]: SMF_FMRI=svc:/database/postgres:default envp[15]: SMF_METHOD=/opt/pgsql/bin/pg_ctl -D $PGDATA start -w envp[16]: SMF_RESTARTER=svc:/system/svc/restarter:default envp[17]: TNS_ADMIN=/opt/oracle/network/admin envp[18]: TZ=US/Eastern Wednesday, May 23, 12
  • 83. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” -bash-3.00$ sudo pargs -e `pgrep postgres` | grep LC_CTYPE Password: envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[6]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 Wednesday, May 23, 12
  • 84. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” -bash-3.00$ sudo pargs -e `pgrep postgres` | grep LC_CTYPE Password: envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[6]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 note: this is actually from the “broken” one Wednesday, May 23, 12
  • 85. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” After more experimentation, including restart of database with adjusted LANG and en vars, I was able to fix the odd LC settings, but not the remote data select with plperl... #DOH Wednesday, May 23, 12
  • 86. Big Bad “Upgraded” Postgres have you figured it out yet? Wednesday, May 23, 12
  • 87. Big Bad “Upgraded” Postgres pg_enable_utf8 Wednesday, May 23, 12
  • 88. Big Bad “Upgraded” Postgres $dbh->{pg_enable_utf8} = 1; Force strings passed to and from plperl to be in UTF8 encoding. String are converted to UTF8 on the way into perl and to the database encoding on the way back. This avoids a number of observed anomalies, and ensures Perl a consistent view of the world. https://github.com/postgres/postgres/commit/50d89d422f9c68a52a6964e5468e8eb4f90b1d95 Wednesday, May 23, 12
  • 89. Big Bad “Upgraded” Postgres $dbh->{pg_enable_utf8} = 1; Actually a change in 9.0 http://www.postgresql.org/docs/9.1/static/release-9-0.html * Verify that PL/Perl return values are valid in the server encoding (Andrew Dunstan) Note: Perl may otherwise make assumptions that your data is Latin1 Wednesday, May 23, 12
  • 90. Big Bad “Upgraded” Postgres My 2:00AM Summary... “In any case, this seems like a horrible backwards compatibility nightmare thats likely to eat peoples data; I only noticed it because I was pulling data from a varchar(20) into a varchar(20) and it complained the data size was too long. Had I been using text (which is what I normally do), I think I would have screwed myself.” Wednesday, May 23, 12
  • 91. Big Bad “Upgraded” Postgres One last broken job... SELECT 42, cntry_abbr as country, date_trunc('day', h.hitdate) as rollup_day, pg.price_group_id, count(1) as hits FROM tblhits h join tbladvertiser_campaign sc on h.partner = sc.source_code join tblcountry flc on promo.perl_geo_ip_country(h.ipaddress) = flc.cntry_abbr left join tbladvertiser_price_groups pg on pg.campaign_id = 42 and h.hitdate between pg.start_date and coalesce(pg.end_date, h.hitdate) and get_bit(decode(pg.countries::text, 'hex'), flc.country_id::integer) > 0 WHERE h.hitdate >= '2011-10-26'::date and sc.campaign_id = 42 and h.hitdate < '2011-10-31'::date + '1 day'::interval GROUP BY flc.country_abbreviation, date_trunc('day', h.hitdate), pg.price_group_id; Wednesday, May 23, 12
  • 92. Big Bad “Upgraded” Postgres One last broken job... Default output of bytea columns was changed in Postgres 9.1 Wednesday, May 23, 12
  • 93. Big Bad “Upgraded” Postgres One last broken job... SELECT 42, cntry_abbr as country, date_trunc('day', h.hitdate) as rollup_day, pg.price_group_id, count(1) as hits FROM tblhits h join tbladvertiser_campaign sc on h.partner = sc.source_code join tblcountry flc on promo.perl_geo_ip_country(h.ipaddress) = flc.cntry_abbr left join tbladvertiser_price_groups pg on pg.campaign_id = 42 and h.hitdate between pg.start_date and coalesce(pg.end_date, h.hitdate) and get_bit(pg.countries, flc.country_id::integer) > 0 WHERE h.hitdate >= '2011-10-26'::date and sc.campaign_id = 42 and h.hitdate < '2011-10-31'::date + '1 day'::interval GROUP BY flc.country_abbreviation, date_trunc('day', h.hitdate), pg.price_group_id; Wednesday, May 23, 12
  • 94. Aftermath • did utf8 issue cause data corruption? • extensive post-upgrade testing • “data diff” between servers • fixed dozen of problems Wednesday, May 23, 12
  • 95. Aftermath • did utf8 issue cause data corruption? • extensive post-upgrade testing • “data diff” between servers • fixed dozen of problems NONE attributable to the upgrade or plperl issues Wednesday, May 23, 12
  • 96. Aftermath pg_upgrade works pretty well Wednesday, May 23, 12
  • 97. Alls Well That Ends Well We upgraded the other box Wednesday, May 23, 12
  • 98. Alls Well That Ends Well Took ~ 45 minutes Wednesday, May 23, 12
  • 99. Alls Well That Ends Well Nothing broke Wednesday, May 23, 12
  • 100. Alls Well That Ends Well Yet? ;-) Wednesday, May 23, 12
  • 101. THE END Thanks! PGCon omniti dba team postgres hackers Slides http://www.xzilla.net/ @robtreat2 Wednesday, May 23, 12