PostgreSQL Prologue

PostgreSQL Prologue
Stay If You
Want To:
Image Source: Himmelfarb et al 2002: 1526 (artist: G. Renee
Guzlas). All rights reserved ©. Available via license: CC BY-NC 3.0
- have an intro to postgres
- know basic components
of postgres
- have some idea on
postgres workology
- go through logical and
physical layout of
postgres
What is PostgreSQL in the first place?
- “PostgreSQL is an object-relational database management system (ORDBMS) based on
POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer
Science Department.” -- PostgreSQL Documentation, By postgresql.org.
- “PostgreSQL (pronounced Post-Gres-Q-L), or postgres for short, is an open source
object-relational-database management system.” -- Learning PostgreSQL 11 (Third
Edition), A beginner's guide to building high-performance PostgreSQL database solutions,
By Salahaldin Juba, Andrey Volkov.
If You Are Wondering...
- we heard about relational database, how does this differ from that?
- the definitions claimed postgres to be an “Object Relational Database Management
System”, what does this imply?
- or, we know Object Oriented Principles, does PostgreSQL adapt OOP paradigms like an
Object Oriented Language does?
Detour - Database
- in simplest words:
- organized collection of valid data where new records can be added or an existing
record can be accessed, modified or removed
Detour - DataBase Management System
- can be seen as gatekeeper of database, basically an interface that:
- offers and controls access to database to read, update or remove data from database
- ensures integrity by imposing given constraints
- ensures concurrency and transactions
- enables remote access to database
- ensures data recovery in case of any kind of failure
Detour - Relational DBMS
- group of related data can be stored in a tabular form considering:
- each property as a column in that table, attribute is more common term
- every single instance having those properties is a row or tuple
- relation between the properties of that set is also known as schema
- relating two different schema using some common attribute is possible, eg: foreign key
- use when:
- you know your data model right, structured data
- data pattern is fixed
- all of your entities has fixed attribute and it’s not gonna change
- you need immediate acid compliant transaction
Detour - Object Relational DBMS
- object, classes, inheritance etc paradigms of OOP are supported in schema, relation, even in
queries
- supports custom data types and nested data types like oop
- even functions or operators can be overloaded to facilitate polymorphism
Meet “SLONIK”
Source: Daniel Lundin - https://wiki.postgresql.org/images/a/a4/PostgreSQL_logo.3colors.svg
PostgreSQL - Evolution
- evolved from Ingres project of University of California, Berkeley led by Michael
Stonebraker
- that’s why sometimes termed as Post-Ingres
image src
Why Use PostgreSQL?
- can support both relational and non-relational data types
- extensive data read/write speed
- multi-versioning concurrency control
- parallel query execution using multiple cores
- non-blocking indexing
- partial indexing available(skipping deleted tuples)
Commercial Break
Who Trust PostgreSQL?
src: StackShare
PostgreSQL Components
Postgres Built-in Applications
- ships with a number of client and server applications
- uses server/client model
- client and server can reside in different hosts and communicate via TCP/IP or linux socket
- can handle multiple concurrent connection from a client
- each connection to a client forks a new process
Postgres Client Applications
- frontend application that requests some database action
- psql:
- offers interactive terminal to write queries and get response from postgres
- queries can be added from file or as command line arguments(cla) as well
- pg_config
- can tell different configured parameter for the installed version
- pgbench
- runs benchmark by executing a number of dummy transactions from a number of
dummy clients
Postgres Client Applications(continued...)
- clusterdb
- re-clusters previously clustered tables in the specified databases
- createdb
- creates a new database,
- nothing but a wrapper of CREATE DATABASE command
- dropdb
- removes the specified database
- nothing but a wrapper of DROP DATABASE command
Postgres Client Applications(continued...)
- createuser
- creates a new user
- just a wrapper of CREATE ROLE command
- dropuser
- removes a new user
- just a wrapper of DROP ROLE command
- vacuumdb(garbage collector and optionally analyzer)
- cleans dead tuples from all(or specified) tables of a database user has permission to
vacuum or generates statistics about the database
- a wrapper of VACUUM command
- full list here
Postgres Server Applications(continued...)
- backend application
- postgres
- accepts connection from client applications
- resolves client requests
- manages database files
- initdb
- creates a new pg cluster
Postgres Server Applications(continued...)
- pg_ctl
- initializing, starting, stopping, controlling and etc
- pg_upgrade
- upgrading a postgres server instance
- pg_waldump
- generates human readable wal logs
- full list here
PostgreSQL Internals
PostgreSQL Forked Process
PostgreSQL Forked Process(continued...)
- follows process per user method
- one client process gets connected to exactly one server process
- the master(postmaster) process spawns a new backend server process each time a new connection
is requested
PostgreSQL Prologue
PostgreSQL Forked Process(continued...)
- master process forks other background process at start-up
- walwriter
- manages Write Ahead Log
- any change to data files(table or index) are logged first into wal buffer
- ensures data integrity
- in case of system crash, roll-forward(or REDO) is done using the log records
- checkpointer
- keeps a checkpoint in the wal sequence
- flushes data files to disk from the last checkpoint reflecting the log
PostgreSQL Forked Process(continued...)
- background writer
- writes specific dirty(new or modified) buffers
- may increase I/O load significantly as a dirty page may be written only once per checkpoint
wherase bg writer may write this several times
- vacuum writer
- postgres uses pseudo-deletion method
- if deleted or updated, a tuple is not removed from physical storage of that table
- thes obsolete tuples are marked as deleted
PostgreSQL Forked Process(continued...)
- vacuum writer reclaims spaces consumed by dead tuples
- also updates the visibility map(_vm)
- if run with ANALYZE, updates pg_statistic catalog which query planner uses to plan for
most effective execution plan
- stats collector
- collects and reports server activity
- counts access to table and index, number of rows of a table, vacuum and analyze stats etc
PostgreSQL Forked Process(continued...)
- logical replication launcher
- doesn’t replicate byte by byte like physical(stream) replication
- replicates one database at a time and only committed row changes, not vacuum ones
- works in publisher-subscriber model
- unlike stream replication multi-master is possible
- DDL is not handled and so manual table creation is required at subscriber end
- column name must match, not order or number of column
- can’t stream transactions as they happen and so can add overhead if transaction is big
- server processes communicate with each other via semaphore and shared memory to ensure
data integrity
PostgreSQL Memory Model
Memory Layout
Memory Layout(continued...)
- shared memory
- accessible from all backend processes and user processes connected to database
- shared buffer, WAL buffer, CLog buffer etc
- local memory
- allocated and used by a specific process or subsystem
- vacuum buffer, temp buffer, work memory etc.
Memory Layout(continued...)
Shared Buffer
- where data is read or written
- data or blocks residing here is called dirty data or dirty blocks and they are called data files when
permanently written to disk
- shared memory
- can’t be resized unless running postgres server instance is restarted
- config parameter:
- shared_buffers: 128MB by default
Memory Layout(continued...)
WAL Buffer
- separate buffer to keep transaction logs
- wal data is first written to wal buffer before being written to wal disk
- shared memory
- usually 1/16th of shared buffer in size
- config parameter:
- wal_buffers: 4MB by default
Memory Layout(continued...)
CLog Buffer
- contains transaction metadata
- keeps status of transactions
- can tell if a transaction is committed or not
- shared memory
Lock Space
- all locks are stored here
- shared memory
Memory Layout(continued...)
Vacuum Buffer
- local memory: used by auto vacuum worker
- total size is autovacuum_work_mem times autovacuum_max_workers
- config parameter:
- autovacuum_max_workers: 3 by default
- autovacuum_work_mem: minimum 1MB or if set to -1 uses maintenance_work_mem
which is 64MB by default
Memory Layout(continued...)
Work Memory
- local memory: used by the executor or query workspaces
- memory to be used when sort(query example: ORDER BY, DISTINCT MERGE JOIN) or
hash(query example: HASH-JOIN, IN) operation is executed
- config parameter:
- work_mem: 4MB by default
Memory Layout(continued...)
Maintenance Memory
- local memory
- memory allocated for maintenance operations like: CREATE INDEX, VACUUM, REINDEX, or
while adding FOREIGN KEY
- config parameter:
- maintenance_work_mem: 64MB by default
Memory Layout(continued...)
Temp Buffer
- local memory: used by the executor
- space where temporary typed tables will be stored
- config parameter:
- temp_buffers: 8MB for each session by default
Life Of A Query In Postgres
Backend
Flowchart
Query Execution Phases
- client gets a connection to transmit a query to the server and to receive the results
- parser stage checks the query for correct syntax and creates a query tree
- traffic cop subsystem determines the query type
- utility query is passed to the utilities subsystem
- rewrite takes the query tree and looks for any rules to apply to the query tree
- planner/optimizer takes the (rewritten) query tree and creates a query plan
- first creates all possible paths leading to the same result
- next the cost for the execution of each path is estimated
- finally the cheapest path is chosen
- executor recursively steps through the plan tree and retrieves rows in the way represented
by the plan
Where All Those Data Goes?
Logical
Layout
Logical Layout(continued...)
database cluster
- collection of databases within the running postgres instance
- mainly resides in data area(eg: $PGDATA - /usr/local/pgsql/data)
- multiple clusters managed by different postgres instances can exist on the same machine
- don’t mix it up with physical database server or node cluster
Logical Layout(continued...)
database object:
- a data structure used to store and refer data
- tablespace, tables(heap), functions, views, indexes, etc and even database itself
- identified by object identifier or OID, unsigned 4 byte long integer
- respective oids are stored in system catalog(pg_catalog) schema
- for instance: when a new database or a new table is created it’s all meta data are stored into
the pg_catalog.pg_database table and pg_catalog.pg_class table respectively and so on
Physical
Layout
Physical Layout(continued...)
Files Directories
- pg_hba.conf
- pg_ident.conf
- PG_VERSION
- postgresql.auto.conf
- postgresql.conf
- postmaster.pid
- postmaster.opts
- base
- global
- pg_commit_ts
- pg_dynshmem
- pg_logical
- pg_stat
- pg_tblspc
- pg_wal
Physical Layout(continued...)
pg_hba.conf
- stands for host based authentication
- created when initdb is called
- can stay elsewhere as well, default location data area
- configuration file to control client authentication
Physical Layout(continued...)
pg_ident.conf:
- configuration file to control postgres user name mapping
- used along with pg_hba.conf file
- maps system user name(achieved from some external authentication system like iden or
GSSAPI) of the client trying to connect to postgres server to postgres user
- can stay elsewhere as well, default location data area
PG_VERSION
- containing the major version number of PostgreSQL
Physical Layout(continued...)
postgresql.auto.conf
- system configurations changed using `ALTER SYSTEM SET
<confParameter>=<confValue>;` sql command are overwritten here
- gets cleared after resetting the parameter
postgresql.conf
- server configuration file
- can stay elsewhere as well
postmaster.opts:
- file containing command-line options used at server start time
Physical Layout(continued...)
postmaster.pid:
- keeps track of followings each in a separate line
- currently running postgres server instance pid
- path of data area
- server start timestamp in epoch time
- server port number
- unix socket path
- ip or hostname of listen_address
- shared memory segment id
- server status
- file is absent if no server instance is running
Physical Layout - DB Cluster
Physical Layout - DB Cluster(continued...)
- base directory contains the databases as subdirectories named after the corresponding
database oid which are created on pg_default tablespace
- tables or databases created on different tablespace like the one(test_db_2 → 16412 is
created with default tablespace to be test_table_spcace → 16410) here
Physical Layout - Table Files
- when a table is created, a file having the filenode of the table as the filename is created
- max table size is 32TB
- divided into 1GB sized segments(if page size is 8KB)
- each segment file from the second one will be named as <filenode>.1, <filenode>.2
and so on
- usually filenode is same of oid unless TRUNCATE, REINDEX, CLUSTER or ALTER
TABLE or AUTOVACUUM is applied to that table
Physical Layout - Table Files(continued…)
Figure: table page layout(src)
Physical Layout - Table Files(continued…)
- each table segment contains several pages(8K sized)
- each page starts with some page header(24 bytes) followed by item pointers(4 bytes
each) and ends with the actual tuples(or items) and special space, the space in between
item pointer and actual item is called free space
- when a tuple can’t fit into a single page, it is stored in a separate file named
TOAST(The Oversized-Attribute Storage Technique) file created as
<filenode>_toast format
Physical Layout - Table Files(continued...)
- a table may contain an _fsm(Free Space Map) file and a _vm(Visibility Map) file
- when updating a tuple, postgres doesn’t overwrite it, creates a new one instead
marking the old one as deleted
- when deleting a tuple, postgres uses a policy of pseudo-deletion, it just marks the
existing tuple to be deleted and updates the _fsm file
- also vacuum worker finds out those unused spaces and recognises them as free space
and creates(if there’s none) or updates the _fsm and _vm file
Physical Layout - Table Files(continued...)
- _fsm file keeps track of the free spaces that can be reused by some other tuple
- _vm file keeps track of which pages in the segment has these tuple gaps by storing 2 bits per
page
- the first bit is only set when the corresponding page has no gaps making it easy for the next
scan
- _vm bits can only set by vacuum although other data-modifying operation can reset them
- index files don’t have any _vm file
Physical Layout - Table Files(Example)
Physical Layout - Table Files(Example - head)
Physical Layout - Table Files(Example - tail)
Physical Layout - Tablespace
- symlink to some other storage where table files will be stored
- pg_global tablespace is used for shared system catalogs
- pg_default tablespace is the default tablespace of the template1 and template0 databases
- different tables of the same database can be kept in different tablespace
- use case:
- if you are running out of disk space, you can create a tablespace to a different disk to
move data to that location
- tablespace for highly accessed data can be set to fast disks like SSD and less accessed
ones can be stored in slower disks like SATA
- temporary tables can be stored in separate table space
Summary Attempt of:
- https://www.postgresql.org/docs/12/index.html
- https://developer.okta.com/blog/2019/07/19/mysql-vs-postgres
- https://en.wikipedia.org/wiki/PostgreSQL
- Learning PostgreSQL 11 (Third Edition), A beginner's guide to building high-performance
PostgreSQL database solutions, By Salahaldin Juba, Andrey Volkov
- https://www.izenda.com/relational-vs-non-relational-databases/
- https://medium.com/@zhenwu93/relational-vs-non-relational-databases-8336870da8bc
- https://www.ibm.com/cloud/blog/new-builders/brief-overview-database-landscape
Summary Attempt of:
- https://www.linuxjournal.com/content/postgresql-nosql-database
- https://stackoverflow.com/questions/33621906/difference-between-stream-replication-and-l
ogical-replication
- https://www.postgresql.fastware.com/blog/back-to-basics-with-postgresql-memory-compon
ents
- https://severalnines.com/database-blog/architecture-and-tuning-memory-postgresql-databas
es
- http://www.interdb.jp/pg/pgsql02.html
Summary Attempt of:
- http://rachbelaid.com/introduction-to-postgres-physical-storage/
- https://www.postgresql.org/docs/current/storage-page-layout.html
- https://www.postgresql.org/docs/current/storage-file-layout.html
- http://www.interdb.jp/pg/pgsql01.html
- https://blog.dbi-services.com/using-operating-system-users-to-connect-to-postgresql/
- http://etutorials.org/SQL/Postgresql/Part+I+General+PostgreSQL+Use/Chapter+4.+Perform
ance/How+PostgreSQL+Organizes+Data/
- https://www.postgresql.org/docs/12/limits.html
Summary Attempt of:
- https://pgdash.io/blog/tablespaces-postgres.html
- Chapter-3, Learning PostgreSQL 11, Third Edition, by Salahaldin Juba, Andrey Volkov
- https://www.postgresql.org/docs/12/manage-ag-tablespaces.html
- https://www.postgresql.org/docs/12/query-path.html
- https://www.postgresql.org/developer/backend/
- https://stackshare.io/postgresql
- https://www.postgresql.org/docs/12/history.html
1 de 67

Recomendados

PostgreSQL as an Alternative to MSSQL por
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLAlexei Krasner
947 vistas32 diapositivas
The Essential postgresql.conf por
The Essential postgresql.confThe Essential postgresql.conf
The Essential postgresql.confRobert Treat
7.7K vistas22 diapositivas
Postgres Vienna DB Meetup 2014 por
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
1.2K vistas41 diapositivas
NonStop SQL/MX DBS Explained por
NonStop SQL/MX DBS ExplainedNonStop SQL/MX DBS Explained
NonStop SQL/MX DBS ExplainedFrans Jongma
559 vistas22 diapositivas
Oracle11g notes por
Oracle11g notesOracle11g notes
Oracle11g notesManish Mudhliyar
381 vistas20 diapositivas
Gg steps por
Gg stepsGg steps
Gg stepsHari Prasath
498 vistas19 diapositivas

Más contenido relacionado

Similar a PostgreSQL Prologue

Snowflake SnowPro Certification Exam Cheat Sheet por
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
79.6K vistas7 diapositivas
Postgresql Database Administration Basic - Day1 por
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1PoguttuezhiniVP
118 vistas27 diapositivas
Pegasus - automate, recover, and debug scientific computations por
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsRafael Ferreira da Silva
1.4K vistas34 diapositivas
Sql introduction por
Sql introductionSql introduction
Sql introductionvimal_guru
302 vistas35 diapositivas
Odoo command line interface por
Odoo command line interfaceOdoo command line interface
Odoo command line interfaceJalal Zahid
129 vistas7 diapositivas
Introduction to PostgreSQL for System Administrators por
Introduction to PostgreSQL for System AdministratorsIntroduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System AdministratorsJignesh Shah
4.4K vistas31 diapositivas

Similar a PostgreSQL Prologue(20)

Snowflake SnowPro Certification Exam Cheat Sheet por Jeno Yamma
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
Jeno Yamma79.6K vistas
Postgresql Database Administration Basic - Day1 por PoguttuezhiniVP
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
PoguttuezhiniVP118 vistas
Sql introduction por vimal_guru
Sql introductionSql introduction
Sql introduction
vimal_guru302 vistas
Odoo command line interface por Jalal Zahid
Odoo command line interfaceOdoo command line interface
Odoo command line interface
Jalal Zahid129 vistas
Introduction to PostgreSQL for System Administrators por Jignesh Shah
Introduction to PostgreSQL for System AdministratorsIntroduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System Administrators
Jignesh Shah4.4K vistas
Mastering PostgreSQL Administration por EDB
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB7.2K vistas
data stage-material por Rajesh Kv
data stage-materialdata stage-material
data stage-material
Rajesh Kv7.2K vistas
Google Bigtable paper presentation por vanjakom
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
vanjakom2.7K vistas
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr... por Ashnikbiz
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Ashnikbiz716 vistas
Java Developers, make the database work for you (NLJUG JFall 2010) por Lucas Jellema
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)
Lucas Jellema685 vistas
Oracle to Postgres Migration - part 2 por PgTraining
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
PgTraining1.2K vistas
Oracle Database 12c "New features" por Anar Godjaev
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features"
Anar Godjaev2.2K vistas

Último

JioEngage_Presentation.pptx por
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
8 vistas4 diapositivas
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx por
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptxanimuscrm
15 vistas19 diapositivas
Navigating container technology for enhanced security by Niklas Saari por
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas SaariMetosin Oy
14 vistas34 diapositivas
Electronic AWB - Electronic Air Waybill por
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill Freightoscope
5 vistas1 diapositiva
FIMA 2023 Neo4j & FS - Entity Resolution.pptx por
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptxNeo4j
17 vistas26 diapositivas
Keep por
KeepKeep
KeepGeniusee
78 vistas10 diapositivas

Último(20)

JioEngage_Presentation.pptx por admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254558 vistas
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx por animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm15 vistas
Navigating container technology for enhanced security by Niklas Saari por Metosin Oy
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy14 vistas
Electronic AWB - Electronic Air Waybill por Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 5 vistas
FIMA 2023 Neo4j & FS - Entity Resolution.pptx por Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j17 vistas
ADDO_2022_CICID_Tom_Halpin.pdf por TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin95 vistas
Dapr Unleashed: Accelerating Microservice Development por Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski13 vistas
360 graden fabriek por info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492162 vistas
Quality Engineer: A Day in the Life por John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 vistas
How Workforce Management Software Empowers SMEs | TraQSuite por TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 vistas
Top-5-production-devconMunich-2023-v2.pptx por Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app6 vistas
Airline Booking Software por SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta9 vistas

PostgreSQL Prologue

  • 2. Stay If You Want To: Image Source: Himmelfarb et al 2002: 1526 (artist: G. Renee Guzlas). All rights reserved ©. Available via license: CC BY-NC 3.0 - have an intro to postgres - know basic components of postgres - have some idea on postgres workology - go through logical and physical layout of postgres
  • 3. What is PostgreSQL in the first place? - “PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department.” -- PostgreSQL Documentation, By postgresql.org. - “PostgreSQL (pronounced Post-Gres-Q-L), or postgres for short, is an open source object-relational-database management system.” -- Learning PostgreSQL 11 (Third Edition), A beginner's guide to building high-performance PostgreSQL database solutions, By Salahaldin Juba, Andrey Volkov.
  • 4. If You Are Wondering... - we heard about relational database, how does this differ from that? - the definitions claimed postgres to be an “Object Relational Database Management System”, what does this imply? - or, we know Object Oriented Principles, does PostgreSQL adapt OOP paradigms like an Object Oriented Language does?
  • 5. Detour - Database - in simplest words: - organized collection of valid data where new records can be added or an existing record can be accessed, modified or removed
  • 6. Detour - DataBase Management System - can be seen as gatekeeper of database, basically an interface that: - offers and controls access to database to read, update or remove data from database - ensures integrity by imposing given constraints - ensures concurrency and transactions - enables remote access to database - ensures data recovery in case of any kind of failure
  • 7. Detour - Relational DBMS - group of related data can be stored in a tabular form considering: - each property as a column in that table, attribute is more common term - every single instance having those properties is a row or tuple - relation between the properties of that set is also known as schema - relating two different schema using some common attribute is possible, eg: foreign key - use when: - you know your data model right, structured data - data pattern is fixed - all of your entities has fixed attribute and it’s not gonna change - you need immediate acid compliant transaction
  • 8. Detour - Object Relational DBMS - object, classes, inheritance etc paradigms of OOP are supported in schema, relation, even in queries - supports custom data types and nested data types like oop - even functions or operators can be overloaded to facilitate polymorphism
  • 9. Meet “SLONIK” Source: Daniel Lundin - https://wiki.postgresql.org/images/a/a4/PostgreSQL_logo.3colors.svg
  • 10. PostgreSQL - Evolution - evolved from Ingres project of University of California, Berkeley led by Michael Stonebraker - that’s why sometimes termed as Post-Ingres
  • 12. Why Use PostgreSQL? - can support both relational and non-relational data types - extensive data read/write speed - multi-versioning concurrency control - parallel query execution using multiple cores - non-blocking indexing - partial indexing available(skipping deleted tuples)
  • 16. Postgres Built-in Applications - ships with a number of client and server applications - uses server/client model - client and server can reside in different hosts and communicate via TCP/IP or linux socket - can handle multiple concurrent connection from a client - each connection to a client forks a new process
  • 17. Postgres Client Applications - frontend application that requests some database action - psql: - offers interactive terminal to write queries and get response from postgres - queries can be added from file or as command line arguments(cla) as well - pg_config - can tell different configured parameter for the installed version - pgbench - runs benchmark by executing a number of dummy transactions from a number of dummy clients
  • 18. Postgres Client Applications(continued...) - clusterdb - re-clusters previously clustered tables in the specified databases - createdb - creates a new database, - nothing but a wrapper of CREATE DATABASE command - dropdb - removes the specified database - nothing but a wrapper of DROP DATABASE command
  • 19. Postgres Client Applications(continued...) - createuser - creates a new user - just a wrapper of CREATE ROLE command - dropuser - removes a new user - just a wrapper of DROP ROLE command - vacuumdb(garbage collector and optionally analyzer) - cleans dead tuples from all(or specified) tables of a database user has permission to vacuum or generates statistics about the database - a wrapper of VACUUM command - full list here
  • 20. Postgres Server Applications(continued...) - backend application - postgres - accepts connection from client applications - resolves client requests - manages database files - initdb - creates a new pg cluster
  • 21. Postgres Server Applications(continued...) - pg_ctl - initializing, starting, stopping, controlling and etc - pg_upgrade - upgrading a postgres server instance - pg_waldump - generates human readable wal logs - full list here
  • 24. PostgreSQL Forked Process(continued...) - follows process per user method - one client process gets connected to exactly one server process - the master(postmaster) process spawns a new backend server process each time a new connection is requested
  • 26. PostgreSQL Forked Process(continued...) - master process forks other background process at start-up - walwriter - manages Write Ahead Log - any change to data files(table or index) are logged first into wal buffer - ensures data integrity - in case of system crash, roll-forward(or REDO) is done using the log records - checkpointer - keeps a checkpoint in the wal sequence - flushes data files to disk from the last checkpoint reflecting the log
  • 27. PostgreSQL Forked Process(continued...) - background writer - writes specific dirty(new or modified) buffers - may increase I/O load significantly as a dirty page may be written only once per checkpoint wherase bg writer may write this several times - vacuum writer - postgres uses pseudo-deletion method - if deleted or updated, a tuple is not removed from physical storage of that table - thes obsolete tuples are marked as deleted
  • 28. PostgreSQL Forked Process(continued...) - vacuum writer reclaims spaces consumed by dead tuples - also updates the visibility map(_vm) - if run with ANALYZE, updates pg_statistic catalog which query planner uses to plan for most effective execution plan - stats collector - collects and reports server activity - counts access to table and index, number of rows of a table, vacuum and analyze stats etc
  • 29. PostgreSQL Forked Process(continued...) - logical replication launcher - doesn’t replicate byte by byte like physical(stream) replication - replicates one database at a time and only committed row changes, not vacuum ones - works in publisher-subscriber model - unlike stream replication multi-master is possible - DDL is not handled and so manual table creation is required at subscriber end - column name must match, not order or number of column - can’t stream transactions as they happen and so can add overhead if transaction is big - server processes communicate with each other via semaphore and shared memory to ensure data integrity
  • 32. Memory Layout(continued...) - shared memory - accessible from all backend processes and user processes connected to database - shared buffer, WAL buffer, CLog buffer etc - local memory - allocated and used by a specific process or subsystem - vacuum buffer, temp buffer, work memory etc.
  • 33. Memory Layout(continued...) Shared Buffer - where data is read or written - data or blocks residing here is called dirty data or dirty blocks and they are called data files when permanently written to disk - shared memory - can’t be resized unless running postgres server instance is restarted - config parameter: - shared_buffers: 128MB by default
  • 34. Memory Layout(continued...) WAL Buffer - separate buffer to keep transaction logs - wal data is first written to wal buffer before being written to wal disk - shared memory - usually 1/16th of shared buffer in size - config parameter: - wal_buffers: 4MB by default
  • 35. Memory Layout(continued...) CLog Buffer - contains transaction metadata - keeps status of transactions - can tell if a transaction is committed or not - shared memory Lock Space - all locks are stored here - shared memory
  • 36. Memory Layout(continued...) Vacuum Buffer - local memory: used by auto vacuum worker - total size is autovacuum_work_mem times autovacuum_max_workers - config parameter: - autovacuum_max_workers: 3 by default - autovacuum_work_mem: minimum 1MB or if set to -1 uses maintenance_work_mem which is 64MB by default
  • 37. Memory Layout(continued...) Work Memory - local memory: used by the executor or query workspaces - memory to be used when sort(query example: ORDER BY, DISTINCT MERGE JOIN) or hash(query example: HASH-JOIN, IN) operation is executed - config parameter: - work_mem: 4MB by default
  • 38. Memory Layout(continued...) Maintenance Memory - local memory - memory allocated for maintenance operations like: CREATE INDEX, VACUUM, REINDEX, or while adding FOREIGN KEY - config parameter: - maintenance_work_mem: 64MB by default
  • 39. Memory Layout(continued...) Temp Buffer - local memory: used by the executor - space where temporary typed tables will be stored - config parameter: - temp_buffers: 8MB for each session by default
  • 40. Life Of A Query In Postgres
  • 42. Query Execution Phases - client gets a connection to transmit a query to the server and to receive the results - parser stage checks the query for correct syntax and creates a query tree - traffic cop subsystem determines the query type - utility query is passed to the utilities subsystem - rewrite takes the query tree and looks for any rules to apply to the query tree - planner/optimizer takes the (rewritten) query tree and creates a query plan - first creates all possible paths leading to the same result - next the cost for the execution of each path is estimated - finally the cheapest path is chosen - executor recursively steps through the plan tree and retrieves rows in the way represented by the plan
  • 43. Where All Those Data Goes?
  • 45. Logical Layout(continued...) database cluster - collection of databases within the running postgres instance - mainly resides in data area(eg: $PGDATA - /usr/local/pgsql/data) - multiple clusters managed by different postgres instances can exist on the same machine - don’t mix it up with physical database server or node cluster
  • 46. Logical Layout(continued...) database object: - a data structure used to store and refer data - tablespace, tables(heap), functions, views, indexes, etc and even database itself - identified by object identifier or OID, unsigned 4 byte long integer - respective oids are stored in system catalog(pg_catalog) schema - for instance: when a new database or a new table is created it’s all meta data are stored into the pg_catalog.pg_database table and pg_catalog.pg_class table respectively and so on
  • 48. Physical Layout(continued...) Files Directories - pg_hba.conf - pg_ident.conf - PG_VERSION - postgresql.auto.conf - postgresql.conf - postmaster.pid - postmaster.opts - base - global - pg_commit_ts - pg_dynshmem - pg_logical - pg_stat - pg_tblspc - pg_wal
  • 49. Physical Layout(continued...) pg_hba.conf - stands for host based authentication - created when initdb is called - can stay elsewhere as well, default location data area - configuration file to control client authentication
  • 50. Physical Layout(continued...) pg_ident.conf: - configuration file to control postgres user name mapping - used along with pg_hba.conf file - maps system user name(achieved from some external authentication system like iden or GSSAPI) of the client trying to connect to postgres server to postgres user - can stay elsewhere as well, default location data area PG_VERSION - containing the major version number of PostgreSQL
  • 51. Physical Layout(continued...) postgresql.auto.conf - system configurations changed using `ALTER SYSTEM SET <confParameter>=<confValue>;` sql command are overwritten here - gets cleared after resetting the parameter postgresql.conf - server configuration file - can stay elsewhere as well postmaster.opts: - file containing command-line options used at server start time
  • 52. Physical Layout(continued...) postmaster.pid: - keeps track of followings each in a separate line - currently running postgres server instance pid - path of data area - server start timestamp in epoch time - server port number - unix socket path - ip or hostname of listen_address - shared memory segment id - server status - file is absent if no server instance is running
  • 53. Physical Layout - DB Cluster
  • 54. Physical Layout - DB Cluster(continued...) - base directory contains the databases as subdirectories named after the corresponding database oid which are created on pg_default tablespace - tables or databases created on different tablespace like the one(test_db_2 → 16412 is created with default tablespace to be test_table_spcace → 16410) here
  • 55. Physical Layout - Table Files - when a table is created, a file having the filenode of the table as the filename is created - max table size is 32TB - divided into 1GB sized segments(if page size is 8KB) - each segment file from the second one will be named as <filenode>.1, <filenode>.2 and so on - usually filenode is same of oid unless TRUNCATE, REINDEX, CLUSTER or ALTER TABLE or AUTOVACUUM is applied to that table
  • 56. Physical Layout - Table Files(continued…) Figure: table page layout(src)
  • 57. Physical Layout - Table Files(continued…) - each table segment contains several pages(8K sized) - each page starts with some page header(24 bytes) followed by item pointers(4 bytes each) and ends with the actual tuples(or items) and special space, the space in between item pointer and actual item is called free space - when a tuple can’t fit into a single page, it is stored in a separate file named TOAST(The Oversized-Attribute Storage Technique) file created as <filenode>_toast format
  • 58. Physical Layout - Table Files(continued...) - a table may contain an _fsm(Free Space Map) file and a _vm(Visibility Map) file - when updating a tuple, postgres doesn’t overwrite it, creates a new one instead marking the old one as deleted - when deleting a tuple, postgres uses a policy of pseudo-deletion, it just marks the existing tuple to be deleted and updates the _fsm file - also vacuum worker finds out those unused spaces and recognises them as free space and creates(if there’s none) or updates the _fsm and _vm file
  • 59. Physical Layout - Table Files(continued...) - _fsm file keeps track of the free spaces that can be reused by some other tuple - _vm file keeps track of which pages in the segment has these tuple gaps by storing 2 bits per page - the first bit is only set when the corresponding page has no gaps making it easy for the next scan - _vm bits can only set by vacuum although other data-modifying operation can reset them - index files don’t have any _vm file
  • 60. Physical Layout - Table Files(Example)
  • 61. Physical Layout - Table Files(Example - head)
  • 62. Physical Layout - Table Files(Example - tail)
  • 63. Physical Layout - Tablespace - symlink to some other storage where table files will be stored - pg_global tablespace is used for shared system catalogs - pg_default tablespace is the default tablespace of the template1 and template0 databases - different tables of the same database can be kept in different tablespace - use case: - if you are running out of disk space, you can create a tablespace to a different disk to move data to that location - tablespace for highly accessed data can be set to fast disks like SSD and less accessed ones can be stored in slower disks like SATA - temporary tables can be stored in separate table space
  • 64. Summary Attempt of: - https://www.postgresql.org/docs/12/index.html - https://developer.okta.com/blog/2019/07/19/mysql-vs-postgres - https://en.wikipedia.org/wiki/PostgreSQL - Learning PostgreSQL 11 (Third Edition), A beginner's guide to building high-performance PostgreSQL database solutions, By Salahaldin Juba, Andrey Volkov - https://www.izenda.com/relational-vs-non-relational-databases/ - https://medium.com/@zhenwu93/relational-vs-non-relational-databases-8336870da8bc - https://www.ibm.com/cloud/blog/new-builders/brief-overview-database-landscape
  • 65. Summary Attempt of: - https://www.linuxjournal.com/content/postgresql-nosql-database - https://stackoverflow.com/questions/33621906/difference-between-stream-replication-and-l ogical-replication - https://www.postgresql.fastware.com/blog/back-to-basics-with-postgresql-memory-compon ents - https://severalnines.com/database-blog/architecture-and-tuning-memory-postgresql-databas es - http://www.interdb.jp/pg/pgsql02.html
  • 66. Summary Attempt of: - http://rachbelaid.com/introduction-to-postgres-physical-storage/ - https://www.postgresql.org/docs/current/storage-page-layout.html - https://www.postgresql.org/docs/current/storage-file-layout.html - http://www.interdb.jp/pg/pgsql01.html - https://blog.dbi-services.com/using-operating-system-users-to-connect-to-postgresql/ - http://etutorials.org/SQL/Postgresql/Part+I+General+PostgreSQL+Use/Chapter+4.+Perform ance/How+PostgreSQL+Organizes+Data/ - https://www.postgresql.org/docs/12/limits.html
  • 67. Summary Attempt of: - https://pgdash.io/blog/tablespaces-postgres.html - Chapter-3, Learning PostgreSQL 11, Third Edition, by Salahaldin Juba, Andrey Volkov - https://www.postgresql.org/docs/12/manage-ag-tablespaces.html - https://www.postgresql.org/docs/12/query-path.html - https://www.postgresql.org/developer/backend/ - https://stackshare.io/postgresql - https://www.postgresql.org/docs/12/history.html