3. Things to look at when choosing a database
blah blah blah
• Consistency, availability, and partition tolerance (CAP)
• Robustness and reliability
• Scalability
• Performance and speed
• Operational and querying capabilities
• Database integrity and constraints
• Database security
• Database vendor/system funding, stability, community, and level of establishment
• Talent pool and availability of relevant skills
• The type and structure of data being stored, and the ideal method of modeling the data
•In other words – IT DEPENDS
• And don’t let any database employee tell you otherwise
4. The SQL you use…
■ Is from 1992
– At least that’s a step up from SQL-89
– Until MSVC 2015 C was stuck on C89
■ The first SQL standard was created in 1986
– COBOL, FORTRAN, Pascal and PL/I
– I now feel old
■ Was driven by large companies
– See if you can name some
5. We are ANSI –
and we like
standards!
“
Second, because there was no outside “buyer” to
shape the content of the Core level of SQL99, it
was enlarged to such an extent that to
implement it all is close to impossible for all
vendors except for two or three. In short, the size
of core is a natural barrier to practical product
development.
”
Michael Gorman
Secretary of the ANSI Database Languages Committee
http://tdan.com/is-sql-a-real-standard-
anymore/4923
SQL-92
SQL-99
SQL-2003
SQL-2011
6. Reporting Reporting Reporting
■ Almost all the features we’ll discuss are most useful for reporting
■ Some are syntactic sugar (or run faster in cases) than traditional SQL
■ Mysql doesn’t have any of this (sorry folks) – maybe WITH in 8.0
– https://dveeden.github.io/modern-sql-in-mysql/
– Go here and +1 all the feature requests!
■ PostgreSQL has it all, plus some other goodies
– But use a new version, I recommend 9.5
9. OLAP
■ Online analytical processing
■ Processes multi-dimensional analytical (MDA) queries swiftly
■ Consolidation (roll-up)
– aggregation of data that can be accumulated and computed in one or more
dimensions
■ Drill-down
– navigate through the details
■ Slicing and dicing
– take out (slicing) a specific set of data and view (dicing) the slices from different
dimensions
10. GROUPING SETS, ROLLUP, CUBE
■ GROUPING SETS ( ( e1, e2, e3, ... ), ( e1, e2 ), ( ))
– allow you to choose how you do sets of columns that you group by
■ ROLLUP ( e1, e2, e3, ... )
– is shorthand the given list of expressions and all prefixes of the list including the
empty list, useful for hierarchical data
■ CUBE ( e1, e2, ... )
– is shorthand for the given list and all of its possible subsets
14. Why do we care?
■ Simplify queries
■ Perform fewer queries, get more data out of the same query
■ Group information on multiple (and complex) dimensions
15. Support
■ MySQL (and MariaDB) have GROUP BY …WITH ROLLUP
– That’s it, just rollup
– And it’s kind of broken syntax too compared to other DBs
■ PostgreSQL was late to the party (9.5) but implemented ALLTHETHINGS
■ SQL Server, Oracle, DB2 have had this stuff for ages (plus a bunch of proprietary olap
features in addition!)
■ SQLite IS missing this stuff (bad SQLite, bad!)
18. WITH – organize complex queries
■ WITH query_name (column_name1, ...)AS
(SELECT ...)
SELECT ...
■ A way to organize queries
■ Also called common table expression (CTE) and sub-query factoring
■ Makes highly complex queries sane to read and understand
20. Why do we care?
■ If it’s easier to read it’s easier to maintain
■ Assign column names to tables
■ Hide tables for testing (with names HIDETABLES of the same name)
21. WITH RECURSIVE
■ The optional RECURSIVE modifier changesWITH from a mere syntactic convenience
into a feature that accomplishes things not otherwise possible in standard SQL. Using
RECURSIVE, aWITH query can refer to its own output
■ WAT?
27. LATERAL – joining by foreach
■ a LATERAL join is like a SQL foreach loop, in which the db will iterate over each row in
a result set and evaluate a subquery using that row as a parameter.
■ A lateral join can reference other tables in the query!
■ Generally lateral joins are faster (the optimizer gets to have fun)
28. Lateral join example
■ https://gist.github.com/auroraeosrose/73fb8d0779ef4c0251754f38eea228de
36. TOTHEVM!
■ Which is faster – case or filter?
■ Why is this not the #1 request everywhere
– Look, we’ll give you a keyword that optimizes the crap out of your query and you
can do more with just 1 query
39. Window Functions
■ Define which rows are visible at each row
■ OVER() makes all rows visible at each row
■ OVER(PARTITION BY) segregates like a GROUP BY
■ OVER(ORDER BY … BETWEEN) segregates using < >
40. As a query writer I want to
■ Merge rows that have the same things
– GROUP BY
– DISTINCT
■ Aggregate data from related rows
– Requires a GROUP BY
– Uses aggregate functions
■ BUTATTHE SAMETIME
42. TOTHEVM!
■ You can do a LOT more withWindowing
■ You can page
■ You can do ranges and between
■ You can window more than once
■ I could do a whole talk on windowing!
44. Windowing Functions
row_number() number of the current row within its partition, counting from 1
rank() rank of the current row with gaps
dense_rank() rank of the current row without gaps
percent_rank() relative rank of the current row: (rank - 1) / (total rows - 1)
cume_dist()
relative rank of the current row: (number of rows preceding or peer
with current row) / (total rows)
ntile(num_buckets integer)
integer ranging from 1 to the argument value, dividing the partition
as equally as possible
lag(value anyelement)
returns value evaluated at the row that is offset rows before the
current row within the partition
lead(value anyelement)
returns value evaluated at the row that is offset rows after the
current row within the partition
first_value(value any) returns value evaluated at the row that is the first row
last_value(value any) returns value evaluated at the row that is the last row
nth_value(value any, nth integer)
returns value evaluated at the row that is the nth row of the
window frame (counting from 1); null if no such row
49. Json types in Postgresql
json
■ String internal representation
■ http://rfc7159.net/rfc7159
– previously supported
http://www.ietf.org/rfc/rfc4627.t
xt
■ Stores exact text, reparsed on each
execution
jsonb
■ Binary internal representation
■ Can have indexes on stuff inside
■ Has “shadow types”
■ http://rfc7159.net/rfc7159
■ De-duplicates and decomposes to
binary format
So why did I start this talk
I like sql and was “apprenticed” to someone who was determined to teach me how to sql right
I had done a lot of mysql in my day and had learned some very bad habits
I was rather amazed at all the stuff I could do with proper sql!
We could talk a bunch on why you’d choose an rdbms over sql
But that’s not what this talk is about
This is about you already have sql and want it
If you really want to see this in action, go argue with Derick
Look, lots of things go into choosing a database
Most of us in web look for fast and cheap – that might be nice, but you might be missing out on features you need
Also I’m rather disappointed in a lot of open source databases
Interestingly enough Microsoft didn’t get into the game until1998 – they partenered with Sybase, then eventually did a rewrite to make it NT happy (That’s why if you used the old php mssql extension that also gave you Sybase calls)
IBM, Oracle and later Microsoft have been the big players – Sybase, SAP and others are also involved
The SQL standard is huge. More than 4000 pages in its SQL:2011 incarnation. No single implementation can ever implement all features.0 Even in the early releases, such as SQL-92, the SQL standard defined different conformance levels so that vendors can claim conformance to a subset of the standard.
Starting with SQL:1999 all features are enumerated and either flagged mandatory or optional. As a bare minimum, conforming systems must comply with all mandatory features, which are collectively called “Core SQL”. Besides entry-level SQL-92 features, Core SQL:1999 also requires some features previously only required for intermediate or full level as well as a few new features3.
Beyond Core SQL, vendors can claim conformance on a feature-by-feature basis.
The sad part is I don’t even care about the REALLY new shiny (ahem, 5 year old) stuff, I just want 1999 support!!
In fact nothing here I’m showing you is beyond the spec for 1999
This is describing in fancy terms every reporting interface ever
I”ve done this – many many many many times in one form or another (or multiple queries even to get it or other evil)
Grouping sets let you do all kinds of cool complex stuff!
Guess what happens when you add a () to the end?
That, that is what I want to write
Simple, succinct
DOES THE SAME THING!!
Does anyone have any idea what this is doing? I do not!
Seriously, so postgresql and sqlite support this but mysql doesn’t yet… it MIGHT in 8.0, maybe
Oooh – now it makes some sense
Caveat lector – postgresql treats with statements as an optimization fence! This is because you can update and delete using a with, a postgresql extension
So beware of CTES depending on what you want for performance!
Perl joke
How to play along if you’re so included the default postgresql in Ubuntu 16.04 IS 9.5 which is what you should be on
Not only for features and speed but also there were a couple of nasty jsonb and security bugs fixed
The term "loose indexscan" is used in some other databases for the operation of using a btree index to retrieve the distinct values of a column efficiently; rather than scanning all equal values of a key, as soon as a new value is found, restart the search by looking for a larger value. This is much faster when the index has many equal keys.
Postgres does not support loose indexscans natively, but they can be emulated using a recursive CTE as follows:
Yes, you could probably rewrite this as a subquery – but it is generally going to be faster this way, especially with large amounts of data
Once again mysql is out in the cold
Who remembers what was happening in 2003? You do realize this stuff was standardized 13 years ago? No more complaining about browser stuff huh?
This is something I absolutely love, filter is amaaaazing
Also I lied – while case is a 1999 feature, filter is a 2003 feature (bite me)
The next cool thing is 2003 too, the future is coming!
With the exception of subqueries and window functions, the <condition> may contain any expression that is allowed in regular where clauses0.
The biggest win here is it’s simply FASTER in postgresql to use filter – the query planner is more clever than when used with a traditional case statement, which can be slow
Use it for pivot tables, for grabbing eav data easily
But remember, as we discussed earlier there CAN be a performance penalty for queries with CASE – filter has better query optimizations
The whole idea behind window functions is to allow you to process several values of the result set at a time: you see through the window some peer rows and are able to compute a single output value from them, much like when using an aggregate function.
So basically we’re going to chop up so we can do multiple things at a time
Using the same ddl as the lateral stuff, you can see we can get our average salary AND our individual salary at the same time!
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
A window function call always contains an OVER clause directly following the window function's name and argument(s). This is what syntactically distinguishes it from a regular function or aggregate function. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.
Currently the biggest trend seems to be json
This amuses me a lot because I remember the xml push
You could argue json is a better format … but there are a BUNCH of xml standards for dbs, did you know that?
Sql server and oracle and db2 have all sorts of fancy features for xml usage
But now they’re all moving json bound
shadow types” unknown to the core SQL parser.
Description
Get JSON array element (indexed from zero, negative integers count from the end)
Get JSON object field by key
Get JSON array element as text
Get JSON object field as text
Get JSON object at specified path
Get JSON object at specified path as text
Does the left JSON value contain the right JSON path/value entries at the top level?
Are the left JSON path/value entries contained at the top level within the right JSON value?
Does the string exist as a top-level key within the JSON value?
Do any of these array strings exist as top-level keys?
Do all of these array strings exist as top-level keys?
Concatenate two jsonb values into a new jsonb value
Delete key/value pair or string element from left operand. Key/value pairs are matched based on their key value.
Delete the array element with specified index (Negative integers count from the end). Throws an error if top level container is not an array.
Delete the field or element with specified path (for JSON arrays, negative integers count from the end)
Note: The || operator concatenates the elements at the top level of each of its operands. It does not operate recursively. For example, if both operands are objects with a common key field name, the value of the field in the result will just be the value from the right hand operand
If you want to learn about any of these terms – just google them – they’re fairly well documented – sql server actually has some fantastic documentation on all of their stuff
Postgresql is more people’s blogs
There is SOOO much more you can do from hooking objects to hooking the engine!