PostgreSQL has become the most popular RDBMS in the Ruby ecosystem in the last decade. It has a great set of built-in features, including a variety of versatile data types, both common and very specific.
But when we load data from the database to our application code, we're working with Ruby data types: classes from the standard library, Rails, or other gems. So while they can seem to be the same as their PostgreSQL counterparts, they are not absolutely identical, and sometimes that could lead to surprising behavior.
In this talk, I would like to explore the power of data types in PostgreSQL and Ruby and how to work with them properly to use both Ruby and PostgreSQL on 100% of their power!
4. Martian Open Source
Yabeda:Ruby application
instrumentation framework
Lefthook:git hooks manager AnyCable:Polyglot replacement for
ActionCable server
PostCSS:A tool for transforming
CSS with JavaScript
Imgproxy:Fast and secure
standalone server for resizing and
converting remote images
Logux:Client-server communication
framework based on Optimistic UI,
CRDT, and log
Overmind:Process manager for
Procfile-based applications and
tmux
Even more at
evilmartians.com/oss
7. Example of subtle yet important difference
Are they same?
…
Left: road signconforming Vienna ConventiononRoad Signs and Signals
Right: Japanese road signper “Order onRoad Sign, Road Line, and Road Surface Marking”
8. Example of subtle yet important difference
Are they same?
NO!
Both require drivers to give a way…
European sign doesn’t have a requirement to stop or even slow down!
9. Example of subtle yet important difference
Are they same?
YES!
Stop signs around the world are mostly octagonal.
Japanese signs have “STOP” word in English after 2017, but are still pretty rare.
13. Integers
Integer
Variable length
integer
2, 4, 8 bytes signed
In ActiveModel there is validation for
databases:
Name Size Range
smallint 2 -32768 to +32767
integer 4 -2147483648 to +2147483647
bigint 8 -9223372036854775808 to
+9223372036854775807
See bignum.c in Ruby sources. See Numeric types docs
1.size # => 8 (bytes)
(256**8 - 1).size # => 8 (bytes)
(256**8).size # => 9 (bytes)
(256**40 - 1).size # => 40 (bytes)
Test.create(value: 2147483648)
# ActiveModel::RangeError: 2147483648 is out of rang
# for ActiveModel::Type::Integer with limit 4 bytes
` `
` `
` `
INSERT INTO "tests" ("value") VALUES (2147483648);
-- ERROR: integer out of range
14.
15. Oh no, I have integer primary keys! What to do?
0. Don’t panic!
1. Use pghero or Heroku pg:diagnose to detect problematic primary keys.
2. Migrate to bigint or uuid if needed (use triggers, Luke!)
3. In case of emergency, remember that all integers are signed!
You always have 2 more billions of values on the dark negative side!
Example of pkey migration from int to bigint: engineering.silverfin.com/pg-zero-downtime-bigint-migration
` ` ` `
ALTER SEQUENCE tablename_id_seq MINVALUE -2147483647 RESTART WITH -2147483647;
pghero
16. Floating point numbers
Float
8 bytes (double-precision)
real — 4 bytes
double — 8 bytes
See Ruby docs for Float More fun at 0.30000000000000004.com!
IEEE 754
0.1 + 0.2 # => 0.30000000000000004
Float::MAX # => 1.7976931348623157e+308
Float::MAX + '1e+308'.to_f # => Infinity
# BUT!
Float::MAX + '0.1'.to_f
# => 1.7976931348623157e+308 🤔
Float::MAX == (Float::MAX + '0.1'.to_f)
# => true 🤯
Float::NAN == Float::NAN # => false
SELECT 0.1::float + 0.2::float; -- 0.300000000000000
SELECT 0.1 + 0.2; -- 0.3 (but it is NOT float!)
SELECT '1.7976931348623157e+308'::float + '1e+308'::
--- ERROR: value out of range: overflow
SELECT '+inf'::double precision; -- Infinity 🤷
SELECT '1.7976931348623157e+308'::float =
('1.7976931348623157e+308'::float + '0.1'::float);
-- true ¯_(ツ)_/¯
SELECT 'NaN'::float = 'NaN'::float; -- true 🤯
0.30000000000000004.com
18. Arbitrary precision numbers
BigDecimal
Variable length
numeric
Variable length
See Ruby docs for BigDecimal Use numeric to store money!
BigDecimal("0.1") + BigDecimal("0.2") # => 0.3e0
BigDecimal("NaN") == BigDecimal("NaN") # => false
BigDecimal("1.0") / BigDecimal("0.0") #=> Infinity
# To match PostgreSQL behavior:
BigDecimal.mode(BigDecimal::EXCEPTION_OVERFLOW, tru
BigDecimal("1.0") / BigDecimal("0.0")
# Computation results in 'Infinity' (FloatDomainEr
BigDecimal("0.1") + 0.2.to_d == 0.30000000000000004
# true 🤔
SELECT 0.1 + 0.2; -- 0.3 which is decimal
SELECT 'NaN'::decimal = 'NaN'::decimal; -- true
SELECT '1.0'::decimal / '0.0'::decimal;
-- ERROR: division by zero
SELECT (0.1 + 0.2) = (0.1::float + 0.2::float);
-- false
` `
19. But there is money type, isn’t it?
BigDecimal
Variable size
money
8 byte fixed-precision number.
ActiveRecord has to parse textual representation, see
connection_adapters/postgresql/oid/money.rb
Also see issue № 31457 for lots of pain.
Both output and acceptable input format depends on
session-level lc_monetary setting!
Precision is defined by lc_monetary at database creation
time and can’t be changed!
# If the database locale setting isn't `en_US`:
# Creation may fail:
Product.create!(price: 100500.42)
# ERROR: invalid input syntax for type money: "1005
# Or it can succeed, but won't be able to be parsed
Product.last.price # => 0.0
-- on production:
SELECT 100500.42::money;
-- $100,500.42
-- on dev machine:
SELECT 100500.42::money;
-- ¥ 100,500
-- 🤯 But it should be dollars, and where are my cen
` `
` `
20. Strings and texts, lyrics and prose
String
Variable size
varchar,text
variable size, max 1 GB
Read the docs: String Read the docs: 8.3. Character Types
"こんにちは地球人!".encoding
# => #<Encoding:UTF-8>
"xe3x2ex2e".encoding
# => #<Encoding:UTF-8>
"xe3x2ex2e".valid_encoding?
# => false
"これx00やばい!".valid_encoding?
# => true
SELECT 'こんにちは地球人!';
-- こんにちは地球人!
SELECT E'xe3x2ex2e');
-- ERROR: invalid byte sequence for encoding "UTF8
SELECT E'これx00やばい!';
-- ERROR: invalid byte sequence for encoding "UTF8
21. So many string types!
1. Don’t use char(n)
It is always size of n and stores unnecessary spaces at right. Mostly for
compatibility with older applications.
2. varchar and text are effectively the same, choose whatever you like.
string in migrations is varchar in PostgreSQL.
💡Did you know?
SimpleForm gem will render multi-line HTML <textarea> tag for text type and
single-line <input type="text"> for character varying.
Convention over configuration!
See SimpleForm gem README: github.com/heartcombo/simple_form
` `
` `
` ` ` `
` ` ` `
` `
` `
23. Binary data
String
Variable size
bytea
Variable size, max 1 GB
Memory and network traffic consumption: 📈 See Binary Data Types page in the docs.
data = File.binread(“meme.png”)
# => "x89PNGrnx1A…"
data.encoding # => #<Encoding:ASCII-8BIT>
data.bytesize # => 46534
Test.last.blob
# => "x89PNGrnx1A…"
Test.last.blob_before_type_cast.bytesize
# => 46534
Test.last.blob_before_type_cast
# => "x89504e470d0a1a0a"
Test.last.blob_before_type_cast.bytesize
# => 93070
SELECT 'x89504e470d0a1a0a…'::bytea;
# Note hexadecimal format ↑
24. What if 1 GB isn’t enough?
You can’t store more in a table column (hard limit)
But you can store up 4 TB in large objects table!
And there is a gem for that:
active_storage-postgresql
Beware performance implications of TOAST →
More about it in PG docs: 70.2. TOAST
Andrey Novikov
@Envek · Follow
— PostgreSQL, did you update all these megabyte-
long JSON I gave you?
— Yes
— What did it cost?
— Everything… all the IO on the server.
Let’s talk about how PostgreSQL stores JSONB
(yep, it will be a thread about TOAST)
10:47 PM · May 18, 2022
Read the full conversation on Twitter
359 Reply Copy link
Read 4 replies
25. Dates
Date date
4 bytes
Internally stores number of days since
year 4713 BC up to infinity.
The Julianday number is inelapsed days since noon
(GreenwichMeanTime) onJanuary 1, 4713 BCE(in
the Juliancalendar).
The day count is virtually the astronomical Julian
day number.
Internally stores number of days since
year 4713 BC up to year 5874897 AD.
Inthe JulianDate system, eachday has a sequential
number, starting from JD0 (whichis sometimes
called the JulianDate). JD0 corresponds to 1
January 4713 BC inthe Juliancalendar.
See the docs for Date class. See B.7. Julian Dates in PostgreSQL docs.
26. Time and timezones
Time
AS::TimeWithZone
Two UNIX timestamps inside and tzdata also
timestamp
timestamptz
8 bytes, microsecond precision
Ruby on Rails uses UTC timezone internally. Use timestamp with time zone whenever possible!
Time.now # => 2022-10-22 13:42:42 +0900
Time.current # => Sat, 22 Oct 2022 04:42:42 UTC +00:
Time.current.time_zone
# => #<ActiveSupport::TimeZone name="UTC", @tzinfo=
Time.use_zone("Asia/Tokyo") { Time.current }
# => Sat, 22 Oct 2020 13:42:42 JST +09:00
CREATE TABLE tests (t1 timestamp, t2 timestamptz);
SET SESSION timezone TO 'Etc/UTC';
INSERT INTO tests (t1, t2) VALUES (now(), now());
SET SESSION timezone TO 'Asia/Tokyo';
INSERT INTO tests (t1, t2) VALUES (now(), now());
SET SESSION timezone TO 'Europe/Lisbon';
INSERT INTO tests (t1, t2) VALUES (now(), now());
SET SESSION timezone TO 'Asia/Tokyo';
SELECT * FROM tests;
t1 | t2
---------------------+-------------------------
2022-10-22 04:42:42 │ 2022-10-02 13:42:42+09 │
2022-10-22 13:42:42 │ 2022-10-02 13:42:42+09 │
2022-10-22 05:42:42 │ 2022-10-02 13:42:42+09 │
` `
27. How to not mess up with timezones
1. Use timezone-aware methods
Use Time.current and Date.current instead of Time.now and Date.today
2. Convert timestamps to user time zone
3. Don’t use dates in SQL, use timestamps
More tips here: thoughtbot.com/blog/its-about-time-zones
` ` ` ` ` ` ` `
Time.use_zone(user.timezone) do
# Do SQL queries, render views, …
end
# or
Time.current.in_time_zone(user.timezone)
- Posts.where(published_at: Date.today...Date.tomorrow)
+ Posts.where(published_at: Time.current.beginning_of_day..Time.current.end_of_day)
28. Time zones are hard
見る
The Problem with Time & Timezones - Computerphile
The Problem with Time & Timezones - Computerphile
共有
共有
29. Void and uncertainity
NilClass NULL
nil == nil
# => true ¯_(ツ)_/¯
SELECT NULL = NULL; -- NULL 🚨
SELECT NULL IS NULL; -- true
SELECT NULL IS DISTINCT FROM NULL; -- false
SELECT 'Ruby' = NULL; -- NULL 🚨
SELECT 'Ruby' IS NULL; -- false
SELECT 'Ruby' IS DISTINCT FROM NULL; -- true
31. JSON
Hash,Array json,jsonb
Variable length, up to 1GB
Be careful with symbols as keys
Define as_json method on your classes
to serialize them to JSON automatically.
Behavior of JSON.dump and to_json in
Rails is different!
JSON saves value as is (it is just a string)
JSONB is effective but strict: no
duplicate keys, doesn’t preserve
whitespaces, etc…
Inside: string (no null-bytes!), numeric, …
{ "foo" => "bar", foo: "baz" }.to_json
# {"foo":"baz"}
` `
` ` ` `
SELECT '{"foo": "bar","foo":"baz"}'::json;
-- {"foo": "bar","foo":"baz"}
SELECT '{"foo": "bar","foo":"baz"}'::jsonb;
-- {"foo": "baz"}
32. JSON on steroids
Use store_model gem to make powerful value objects from JSON fields.
But don’t overuse!
There is performance penalty for serialization and deserialization.
store_model gem repo
33. Ranges
Range intrange,tsrange, …
intmultirange, …
Additional methods in the facets gem. https://www.postgresql.org/docs/14/rangetypes.html
5..7 or 5...8
Time.current..1.day.from_now
# endless or beginless ranges
Time.current..
..Date.yesterday
nil.. or Range.new(nil, nil)
# Beginning is always included if possible :-(
Test.pluck("intrange(1, 5, '()')").first # 2...5
Test.pluck("tstzrange(now(), now() + '1 hour', '()
# ArgumentError: The Ruby Range object does not
# support excluding the beginning of a Range.
SELECT int8range(5, 7, '[]'); -- [5,8]
SELECT int8range(5, 8); -- [5,8)
SELECT tstzrange(now(), now() + '1 day', '()');
-- ["2022-10-22 14:42:42+09","2022-10-23 14:42:42+09
-- endless or beginless ranges
SELECT tstzrange(now(), NULL);
SELECT tstzrange(NULL, NULL);
-- PG 14: Multiranges and operators
SELECT nummultirange(numrange(1, 20))
- nummultirange(numrange(4, 6));
-- {[1,4),[6,20)}
-- and many more… (exclusion constraints!)
34. UUID
String
36 bytes
uuid
16 bytes
See Rails docs for Digest::UUID See docs for pgcrypto and uuid-ossp extensions.
Also take a look at upcoming UUIDv6, v7, and v8!
datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04
# All-random UUIDv4
SecureRandom.uuid
# => “40f15398-4b38-4e16-8b3c-ff16fc960d38”
# Determined UUIDv5 (hash-based)
Digest::UUID.uuid_v5(Digest::UUID::DNS_NAMESPACE, "n
# => "9b8edca0-90f2-5031-8e5d-3f708834696c"
CREATE EXTENSION "pgcrypto";
SELECT gen_random_uuid();
-- 2cfff962-4a24-4ef3-b2f8-35351b18bf63
CREATE EXTENSION "uuid-ossp";
SELECT uuid_generate_v5(uuid_ns_dns(), 'name');
-- 9b8edca0-90f2-5031-8e5d-3f708834696c
35. IP addresses
IPAddr inet,cidr
7 or 19 bytes both
inet works with both host and network addresses.
cidr works with network addresses only.
See IPAddr docs. See Network address types and functions and operators.
ip6 = IPAddr.new "3ffe:505:2::1"
ip4 = IPAddr.new "192.168.2.0/24"
IPAddr.new("192.168.2.0/24").mask(16)
#<IPAddr: IPv4:192.168.0.0/255.255.0.0>
SELECT '::1'::inet;
SELECT '127.0.0.1/32'::inet;
SELECT set_masklen(cidr '192.168.1.0/24', 16);
-- 192.168.0.0/16
SELECT inet_merge(inet '192.168.1.5/24', inet '192.
-- 192.168.0.0/22
` `
` `
36. Durations
ActiveSupport::Duration interval
16 bytes
Disclaimer: I added it to Rails in pull request № 16919. Supported out-of-the-box in Ruby on Rails 6.1+
Time.current + 1.year
# => Thu, 18 Jun 2021 21:00:00 MSK +03:00
100500.weeks.iso8601
# => "P100500W"
1.month.to_i
# => 2629746 (30.436875 days in seconds)
SELECT now() + ‘1 year’;
-- 2021-06-18 21:00:00+03
SELECT '100500 weeks'::interval;
-- 703500 days
SELECT EXTRACT(epoch FROM '1 month'::interval);
-- 2592000 (30.0 days in seconds)
37. Enums
String Custom enum types
4 bytes
Values are human-readable in SQL
On Rails < 7 you can use activerecord-postgres_enum gem See 8.7 Enumerated Types page in PostgreSQL docs
# In migration (Rails 7+):
create_enum :status, ["draft", "published", "archive
change_table :posts do |t|
t.enum :status, enum_type: "status", default: "dra
end
# In the application code:
class Article < ApplicationRecord
enum :status, { draft: "draft", published: "publis
end
Article.last.status #=> "draft"
Article.last.draft? #=> true
Article.last.published!
# UPDATE articles SET status = 'published' WHERE id
CREATE TYPE status AS ENUM ('draft', 'published', '
ALTER TABLE posts ADD COLUMN "status" status NOT NU
INSERT INTO posts (status) VALUES ('published');
INSERT INTO posts (status) VALUES ('draft');
SELECT id, status FROM posts;
id | status
----+------------
2 | draft
1 | published
39. Example of composite type use case
What if we want to:
1. Store products with prices in different currencies in one table
2. Work with price and currency as a whole
There is a great money gem in Ruby, but how to do it in SQL?
3. And do some calculations without having to write complex SQL queries with joins.
Before: After:
User (string currency)
has many
Product (numeric price)
User (string currency) # only as a setting!
has many
Product (true_money price (string+numeric))
40. Create custom datatype
Declare composite datatype in the database:
CREATE TYPE _true_money AS (
currency varchar,
amount numeric
);
-- type with constraints to allow:
-- - either NULL value (no price, can be forbidden by NOT NULL)
-- - or value with both currency and amount specified
CREATE DOMAIN true_money AS _true_money CHECK (
value IS NULL AND
value IS DISTINCT FROM (null, null)::_true_money OR
((value).currency IS NOT NULL AND (value).amount IS NOT NULL)
);
41. Fun fact about composite datatypes
Every table defines own datatype which can be used elsewhere
But don’t use it in reality, please!
(There are limitations)
CREATE TABLE "inner" ( v1 integer, v2 text );
CREATE TABLE "outer" (v inner);
INSERT INTO "outer" (v) VALUES ((42,'Hello world!'));
SELECT * FROM "outer";
v
-------------------
(42,"Hello world!")
42. Use composite datatype
See 8.16. Composite Types in PostgreSQL docs for more advices and caveats.
ALTER TABLE tests ADD COLUMN price true_money;
INSERT INTO tests (price) VALUES (('JPY',10000.0));
INSERT INTO tests (price) VALUES ('("JPY",100.0)');
SELECT price FROM tests; -- (JPY,10000.0),(JPY,100.0)
SELECT (price).currency, (price).amount FROM tests;
currency | amount
----------+---------
JPY | 10000.0
JPY | 100.0
43. Declare it in ActiveRecord
module ActiveRecord
module ConnectionAdapters
module PostgreSQL
module OID
class TrueMoney < Type::Value
def type
:true_money
end
# Here will be (de)serialization code
end
end
end
end
end
44. Deserialization
And "(USD,4.2)" becomes #<Money fractional:420 currency:USD> in Ruby ✨
def deserialize(value)
return nil if value.nil?
currency, amount = value.match(/A("?(w+)"?,(d+(?:.d+)?))z/).captures
::Money.from_amount(BigDecimal(amount), currency)
end
` ` ` `
45. Casting user input
Add ability to assign ready object to attribute:
def cast(value)
return nil if value.nil?
case value
when ::Money then value
when String then deserialize(value)
else
raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money"
end
end
46. Deserialization and input casting at once
Replaces both deserialize and cast , also handles nil s.
def cast_value(value)
case value
when ::Money then value
when String
currency, amount = value.match(/A("?(w+)"?,(d+(?:.d+)?))z/).captures
::Money.from_amount(BigDecimal(amount), currency)
else
raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money"
end
end
end
` ` ` ` ` `
47. Serialization for the database
Reuse available serialization methods for subtypes.
def serialize(value)
return nil if value.nil? # ActiveRecord will handle NULL for us
amount_t = ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::Decimal.new
currency_t = ::ActiveModel::Type::String.new
"(#{currency_t.serialize(value.currency.iso_code).inspect},#{amount_t.serialize(value.amount)})"
end
48. Register datatype in ActiveRecord
PostgreSQLAdapterWithTrueMoney = Module.new do
def initialize_type_map(m = type_map)
m.register_type "true_money" do |*_args, _sql_type|
::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney.new
end
m.alias_type "_true_money", "true_money"
super
end
end
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.prepend(PostgreSQLAdapterWithTrueMoney)
ActiveRecord::Type.register(
:true_money,
::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney,
adapter: :postgresql,
)
49. Also add it for migrations…
module SchemaStatementsWithTrueMoney
def type_to_sql(type, limit: nil, precision: nil, scale: nil, array: nil, **)
case type.to_s
when 'true_money' then 'true_money'
else super
end
end
end
ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaStatements.prepend(SchemaStatementsWithTrueMoney)
module ActiveRecord::ConnectionAdapters::PostgreSQL::ColumnMethods
def true_money(name, options = {})
column(name, :true_money, options)
end
end
50. Ready to use!
rails g model Product title price:true_money
rails db:migrate
rails console
Product.create!(title: "Something", price: Money.from_amount(100000, “USD”))
Product.last
# => #<Product id: 1, title: "Something", price: 100000.00 USD>
51. But it is not done yet!
A lot of stuff has to be done to make a
full-featured datatype in SQL…
But then you can do a lot in SQL:
After all, one might re-invent abandoned pg-currency
CREATE FUNCTION true_money_add(a true_money, b true_
BEGIN
IF (a).currency != (b).currency THEN
RAISE EXCEPTION '% can not be added to % - cur
END IF;
RETURN ((a).currency, (a).amount + (b).amount);
END;
$$ IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE plp
CREATE OPERATOR +(leftarg=true_money, rightarg=true
CREATE FUNCTION true_money_sum(state true_money, va
BEGIN
IF value IS NULL AND state IS NULL THEN
RETURN NULL;
END IF;
IF state IS NULL THEN
RETURN value;
END IF;
RETURN state + value;
END;
$$ IMMUTABLE LANGUAGE plpgsql;
CREATE AGGREGATE sum (true_money) (sfunc = true_mon
SELECT (price).currency AS currency, sum(price) AS total
FROM products GROUP BY currency;
52. Play with it yourself!
https://gist.github.com/Envek/780b917e72a86c123776ee763b8dd986
54. Gems for datatypes
torque-postgresql — standard datatypes not (yet) supported by Rails.
activerecord-postgis-adapter — all the power of PostGIS extension in Ruby.
activerecord-postgres_enum — support enum in migrations and schema (before
Rails 7)
55. Gems for other PostgreSQL features
Because PostgreSQL is much more than datatypes.
ActiveRecordExtended — functions for datatypes and DSL for queries
fx — make schema.rb great again with triggers
scenic — add support for views
order_query — keyset-pagination for your models
postgresql_cursor — get more data from the database efficiently
And also martian pg_trunk gem to rule them all get fx , scenic , object
dependency management and more within a single gem!
` ` ` `
structure.sql no more!
Make schema.rb great again!
`
`
`
`
pg_trunk gem