Postgres:

The Nice Bits

Schemas are great


ALTER TABLE accounts ADD CONSTRAINT positive_balance
                     CHECK (balance >= 0);

Schema changes often aren't great


  mysql> ALTER TABLE widgets ADD COLUMN size INTEGER;


Zero-downtime schema changes

In Postgres, all* schema changes are transactional.

Some schema changes are O(1):

  • Adding a nullable column without a default
  • Dropping a column
  • Some trivial column type changes
  • (*) CREATE INDEX CONCURRENTLY

The usual scenario


  ALTER TABLE widgets ADD COLUMN size INTEGER DEFAULT 0 NOT NULL;

Blocks writes to the table until it's set the default on all rows.

Zero-downtime changes


ALTER TABLE widgets ADD COLUMN size INTEGER;
ALTER TABLE widgets ALTER COLUMN size DEFAULT 0;
UPDATE widgets SET size = 0;
ALTER TABLE widgets ALTER COLUMN size NOT NULL;

All these statements are O(1), except the UPDATE which doesn't block writes.

Zero-downtime changes

Completely decouple schema changes from deploys:

  1. Add the column
  2. Populate the values
  3. Add constraints
  4. Deploy code which relies on it
  5. Drop old columns

Schemaless

Schemaless column types

  • hstore (since years ago)
  • XML (since 8.3)
  • json (complete support in 9.3)

Schemaless: JSON


CREATE TABLE articles (
  id SERIAL NOT NULL PRIMARY KEY,
  slug TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  data JSON
);

INSERT INTO articles (slug, name, data) VALUES
  ('cats', 'Cats', '{"body": "…",
                     "type": "post",
                     "keywords": ["cat", "mammal"]
                    }'
  );

Querying JSON


SELECT name FROM articles WHERE data->>'type' = 'post';

SELECT data->>'keywords'->>0 FROM articles;

Indexing JSON fields


CREATE INDEX articles_data ON articles USING GIN(data);

or


CREATE INDEX articles_type ON articles(data->>'type');

Code in the database

Stored procedure languages

Included: pgSQL, TCL, Perl, Python

Extensions: Javascript, Ruby, Scheme, Java (!), PHP (‽), ...

Everyone loves Javascript


CREATE TABLE ratings (id int, count int, total int, average float);

CREATE FUNCTION add_rating(id int, rating int)
 RETURNS float AS
$$
 var data = plv8.execute("SELECT count, total, average FROM ratings
                          WHERE id = $1", [id])[0];
 var newTotal = data.total + rating;
 var newCount = data.count + 1;
 var newAverage = (newTotal / newCount);
 plv8.execute("UPDATE ratings SET count = $1, total = $2, average = $3
               WHERE id = $4",
 [newCount, newTotal, newAverage, id]);
 return newAverage;
$$
LANGUAGE plv8;

Example


INSERT INTO ratings VALUES (1, 0, 0, 0);

SELECT add_rating(1, 5);
 add_rating
------------
 5

SELECT add_rating(1, 4);
 add_rating
------------
 4.5

Backups

Backups: log replication

  • Based on Postgres's write-ahead logs (WAL)
  • Automated, hassle-free incremental backups
  • Lock-free, concurrent base backups
  • Always less than 5 minutes old

WAL archiving: WAL-E

  • Automates base backups
  • Saves log segments to S3 or other block stores
  • Compresses (LZO) and encrypts (GPG)
  • Automates segment retrieval

Point-in-time recovery

Someone accidentally ran an UPDATE without a WHERE clause at 17:21:30...


restore_command = 'envdir /etc/wal-e.d/env wal-e wal-fetch "%f" "%p"'
recovery_target_time = '2014-03-01 17:21:29 UTC'

Streaming replication & hot standby

Slave servers connect back to the master and receive changes live

Synchronous replication


synchronous_standby_names = 'slave1 slave2'

Postgres waits for at least one of the slaves to write the transaction to disk before completing a commit

Sharding and clustering

  • err.

GIS and such

PostGIS is great, but it's a bit weird.

Creating a GIS table


CREATE TABLE tube_stations (
    id SERIAL NOT NULL PRIMARY KEY,
    name TEXT NOT NULL
);
SELECT AddGeometryColumn('tube_stations', 'geom', 4326, 'POINT', 2);

Add some rows


INSERT INTO tube_stations(name, geom) VALUES
    ('Holborn', ST_GeomFromText('POINT(-0.1200 51.5174)', 4326));
INSERT INTO tube_stations(name, geom) VALUES
    ('Old Street', ST_GeomFromText('POINT(0.0873 51.5263)', 4326));

Adding an index


  CREATE INDEX tube_stations_geom_gist ON tube_stations
                          USING GIST (geom);

Querying with the index


SELECT name FROM tube_stations WHERE
    geom && SetSRID('BOX(-0.121 51.512, -0.097 51.524)'::box2d, 4326);
     name
---------------
 Holborn
 Barbican
 Blackfriars
 Chancery Lane
 Farringdon
 Holborn
 St. Paul's

End



Some examples stolen with permission from Andrew Godwin
(Rapid Schema Development with Postgres at AYB13)