29
.
04
.
2024
25
.
03
.
2024
Postgresql
Backend

Table partitioning types - Postgres Stories

Jarosław Kowalewski
Ruby Developer

In this article, I'll provide a quick recap on the types of partitioning available in PostgreSQL. I will cover the basics, including creating partitioned tables from scratch, which is, let’s be honest, almost never the case. Usually partitioning comes to the table when data set is already overloaded and performance issues need to be fixed by it. I will cover such case in another article, but it is nice to have some background before diving deeper into the topic.

Different ways of partitioning

Before PostgreSQL 10 was released, there was no native support for partitioned tables.The process was managed by using inheritance and creating child tables. Triggers were used to insert data into these child tables. Version 10 introduced native support for a feature known as Declarative Partitioning, which allows partitioning by range and value (also known as list partitioning). PostgreSQL 11 expanded this feature by adding the ability to partition by hash.

Range partitioning

Range partitioning divides data into segments based on a particular range of values in a column. It is especially beneficial for time-series data or any data with a natural order. Each partition corresponds to a unique range of values, and data within that range is stored in the respective partition.A good example of this is partitioning by the created_at column - here considered for users table:

CREATE TABLE IF NOT EXISTS public.users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY RANGE(created_at);

Next, let’s create our partitions. Each partition's definition must specify bounds that match the partitioning method and partition key of the parent. To be clear - each range's bounds are inclusive at the start of range and exclusive at the end.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

Now, after adding a new user to the users table, it will be inserted into the appropriate partition according to its creation date.

If I attempt to create overlapping partitions, such as:

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2023-12-01') TO ('2024-02-01');

It will result in an error:

ERROR: partition "users_2" would overlap partition "users_1”

Additionally, considering the scenario where a value with a specific date doesn't fit into any range-partitioned tables, like this

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, '2025-05-01', '2025-05-01');

will return a message:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (created_at) = (2024-05-01 00:00:00).

The conclusion drawn from this type of error is that it may be beneficial to establish a cron job, such as pg_cron or another scheduler, which will periodically create new partitions for a specific time range. For range partitioning it is also possible to create default partition, which may hold values which don’t fit to any specified partition. It allows to avoid above error.

List partitioning

This type of partitioning relies on selecting specific values present in a column and dividing by them. This particular one allows to choose specific, not overlapping values instead of range of values - a good example would be partition by category or status columns.

CREATE TABLE IF NOT EXISTS users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY LIST(status);

I assumed here that I am using integer as enum in application - it is possible to add additional constraints if needed.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES IN (0);

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES IN (1);

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES IN (2);

Now, I can create our insert statement to correctly add record to specific partition:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, NOW(), NOW());

It is worth to mention what actually happens if you try to use status which is not defined in partition, something like:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',4, NOW(), NOW());

As a result of such query error is received:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (4).

Similar case happens when I do not specify value of status at all:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (null).

This is crucial because without partitioning, I could smoothly add new values to the enum in our codebase. Before adding new status value, you need to define a new partition. Alternatively, define a default partition or partition for null value. Choose the right approach according to your business needs.

Hash partitioning

Hash partitioning is another method of table partitioning in PostgreSQL. Data is divided into partitions based on the hash value of a specific column. This approach is useful if you want to distribute data evenly across partitions, allowing for balanced data load. For this one, I’m going to distribute data over foreign key, like user_id or customer_id.

CREATE TABLE IF NOT EXISTS orders
(
    id bigint NOT NULL DEFAULT nextval('orders_id_seq'::regclass),
    user_id bigint,
    amount numeric(10,2) NOT NULL,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT orders_pkey PRIMARY KEY (id),
    CONSTRAINT fk_rails_f868b47f6a FOREIGN KEY (user_id)
        REFERENCES users (id) MATCH SIMPLE
) PARTITION BY HASH(user_id);

Then, following partitions looks like that:

CREATE TABLE orders_1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 0);

CREATE TABLE orders_2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 1);

CREATE TABLE orders_3 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 2);

CREATE TABLE orders_4 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 3);

CREATE TABLE orders_5 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 4);

And here comes example of inserting new records for each partition:

// orders_1
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (1,10, NOW(), NOW());
// orders_2
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (2,10, NOW(), NOW());
// orders_3
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (3,10, NOW(), NOW());
// orders_4
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (4,10, NOW(), NOW());
// orders_5
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (5,10, NOW(), NOW());

To clarify, which I found misleading during my research, PostgreSQL does not directly calculate modulo from the column's value (as, for example, hash partitioning is also possible for varchar column).

PostgreSQL uses a hash function to balance partitions as evenly as possible without any particular order. This approach does not make much sense for columns with low cardinality, as demonstrated by the example above, where I obtained certain results for only 5 different values:

orders_1

User ID Amount Created At Updated At
1 10
4 10
5 10

orders_2

User ID Amount Created At Updated At
2 10
3 10

Records were distributed among only two of the five partitions, without any specific order, leaving the rest of the partitions (orders3, orders4 and orders_5) empty. Therefore, for a column with a low variety of values, applying hash partitioning would be pointless since the records won't be distributed evenly.

Hash partitioning proves useful in a select number of scenarios. It is primarily used for I/O improvements, such as parallel writing, especially when the cardinality of the partitioning column is high. Additionally, it enhances the efficiency of autovacuuming, as vacuuming smaller tables is faster and more effective.

Last but not least - if our write statement contains null value in partitioning attribute, it is assigned to partition with reminder 0 by default. Also, for hash partitioning it is not possible to create default partition.

Wrapping up

To sum up - just as careful consideration is required when deciding on table partitioning, the selection of the appropriate partitioning method should also be well-thought-out. Shorty describing best cases for each - pick range partitioning for time-series data - it very natural way for any archiving purposes. List partitioning would be best for any specific values, or group of values. Hash partitioning would be intuitively least likely choice, as it doesn’t rely specifically on data structure, but for some scenarios with I/O improvements and parallel writing it could be a good alternative. Choose wisely!

Articles in this series

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

Ula Sołogub - SQL Injection in Ruby on Rails

The Deadly Sins in RoR security - SQL Injection

14
.
11
.
2023
Urszula Sołogub
Backend
Ruby on Rails
Software
Michal - Highlights from Ruby Unconf 2024

Highlights from Ruby Unconf 2024

14
.
11
.
2023
Michał Łęcicki
Conferences
Visuality
Cezary Kłos - Optimizing Cloud Infrastructure by $40 000 Annually

Optimizing Cloud Infrastructure by $40 000 Annually

14
.
11
.
2023
Cezary Kłos
Backend
Ruby on Rails

Smooth Concurrent Updates with Hotwire Stimulus

14
.
11
.
2023
Michał Łęcicki
Hotwire
Ruby on Rails
Software
Tutorial

Freelancers vs Software house

14
.
11
.
2023
Michał Krochecki
Visuality
Business

Table partitioning in Rails, part 2 - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Backend
Postgresql
Ruby on Rails

N+1 in Ruby on Rails

14
.
11
.
2023
Katarzyna Melon-Markowska
Ruby on Rails
Ruby
Backend

Turbo Streams and current user

29
.
11
.
2023
Mateusz Bilski
Hotwire
Ruby on Rails
Backend
Frontend

Showing progress of background jobs with Turbo

14
.
11
.
2023
Michał Łęcicki
Ruby on Rails
Ruby
Hotwire
Frontend
Backend

Table partitioning in Rails, part 1 - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Backend
Ruby on Rails

Table partitioning types - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Backend

Indexing partitioned table - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Backend
Postgresql
SQL Views in Ruby on Rails

SQL views in Ruby on Rails

14
.
11
.
2023
Jan Grela
Backend
Ruby
Ruby on Rails
Postgresql
Design your bathroom in React

Design your bathroom in React

14
.
11
.
2023
Bartosz Bazański
Frontend
React
Lazy Attributes in Ruby - Krzysztof Wawer

Lazy attributes in Ruby

14
.
11
.
2023
Krzysztof Wawer
Ruby
Software

Exporting CSV files using COPY - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Ruby
Ruby on Rails
Michał Łęcicki - From Celluloid to Concurrent Ruby

From Celluloid to Concurrent Ruby: Practical Examples Of Multithreading Calls

14
.
11
.
2023
Michał Łęcicki
Backend
Ruby
Ruby on Rails
Software

Super Slide Me - Game Written in React

14
.
11
.
2023
Antoni Smoliński
Frontend
React
Jarek Kowalewski - ILIKE vs LIKE/LOWER - Postgres Stories

ILIKE vs LIKE/LOWER - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Ruby
Ruby on Rails
Postgresql

A look back at Friendly.rb 2023

14
.
11
.
2023
Cezary Kłos
Conferences
Ruby

Debugging Rails - Ruby Junior Chronicles

14
.
11
.
2023
Piotr Witek
Ruby on Rails
Backend
Tutorial

GraphQL in Ruby on Rails: How to Extend Connections

14
.
11
.
2023
Cezary Kłos
Ruby on Rails
GraphQL
Backend
Tutorial

Tetris on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Backend
Frontend
Hotwire

EURUKO 2023 - here's what you've missed

14
.
11
.
2023
Michał Łęcicki
Ruby
Conferences