29
.
04
.
2024
25
.
03
.
2024
Postgresql
Backend

Table partitioning types - Postgres Stories

Jarosław Kowalewski
Ruby Developer

In this article, I'll provide a quick recap on the types of partitioning available in PostgreSQL. I will cover the basics, including creating partitioned tables from scratch, which is, let’s be honest, almost never the case. Usually partitioning comes to the table when data set is already overloaded and performance issues need to be fixed by it. I will cover such case in another article, but it is nice to have some background before diving deeper into the topic.

Different ways of partitioning

Before PostgreSQL 10 was released, there was no native support for partitioned tables.The process was managed by using inheritance and creating child tables. Triggers were used to insert data into these child tables. Version 10 introduced native support for a feature known as Declarative Partitioning, which allows partitioning by range and value (also known as list partitioning). PostgreSQL 11 expanded this feature by adding the ability to partition by hash.

Range partitioning

Range partitioning divides data into segments based on a particular range of values in a column. It is especially beneficial for time-series data or any data with a natural order. Each partition corresponds to a unique range of values, and data within that range is stored in the respective partition.A good example of this is partitioning by the created_at column - here considered for users table:

CREATE TABLE IF NOT EXISTS public.users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY RANGE(created_at);

Next, let’s create our partitions. Each partition's definition must specify bounds that match the partitioning method and partition key of the parent. To be clear - each range's bounds are inclusive at the start of range and exclusive at the end.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

Now, after adding a new user to the users table, it will be inserted into the appropriate partition according to its creation date.

If I attempt to create overlapping partitions, such as:

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2023-12-01') TO ('2024-02-01');

It will result in an error:

ERROR: partition "users_2" would overlap partition "users_1”

Additionally, considering the scenario where a value with a specific date doesn't fit into any range-partitioned tables, like this

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, '2025-05-01', '2025-05-01');

will return a message:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (created_at) = (2024-05-01 00:00:00).

The conclusion drawn from this type of error is that it may be beneficial to establish a cron job, such as pg_cron or another scheduler, which will periodically create new partitions for a specific time range. For range partitioning it is also possible to create default partition, which may hold values which don’t fit to any specified partition. It allows to avoid above error.

List partitioning

This type of partitioning relies on selecting specific values present in a column and dividing by them. This particular one allows to choose specific, not overlapping values instead of range of values - a good example would be partition by category or status columns.

CREATE TABLE IF NOT EXISTS users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY LIST(status);

I assumed here that I am using integer as enum in application - it is possible to add additional constraints if needed.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES IN (0);

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES IN (1);

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES IN (2);

Now, I can create our insert statement to correctly add record to specific partition:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, NOW(), NOW());

It is worth to mention what actually happens if you try to use status which is not defined in partition, something like:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',4, NOW(), NOW());

As a result of such query error is received:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (4).

Similar case happens when I do not specify value of status at all:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (null).

This is crucial because without partitioning, I could smoothly add new values to the enum in our codebase. Before adding new status value, you need to define a new partition. Alternatively, define a default partition or partition for null value. Choose the right approach according to your business needs.

Hash partitioning

Hash partitioning is another method of table partitioning in PostgreSQL. Data is divided into partitions based on the hash value of a specific column. This approach is useful if you want to distribute data evenly across partitions, allowing for balanced data load. For this one, I’m going to distribute data over foreign key, like user_id or customer_id.

CREATE TABLE IF NOT EXISTS orders
(
    id bigint NOT NULL DEFAULT nextval('orders_id_seq'::regclass),
    user_id bigint,
    amount numeric(10,2) NOT NULL,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT orders_pkey PRIMARY KEY (id),
    CONSTRAINT fk_rails_f868b47f6a FOREIGN KEY (user_id)
        REFERENCES users (id) MATCH SIMPLE
) PARTITION BY HASH(user_id);

Then, following partitions looks like that:

CREATE TABLE orders_1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 0);

CREATE TABLE orders_2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 1);

CREATE TABLE orders_3 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 2);

CREATE TABLE orders_4 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 3);

CREATE TABLE orders_5 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 4);

And here comes example of inserting new records for each partition:

// orders_1
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (1,10, NOW(), NOW());
// orders_2
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (2,10, NOW(), NOW());
// orders_3
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (3,10, NOW(), NOW());
// orders_4
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (4,10, NOW(), NOW());
// orders_5
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (5,10, NOW(), NOW());

To clarify, which I found misleading during my research, PostgreSQL does not directly calculate modulo from the column's value (as, for example, hash partitioning is also possible for varchar column).

PostgreSQL uses a hash function to balance partitions as evenly as possible without any particular order. This approach does not make much sense for columns with low cardinality, as demonstrated by the example above, where I obtained certain results for only 5 different values:

orders_1

User ID Amount Created At Updated At
1 10
4 10
5 10

orders_2

User ID Amount Created At Updated At
2 10
3 10

Records were distributed among only two of the five partitions, without any specific order, leaving the rest of the partitions (orders3, orders4 and orders_5) empty. Therefore, for a column with a low variety of values, applying hash partitioning would be pointless since the records won't be distributed evenly.

Hash partitioning proves useful in a select number of scenarios. It is primarily used for I/O improvements, such as parallel writing, especially when the cardinality of the partitioning column is high. Additionally, it enhances the efficiency of autovacuuming, as vacuuming smaller tables is faster and more effective.

Last but not least - if our write statement contains null value in partitioning attribute, it is assigned to partition with reminder 0 by default. Also, for hash partitioning it is not possible to create default partition.

Wrapping up

To sum up - just as careful consideration is required when deciding on table partitioning, the selection of the appropriate partitioning method should also be well-thought-out. Shorty describing best cases for each - pick range partitioning for time-series data - it very natural way for any archiving purposes. List partitioning would be best for any specific values, or group of values. Hash partitioning would be intuitively least likely choice, as it doesn’t rely specifically on data structure, but for some scenarios with I/O improvements and parallel writing it could be a good alternative. Choose wisely!

Articles in this series

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

Writing Chrome Extensions Is (probably) Easier Than You Think

14
.
11
.
2023
Antoni Smoliński
Tutorial
Frontend
Backend

Bounded Context - DDD in Ruby on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Domain-Driven Design
Backend
Tutorial

The origin of Poltrax development - story of POLTRAX (part 2)

29
.
11
.
2023
Stanisław Zawadzki
Ruby on Rails
Startups
Business
Backend

Ruby Meetups in 2022 - Summary

14
.
11
.
2023
Michał Łęcicki
Ruby on Rails
Visuality
Conferences

Repository - DDD in Ruby on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Domain-Driven Design
Backend
Tutorial

Example Application - DDD in Ruby on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Domain-Driven Design
Backend
Tutorial

How to launch a successful startup - story of POLTRAX (part 1)

14
.
11
.
2023
Michał Piórkowski
Ruby on Rails
Startups
Business

How to use different git emails for different projects

14
.
11
.
2023
Michał Łęcicki
Backend
Tutorial

Aggregate - DDD in Ruby on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Domain-Driven Design
Backend
Tutorial

Visuality at wroc_love.rb 2022: It's back and it's good!

14
.
11
.
2023
Patryk Ptasiński
Ruby on Rails
Conferences
Ruby

Our journey to Event Storming

14
.
11
.
2023
Michał Łęcicki
Visuality
Event Storming

Should I use Active Record Callbacks?

14
.
11
.
2023
Mateusz Woźniczka
Ruby on Rails
Backend
Tutorial

How to rescue a transaction to roll back changes?

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Backend
Ruby
Tutorial

Safe navigation operator '&.' vs '.try' in Rails

14
.
11
.
2023
Mateusz Woźniczka
Ruby on Rails
Backend
Ruby
Tutorial

What does the ||= operator actually mean in Ruby?

14
.
11
.
2023
Mateusz Woźniczka
Ruby on Rails
Backend
Ruby
Tutorial

How to design an entity - DDD in Ruby on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Domain-Driven Design
Backend
Tutorial

Entity - DDD in Ruby on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Domain-Driven Design
Backend
Tutorial

Should I use instance variables in Rails views?

14
.
11
.
2023
Mateusz Woźniczka
Ruby on Rails
Frontend
Backend
Tutorial

Data Quality in Ruby on Rails

14
.
11
.
2023
Michał Łęcicki
Ruby on Rails
Backend
Software

We started using Event Storming. Here’s why!

14
.
11
.
2023
Mariusz Kozieł
Event Storming
Visuality

First Miłośnicy Ruby Warsaw Meetup

14
.
11
.
2023
Michał Łęcicki
Conferences
Visuality

Should I use Action Filters?

14
.
11
.
2023
Mateusz Woźniczka
Ruby on Rails
Backend
Tutorial

Value Object - DDD in Ruby on Rails

17
.
03
.
2024
Paweł Strzałkowski
Ruby on Rails
Domain-Driven Design
Backend
Tutorial