Hello again! In the previous part about partitioning tables in Rails, I introduced you to the basics of how partitioning works and how you can implement it on a freshly created table. Also, I promised that I would show you how to handle it for an already existing table which holds some (probably much) data. So, let’s dive into the topic.
Migrating existing table
Partitioning primarily focuses on large datasets, which is seldom a consideration at the beginning of a project. It's rare to plan table partitioning upfront.
Typically, the situation involves hitting the performance threshold and deciding to partition an existing large table. If this table is huge it likely plays a significant role in the system. Given its widespread use, continuous availability is crucial. Any downtime would be unpleasant to the end-user.
Meanwhile, we aim to migrate or modify a large dataset, a process that often involves locking or shutting down the table for longer period. That's why it's important to find a solution that minimizes potential unavailability. This is particularly significant since most PostgreSQL documentation and other tutorials focus on development of partitioning from scratch.
The basic approach is simple:
- Create a new, partitioned table.
- Create partitions.
- Use INSERT INTO/SELECT or COPY to transfer data into the partitioned table.
- Carry out any additional steps like recreating indexes/triggers, etc.
- Switch or rename the old table and the partitioned table.
- Delete the old table.
At first glance, this method seems effective, but it complicates write operations. Despite having triggers on update on the old table, maintaining complete integrity between the old and new tables is a challenge. Furthermore, this approach requires at least twice the storage for the operation.
Partitioning a large table with no down time
Let’s see alternative, better approach - I will present it for 2 different types of partitioning - range and list. More about such types you can read in separate article about partitioning types
Range partitioning
This is the first part of transforming our table into a partitioned one, which includes the following steps:
- Rename the old table
- Create a new partitioned table
- Attach the old table as the default partition for all historical data
- Create partitions for future use
BEGIN;
ALTER TABLE users RENAME TO users_temp;
CREATE TABLE users
...
PARTITION BY RANGE(created_at)
ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
// here we assume that our current data is ending before 2023-12-01
CREATE TABLE users_2023_12 PARTITION OF users
FOR VALUES FROM ('2023-12-01') TO ('2024-01-01');
CREATE TABLE users_2024_01 PARTITION OF users
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
// add indexes/triggers for new partitions
COMMIT;
The second part will involve transforming historical data into separate partitions:
The steps described below are as follows:
- Detach the default partition
- Create a new partition for a portion of the historical data
- Remove data from the historical table and insert it into the new partition
- Reattach the old table
BEGIN;
ALTER TABLE users DETACH PARTITION users_temp;
CREATE TABLE users_2023_11 PARTITION OF users
FOR VALUES FROM ('2023-11-01') TO ('2023-12-01');
// add indexes/triggers
WITH copied_data AS (
DELETE FROM users_temp
WHERE created_at >= '2023-11-01' AND created_at < '2023-12-01'
RETURNING *
)
INSERT INTO users
SELECT * FROM copied_data;
ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;
We detach the default partition because it's impossible to add a new partition if records in the default one belong to the new one.
This example pertains to range partitioning. An additional example for list partitioning is the next one, which follows a similar process.
List partitioning
BEGIN;
ALTER TABLE users RENAME TO users_temp;
CREATE TABLE users
...
PARTITION BY LIST(status)
ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;
In this case, additional partitions aren't created by default. This is because we assume that all statuses are in use. Similarly to range partitioning, we cannot create partitions for common values.
BEGIN;
ALTER TABLE users DETACH PARTITION users_temp;
CREATE TABLE users_0 PARTITION OF users
FOR VALUES IN (0)
//add indexes/triggers
WITH copied_data AS (
DELETE FROM users_temp
WHERE status = 0
RETURNING *
)
INSERT INTO users
SELECT * FROM copied_data;
ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;
Once again, we detach the default partition, create a new one, transfer data to it, and reattach the default partition.
Using Views and Triggers to introduce Partitioning
The last approach that I found handy is to use different mechanisms of Postgres - views and triggers. Long story short - this one has following steps:
- Create partitioned table along with partitions/indexes
- Rename old table
- Create view with UNION ALL both tables
- Create proper triggers for writing operations - INSERT is executed only on partitioned table, UPDATE/DELETE are proceeded on both tables
- Move data from old table to new one in batches
- Remove view and old table
You can read more about this approach in this article. The benefit of this method is that it allows us to easily manage the size of batches we want to transfer into partitions. In the previous method, batch sizes were defined by the new partition's size (either by the range of values or by a specific value in list partitioning). This restricts our ability to control transaction duration lengths and the locks they caused.
Partitioning large tables in Rails
Now, its time to prepare migration for partitioning within Rails app. Exactly as in previous article, I am going to utilize https://github.com/rkrage/pg_party gem. In this case, I’ve chosen list partitioning by the status
field. The first migration will initialize the partitioned table and attach the old table as the default partition.
class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
def up
rename_table :users, :users_temp
create_list_partition :users, partition_key: :status, template: false do |t|
t.string :email, null: false
t.string :first_name
t.string :last_name
t.string :username, null: false
t.integer :status, default: 0
t.jsonb :settings, default: {}
t.timestamps
end
attach_default_partition :users, :users_temp
end
def down
detach_partition :users, :users_temp
drop_table :users
rename_table :users_temp, :users
end
end
The following steps outline the process I've used in SQL:
- Rename the old table as temporary
- Create a new partitioned table
- Attach the old table as default
Next, I will create the first list partition for the status
value equal to 0. To simplify data migration, I will also create a temporary model. Alternatively, if you need to, you can also persist such a model directly in a migration.
class UsersTemp < ApplicationRecord; end
Following migration:
class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
def up
detach_partition :users, :users_temp
create_list_partition_of :users_0, :users, values: 0
ActiveRecord::Base.transaction do
UsersTemp.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
User.insert_all!(batch.map { |user_temp| user_temp.attributes.except("id") })
end
UsersTemp.where(status: 0).delete_all
end
attach_default_partition :users, :users_temp
end
def down
detach_partition :users, :users_temp
ActiveRecord::Base.transaction do
User.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
UsersTemp.insert_all!(batch.map { |user| user.attributes.except("id") })
end
end
detach_partition :users, :users_0
drop_table :users_0
attach_default_partition :users, :users_temp
end
end
Again, I’ve tried to reproduce steps from SQL:
- Detach default partition
- Create a new partition for specific value of
status
= 0 - Migrate data from default detached partition
- Attach default partition again
Such migration should be prepared for every possible value of status
field. I assumed, that amount of status
values are limited.
In the revert migration, I have the option to detach the users_0
partition and directly copy its data into the users
partitioned table. This would automatically assign the data to the default partition, and then I could simply drop this partition.
To keep with the ActiveRecord approach, this might require creating a temporary model like Users0
or correctly assigning the table_name. Alternatively, writing raw SQL with ActiveRecord::Base.connection.execute
for this data migration could allow us to avoid avoid the need for temporary models to access our data. This approach should be considered depending on the scenario.
Summary
In general, partitioning existing tables is always a significant challenge that may demand even weeks of proper planning and choosing the right approach for the specific case. It's important to highlight a few issues that warrant concern:
- Maintaining data consistency between tables
- Minimizing lock duration and downtimes
- Keeping an eye on possible storage usage
Following these points should be a core part of preparing such a migration and should lead to a happy ending. With that, I'll wrap up this quite exciting (I trust not only for me) journey through partitioning in PostgreSQL. I’m happy that I took part in explaining you the details of this topic and I hope you enjoyed it!
Articles in this series
- Table partitioning in Rails, part 1
- Table partitioning in Rails, part 2
- Table partitioning types
- Indexing partitioned table