Rails Migrations Best Practices: Why Data Does Not Belong There

After a few years of working with Ruby on Rails, I had the opportunity to experience many different approaches to data migration. Some worked well. Others did not. At times, the experience was bittersweet — especially in my junior years, when mistakes turned into valuable lessons but occasionally struck back at the worst possible moment. Over time, my team and I kept refining our approach. Sometimes we introduced rules that only added extra work. Sometimes those same rules saved us from real trouble. In this article, I share the lessons I learned about handling data changes in Rails — and why migrations are often not the right place for them.

Database migrations in Ruby on Rails

Migrations in Ruby on Rails are meant to be used specifically to manipulate the structure of the database — to create, alter, or delete tables and rows, apply indexes, and so on. They can be generated with a simple console command, rails generate migration ..., which produces a file with a timestamp prefix in its name. This ensures that migration files are always in chronological order and are meant to serve as a history of changes to the database structure. When a migration is run, its timestamp is saved as a version in the database table schema_migrations, ensuring that no change is run twice or skipped. Ideally, running all migrations — no matter how old — on a freshly created, empty database should result in a fully functional structure.

Change, up and down

By default, Ruby on Rails generates a migration with a predefined structure — either with an empty change method or already populated based on the arguments passed to the generator command.

class DoSomethingWithDatabase< ActiveRecord::Migration[X.Y]
  def change
    # Apply changes here
  end
end

It is entirely up to the developer how the change method is used and what kind of modifications are introduced. This freedom can be used either wisely or recklessly — resulting in a well-designed migration or a complete disaster. Before discussing those extremes, it is worth covering the basics. In most cases, common database operations can be easily applied and reverted when needed. This is especially important when a deployment goes wrong and the application must be rolled back to a previous version. Rollbacks are also extremely useful during development, particularly in projects with a complex or time-consuming setup — it is almost always easier to revert a migration than to recreate the database from scratch. Simple operations such as creating tables, adding columns, or modifying indexes are fully reversible, and Rails handles them gracefully. Some operations, however, while technically reversible, may leave a permanent mark. For example, once a table or a row is deleted, it can be restored, but its data is permanently lost. Finally, any non-standard use of migrations is considered irreversible, and attempting to roll it back will cause Rails to raise an exception. There are valid reasons for performing uncommon operations in migrations — sometimes for performance reasons, sometimes to leverage database-specific features. One such example is executing raw SQL inside a migration. Changes like this cannot be reverted automatically. In these cases, Rails allows developers to define custom rollback behavior by implementing up and down methods instead of change. The up method contains instructions for applying the migration, while down defines how to revert it.

class DoIrreversibleChange < ActiveRecord::Migration[X.Y]
  def change
    execute "CREATE INDEX CONCURRENTLY index_users_on_email ON users(email)"
  end
end

class DoReversibleChange < ActiveRecord::Migration[X.Y]
  def up
    execute "CREATE INDEX CONCURRENTLY index_users_on_email ON users(email)"
  end

  def down
    execute "DROP INDEX CONCURRENTLY index_users_on_email"
  end
end

Misinterpreting the philosophy

The trouble begins when the underlying assumptions behind migrations are misunderstood. It is true that migrations are meant for database manipulation — specifically, for introducing changes to the database structure. Some developers, however, interpret them as a convenient place to handle data changes as well, and this is where the ground starts to become dangerous. At first glance, this approach may feel safe, as it allows data operations to be executed immediately after structural changes. In reality, it introduces several traps that can become painful in the long run. Making a change irreversible is only one of the risks — one that can, to some extent, be mitigated by using up and down methods. One of the most naive mistakes developers make is relying on the Rails ORM inside migrations — for example, by directly calling application models. The ORM always reflects the current version of the code: relationships, validations, callbacks, and many other behaviors are defined at the model level. These aspects inevitably change over time — tables grow, new relations are added, validations evolve, and some tables are renamed or even removed entirely. Old migrations, however, are rarely revisited and updated to match the current state of the models. As a result, migrations written years ago may no longer be compatible with the application code they implicitly depend on. Consider a simple example. Initially, there is a User table containing only a login and a password. At some point, a name column is added, and existing records need to be populated with the value of login. Later still, the model is modernized and starts requiring an email address; a validation enforcing its presence is introduced. This works correctly in all existing environments. However, from that moment on, any attempt to recreate the database from scratch — such as during local development — fails when running all migrations. The failure occurs in the PopulateNameWithLogin migration. Although the email column does not yet exist at that point in the migration history, the model validation is already in place. As a result, every update attempt raises an exception because the email validation fails for all records.

# Migration from the past
class PopulateNameWithLogin < ActiveRecord::Migration[X.Y]
  def change
    User.find_each do |user|
      user.update!(name: user.login) # this line will raise an error when a model has a validation introduced
    end
  end
end

# Newer migration
class AddEmailToUser < ActiveRecord::Migration[X.Y]
  def change
    add_column :users, :email, :string
  end
end

# A user model
class User < ApplicationRecord
  validates :email, presence: true
end

This example can actually be handled by rewriting it as a plain SQL query or skipping ORM validations. However, that is not the only hazard of this approach. Besides it is irreversible, another risk of performing data manipulation inside migrations is related to performance. An operation that appears fast in a local environment can become extremely slow in production due to the size of the data involved. Updating records one by one inside a transaction may lock an entire table for an extended period of time and can even result in a timeout, preventing the migration from completing successfully. A failure during a production deployment is a highly stressful event that often requires ad hoc investigation and recovery. It is far better to prevent such situations altogether than to deal with them in a moment of panic.

How to handle data manipulations?

By now, it should be clear that performing data manipulations inside migrations can be dangerous — no one wants to deal with a failing migration during a production release. Such failures often occur at the least expected moment and feel almost like a betrayal, especially when everything worked smoothly in other environments. If migrations — which may feel like the most trustworthy place to perform such operations — are risky, what options are left? An easy solution to these concerns is to maintain a dedicated directory for simple rake tasks. The development team can plan to run the appropriate scripts after a release, if needed. Even if a script fails for some reason, the resulting chaos is far less severe than in a situation where a migration fails and blocks database structure changes.

post_release_scripts/
├─ move_from_table_a_to_b.rb
├─ move_x_to_y.rb
├─ populate_users_emails.rb
├─ update_products_prices.rb

As the application grows and business needs evolve, however, this approach quickly reveals its imperfections. It is manageable when only a few rake tasks are required, but in real-world scenarios — where multiple users actively work with the application — the number of such scripts can grow rapidly. If each release introduces at least one new task, after a year the directory may be full of loosely related data manipulation scripts, with no one remembering their original purpose. Moreover, every team member must remember which scripts need to be run locally, in what order, and at what time, in order to keep the data consistent — something that is very difficult to manage in practice. At this point, introducing some form of automation becomes the intuitive next step. Automation not only helps developers maintain local databases but also reduces the risk of running the wrong task or accidentally executing the same script multiple times. This is where solutions can easily become overly complex or overengineered. Fortunately, there is no need to reinvent the wheel — the problem has already been solved. One approach that has proven solid in practice is using the after_party gem. It is simple, yet it addresses most of the issues described above. Each after-party task is generated via the terminal and receives a timestamp. After proper configuration, the gem creates a dedicated database table to track which tasks have already been executed, ensuring that no code is run more than once (unless explicitly intended). It can be configured to automatically execute all new tasks right after deployment, once migrations have completed. If a task fails, it does not result in a catastrophe — the error is raised, but the remaining tasks are still executed. One important caveat, however, is worth mentioning: unlike migrations, after-party tasks are not run sequentially in a strict order but effectively all at once. For this reason, it is unsafe to assume that one task can rely on data produced by another.

lib/
├─ tasks/
│ ├─ deployment/
│ │ ├─ 20251120123123_populate_users_emails.rb
│ │ ├─ 20251206321123_move_from_table_a_to_b.rb
│ │ ├─ 20251227001222_move_x_to_y.rb
│ │ ├─ 20260131345514_update_products_prices.rb

It is also worth mentioning that moving data manipulations from migrations to rake tasks may resolve many issues, but not all of them. Poorly designed operations can still lock tables, trigger validations, or negatively impact live traffic if executed without care.

Data consistency — test coverage

Last but not least, there is one more important aspect of data manipulation that must be addressed: testing. Any code that is written should also be covered by tests. Simply running a script and concluding "it worked" is not sufficient — even if it was executed against production data and appears to have produced the desired result. At first glance, the content of an after-party task may seem difficult to test, especially when the entire logic is implemented directly inside the task file. For very simple operations — such as trivial one-liners that are unlikely to fail — full test coverage may not be critical. However, more complex operations are a completely different matter, and failures in such cases can be genuinely harmful. For non-trivial data manipulations, the best practice is to extract the logic into a separate class, thoroughly test it, and invoke it from within the after-party task. This approach keeps deployment tasks lightweight while ensuring that the core logic is well tested. As a result, safety is maximized and the risk of unintended or unexpected behavior is reduced to an absolute minimum.

namespace :after_party do
  desc 'Deployment task: perform_a_complex_operation'
  task perform_a_complex_operation: :environment do
    puts "Running deploy task 'perform_a_complex_operation'"

    records_ids = [132, 772, 1155, 1700, 2199, 2200, 2201]
    DataMigrations::DoSomeComplexDatabaseManipulationOnRecords.new(records_ids).call # class to be covered with tests

    AfterParty::TaskRecord
      .create version: AfterParty::TaskRecorder.new(__FILE__).timestamp
  end
end

Conclusions

Although migrations may intuitively feel like the right place to manipulate database content—especially immediately after structural changes — they are not designed for that purpose. Treating them as such leads to a range of subtle but serious problems: failing migrations, blocked deployments, long-running database locks, and errors that surface only in specific environments. Keeping migrations small and strictly limited to schema changes is the safest and most sustainable approach. Data manipulations belong elsewhere, where they can be executed deliberately, monitored independently, and retried without risking the integrity of the deployment process. Separating these two responsibilities brings clear benefits: greater safety and stability, more predictable releases, and the ability to properly test data-related logic instead of treating it as a one-off side effect of a migration.

← Back to Posts