2024

Backend

Ruby

Ruby on Rails

Postgresql

SQL views in Ruby on Rails

Jan Grela

Ruby Developer

SQL view is like a virtual table, combines one or more tables (or even other views), and holds the SQL statement from which it was created. To the user, it looks like a normal table with columns and rows. Views are a way to create custom records subsets:

simple views, which allows INSERT, UPDATE, or DELETE
more complex views are read-only

The view is created using the CREATE VIEW statement, whileCREATE OR REPLACE VIEW replaces it if it already exists, andDROP VIEW removes the view.

Setup

All examples are done in Rails 7.0.4 (with Ruby 3.2.2) and PostgreSQL (15.1). To execute SQL statements from examples use any SQL client you like.

Create a new Rails app using PostgreSQL DB.

rails new blog -d postgresql

Add migrations and models

# db/migrate/20230102132134_create_authors.rb
class CreateAuthors < ActiveRecord::Migration[7.0]
  def change
    create_table :authors do |t|
      t.string :first_name
      t.string :last_name
      t.string :role
    end
  end
end

# db/migrate/20230102132140_create_posts.rb
class CreatePosts < ActiveRecord::Migration[7.0]
  def change
    create_table :posts do |t|
      t.references :author
      t.string :title
      t.text :content
      t.string :tag
      t.datetime :published_at
      t.timestamps
    end
  end
end

# app/models/author.rb
class Author < ApplicationRecord
  has_many :posts
end

# app/models/post.rb
class Post < ApplicationRecord
  belongs_to :author, optional: true
end

Create DB and run migrations:

rails db:create db:migrate

Simple view

View which queries from one table doesn’t use WITH, DISTINCT, GROUP BY, HAVING, LIMIT, OFFSET, UNION, INTERSECT, EXCEPT clauses, or selects columns without aggregate functions is considered as a simple view. Such views are updatable, so could be used to add/update/remove records to the table it uses. One of the reasons to use simple views is to restrict access to certain data. This could be done with a WHERE clause.

The following example will create a view that shows only authors with a dev role.

CREATE VIEW authors_devs AS
SELECT *
FROM authors
WHERE role = 'dev'

Because it's updatable, INSERT is possible:

INSERT INTO authors_devs (first_name, last_name, role)
VALUES ('Jane', 'Doe', 'dev'),
       ('Joe', 'Foo', 'qa'),
       ('Jack', 'Bar', 'dev');
INSERT 0 3

It adds 3 rows to the authors’ table:

SELECT * FROM authors;
 id | first_name | last_name | role
----+------------+-----------+------
  1 | Jane       | Doe       | dev
  2 | Joe        | Foo       | qa
  3 | Jack       | Bar       | dev
(3 rows)

but only two rows when querying from authors_devs:

SELECT * FROM authors_devs;
 id | first_name | last_name | role
----+------------+-----------+------
  1 | Jane       | Doe       | dev
  3 | Jack       | Bar       | dev
(2 rows)

It is possible to update the author with id 3:

UPDATE authors_devs SET first_name = 'John' WHERE id=3;
UPDATE 1

but not one with id 2:

UPDATE authors_devs SET first_name = 'Jessika' WHERE id=2;
UPDATE 0

Similarly deleting is restricted:

DELETE FROM authors_devs WHERE id=3;
DELETE 1

DELETE FROM authors_devs WHERE id=2;
DELETE 0

Complex view

When creating views that join multiple tables, group results and use aggregate functions is the complex view. It is used only to organize data. Typical usage is some kind of report or summary.

The next view example is a summary of authors and posts.

CREATE VIEW author_statistics AS
SELECT
  CONCAT(a.first_name, ' ', a.last_name) AS author,
  COUNT(DISTINCT p) AS posts_count,
  JSON_OBJECT_AGG(DISTINCT tag_counts.name, tag_counts.count) AS tags_count,
  TO_CHAR(MAX(p.published_at), 'DD.MM.YYYY HH24:MI') AS last_post_at
FROM authors a
JOIN posts p ON a.id = p.author_id
JOIN (
  SELECT
      a.id AS author_id,
    p.tag AS name,
    COUNT(p.tag) AS count
  FROM authors a
  JOIN posts p ON a.id = p.author_id
  GROUP BY a.id, p.tag
) tag_counts ON tag_counts.author_id = a.id
GROUP BY a.id
ORDER BY MAX(p.published_at) DESC

Such a complex view generates a nice table:

SELECT * FROM author_statistics;
        author         | posts_count |                     tags_count                      |   last_post_at   
-----------------------+-------------+-----------------------------------------------------+------------------
 Norbert Dare          |          19 | { "backend" : 9, "frontend" : 6, "fullstack" : 4 }  | 03.01.2023 11:43
 Granville Schmidt     |          12 | { "backend" : 3, "frontend" : 4, "fullstack" : 5 }  | 03.01.2023 11:30
 ...

Such views are not automatically updateable. To enable INSERT, UPDATE, DELETE, or TRUNCATE events on that view, an INSTEAD OF trigger could be added.

Materialized views

Normal views don’t speed up querying from the database. It’s only a layer to organize the data.

If a query takes a lot of time to execute, it is worth creating a materialized view, it will store the results of the query. This kind of view is perfect for data that is often requested but doesn't change often. A good example is a daily report, which is generated (view is refreshed) once a day.

CREATE MATERIALIZED VIEW author_statistics AS
SELECT ...

To replace the content of the materialized view:

REFRESH MATERIALIZED VIEW author_statistics

Materialized views are regular tables, therefor it is possible to create indexes on them.

Rails

SQL views are treated as a normal database table, so we can add an ActiveRecord model and use AR query language as usual. Having author_statistics view and AuthorStatistic AR class model, the following is possible:

> AuthorStatistic.all
AuthorStatistic Load (190.7ms)  SELECT "author_statistics".* FROM "author_statistics"

> AuthorStatistic.where('last_post_at > ?', 1.month.ago)
AuthorStatistic Load (202.0ms)  SELECT "author_statistics".* FROM "author_statistics" WHERE (last_post_at > '2022-12-11 14:32:39.480871')

We learned how to create SQL views by executing CREATE VIEW statements in an SQL client. It is fine for fast-checking some solutions without the need to implement them and commit to the repo. When we decide to utilize SQL views in our Rails app a way to store and manage views is needed. A gem that delivers a solution for it is scenic. Let's take a closer look at it.

Scenic gem

https://github.com/scenic-views/scenic

This is a perfect tool to create SQL views in migrations, keep view's SELECT statements in separate files and keep track of views versions when they evolve.

Scenic has two generators that help with views:

scenic:model
scenic:view

scenic:model - generates migration file, SQL statement file and model file, whereas scenic:view does the same without model file.

% rails generate scenic:model author_statistics
      create  app/models/author_statistic.rb
      create  db/views/author_statistics_v01.sql
      create  db/migrate/20230103144413_create_author_statistics.rb

# db/migrate/20230103144413_create_author_statistics

class CreateAuthorStatistics < ActiveRecord::Migration[7.0]
  def change
    create_view :author_statistics
  end
end

# app/models/author_statistic.rb

class AuthorStatistic < ApplicationRecord
end

It is possible to create a materialized view with option --materialized.

% rails generate scenic:model materialized_author_statistics --materialized
      create  app/models/materialized_author_statistic.rb
      create  db/views/materialized_author_statistics_v01.sql
      create  db/migrate/20230103150813_create_materialized_author_statistics.rb

# 20230103150813_create_materialized_author_statistics

class CreateMaterializedAuthorStatistics < ActiveRecord::Migration[7.0]
  def change
    create_view :materialized_author_statistics, materialized: true
  end
end

# app/models/materialized_author_statistic.rb

class MaterializedAuthorStatistic < ApplicationRecord
    # calls refresh on view
  # use concurrently: true to not lock view during refresh
  def self.refresh
    Scenic.database.refresh_materialized_view(table_name, concurrently: false, cascade: false)
  end
end

Under db/views/author_statistics_v01.sql and db/views/materialized_author_statistics_v01.sql put SQL code

SELECT
  CONCAT(a.first_name, ' ', a.last_name) AS author,
  COUNT(DISTINCT p) AS posts_count,
  JSON_OBJECT_AGG(DISTINCT tag_counts.name, tag_counts.count) AS tags_count,
  TO_CHAR(MAX(p.published_at), 'DD.MM.YYYY HH24:MI') AS last_post_at
FROM authors a
JOIN posts p ON a.id = p.author_id
JOIN (
  SELECT
      a.id AS author_id,
    p.tag AS name,
    COUNT(p.tag) AS count
  FROM authors a
  JOIN posts p ON a.id = p.author_id
  GROUP BY a.id, p.tag
) tag_counts ON tag_counts.author_id = a.id
GROUP BY a.id
ORDER BY MAX(p.published_at) DESC

Run migrations to create defined views (we will use them later):

% rails db:migrate

Both generators bump up the view version when using the same name:

% rails generate scenic:view author_statistics
      create  db/views/author_statistics_v02.sql
      create  db/migrate/20230103145259_update_author_statistics_to_version_2.rb

The new version file - db/views/author_statistics_v02.sql - contains previous version of SQL, ready for modifications.

# 20230103145259_update_author_statistics_to_version_2.rb

class UpdateAuthorStatisticsToVersion2 < ActiveRecord::Migration[7.0]
  def change
    update_view :author_statistics, version: 2, revert_to_version: 1
  end
end

Benchmarks

Sometimes it is worth measuring different approaches to solving problems. Here we have a fetch from created earlier SQL views and running the same query using plain SQL.

Setup

author_statistics and materialized_author_statistics SQL views exist with corresponding AR models.
Benchmarks rake tasks use the Benchmark module from the Ruby standard library to measure time and gem benchmark-memory to measure memory.
Seed DB with a significant amount of data

# db/seed.rb

# just random data using Faker
require 'faker'

1000.times do
  Author.create(
    first_name: Faker::Name.first_name,
    last_name: Faker::Name.last_name,
    role: ['dev', 'pm', 'qa'].sample
    )
end

10000.times do
  Post.create(
    title: Faker::Lorem.sentence(word_count: 3),
    content: Faker::Lorem.paragraphs.join("\n"),
    tag: ['backend', 'frontend', 'fullstack'].sample,
    author: Author.all.sample,
    published_at: rand(24*60*100).minutes.ago
  )
end

SQL execute vs SQL view vs Materialized SQL view

For benchmark purpose let's have 5 methods, delivering the same result, but in a different way.

sql_execute - execute SQL defined in string
view_model_find - use AR model to fetch data
view_sql_execute - same as above, but without using a model
materialized_view_model_find - use AR model for materialized with
materialized_view_sql_execute - query from view without model

#lib/tasks/benchmarks.rake

ITERATION_NUMBER = ENV.fetch('ITERATION_NUMBER', 100).to_i

def process_result(result)
  {
    author: result['author'],
    posts_count: result['posts_count'],
    last_post_at: result['last_post_at']
  }
end

namespace :benchmarks do
  desc 'author_stats'
  task author_stats: :environment do
    def sql_execute
      sql = <<~SQL
        SELECT
        ...
      SQL

      ITERATION_NUMBER.times do
        ActiveRecord::Base.connection.exec_query(sql)
          .map { |result| process_result(result) }
      end
    end

    def view_model_find
      ITERATION_NUMBER.times do
        AuthorStatistic.select(:author, :posts_count, :last_post_at)
          .map { |result| process_result(result) }
      end
    end

    def view_sql_execute
      ITERATION_NUMBER.times do
        ActiveRecord::Base.connection
          .exec_query('SELECT * FROM author_statistics')
          .map { |result| process_result(result) }
      end
    end

    def materialized_view_model_find
      ITERATION_NUMBER.times do
        MaterializedAuthorStatistic.select(:author, :posts_count, :last_post_at)
          .map { |result| process_result(result) }
      end
    end

    def materialized_view_sql_execute
      ITERATION_NUMBER.times do
        ActiveRecord::Base.connection
          .exec_query('SELECT * FROM materialized_author_statistics')
          .map { |result| process_result(result) }
      end
    end

        functions = [
      'sql_execute', 
      'view_model_find',  
      'view_sql_execute',
      'materialized_view_model_find',
      'materialized_view_sql_execute'
    ]

    puts "Time measures:"
    Benchmark.bmbm do |x|
      functions.each do |function|
        x.report("#{function}:") do
          send(function)
        end
      end
    end

    puts "Memmory measures:"
    Benchmark.memory do |x|
      functions.each do |function|
        x.report("#{function}:") do
          send(function)
        end
      end
    end
  end
end

Run the rake task:

rake benchmarks:author_stats

**Time measures:**

                                 user     system      total        real
sql_execute:                     0.486172   0.036185   0.522357 (  4.892079)
view_model_find:                 1.192921   0.017537   1.210458 (  4.632283)
view_sql_execute:                0.483697   0.036375   0.520072 (  4.770821)
materialized_view_model_find:    0.474464   0.006046   0.480510 (  0.547047)
materialized_view_sql_execute:   0.103913   0.008027   0.111940 (  0.197640)

**Memmory measures:**

sql_execute:                     69.763M memsize
                                 624.501k object
                                 50.000  strings
view_model_find:                 139.387M memsize
                                 1.656M objects
                                 50.000  strings
view_sql_execute:                69.516M memsize
                                 624.600k objects
                                 50.000  strings
materialized_view_model_find:    139.377M memsize
                                 1.656M objects
                                 50.000  strings
materialized_view_sql_execute:   69.522M memsize
                                 624.600k object
                                 50.000  strings

Looking at the results there few conclusions:

As expected querying from SQL view isn’t faster than executing plain SQL. View in fact executes the same SQL.
Using ActiveRecord::Base.connection.exec_query is faster than using ActiveRecord Query language
Querying from materialized view proves that it’s faster
Using ActiveRecord::Base.connection.exec_query consumes less memory (doesn’t need to initialize AR objects).

SQL execute vs Active Record query

Views are written in raw SQL. This means that these could use aggregates, mathematical functions, and operators. Active Records support some of the functions like count, sum, min, and max, but other than these you need to use Ruby code to achieve the same results as SQL functions.

In this example, we benchmark 3 different functions giving concatenated first and last name, and an array of post titles.

active_record_query - use only Ruby code
sql_query - use only SQL code
mixed_active_record_with_sql - mixture of above

#lib/tasks/benchmarks.rake

ITERATION_NUMBER = ENV.fetch('ITERATION_NUMBER', 10).to_i

namespace :benchmarks do
  desc 'author_titles'
    task author_titles: :environment do

        # active record
    def active_record_query
      ITERATION_NUMBER.times do
        Author.includes(:posts).map do |author|
          {
            name: "#{author.first_name} #{author.last_name}",
            titles: author.posts.map(&:title)
          }
        end
      end
    end

    def sql_query
      sql = <<~SQL
        SELECT
          CONCAT(a.first_name, ' ', a.last_name) AS name,
          ARRAY_AGG(p.title) AS titles
        FROM authors a
        JOIN posts p ON a.id = p.author_id
        GROUP BY a.id
      SQL

      ITERATION_NUMBER.times do
        ActiveRecord::Base.connection.exec_query(sql).map do |result|
          {
            name: result['name'],
            titles: result['titles']
          }
        end
      end
    end

    def mixed_active_record_with_sql
      ITERATION_NUMBER.times do
        Author.joins(:posts)
              .select(
                 "CONCAT(authors.first_name, ' ', authors.last_name) AS name",
                 "ARRAY_AGG(posts.title) AS titles"
              )
              .group(:id).map do |result|
                {
                  name: result.name,
                      titles: result.titles
                }
              end
      end
    end

    functions = [
      'active_record_query', 
      'sql_query',  
      'mixed_active_record_with_sql'
    ]

    puts "Time measures:"
    Benchmark.bmbm do |x|
      functions.each do |function|
        x.report("#{function}:") do
          send(function)
        end
      end
    end

    puts "Memmory measures:"
    Benchmark.memory do |x|
      functions.each do |function|
        x.report("#{function}:") do
          send(function)
        end
      end
    end
  end
end

Run the rake task:

rake benchmarks:author_titles

**Time measures:**

                                   user     system      total        real
active_record_query             2.315652   0.019930   2.335582 (  2.824983)
sql_query:                      0.008803   0.001647   0.010450 (  0.204779)
mixed_active_record_with_sql    0.134432   0.001860   0.136292 (  0.288891)

**Memory measures:**

active_record_query             250.648M memsize
                                2.458M objects
                                50.000  strings
sql_query:                      9.588M memsize
                                52.451k objects
                                50.000  strings
mixed_active_record_with_sql    48.192M memsize
                                552.790k objects
                                50.000  strings

The conclusion from the results is that using SQL is way faster and consumes less memory compared to Ruby code.

Summary

Now, you may ask why anybody needs SQL views if the same thing could be done with Active Record Query Interface. Well, there are a few reasons that could be considered:

This kind of views are easy to create and might be used as a read model
Combine many tables in one logical table
Might be used across different services, keeping some business logic in SQL statements
Materialized views are a great help when need to speed up read time
SQL aggregate functions are in general faster than doing the same in Ruby code

Jan Grela

Ruby Developer

Check my Twitter