❶ Author of How to Open Source (.dev). A book to take you from coder to contributor.

❷ Creator of CodeTriage, a free service helping developers contribute to open source.

❸ Core committer to ruby/ruby.

❹ Rust Cloud Native Buildpack author .

❺ Married to Ruby, literally.

A Tale of Slow Pagination

22 Jun 2017

When I see a query in my logs without either a limit or a count clause, alarm bells go off because it is likely a hotspot. A pagination query has alimit so it usually flies under my radar:

SELECT  "repos".*
FROM "repos"
WHERE (issues_count > 0)
ORDER BY issues_count DESC
LIMIT $1 OFFSET $2

This query came from the main page of my app CodeTriage.com, the easiest way to get started contributing to Open Source.

As you can guess there’s a problem here. It was pointed out to me by Nate Berkopec who writes a ton about Ruby performance problems. I had been staring the issue in the face for so long, I just accepted that slow query as a fact of life.

I’ll take a moment to rewind and show you how I should have found the problem.

Always Benchmark Code

If you’re not using Rack Mini Profiler in development, you really should be. Even better, you should be using it in production:

These two commits tell RMP that when someone is logged in and that person has an admin flag on their account to show them the rack-mini-profiler stats in the UI.

Here’s what my index page looks like when I’m logged in.

When I click the tab in the top left, it expands and this is what I see:

Of the 336ms page render time (on the server side), you can see that 281.6ms are being spent in:

Rendering: pages/_repos_with_pagination

You can also see that there’s a link for 2 SQL queries. When you click those this is what you see:

One SQL query takes 125ms and the other 118ms. Yikes!

Most of my other SQL queries take anywhere from 1-4ms so this is pretty bad. Over 70% of my time spent rendering my index page is spent in these two queries.

If you ever have a query that you don’t know why it’s slow, you can use EXPLAIN ANALYZE` with Postgres:

$ heroku pg:psql
issuetriage::DATABASE=> EXPLAIN ANALYZE
SELECT COUNT(*)
FROM "repos"
WHERE (issues_count > 0);
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate  (cost=39294.58..39294.58 rows=1 width=0) (actual time=115.217..115.217 rows=1 loops=1)
   ->  Seq Scan on repos  (cost=0.00..39293.48 rows=2198 width=0) (actual time=0.009..114.978 rows=2184 loops=1)
         Filter: (issues_count > 0)
         Rows Removed by Filter: 511
Total runtime: 115.253 ms
(5 rows)

This confirms the total time is pretty bad 115ms. I’m doing this in production so that I have production like data.

In this case, I’m doing a simple count, so why are things so slow? When you look at the output this line should jump out at you:

->  Seq Scan on repos  (cost=0.00..39293.48 rows=2198 width=0) (actual time=0.009..114.978 rows=2184 loops=1)

The database has to loop over ALL records just to find the ones that have an issue count greater than zero.

The next query in our problem output has bind parameters in it:

SELECT
"repos".*
FROM "repos"
WHERE (issues_count > 0)
ORDER BY issues_count DESC
LIMIT $1
OFFSET $2;

We can replace these with some reasonable values, such as a limit of 20 and an offset of 0.

issuetriage::DATABASE=> EXPLAIN ANALYZE
SELECT  "repos".* FROM "repos"
WHERE (issues_count > 0)
ORDER BY issues_count DESC
LIMIT 20
OFFSET 0;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit  (cost=39305.18..39305.19 rows=20 width=1541) (actual time=150.109..150.113 rows=20 loops=1)
   ->  Sort  (cost=39305.18..39306.28 rows=2198 width=1541) (actual time=150.108..150.111 rows=20 loops=1)
         Sort Key: issues_count
         Sort Method: top-N heapsort  Memory: 65kB
         ->  Seq Scan on repos  (cost=0.00..39293.48 rows=2198 width=1541) (actual time=0.023..148.889 rows=2184 loops=1)
               Filter: (issues_count > 0)
               Rows Removed by Filter: 511
Total runtime: 150.160 ms

Even though we’re using a limit, we have to order all the entries in our database by that value in the order clause before we can apply the limit. This means we have to sort all the values, which is why you’re seeing that sort call. Later you’re seeing the same sequential scan since we’re also applying the same where clause (to not show issues_count-s of 0).

So what’s the fix? In this case, it’s pretty simple. We can give the database more information about the data that it needs to sort and filter by adding an index. To be efficient the database needs an index that includes not just repo_id but also index_count.

Note: Indexes aren’t magic remedies, there is a downside as they use memory in your database, which is not free. In this case, I only have about 2K repos in the DB so the index is well worth the cost. Alternatively, I could consider caching the view elements to not do those expensive lookups on the first page. I could even change the view to be a simple next button, and then I could have avoided having to count all the repos.

There are multiple fixes possible, but in my case adding an index was the easiest. Again thanks to Nate Berkopec for pointing this out. I didn’t figure this all out on my own, he did. He also has a book explaining how to debug and profile common performance issues like this that’s worth picking up.

Here’s the index:

class AddIndexToIssuesCount < ActiveRecord::Migration[5.1]
  def change
    add_index :repos, :issues_count
  end
end

What’s the end result? We can re-run our EXPLAIN ANALYZE and get some really good feedback:

issuetriage::DATABASE=> EXPLAIN ANALYZE
SELECT  "repos".*
FROM "repos"
WHERE (issues_count > 0)
ORDER BY issues_count DESC
LIMIT 20
OFFSET 0;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.06..45.61 rows=20 width=1541) (actual time=0.019..0.050 rows=20 loops=1)
   ->  Index Scan Backward using index_repos_on_issues_count on repos  (cost=0.06..4980.92 rows=2187 width=1541) (actual time=0.017..0.047 rows=20 loops=1)
         Index Cond: (issues_count > 0)
 Total runtime: 0.084 ms
(4 rows)

From over 100ms to under 1ms not bad, what about the other?

issuetriage::DATABASE=> EXPLAIN ANALYZE
SELECT COUNT(*) FROM "repos"
WHERE (issues_count > 0);
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=75.33..75.33 rows=1 width=0) (actual time=1.713..1.713 rows=1 loops=1)
   ->  Index Only Scan using index_repos_on_issues_count on repos  (cost=0.06..74.23 rows=2187 width=0) (actual time=0.034..1.578 rows=2184 loops=1)
         Index Cond: (issues_count > 0)
         Heap Fetches: 618
 Total runtime: 1.757 ms
(5 rows)

Not too shabby either. The overall page render time went from 336ms to 169ms:

That partial where we are calling these queries still accounts for 55ms, but it’s much better than the previously 281.6ms. They say that a 200ms wait time is perceptible to a human, so in theory this just saved me a few “ugh why is this so slow”-s per day. Better than that, it also means that my index page can handle slightly more throughput as it won’t be hanging around waiting for slow queries to finish.

Bottom line: Don’t neglect your pagination queries when speeding up your app, and go buy Nate’s book.

Keep Reading 🚀

Cinnamon Buns & Commit Bits: A RubyConf Story

RubyConf 2026 is coming up soon! The deadline to buy a ticket and enter the VIP raffle by buying a RubyConf ticket is July 1st. That’s only 2 days from this post’s publication. This personal essay explores why a language-specific conference in the age of AI is still important and why events are still important for bringing developers together.

Read More
RubyGems Fracture Incident Report

This is a copy of the canonical report I published on RubyCentral’s site. I recommend reading there for the most updated copy. This document attempts to give closure to the Ruby community about the events that led to the incident, September 10-18, 2025, which I’ve named “RubyGems Fracture.”

Read More
How to "Sustain Heroku"

This is a personal essay (I speak for me and my views, not for my employer) about what exactly a “Sustaining Engineering Model” is, in the context of the recent Heroku announcement, and the book “The Innovator’s Dilemma,” as seen by someone who has worked at the company for the past 14 years.

Read More
Non-Violent Comments: Calling out or Calling in?

Now that programmers are at war with the robots (Gen AI) for our jobs, we need to lean into the things that they cannot do. Today, I’m going to be talking about how to be a human and communicate with other humans in the most hostile of scenarios, “in conflict (drama).”

Read More
Disallow code usage with a custom `clippy.toml`

I recently discovered that adding a clippy.toml file to the root of a Rust project gives the ability to disallow a method or a type when running cargo clippy. This has been really useful. I want to share two quick ways that I’ve used it: Enhancing std::fs calls via fs_err and protecting CWD threadsafety in tests.

Read More

A Tale of Slow Pagination

Always Benchmark Code

Subscribe to my Newsletter 😻 🤠

Keep Reading 🚀

Cinnamon Buns & Commit Bits: A RubyConf Story

RubyGems Fracture Incident Report

How to "Sustain Heroku"

Non-Violent Comments: Calling out or Calling in?

Disallow code usage with a custom `clippy.toml`