Config: Behavior versus Credentials

An application doesn’t have one type of configuration, it has two. In Rails, it’s confusing since we muddle these two together under a giant switch statement powered by RAILS_ENV. Let’s start with some definitions.

Behavior

Is caching enabled or disabled? What gems are loaded? What actions are safe to perform? When you have your app configured for “test” then it’s perfectly normal and expected to drop your database or flush your Redis instance between test runs, even though that would be catastrophic in production. In development we want hot code reloading to decrease iteration time, while in production we want to cache all code so we can run with maximum speed and throughput.

This is what I mean by behavior. How does your app behave and what actions are acceptable? This is typically powered in Rails by setting RAILS_ENV environment variable to development, production or test. Different gems get loaded from the Gemfile depending on what behavior we want, and different behaviors are enabled or disabled via environment specific files such as config/environments/production.rb.

Credentials

In addition to being able to flush a Redis instance or not, our app needs to know how to actually connect to the instance. Same goes for any external resources you use, database, email provider, payment service, APIs, etc. It makes sense that when your behavior changes your credentials should too. Just because it’s okay to drop your database in “test” doesn’t mean that you should do so while connected to your production database. Rails saw this issue early on and “fixed” it for the database side by requiring database connection information be present in the database.yml under different “environments”.

test:
  name: schneems_test

production:
  url: postgres://username:password@host:port/name

Usually, any other credentials are inside of a file like config/environments/production.rb or in an initializer for example:

# config/initializers/sidekiq.rb

if Rails.env.production?
  REDIS_URL = 'redis://redis.example.com:7372/12'
else
  REDIS_URL = 'redis://localhost:6379'
end

Sidekiq.configure_server do |config|
  config.redis = { url: REDIS_URL }
end

One good question I got from DHH on this dichotomy was “Where does a CDN config fit in?”. Generally configuring a Rails app to use a CDN doesn’t have any password or “secrets”, instead it’s a subdomain that is public information. I would still label this as “credentials” because naming is hard, and changing that value does not change the behavior of the application. It changes the resource that is being used to serve the application.

Behavior Versus Credentials - Staging

In the case of Rails, there is no distinguishing between behavior or credentials. You are encouraged to use a giant switch RAILS_ENV=production to set both at the same time as they are coupled. This works for most cases but has some nasty side effects.

While you start with 3 environments shipped with Rails, you might one day decide that you want another one day. Maybe you want to be able to show stakeholders previews through something like review apps. Maybe you have a QA team and you want them to have access to a staging environment where it’s safe for them to exercise the full depth and breadth of your app without fear of emailing thousands of users or kicking off real debit card transactions.

In that case, most people add a RAILS_ENV=staging. The problem here is one of divergence. If you expect your QA to catch a bug before it hits production then your staging needs to behave EXACTLY like production. You can do things like have config/environments/staging.rb to load the config in config/environments/production.rb. If the app used that file to specify credentials then now your app is accidentally connected to production credentials, oops.

It’s also common to configure behavior in other places in your app with if statements similar to how we did it with sidekiq credentials previously:


config.action_controller.perform_caching = Rails.env.production?

This one is a bit contrived, but things like this do happen. Then you’ll see a bug in production that can’t be reproduced in staging, it costs hours or days of your life. Now one option could be to make the production? method return true for staging environment, but this would mean that we are accidentally connected to our production sidekiq instance.

Another thing I see is apps that have admin features available in development for debugging but not in production for security reasons. Having these things exposed on a staging site could accidentally leak customer information and that would be bad.

This conflation between behavior and credentials is a bad one. When I started answering support tickets at Heroku for Ruby apps, most of the “impossible” behavior ended up being explained by a

$ heroku run bash
~$ echo $RAILS_ENV
staging

It became so common that I wrote an article about it and even emit a warning while you’re deploying. I mentioned this to the Rails mailing list and someone’s reaction was “they must be new” and I will tell you that apps of all shapes and sizes with developers of all skill levels hit this problem. It’s not an issue of “knowing what you’re doing” it’s an issue of your app behaving as expected and doing the right thing more often than not.

After the warning and the error, I still see an occasional ticket related to this but it’s no longer where I spend the bulk of my debugging time.

Imitating Behavior

In the previous example, a “staging” environment should act very close to a “production” environment. In that case, it makes sense to preserve the production behavior but only change the credentials. So “staging” isn’t a behavior, it’s a set of different credentials.

There’s another case I see where people want to debug a production issue or do some performance benchmarking with production settings. If you’re using the default behavior of hiding all your behavior and credentials behind RAILS_ENV, if you start running your app with RAILS_ENV=production locally you’re in dangerous territory. Maybe the endpoint that you want to debug as slow is for sending out mass emails based on users in your database. If you boot with RAILS_ENV=production locally and hit that endpoint, now you’ve just sent off dozens or hundreds of emails, oops.

This is essentially another case of the “staging” problem. We want to reproduce or imitate behavior, but either we accidentally get the credentials too, or we have slightly different behavior based on an accidentally slightly different configuration with a custom RAILS_ENV.

The Illusion of Safety

As I mentioned previously, I had a mailing list conversation about this topic recently. One of the things I heard was that the dev wanted to be able to use RAILS_ENV=staging as a safety net. If they use that then they know they are not going to affect any production data. I disagree.

What if you’re on your staging console and you’re trying to reproduce a production bug, but you can’t. You think “maybe there’s a difference in behavior with production and try it out with RAILS_ENV=production without thinking that even though you’re on your staging app it has credentials to ALL your environments. The next thing you know, your production app doesn’t just have a bug, it’s down.

While any instance of your running application should be able to behave like any other instance (dev/prod/test), they shouldn’t even have the credentials to connect to all the different services.

If you can ssh into your staging server and take down your production database, that’s a problem. While you might say “I’ll never do that” or “I’m a good programmer”, we’re all bad programmers when we’re tired, or hungry, or upset. Since we’re all bad programmers on some days, we have to always plan for that case. I don’t know about you, but not breaking my production app from within my staging environment sounds like a pretty bad time.

Two measures of defence

One thing we can do is to safe guard against dangerous actions. I took this measure in Rails to prevent dropping production databases by accident. Again, I know that you will never need this code, because you are a good programmer. This is a thing that does happen and if it ever happens to “bad programmer you” then you’ll be glad there is an extra safeguard there.

This approach is extremely time intensive. It also requires enough people doing the “wrong” thing to find out what actions are the most dangerous or the most common to warrant protection. Also there’s plenty of cases where you can’t infer what is dangerous and what isn’t. For example sending out emails or charging money to an account are common on a production app but shouldn’t happen in staging or development. From the framework level it would be almost impossible to detect when this action is valid or not. That’s why we need a separate line of defence.

Seperate the behavior of your application from the credentials. It’s that simple. If your app CANNOT charge a creddit card because it has “development” api credentials instead of production credentials, then this action is inherently safe. You’re free to RAILS_ENV=production all you want, on any machine that you want, and you’ll only get in trouble if you’re on your production instance.

This means not checking in any credentials into your repo. This has a few benefits, if a contractor or intern walks away with your codebase do they walk away with your customer data too? What if someone accidentally hits the “public” button on GitHub for your repo, so you have to roll all your credentials? By separating out your behavior from your credentials and not checking in your credentials, you are protected from a wide array of threats.

Easier Said Than Done

On Heroku this means storing your credentials in environment variables. Heroku provides a secure way of setting these values per app via heroku config. Are there security implications to this method? Yes. Because env vars are global any clients or services such as an error reporting library has access to all your credentials. This is mitigated in that all the good ones only whitelist environment variables to record such as RAILS_ENV and ignore any others such as DATABASE_URL. If you are using a service that does otherwise, you need to switch. While this does mean that you need to be careful about what clients are running on your app, you need to do this anyway. Even without the ability to record your env vars, an unsecure client could run arbitrary code or tar up your app and send it somewhere. Based on that I don’t think this is a large threat.

What about if you’re not running on Heroku? If you’re deploying via something like dokku you can already use environment variables. If you’re using something else, you can roll your own environment variable support via a .env file and either the dotenv-rails gem or foreman gem. You might want to also back up the creds somewhere other than on your production disk, a password manager such as LastPass or 1Password with shared passwords across your admins is an option.

If you need a third level of lockdown protection for the services you rely on, consider isolating them via the network. You can roll your own or use Heroku Private Spaces.

At the end of the day, I’m advocating for declarative configuration. I don’t want my apps to connect to any services I didn’t explicitly configure or do anything I didn’t tell them to. The alternative is conditional configuration which is what Rails encourages by default. In that case we’ve not built one app but rather 3 separate apps that are wrapped in a giant conditional, switched by one environment variable.

Puma, Ports, and Polish

Polish is what distinguishes good software from great software. When you use an app or code that clearly cares about the edge cases and how all the pieces work together, it feels right. Unfortunately, this is the part of the software that most often gets overlooked, in favor of more features or more time on another project. Recently, I had the opportunity to work on an integration between Rails and Puma and I wanted to share that experience in the context of polish and what it takes to make open source work.

The problem

Puma has been the default web server in Rails for about a year now and most things just work™. I talked about some of my previous problems and efforts to get this to work here. When you run rails server with Rails 5+, it uses Puma through an interface called a Rack Handler. Lots of different web servers such as thin also implement a Rack handler allowing Rails to boot the server without having to know anything about the web server. For example, Rails pass a default port of 3000 to the server.

Part of my previous work for getting Puma to play nice with Rails out of the box involved getting Puma’s Rack handler to auto read in config files by default (i.e. config/puma.rb). This way, we can generate a config file that works with Rails and Puma and doesn’t need any special knowledge of what framework it is running. One of the biggest points was the number of threads in Puma cannot exceed the number of connections in the Active Record connection pool. This works great, but we did run into another slight issue with the config and port. If you remember, I said Rails defaults to port 3000 but we can change this value inside our config/puma.rb using the port DSL:

port ENV.fetch("PORT") { 4000 }

So if you boot your app using rails server and you don’t specify a PORT environment variable you would expect this to connect to port 4000 but instead it connects to 3000. That’s the problem.

While this is a bug, it’s a pretty inconsequential bug. If you boot with rails server -p 4000 it works or if you boot with PORT=4000 rails server it also works, or if you use puma -C config/puma.rb, it works. Just in that one specific case does it fails. That’s what I mean by polish. The software has a bug, but it’s not mission critical. In fact, it functions very well without that bug being fixed and many people will never hit it. However, when you do hit this bug it’s very confusing.

Frustration

User frustration comes when things do not behave as you expect them to. You pull out your car key, stick it in the ignition, turn it…and nothing happens. While you might be upset that your car is dead (again), you’re also frustrated that what you predicted would happen didn’t. As humans we build up stories to simplify our lives, we don’t need to know the complex set of steps in a car’s ignition system so instead, “the key starts the car” is what we’ve come to expect. Software is no different. People develop mental models, for instance, “the port configuration in the file should win” and when it doesn’t happen or worse happens inconsistently it’s painful.

I’ve previously called these types of moments papercuts. They’re not life threatening and may not even be mission critical but they are much more painful than they should be. Often these issues force you to stop what you’re doing and either investigate the root cause of the rogue behavior or at bare minimum abandon your thought process and try something new.

When we say something is “polished” it means that it is free from sharp edges, even the small ones. I view polished software to be ones that are mostly free from frustration. They do what you expect them to and are consistent.

We like to think that most software we write is free from bugs, but it really just means it’s free from bugs we care about. Each bug that gets fixed has a cost both the time spent fixing the bug and the opportunity cost of other features we could be implementing. When it comes down to it, most programmers and organizations don’t, can’t or won’t invest in polishing their product.

Puma Port Problems Put Right

Let’s go back to Puma. This bug has been known for almost a year. Between the time it was reported and fixed, nearly 4,000 tickets had been filed against Rails.

While the bug was easy to reason about, the fix was not. It involved coordination with Rails and Puma and a fairly aggressive refactoring inside of the Puma codebase of how the configuration is stored and loaded. All in all, it took me maybe about 12 hours of dev time to get everything working.

On the Puma side, there are 3 different ways configuration can be applied either directly from a user like puma -p 4000 or via a config file like we saw earlier, or via its own internal defaults. When booting a server through the Puma CLI, you always want the explicitly user-configured options to “win” over any static config in a file. But you want a configuration specified in the file to “win” over any defaults.

The root of the issue is that the Rack handler has no way of communicating what values are specified as a default i.e. Rails specifies 3000 as a port, versus an explicit value such as rails server -p 4000. So when Puma got the value of 3000 it had to assume that it was being explicitly defined by the user, so even if the config/puma.rb specified a different port I had to ignore it.

The fix was to record when we are receiving an explicitly user set value and record this in an array user_supplied_options = [:Port]. Then in Puma, we can apply the configuration values differently depending on if they’ve been explicitly set via a user or merely passed in as a default. While this sounds straightforward, it required a major re-tooling of how config is set and stored internally in Puma.

I wanted to write about this fix not because it’s big and important, but because it’s small. I get asked semi-regularly about the big “show stopper” features coming in <language> or <framework> and while these kinds of things can be exciting, they’re not the bulk of work that goes into polished software. Even for those big features are made up of dozens or hundreds of tiny bug fixes.

In many ways I want my software to be boring. I want it to harbor few suprises. I want to feel like I understand and connect with it at a deep level and that I’m not constantly being caught off guard by frustrating, time stealing, papercuts.

Bundler Changed Where Your Canonical Ruby Information Lives: What You Need to Know

Heroku bumped its Bundler version to 1.13.7 almost a month ago, and since then we’ve had a large number of support tickets opened, many a variant of the following:

Your Ruby version is <X>, but your Gemfile specified <Y>

This post originally published on the Heroku blog.

I wanted to talk about why you might get this error while deploying to Heroku, and what you can do about it, along with some bonus features provided by the new Bundler version.

First off, why are you getting this error? On Heroku in our Ruby Version docs, we mention that you have to use a Ruby directive in your Gemfile to specify a version of Ruby. For example if you wanted 2.3.3 then you would need this:

# Gemfile

ruby "2.3.3"

This is still the right way to specify a version, however recent versions of Bundler introduced a cool new feature. To understand why this bug happens you need to understand how the feature works.

Ruby Version Specifiers

If you have people on your team who want to use a more recent version of Ruby, for example say Ruby 2.4.0 locally, but you don’t want to force EVERYONE to use that version you can use a Ruby version specifier.

ruby "~> 2.3"

Note: I don’t recommend you do this since “2.3” isn’t a technically valid version of Ruby. I recommend using full Ruby versions in the version specifier; so if you don’t have a Ruby version in your Gemfile.lock bundle platform --ruby will still return a valid Ruby version.

You can use multiple version declarations just like in a gem for example: ruby '>= 2.3.3', '< 2.5'.

This says that any version of Ruby up until 3.0 is valid. This feature came in Bundler 1.12 but wasn’t made available on Heroku until Bundler 1.13.7. In addition to the ability to specify a Ruby version specifier, Bundler also introduced locking the actual Ruby version in the Gemfile.lock:

# Gemfile.lock

RUBY VERSION
   ruby 2.3.3p222

When you run the command

$ bundle platform --ruby
ruby 2.3.3p222

You’ll get the value from your Gemfile.lock rather than the version specifier from your Gemfile. This is to provide you with development/production parity. To get that Ruby version in your Gemfile.lock you have to run bundle install with the same version of Ruby locally, which means when you deploy you’ll be using a version of Ruby you use locally.

Sidenote: Did you know this is actually how Heroku gets your Ruby version? We run the bundle platform --ruby command against your app.

So while the version specifier tells bundler what version ranges are “valid” the version in the Gemfile.lock is considered to be canonical.

An Error By Any Other Name

So if you were using the app before with the specifier ruby "~> 2.3" and you try to run it with Ruby 1.9.3 you’ll get an error:

Your Ruby version is 1.9.3, but your Gemfile specified ~> 2.3

This is the primary intent of the bundler feature, to prevent you from accidentally using a version of Ruby that may or may not be valid with the app. However if Heroku gets the Ruby version from bundle platform --ruby and that comes from the Gemfile and Gemfile.lock, how could you ever be running a version of Ruby on Heroku different from the version specified in your Gemfile?

One of the reasons we didn’t support Bundler 1.12 was due to a bug in that allowed incompatible Gemfile and Gemfile.lock Ruby versions. I reported the issue, and the bundler team did an amazing job patching it and releasing the fix in 1.13. What I didn’t consider after is that people might still be using older bundler versions locally.

So what is happening is that people will update the Ruby version specified in their Gemfile without running bundle install so their Gemfile.lock does not get updated. Then they push to Heroku and it breaks. Or they’re using an older version of Bundler and their Gemfile.lock is using an incompatible version of Ruby locally but isn’t raising any errors. Then they push to Heroku and it breaks.

So if you’re getting this error on Heroku run this command locally to make sure your Bundler is up to date:

$ gem install bundler
Successfully installed bundler-1.13.7
1 gem installed
Installing ri documentation for bundler-1.13.7...
Installing RDoc documentation for bundler-1.13.7...

Even if you haven’t hit this bug yet, go ahead and make sure you’re on a recent version of Bundler right now. Once you’ve done that run:

$ bundle install

If you’ve already got a Ruby version in your Gemfile.lock you’ll need to run

$ bundle update --ruby

This will insert the same version of Ruby you are using locally into your Gemfile.lock.

If you get the exception locally Your Ruby version is <X>, but your Gemfile specified <Y> it means you either need to update your Gemfile to point at your version of Ruby, or update your locally installed version of Ruby to match your Gemfile.

Once you’ve got everything working, make sure you commit it to git

$ git add Gemfile.lock
$ git commit -m "Fix Ruby version"

Now you’re ready to git push heroku master and things should work.

When Things Go Wrong

When these type of unexpected problems creep up on customers we try to do as much as we can to make the process easier. After seeing a few tickets come in, the information was shared internally with our support department (they’re great by the way). Recently I added documentation to the devcenter to document this explicit problem. I’ve also added some checks in the buildpack to give users a warning that points them to the docs. This is the best case scenario where not only can we document the problem, and the fix, but also add docs directly to the buildpack so you get it when you need it.

I also wanted to blog about it to help people wrap their minds around the fact that the Gemfile is no longer the canonical source of the exact Ruby version, but instead the Gemfile.lock is. While the Gemfile holds the Ruby version specifier that declares a range of ruby versions that are valid with your app, the Gemfile.lock holds the canonical Ruby version of your app.

As Ruby developers we have one of the best (if not the best) dependency managers in bundler. I’m excited for more people to start using version specifiers if the need arises for their app and I’m excited to support this feature on Heroku.

The Oldest Bug In Ruby - Why Rack::Timeout Might Hose your Server

Update: There’s a great resource for dealing with timeouts in Ruby called The ultimate guide to Ruby Timeouts, via @codefolio. Also there’s some good dicussion on Reddit around the possibility of maybe using Thread.handle_interupt in gems, read the comments.

The “bug” comes up in a few contexts. The problem comes when an error is raised from within an ensure block from another source. If you don’t know how that’s possible keep reading, otherwise skip the next section.

WTF huh? How is that possible

We are mostly familiar with exceptions

raise "something bad happened"

If we absolutely need to clean up something in our code we can do it in an ensure block:

begin
  file_1 = make_file("file1.csv")
  file_2 = make_file("file2.csv")

  # do work ...

  raise "something bad happened" if work.bad?

  return work
ensure
  clean_up file_1

  clean_up file_2
end

In this case, we could be writing to files that need to be deleted after every call to this method. It is guaranteed to be called when the method exits and any time an exception happens in the block.

If you don’t follow check out exceptional ruby by Avdi.

Unfortunately things can raise exception other than your own code. For example when you’re running a program and want to close it, Ruby will receive a signal by the operating system, to let it know to clean up. In the case of a SIGKILL it will raise a SignalException exception where-ever the code is in execution. This means it could happen here

# ...
ensure
  clean_up file_1
  # Exception could be raised between the two calls right here <================================
  clean_up file_2
end

If that happens Ruby will never execute clean_up file_2. Granted this is a contrived example and you can do things like use tmp files in a block syntax but that’s not the point. The point is that exeptions can come from outside of your code and it can happen inside of an ensure block. This means that we are never actually guaranteed to full execute an ensure block even if all of our code is “correct”.

For more information on Ruby’s signal behavior check out my post License to SIGKILL.

The other case is raising an exception from another thread, you can do this with Thread#raise. Here’s a contrived example:

require 'thread'

threads = []

threads << Thread.new do
  begin
    # ...
  ensure
    clean_up file_1

    clean_up file_2
  end
end

sleep rand(0..2)

threads.each {|thread| thread.raise "no one expects another thread to raise an exception!" }

Okay, so you may never do this, that looks like an awful idea. But you do use it and just don’t know abut it. It turns out that’s almost exactly what’s happening with Timeout in Ruby. Why Ruby’s Timeout is dangerous

It spawns up a new thread, sleeps the amount you want and when it wakes up it raises a Timeout::Error. If you’ve ever used rack timeout, (and recently I learned about slowpoke which also adds postgresql timeouts) it uses Thread#raise to kill an entire web request running in another thread.

Most of the time this isn’t too awful. However when it goes bad it goes really bad. For example network connections such as Database connections might not be released properly. This could cause issues in your app or in your database. The exception may have happend in a place that puts the thread in an un-recoverable state. It’s not dead so the webserver you’re using thinks it can handle web requests, but maybe it’s stuck and can’t actually process those requests.

I work at Heroku and see this. When an app is getting millions of requests and some of them timeout, if something bad can go wrong it eventually will. Traditionally rubyists avoided this problem by throwing away a process and starting a new one. Killing a process is much safer than raising an exception in a thread. Unfortunately this is expensive to throw away an entire process every time there is a small timeout in your web request.

So right now people have to actively choose between not timing out requests which may cause a domino effect of web request backups, or between killing long running requests which may cause threads to be un-usable.

That’s the state of the Ruby timeout and thread raising. Basically it’s really scary but people do it anyway.

So now what?

The behavior can’t be removed, it’s useful to some. The behavior can’t be changed dramatically. However, what if we added new behavior to Ruby? I’m proposing a way to tell Ruby that we want to wait until an ensure block has finished before an exception is raised. Maybe something like

thread.safe_raise(exception: "no one expects another thread to raise an exception")

So if your code is in the middle of an ensure block

ensure
  clean_up file_1
  # You are here <========================
  clean_up file_2
end

Ruby waits until it is finished before raising the new error:

ensure
  clean_up file_1

  clean_up file_2
end

# Ensure is done, raise "no one expects another thread to raise an exception"  now <========================

The nice thing about this is that it guarantees our ensure blocks execute. The previous non-deterministic behavior is gone. What about the down sides? We could be deeply nested in ensure blocks, you would have to go all the way up the stack to see if you’re fully out of an ensure block. This seems complicated and maybe that process isn’t deterministic (thinking halting problem), but I don’t know.

The other problem is that you can do slow things in the ensure block and if you’re trying to raise a time critical exception such as shutting down a program you may actually want things to stop abruptly and not finish.

For that case, maybe add a timeout behavior to safe_raise

thread.safe_raise(timeout: 3, exception: "no one expects another thread to raise an exception")

So we wait 3 seconds for the timeout to propagate, otherwise we raise where-ever in the code we are. So now we’re back to square one in terms of non-deterministic error raising.

So if we have the same problem why do I think this would be better? Right now we have no choice but to raise an exception and pray for the best. If we had new APIs we would allow the developer more control over the behavior they desire.

Another option could be to allow a timeout handler to be registered, maybe if we know we’re in a bad state, we want to term the whole process.

thread.safe_raise(timeout: 3, timeout_handler: -> { Process.kill('SIGKILL', Process.pid) }, exception: "#...")

The point is that we need more control over this behavior.

Next Steps

So what do you think? Do you like it, hate it? Would you use an API like that? If I get some good responses I’ll kick the can around and submit a feature request to Ruby.

How the F does Sprockets Load an Asset?

How does an asset get compiled? It’s less of a pipeline and more of a recursive ball of, well assets. To understand the process we will, start off with an asset with no directives (no require at the top). We’ll then walk through all the steps Sprockets goes through until a usable asset is loaded into memory. For this example we will use a js.erb file to see how a “complex” file (i.e. multiple extensions) type gets compiled. All examples are with Sprockets 4 (i.e. master branch). Here’s the file:

$ cat assets/users.js.erb
var Users = {
  find: function(id) {
    var t = '<%= Time.now %>';
  }
};

When we compile this asset we get:

var Users = {
  find: function(id) {
    var t = '2016-12-13 11:01:00 -0600';
  }
};

This is with the simplest of sprockets setup:

@env = Sprockets::Environment.new
@env.append_path(fixture_path('asset'))
@env.cache = {}

What happens first? We call

@env.find_asset("users.js")

This calls the find_asset method in Sprockets::Base. The contents are deceptively simple

uri, _ = resolve(*args)
if uri
  load(uri)
end

The resolve method comes from sprockets/resolve.rb and the load method comes from sprockets/load.rb. Resolve will find where the asset is on disk and give us a “uri”. We’ll skip over exactly how resolve works, its task is relatively straightforward, find an asset on disk that satisfies the requirement of resolving to a users.js file. We can go into it in detail some other time.

A “uri” in sprockets looks like this:

"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript"

It has a schema with the type of thing it is (in this case a file). We can tell that it is an absolute path because after the schema file:// it starts with a slash. The full path to this file is /projects/sprockets/test/fixtures/asset/users.js.erb. Then in the query params we carry extra info, in this case we are storing the mime type, which is application/javascript. While the file itself is a .js.erb the expected result of loading (compiling) this file is to be a .js file.

Internally Sprockets mostly doesn’t care about file extensions, it really cares about mime types. It only uses file extensions to generate mime types. When you register any processors, you register via a mime type.

The body of the load method from sprockets/loader.rb is fairly complicated. It handles a few cases.

  • Asset has an :id param, which is a fully digested hash, meaning that the asset is fully resolved and we can attempt to load it from the cache. This has two outcomes
    • Asset is in cache, use it
    • Asset is not in cache, delete the :id parameter and try to load normally.
  • Asset does not have an :id param, we call fetch_asset_from_dependency_cache which returns a block. This method does a lot, it has method docs that are fairly comprehensive, go check them out for full details. Essentially it has two modes. Looking for an asset based on dependency history, or not.
    • Looking for asset based on history:
      • If all dependencies for an asset are in the cache, then we can generate an asset from the cache. Otherwise we move on.
    • Not found based on history:
      • We’ve proven at this point that the asset isn’t in cache or one or more of it’s dependencies aren’t in the cache. At this point we have to load the entire asset.

We’re going to assume a fresh cache for our example. That means that we hit the fetch_asset_from_dependency_cache method and fall back to the `“not found based on history” case so we have to load it.

Loading an unloaded asset (pipeline = nil/:default)

The bulk of work happens in the method load_from_unloaded. We’re going to start getting really technical and low level, so follow along in the code for better comprehension what I’m talking about. We first generate a “load” and a “logicial” path:

puts load_path.inspect
# => "/projects/sprockets/test/fixtures/asset"

puts logical_path.inspect
# => "users.js.erb"

There is an edge case that is handled next. In sprockets foo/index.js can be resolved to foo.js, it’s a convention in some NPM libraries. That doesn’t apply to this case. Next we generate an extname and a file_type

puts extname.inspect
# => ".js.erb"

puts file_type.inspect
# => "application/javascript+ruby"

The file_type is the mime type for our .js.erb extension. Note the +ruby which designates that this is an erb file. I think this is a Sprockets convention. This mime type will be very important.

In this case the only params we have are {:type=>"application/javascript"} so we skip over the pipeline case.

We do have a :type so we’ll run that part. The logical_path was trimmed down to remove the extension

puts logical_path.chomp(extname)
# => "users"

Now we pull an extension based off of our mime type and add it to the logical path

puts config[:mime_types][type][:extensions].first
# => ".js"

Putting these together our new logical path is:

"users.js"

We’ll use this later. This should match the original thing we looked for when we used @env.find_asset.

Next comes a sanity check. Either we’re working with a mime type which we’re requesting, or we’re working with a mime type that can be converted to the one we’re requesting. We check our transformers which is an internal concept to Sprockets, see guides/extending_sprockets.md for more info on building a transformer. They essentially allow you to convert one file into another. Sprockets mostly cares about mime types so we check the transformers to see if it’s possible to transfer the existing mime type into the desired mime type i.e. we want to convert application/javascript+ruby to application/javascript.

Next we grab the “processors” for our mime type. These can be transformers as mentioned earlier, or they can be processors such as DirectiveProcessor which is responsible for expanding directives such as //= require foo.js in the top of your file.

Into this processors_for method we also pass a “pipeline”. For now it is nil, which means that the :default pipeline is used.

A pipeline is registered like a transformer or a processor. They’re an internal concept. Here is what the default one looks like

register_pipeline :default do |env, type, file_type|
  # TODO: Hack for to inject source map transformer
  if (type == "application/js-sourcemap+json" && file_type != "application/js-sourcemap+json") ||
      (type == "application/css-sourcemap+json" && file_type != "application/css-sourcemap+json")
    [SourceMapProcessor]
  else
    env.default_processors_for(type, file_type)
  end
end

Here if we’re requesting a sourcemap we only want to run the [SourceMapProcessor] otherwise we find the “default” processors that are valid to our type (in this case application/javascript) from our file_type (in this case application/javascript+ruby). Default processors are defined here:

def default_processors_for(type, file_type)
  bundled_processors = config[:bundle_processors][type]
  if bundled_processors.any?
    bundled_processors
  else
    self_processors_for(type, file_type)
  end
end

Either we return any “bundled” processors for the type or we return “self” processors. In our case there is a bundle processor registered Sprockets::Bundle. It was registered. In sprockets.rb:

require 'sprockets/bundle'
register_bundle_processor 'application/javascript', Bundle

Now we’re back to the loader.rb file. We have our processors array which is simply [Sprockets::Bundle]. We call build_processors_uri. This generates a string like:

"processors:type=application/javascript&file_type=application/javascript+ruby"

This string gets added to the “dependencies”. This array is used for determining cache keys, so if a processor gets added or removed the cache key will change (I think).

Now we have to call each of our processors. First we resolve! the original filename, but with a different pipeline i.e. pipeline: :source. The resolve! method raises an error if the file cannot be found.

After we resolve the file we get a source_uri that looks like this:

"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript+ruby&pipeline=source"

Now here’s where things get complicated (I know right). We load the exact same file that is already being loaded with this new pipeline=source.

Recursive asset loading is recursive (pipeline=source)

At this point we get recursive, we call repeat everything in load_from_unloaded but with pipeline=source. The results should be the same but with a different pipeline. The :source pipeline looks like this:

register_pipeline :source do |env|
  []
end

In this case the processors returned is an empty array [].

We skip over the processor section, and instead hit this:

dependencies << build_file_digest_uri(unloaded.filename)
metadata = {
  digest: file_digest(unloaded.filename),
  length: self.stat(unloaded.filename).size,
  dependencies: dependencies
}

The file is digested to create a “digest” and the length is added via stat. Also “dependencies” are recorded which look like this:

#<Set: {"environment-version", "environment-paths", "processors:type=application/javascript+ruby&file_type=application/javascript+ruby&pipeline=source", "file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"}>

After this we build an asset hash:

asset = {
  uri:          unloaded.uri,
  load_path:    load_path,
  filename:     unloaded.filename,
  name:         name,
  logical_path: logical_path,
  content_type: type,
  source:       source,
  metadata:     metadata,
  dependencies_digest:
                DigestUtils.digest(resolve_dependencies(metadata[:dependencies]))
}

Which is then used to generate a Sprockets::Asset and is returned by our load method.

Jumping back up the stack (pipeline=default)

Now that we have a “source” asset we can go back and finish running the processors for pipeline=default

We did all that work, just to get a digest path:

source_uri, _ = resolve!(unloaded.filename, pipeline: :source)
source_asset = load(source_uri)

source_path = source_asset.digest_path
# => "users.source.js-798a333a5596e1495e1cc4870f11c7729f168350ee5972637053f9691c8dc326.erb"

Which kinda seems insane, maybe we don’t have to need go all recursive to get this tiny piece of information, but whatevs. If there’s one thing I’ve learned from working on Sprockets, is that the code resists refactoring and most of the seemingly “clever” code is actually a very clean way of accomplishing tasks. That is to say, I’m not going to change this without a lot more research.

Now we execute the call_processors pass in our array of processors [Sprockets::Bundle] and our asset hash:

{
  environment:  self,
  cache:        self.cache,
  uri:          unloaded.uri,
  filename:     unloaded.filename,
  load_path:    load_path,
  source_path:  source_path,
  name:         name,
  content_type: type,
  metadata: {
    dependencies:
                dependencies
}

If we had more than one processor this would call each of them in reverse order and merge the results before calling the next. In this case there’s only one processor. Guess it’s time to figure out what that one does.

Bundle processor (still on pipeline=default)

The bundle processor is defined in sprockets/bundle.rb. Open it up to follow along. We pull out dependencies from the hash. For now it is very simple:

#<Set: {"environment-version", "environment-paths", "processors:type=application/javascript&file_type=application/javascript+ruby"}>

The next thing we do is we resolve the file (yes, again) this time using pipeline=self

processed_uri, deps = env.resolve(input[:filename], accept: type, pipeline: :self)

puts processed_uri.inspect
# => "file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"

puts deps.inspect
# => #<Set: {"file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"}>

We merge this deps with the dependencies from earlier. The file-digest:// that was returned from the resolve method indicates that there is a dependency on the contents of the file on disk, if the contents change, the digest should change.

You ready for some more recursion? You better hold onto your butts.

The next thing that happens is we build a proc

find_required = proc { |uri| env.load(uri).metadata[:required] }

This proc takes in a uri and loads it, then returns a set of “required” files. Sprockets uses this proc to do a depth first search of our processed_uri (i.e. the pipeline=self uri). We can look at the dfs now:

def dfs(initial)
  nodes, seen = Set.new, Set.new
  stack = Array(initial).reverse

  while node = stack.pop
    if seen.include?(node)
      nodes.add(node)
    else
      seen.add(node)
      stack.push(node)
      stack.concat(Array(yield node).reverse)
    end
  end

  nodes
end

The purpose of this search is that we want to make sure to only evaluate each file once and only once. Otherwise if we had an a.js that required a b.js that required a c.js that required a.js if we didn’t keep track, then we would be stuck in an infinite loop. There is more involved in making sure infinite loops don’t happen, but that’s maybe for another post.

For the first iteration this creates an array with only our URI in it:

puts stack.inspect
# => ["file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"]

It then adds this uri to the “seen” set and puts it back on the stack. The next line is a little tricky

stack.concat(Array(yield node).reverse)

Here the node is:

"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"

So we call the block with that node, remembering our block is

find_required = proc { |uri| env.load(uri).metadata[:required] }

So our DFS method invokes this block and passes it our pipeline=self uri, which invokes our load method again.

Load recursion kicked off from within bundle (pipeline=self)

I feel like we can’t get out of this load method, here we are again. This is what our pipeline=self looks like:

register_pipeline :self do |env, type, file_type|
  env.self_processors_for(type, file_type)
end

This method `self_processors_for` is non-trivial:

```ruby
def self_processors_for(type, file_type)
  processors = []

  processors.concat config[:postprocessors][type]
  if type != file_type && processor = config[:transformers][file_type][type]
    processors << processor
  end
  processors.concat config[:preprocessors][file_type]

  if processors.any? || mime_type_charset_detecter(type)
    processors << FileReader
  end

  processors
end

First we grab any postprocessors that are registered for application/javascript mime type. There are no postprocessors registered by default, so I don’t know why they exist, but you can register one using register_postprocessor.

Next up, we pull out a transformer for our file type. This returns us a Sprockets::ProcessorUtils::CompositeProcessor. This is a meta processor that contains possibly several transformers. It is generated via a call to register_transformer. In this case the full processor looks like this:

#<struct Sprockets::ProcessorUtils::CompositeProcessor
  # ...
  processors=
   [#<Sprockets::Preprocessors::DefaultSourceMap:0x007fb24d3271a0>,
    #<Sprockets::DirectiveProcessor:0x007fb24d356400
     @header_pattern=/\A(?:(?m:\s*)(?:(?:\/\/.*\n?)+|(?:\/\*(?m:.*?)\*\/)))+/>,
    Sprockets::ERBProcessor]>

It’s doing some things with source maps and you can see now we have our ERBProcessor in there as well a DirectiveProcessor.

Next up, we gather any preprocessors, of which there are none. Finally, if there are any processors in our list we add a FileReader if we detect that it is not binary. Sprockets assumes a text file if the mime type has a charset defined. This is pretty standard.

So now we have our meta CompositeProcessor as well as our FileReader processor.

Now we call each of the processors in reverse order. First up is the FileReader.

class FileReader
  def self.call(input)
    env = input[:environment]
    data = env.read_file(input[:filename], input[:content_type])
    dependencies = Set.new(input[:metadata][:dependencies])
    dependencies += [env.build_file_digest_uri(input[:filename])]
    { data: data, dependencies: dependencies }
  end
end

It takes in filename, reads that file from disk and adds to the :data key of the hash. It also adds a dependency of the file, in case there isn’t already one:

"file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"

After the file is done being read from disk, next up is the CompositeProcessor. This delegates to its other processors in reverse order so these get called

Sprockets::ERBProcessor
#<Sprockets::DirectiveProcessor:0x007f85b1322448 @header_pattern=/\A(?:(?m:\s*)(?:(?:\/\/.*\n?)+|(?:\/\*(?m:.*?)\*\/)))+/>
#<Sprockets::Preprocessors::DefaultSourceMap:0x007f85b12f33a0>

First up is the ERBProcessor, it takes the input[:data] which is the contents of the file and runs it through an ERB processor. There’s a little magic in that file to detect if someone is using an ENV variable in their erb, in which case we auto add that as a dependency.

Next the DirectiveProcessor looks for any directives such as //= require foo.js of which there are none in this file. Finally we call DefaultSourceMap. This processor adds a 1-to-1 source map if one isn’t already generated. If you’re not familiar with source maps check out guides/source_maps.md which has some of my notes.

Now all of our processors for pipeline=self have been run, the load call completes and now we go back to where we were in our Bundle processor for pipeline=default.

Return to Bundle for (pipeline=default)

You may remember that we were in the middle of a depth first search.

def dfs(initial)
  nodes, seen = Set.new, Set.new
  stack = Array(initial).reverse

  while node = stack.pop
    if seen.include?(node)
      nodes.add(node)
    else
      seen.add(node)
      stack.push(node)
      stack.concat(Array(yield node).reverse)
    end
  end

  nodes
end

ALL that last section happened during the yield node section of this code. The return was an array of dependencies, which are reversed and added onto the stack. In our case there are no “required” files for file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self so that yield call returns an empty set`.

The only node on the stack has already been “seen” so it is added to our nodes set. This was the last thing on the stack so we return that array. Our required list looks like this:

#<Set: {"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"}>

If we were using a required directive such as //= require foo.js then we would have more things in this set. Another concept that Sprockets has is a “stubbed” list. Gonna be totally honest, I have no idea why you would need it but it is there. From the method docs: “Allows dependency to be excluded from the asset bundle”. So there you go. To get this list we call into load AGAIN

stubbed  = Utils.dfs(env.load(processed_uri).metadata[:stubbed], &find_required)

Though there is one thing I never mentioned, not all calls to load are created equal:

Cached Environment

Something I’ve failed to mention is that all calls to an env are not created equal. There is a Sprockets::Environment and a Sprockets::CachedEnvironment. The cached environment wraps the Sprockets::Environment and caches certain calls such as load so in the above example env.load(processed_uri) is returning a cached value and not actually calling into load, that’s a relief.

It turns out that this whole time I was somewhat misleading you, we weren’t using the version of fine_asset from Sprockets::Base but rather we were using Sprockets::Environment

def find_asset(*args)
  cached.find_asset(*args)
end

This call to cached creates a `CachedEnvironment object:

def cached
  CachedEnvironment.new(self)
end

Now any duplicate calls to load (with the EXACT same url) will return a cached copy. The rest of the implementation of find_asset is from the Sprockets::Base.

The first time we hit the cache in this example was with

file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript+ruby&pipeline=source

It is first put in the cache at:

/projects/sprockets/lib/sprockets/loader.rb:149:in `load_from_unloaded'

Note some of my line numbers might not match perfectly due to changes in source, also I’m adding in debug statements etc.

Or this line:

source_uri, _ = resolve!(unloaded.filename, pipeline: :source)
source_asset = load(source_uri) # <========== THIS LINE ===========

source_path = source_asset.digest_path

When we pull it from cache we do so in the bundle processor:

/projects/sprockets/lib/sprockets/bundle.rb:35:in `block in call'

Which corresponds to this code:

(required + stubbed).each do |uri|
  dependencies.merge(env.load(uri).metadata[:dependencies]) #< === Called from cache here
end

Which brings us back to the bundle processor we were looking at before:

Finish the bundle processor (pipeline=default)

We loop through our required set (which is #<Set: {"file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self"}>) minus our stubbed set (which is empty).

For each of these we merge in dependencies. Our final dependencies set looks like this:

#<Set: {
  "environment-version",
  "environment-paths",
  "processors:type=application/javascript&file_type=application/javascript+ruby&pipeline=self",
  "file-digest:///projects/sprockets/test/fixtures/asset/users.js.erb"}>

We then look up “reducers” and get back a hash of keys and callable objects:

{:data=>
  [
    #<Proc:0x007ffef7b74460@/Users/richardschneeman/Documents/projects/sprockets/lib/sprockets.rb:129>,
    #<Proc:0x007ffef7b74398 (lambda)>
  ],
:links=>
  [
    nil,
    #<Proc:0x007ffef7b74118(&:+)>
  ],
:sources=>
  [
    #<Proc:0x007ffef8027c50@/Users/richardschneeman/Documents/projects/sprockets/lib/sprockets.rb:131>,
    #<Proc:0x007ffef7b74118(&:+)>
  ],
:map=>
  [
    #<Proc:0x007ffef8027278@/Users/richardschneeman/Documents/projects/sprockets/lib/sprockets.rb:132>,
    #<Proc:0x007ffef8027070 (lambda)>
  ]
}

A reducer can be registered like so:

register_bundle_metadata_reducer '*/*', :data, proc { String.new("") }, :concat
register_bundle_metadata_reducer 'application/javascript', :data, proc { String.new("") }, Utils.method(:concat_javascript_sources)
register_bundle_metadata_reducer '*/*', :links, :+
register_bundle_metadata_reducer '*/*', :sources, proc { [] }, :+
register_bundle_metadata_reducer '*/*', :map, proc { |input| { "version" => 3, "file" => PathUtils.split_subpath(input[:load_path], input[:filename]), "sections" => [] } }, SourceMapUtils.method(:concat_source_maps)

It acts on a key such as :data to transform or “reduce” individual keys.

If we had some “required” files do to the directive processor

assets = required.map { |uri| env.load(uri) }

Then this last line is where they would be concatenated via our reducers:

process_bundle_reducers(input, assets, reducers).merge(dependencies: dependencies, included: assets.map(&:uri))

In this case our only “required” asset is from file:///projects/sprockets/test/fixtures/asset/users.js.erb?type=application/javascript&pipeline=self which is important because you’ll remember that the pipeline=self is when the FileReader and ERBProcessor were run.

Finally we can return from our original pipeline=nil/:default case since all of our pipelines have been executed. In our original call to load.

The rest of the code is just doing things like taking digests and building hashes, we’ve already covered it in a previous section.

Finally a Sprockets::Asset is generated and returned from our original @env.find_asset invocation.

Yay!

2020 Hindsite

There’s a few confusing things going on here. It isn’t always clear that calls to an env are going to CachedEnvironment and its even less clear if we’re calling something that has already been cached or loading something new.

The pattern of loading files that Sprockets uses is a reactor. It stores state via pipeline=<whatever> and essentially loops with different pipeline variations until it gets its desired output. While this is very powerful, it’s also really hard to wrap your brain around. Most of the code, especially in the Bundle processor are indecipherable if you don’t know minute details about how things work inside of all of Sprockets. These two designs, the recursive-ish load reactor pattern and the CachedEnvironment are sometimes difficult to wrap your mind around. Especially this pattern of loading files creates a forking back trace, so if you’re trying to debug it’s not always immediately clear what’s going on. Debug statements are usually output several times per each method call.

The other thing that makes Sprockets hard to understand is the plugin ecosystem. Sprockets is less a library and more a framework that uses itself to build an asset processing framework. Things like transformers, preprocessors, compressors, bundle_processors, etc. make it confusing exactly where work gets done. Some of the processors are highly coupled, such as the Bundle processor and the DirectiveProcessor. Again it’s extremely powerful and makes the library very flexible but difficult to reason about.

Much of Sprockets resists refactoring. Many of the design decisions are very coupled to the implementation. I’ve spent hours trying to tease out CachedEnvironment into something else, but eventually gave up. One thing to consider if you’re prone to judging code like I am, this project is 70%+ written by one person. These design decisions are all very powerful and many times very beautiful in their simplicity. If you’re the only one that works on a project, sometimes it pays to pick a powerful abstraction over one easier to read and understand.

I’ve got some ideas on how we could tease some abstractions, but it’s a hard thing to do. We have to be backwards compatible, and bake in room for future features & growth. We also need to be performance conscious.

There are other features that I haven’t covered in this example such as how files get written to disk, and how manifest files are generated, but how an asset gets loaded is complicated enough for now. How is your life better now that you know how the “F” Sprockets loads an asset? I have no idea, but I’m sure there’s something good about it. If you enjoyed this technical deep dive check out my post where I live-blog a writing a non-trivial Rails feature. Thanks for reading!


If you liked this post (or even if you didn’t) you can subscribe to my mailing list to get updates when I post new content. I average a little less than a post a week, often fewer. The more subscribers I get, the more incentive I have to put out content consistently.