A Compiler For The Web

“Compilation” - the translation of code from one language into another - is the manufacturing step of software development. During compilation, the source code, which is written with a human reader in mind and which uses human-friendly abstractions, becomes something the machine can execute. It is during this manufacturing step that a specific design (the application's source code) is realized in a form that can be delivered to users (or rather, executed by their browsers).

Historically, Javascript has had no compilation process. Design and manufacturing were a single process: the browser environment allows developers to write scripts exactly as they'll be delivered by the browser, with no intervening steps. That's a useful property: most notably, it enables the “edit, save, and reload” iteration process that's so popular and so pleasant to work with. However, Javascript's target environment has a few weaknesses that limit the scale of the project you can write this way:

Historically, the Javascript community has been fairly reluctant to move away from the rapid iteration process provided by the native Javascript ecosystem in the browser. In the last few years, web application development has reached a stage of maturity where those two problems have much more influence over culture and decision-making than they have in the past, so that attitude has started to change. In the last few years we've seen the rise of numerous Javascript translators (compilers, by another name), and frameworks for executing those translators in a repeatable, reproducible way.

An Aside About Metaphors

Physical manufacturing processes tend to have cost structures where the design step is, unit-wise, expensive, but happens once, while manufacturing is unit-wise quite cheap, but happens endlessly often over the life of the product. Software manufacturing processes are deeply weird by comparison. In software, the design step is, unit-wise, even more expensive, and it happens repeatedly to what is notionally the same product, over most of its life, while the manufacturing step happens a single time, for so little cost that it's rarely worth accounting for.

It's taken a long time to teach manufacturing-trained business people to stop treating development - the design step - like a manufacturing step, but we're finally getting there. Unfortunately, unlike physical manufacturing, software manufacturing is so highly automated that it produces no jobs, even though it's complex enough to support an entire ecosystem of sophisticated, high-quality tools. A software “factory,” for all intents and purposes, operates for free

Webpack

Webpack is a compiler system for the web.

Webpack's compilation process ingests human-friendly source code in a number of languages: primarily Javascript, but in principle any language that can be run by some service the browser provides, including CSS, images, text, and markup. With the help of extensions, it can even ingest things the browser can't serve, such as ES2015 Javascript, or Sass files. It emits, as a target, “bundles” of code which can be loaded using the native tools provided by the browser platform: script tags, stylesheet links, and so on.

It provides, out of the box, solutions to the two core problems of browser development. Webpack provides a lightweight, non-novel module system to allow developers to write applications as a system of modules with well-defined interfaces, even though the browser environment does not have a module loader. Webpack also provides a system of “loaders” which can apply transformations to the input code, which can include the replacement of novel language features with their more-complex equivalents in the browser.

Webpack differentiates itself from its predecessors in a few key ways:

Webpack is not without tradeoffs, however.

On the balance, I've been very impressed with Webpack, and have found it to be a pretty effective way to work with browser applications. If you're not using something like Ember that comes with a pre-baked toolkit, then you can probably improve your week by using Webpack to build your Javascript apps.

Tiny Decisions

To give a sense of what using Webpack is like, here's my current webpack.config.js, annotated with the decisions I've made so far and some of the rationales behind them.

This setup allows me to run webpack on the CLI to compile my sources into a working app, or webpack --watch to leave Webpack running to recompile my app for me as I make changes to the sources. The application is written using the React framework, and uses both React's JSX syntax for components and many ES2105 language features that are unavailable in the browser. It also uses some APIs that are available in some browsers but not in others, and includes polyfills for those interfaces.

You can see the un-annotated file on Github.

'use strict'

var path = require('path')
var keys = require('lodash.keys')

I want to call this require out - I've used a similar pattern in my actual app code. Lodash, specifically, has capability bundles that are much smaller than the full Lodash codebase. Using var _ = require('lodash') grows the bundle by 500kb or so, while this only adds about 30kb.

var webpack = require('webpack')
var HtmlWebpackPlugin = require('html-webpack-plugin')
var ExtractTextPlugin = require("extract-text-webpack-plugin")

var thisPackage = require('./package.json')

We'll see where all of these requires get used later on.

module.exports = {
  entry: {
    app: ['app.less', 'app'],
    vendor: keys(thisPackage.dependencies),
  },

Make two bundles:

This config also invents a third bundle, below. I'll talk about that when I get there.

A lot of this bundle structure is motivated by the gargantuan size of the libraries I'm using. The vendor bundle is approximately two megabytes in my real app, and includes not just React but a number of supporting libraries. Reusing the vendor bundle between versions helps cut down on the number of times users have to download all of that code. I need to address this, but being conscious of browser caching behaviours helps for now.

  resolve: {
    root: [
      path.resolve("src"),
    ],

Some project layout:

All inputs go into a single directory, to simplify Webpack file lookups. Separating inputs by type (js, jsx, less, etc) would be consistent with other tools, but makes operating Webpack much more complicated.

    // Automatically resolve JSX modules, like JS modules.
    extensions: ["", ".webpack.js", ".web.js", ".js", ".jsx"],
  },

This is a React app, so I've added .jsx to the list of default suffixes. This allows constructs like var MyComponent = require('MyComponent') to behave as developers expect, without requiring the consuming developer to keep track of which language MyComponent was written in.

I could also have addressed this by treating all .js files as JSX sources. This felt like a worse option; the JSX preprocessing step looks safe on pure-JS sources, but why worry about it when you can be explicit about which parser to use?

  output: {
    path: path.resolve("dist/bundle"),
    publicPath: "/bundle/",

More project layout:

I've set publicPath so that dynamically-loaded chunks (if you use require.ensure, for example) end up with the right URLs.

    filename: "[name].[chunkhash].js",

Include a stable version hash in the name of each output file, so that we can safely set Cache-Control headers to have browsers store JS and stylesheets for a long time, while maintaining the ability to redeploy the app and see our changes in a timely fashion. Setting a long cache expiry for these means that the user only pays the transfer costs (power, bandwidth) for the bundles on the first pageview after each deployment, or after their browser cache forgets the site.

For each bundle, so long as the contents of that bundle don't change, neither will the hash. Since we split vendor code into its own chunk, often the vendor bundle will end up with the same hash even in different versions of the app, further cutting down the number of times the user has to download the (again, massive) dependencies.

  },

  module: {
    loaders: [
      {
        test: /\.js$/,
        exclude: /node_modules/,
        loader: "babel",
        query: {
          presets: ['es2015'],
          plugins: ['transform-object-rest-spread'],
        },
      },

You don't need this if you don't want it, but I've found ES2015 to be a fairly reasonable improvement over Javascript. Using an exclude, we treat local JS files as ES2015 files, translating them with Babel before including them in the bundle; I leave modules included from third-party dependencies alone, because I have no idea whether I should trust Babel to do the right thing with someone else's code, or whether it already did the right thing.

I've added transform-object-rest-spread because the app I'm working on makes extensive use of return {...state, modified: field} constructs, and that syntax is way easier to work with than the equivalent return Object.assign({}, state, {modified: field}).

      {
        test: /\.jsx$/,
        exclude: /node_modules/,
        loader: "babel",
        query: {
          presets: ['react', 'es2015'],
          plugins: ['transform-object-rest-spread'],
        },
      },

Do the same for local .jsx files, but additionally parse them using Babel's React driver, to translate <SomeComponent /> into approprate React calls. Once again, leave the parsing of third-party code alone.

      {
        test: /\.less$/,
        exclude: /node_modules/,
        loader: ExtractTextPlugin.extract("css?sourceMap!less?sourceMap"),
      },

Compile .less files using less-loader and css-loader, preserving source maps. Then feed them to a plugin whose job is to generate a separate .css file, so that they can be loaded by a <link> tag in the HTML document. The other alternative, style-loader, relies on DOM manipulation at runtime to load stylesheets, which both prevents it from parallelizing with script loading and causes some additional DOM churn.

We'll see where ExtractTextPlugin actually puts the compiled stylesheets later on.

    ],
  },

  plugins: [
    new webpack.optimize.OccurrenceOrderPlugin(/* preferEntry=*/true),

This plugin causes webpack to order bundled modules such that the most frequently used modules have the shortest identifiers (lexically; 9 is shorter than 10 but the same length as 2) in the resulting bundle. Providing a predictable ordering is irrelevant semantically, but it helps keep the vendor bundle ordered predictably.

    new webpack.optimize.CommonsChunkPlugin({
      name: 'vendor',
      minChunks: Infinity,
    }),

Move all the modules the vendor bundle depends on into the vendor bundle, even if they would otherwise be placed in the app bundle. (Trust me: this is a thing. Webpack's algorithm for locating modules is surprising, but consistent.)

    new webpack.optimize.CommonsChunkPlugin({
      name: 'boot',
      chunks: ['vendor'],
    }),

Hoo boy. This one's tricky to explain, and doesn't work very well regardless.

The facts:

  1. This creates the third bundle (“boot.[chunkhash].js”) I mentioned above, and makes the contents of the vendor bundle “children” of it.

  2. This plugin will also put the runtime code, which includes both its module loader (which is the same from build to build) and a table of bundle hashes (which is not, unless the bundles are the same), in the root-most bundle.

  3. I really don't want the hash of the vendor bundle changing without a good reason, because the vendor bundle is grotesquely bloated.

This code effectively moves the Webpack runtime to its own bundle, which loads quickly (it's only a couple of kilobytes long). This bundle's hash changes on nearly every build, so it doesn't get reused between releases, but by moving that change to this tiny bundle, we get to reuse the vendor bundle as-is between releases a lot more often.

Unfortunately, code changes in the app bundle can cause the vendor bundle's constituent modules to be reordered or renumbered, so it's not perfect: sometimes the vendor bundle's hash changes between versions even though it contains an identical module list with different identifiers. So it goes: the right fix here is probably to shrink the bundle and to re-merge it into the app bundle.

    new ExtractTextPlugin("[name].[contenthash].css"),

Emit collected stylesheets into a separate bundle, named after the entry point. Since the only entry point with stylesheets is the app entry point, this creates app.[hash].css in the dist/bundle directory, right next to app.[hash].js.

    new HtmlWebpackPlugin({
      // put index.html outside the bundle/ subdir
      filename: '../index.html',
      template: 'src/index.html',
      chunksSortMode: 'dependency',
    }),

Generate the entry point page from a template (PROJECT/src/index.html), rather than writing it entirely by hand.

You may have noticed that all four of the bundles generated by this build have filenames that include generated chunk hashes. This plugin generates the correct <script> tags and <link> tags to load those bundles and places them in dist/index.html, so that I don't have to manually correct the index page every time I rebuild the app.

  ],

  devtool: '#source-map',

Make it possible to run browser debuggers against the bundled code as if it were against the original, unbundled module sources. This generates the source maps as separate files and annotates the bundle with a link to them, so that the (bulky) source maps are only downloaded when a user actually opens the debugger. (Thanks, browser authors! That's a nice touch.)

The source maps contain the original, unmodified code, so that the browser doesn't need to have access to a source tree to make sense of them. I don't care if someone sees my sources, since the same someone can already see the code inside the webpack bundles.

}

Things yet to do: