Node.js Startup: Series Introduction & Measuring Startup

Tags: ,

This blog post is part of a series to see how much I can optimize Node.js’s startup time. Startup time is something that users care about, especially for interactive tooling or for workloads with many short-lived processes. The most important step of performance analysis is measurement, so let’s start by measuring Node.js’s startup time.

I decided to measure the time to execute node -e 0, which simply evaluates the no-op expression of “0”. I focused on “warm startup”, i.e. when the various file system caches were already warm. This felt more realistic since mostly when you care about startup time, it’s for something you’ll be executing often, so the operating system will already have the executable and shared libraries paged in to memory.

Here’s a boxplot of the runtimes of older versions of Node.js, along with the main branch (“main”, 4b80a7b0c404e). As a teaser for the rest of this series, I’ve also included my WIP branch (“mine”).

As you can see, I’ve got a branch where the startup is faster than it’s been in a long time (at least since 2017). It’s partially offset by a minor regression in main. There’s not much variance in runtimes, so the boxplot looks smushed.

Typical process startup is memory intensive, so optimizing startup time will likely optimize memory usage as well, and vice-versa. Here’s the same graph except focusing on memory usage.

It’s not as impressive as the runtime graph unfortunately: again it’s fighting a regression in main. The final results bring us back to v17.9.1 (released June 2022), but still 2.3 MiB above the glory days of 15.4.0 (released April 2021).

Node.js also provides its own startup benchmarks, which we can check to verify our results.

$ node benchmark/compare.js --old ./node_main --new ./node \
    --runs 10 --filter startup misc > results.csv

$ node-benchmark-compare results.csv

                           confidence improvement
process require-builtins   ***        35.96 %
process semicolon          ***        37.63 %
worker  require-builtins   ***        34.96 %
worker  semicolon          ***        34.80 %

  0.00 false positives, when considering a 0.1% risk acceptance (***)

In addition to speeding up the no-op benchmark (semicolon), we’ll also be speeding up the overall performance of requiring Node.js’s builtin library (require-builtins).

Check out the rest of this series to see how we’ll achieve this amazing feat!


Extra: How I made the graphs

All steps were performed on an Amazon EC2 Linux instance running Debian 10. First, I downloaded a bunch of old versions of Node.js using nodeenv.

for n in 9.11.2 11.15.0 13.14.0 15.14.0 17.9.1 19.9.0; do
  nodeenv $n -n $n &
done
wait

I also built the “main” branch on the day where I started this project (SHA 4b80a7b0c404e) as a comparison point. I did so by creating a release tarball via CUSTOMTAG=t DISTTYPE=custom make -j$(nproc) binary, and extracting it to main/.

I used hyperfine to benchmark the runtime. Executing warmup iterations via --warmup was important to avoid outliers, since Node.js’s startup is very IO heavy.

hyperfine --export-json timings.json \
    -L node_version 9.11.2,11.15.0,13.14.0,15.14.0,17.9.1,19.9.0,main,mine \
    --shell=none --warmup 100 './{node_version}/bin/node -e 0'

This exports a timings.json file. Although hyperfine comes with some utilities for graphing it, I prefer Vega-Lite. Vega-Lite’s builtin transform functionality is sufficient to convert hyperfine’s format into one that Vega-Lite can use for graphing:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "url": "/assets/node-startup-runtime-data.json",
    "format": {"type": "json", "property": "results"}
  },
  "transform": [
    {"flatten": ["times"]},
    {"calculate": "1000*datum.times", "as": "times"},
  ],
  "mark": {"type": "boxplot", "extent": "min-max"},
  "encoding": {
    "x": {
      "field": "parameters.node_version",
      "title": "Node.js version",
      "type": "nominal",
      "sort": [],
      "axis": {"labelAngle": 0}
    },
    "y": {
      "field": "times",
      "type": "quantitative",
      "title": "Time (ms)"
    },
    "color": {
      "title": "Node.js version",
      "field": "parameters.node_version",
      "type": "nominal",
      "sort": []
    }
  },
  "config": {"numberFormat": ".3"},
  "title": {"text": "Startup time of Node.js over the years"},
  "width": "container", "height": 500
}

To measure memory usage, I decided to look at “unique set size” (USS). Unique set size is a measure of how much memory an individual process adds, i.e. excluding any memory shared by any other process. Resident set size (RSS) is also interesting, but it includes all the process’s memory, a lot of which will be shared (like shared libraries, the node binary itself, etc.), so it’s not as meaningful for our purposes. USS is measured by smem, but actually collecting the data required some ugly Bash:

for i in {1..30}; do
  for n in 9.11.2 11.15.0 13.14.0 15.14.0 17.9.1 19.9.0 main mine; do
    # Fire off two node processes, so that the executable and
    # shared libraries will be shared.
    $n/bin/node --expose-gc -e 'gc(),gc();while(1);' &
    $n/bin/node --expose-gc -e 'gc(),gc();while(1);' &

    # Delay until startup/GCs hopefully finish.
    sleep 1

    # Collect the memory used by all processes.
    smem | tee -a $n/smem_results

    # Kill the node processes.
    kill $(jobs -p)
  done
done

And now some extra ugly Bash to convert it into JSON. Note that each smem call gives us two node processes, so we use paste/awk to select the one which has a larger USS.

for n in 9.11.2 11.15.0 13.14.0 15.14.0 17.9.1 19.9.0 main mine; do
  printf '{"node_version":"%s","memory":[%s]},\n' $n \
      $(< $n/smem_results grep /node \
          | awk '{print $(NF-2)}' \
          | paste - - | awk '{print ($1>$2?$1:$2)}' \
          | paste -sd,)
done

And finally, the data goes into Vega-Lite to make a pretty chart.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "url": "/assets/node-startup-memory-data.json",
    "format": {"type": "json"}
  },
  "transform": [
    {"flatten": ["memory"]},
    {"calculate": "datum.memory/1024", "as": "memory"}
  ],
  "mark": {"type": "boxplot", "extent": "min-max"},
  "encoding": {
    "x": {
      "field": "node_version",
      "title": "Node.js version",
      "type": "nominal",
      "sort": [],
      "axis": {"labelAngle": 0}
    },
    "y": {
      "field": "memory",
      "type": "quantitative",
      "title": "Unique Set Size (MiB)"
    },
    "color": {
      "title": "Node.js version",
      "field": "node_version",
      "type": "nominal",
      "sort": []
    },
  },
  "config": {"numberFormat": ".2"},
  "title": {
    "text": "Unique set size of Node.js over the years"
  },
  "width": "container",
  "height": 500
}
Posted on 2023-05-09