Node.js Startup: Speeding up Snapshot Deserialization

In the previous post of this series, we successfully captured a profile of Node.js’s startup. Looking at the profile, we see that around 1/3rd of Node.js’s startup time was in two functions: v8::internal::Snapshot::Initialize and v8::Context::FromSnapshot. But what is a snapshot? A snapshot is a serialized version of V8’s heap which can be deserialized later. This allows embedders like Node.js to quickly load from a snapshot, rather than redoing all their bootstrapping all the time. You can learn more at V8’s blog.

I also profiled d8, V8’s minimal Javascript shell. It spent a similar 1/3rd of its time deserializing from the snapshot, but d8 loads three times faster, so the absolute snapshot deserialization time is 3x faster. My next question was… what’s in Node.js’s snapshot and why is it so much slower?

There is a V8 flag we can give called --profile-deserialization, which outputs some timings:¹

$ out/Release/d8 --profile-deserialization -e 0
[Verifying snapshot checksum took 0.196 ms]
[Deserializing isolate (201600 bytes) took 1.917 ms]
[Deserializing context #0 (49616 bytes) took 0.356 ms]

whereas for Node.js:

$ ./node_main --profile-deserialization -e 0 | head -n3
[Verifying snapshot checksum took 0.827 ms]
[Deserializing isolate (1449656 bytes) took 5.886 ms]
[Deserializing context #3 (379640 bytes) took 2.526 ms]

Node.js’s serialized isolate and context are both around 7x the size of d8’s, which probably explains why deserializing it is slower. An isolate is an instance of the V8 engine (along with its heap) while a context is a global object of an isolate (an isolate can actually have multiple contexts). Next up I was curious what was actually in the heap. Continuing on the flag exploration, --serialization-statistics can be used to dump statistics on what object types are in the snapshot. I had to pass it to Node.js’s snapshot creation script:

diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index d6d92ab156..226c1efa0e 100644
--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -53,6 +53,7 @@ int main(int argc, char* argv[]) {
 
   v8::V8::SetFlagsFromString("--random_seed=42");
   v8::V8::SetFlagsFromString("--harmony-import-assertions");
+  v8::V8::SetFlagsFromString("--serialization-statistics");
   return BuildSnapshot(argc, argv);
 }

Here is the output of that, slightly cleaned up & truncated:

StartupSerializer:
  Spaces (bytes):
 read_only_space       new_space       old_space
               0         1754464          192768
Instance types (count and bytes):
        65     956216  new_space  EXTERNAL_ONE_BYTE_STRING_TYPE
       215     291768  new_space  FIXED_ARRAY_TYPE
      1004     192768  old_space  CODE_TYPE
      2492     139552  new_space  SHARED_FUNCTION_INFO_TYPE
      1728      61776  new_space  ONE_BYTE_STRING_TYPE
       194      47904  new_space  SCOPE_INFO_TYPE
       355      42600  new_space  FUNCTION_TEMPLATE_INFO_TYPE

Certainly EXTERNAL_ONE_BYTE_STRING_TYPE looks suspicious. Why are we storing 1 MB of strings in the snapshot?

It turns out that this megabyte of external string data is Node.js’s Javascript code. As an optimization, Node.js includes a copy of all of its Javascript code inside its executable via a program called js2c. It then uses some low-level V8 functionality called external strings to avoid copying the Javascript code on to the Javascript heap. (When using external strings, V8 will store a pointer to the string data rather than copying the string data.)

However, when taking a snapshot, V8 will copy the external string data into the snapshot. This meant that the snapshot duplicated external string content which was already in the executable. This both bloated the size of the snapshot, and the size of the deserialized Javascript heap (since it contained the actual Javascript code, rather than just a pointer to it if it used external strings).

Fortunately V8 provides the necessary functionality to allow us to serialize snapshots while allowing us to still supply “external” data to the snapshot. V8 allows us to register “external references” in the snapshot. When serializing the snapshot, we provide a list of external references. V8 will replace each external reference with its index in that list. When deserializing the snapshot, we provide the same list of external references, and V8 will replace each index with the value in the list.

If we register the references to the external strings, V8 is smart enough to avoid copying the external string data into the snapshot and use the external reference functionality to properly revive the external strings during deserialization. This means that we no longer need to store the string data in the snapshot, cutting its size significantly:

$ ./node_main --profile-deserialization -e 0 |& head -n3
[Verifying snapshot checksum took 0.827 ms]
[Deserializing isolate (1449656 bytes) took 5.886 ms]
[Deserializing context #3 (379640 bytes) took 2.526 ms]
$ ./node --profile-deserialization -e 0 |& head -n3
[Verifying snapshot checksum took 0.443 ms]
[Deserializing isolate (434168 bytes) took 4.880 ms]
[Deserializing context #3 (379640 bytes) took 2.651 ms]

Doing this speeds up startup by 1ms (4%), and also saves 1.0 MiB of memory on each Node.js process:

There’s probably more startup performance to squeeze out by cutting unnecessary data from the snapshot. However there’s a tradeoff: you could imagine eagerly adding commonly loaded modules to the snapshot, which will improve the startup performance of real-world applications which use the preloaded module at the cost of slowing down the minimal startup. For now, we’ll take the free lunch win and move on from focusing on snapshots for a little, while we continue the rest of the series.