Node.js Startup: Speeding up Snapshot Deserialization
Tags: nodejs-startup, low-level
In the previous
post of this
series, we successfully captured a profile of Node.js’s
startup. Looking at the profile, we see that around 1/3rd of Node.js’s
startup time was in two functions: v8::internal::Snapshot::Initialize
and v8::Context::FromSnapshot
. But what is a snapshot? A snapshot is
a serialized version of V8’s heap which can be deserialized later. This
allows embedders like Node.js to quickly load from a snapshot, rather
than redoing all their bootstrapping all the time. You can learn more
at V8’s blog.
I also profiled d8, V8’s minimal Javascript shell. It spent a similar 1/3rd of its time deserializing from the snapshot, but d8 loads three times faster, so the absolute snapshot deserialization time is 3x faster. My next question was… what’s in Node.js’s snapshot and why is it so much slower?
There is a V8 flag we can give called --profile-deserialization
, which
outputs some timings:1
/Release/d8 --profile-deserialization -e 0
$ out[Verifying snapshot checksum took 0.196 ms]
[Deserializing isolate (201600 bytes) took 1.917 ms]
[Deserializing context #0 (49616 bytes) took 0.356 ms]
whereas for Node.js:
./node_main --profile-deserialization -e 0 | head -n3
$ [Verifying snapshot checksum took 0.827 ms]
[Deserializing isolate (1449656 bytes) took 5.886 ms]
[Deserializing context #3 (379640 bytes) took 2.526 ms]
Node.js’s serialized isolate and context are both around 7x the size of d8’s,
which probably explains why deserializing it is slower. An isolate is an
instance of the V8 engine (along with its heap) while a context is a global
object of an isolate (an isolate can actually have multiple contexts). Next up
I was curious what was actually in the heap. Continuing on the flag
exploration, --serialization-statistics
can be used to dump statistics on
what object types are in the snapshot. I had to pass it to Node.js’s snapshot
creation script:
diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index d6d92ab156..226c1efa0e 100644--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -53,6 +53,7 @@ int main(int argc, char* argv[]) {
v8::V8::SetFlagsFromString("--random_seed=42");
v8::V8::SetFlagsFromString("--harmony-import-assertions");+ v8::V8::SetFlagsFromString("--serialization-statistics");
return BuildSnapshot(argc, argv); }
Here is the output of that, slightly cleaned up & truncated:
:
StartupSerializer(bytes):
Spaces
read_only_space new_space old_space0 1754464 192768
(count and bytes):
Instance types 65 956216 new_space EXTERNAL_ONE_BYTE_STRING_TYPE
215 291768 new_space FIXED_ARRAY_TYPE
1004 192768 old_space CODE_TYPE
2492 139552 new_space SHARED_FUNCTION_INFO_TYPE
1728 61776 new_space ONE_BYTE_STRING_TYPE
194 47904 new_space SCOPE_INFO_TYPE
355 42600 new_space FUNCTION_TEMPLATE_INFO_TYPE
Certainly EXTERNAL_ONE_BYTE_STRING_TYPE
looks suspicious. Why are we storing
1 MB of strings in the snapshot?
It turns out that this megabyte of external string data is Node.js’s Javascript code. As an optimization, Node.js includes a copy of all of its Javascript code inside its executable via a program called js2c. It then uses some low-level V8 functionality called external strings to avoid copying the Javascript code on to the Javascript heap. (When using external strings, V8 will store a pointer to the string data rather than copying the string data.)
However, when taking a snapshot, V8 will copy the external string data into the snapshot. This meant that the snapshot duplicated external string content which was already in the executable. This both bloated the size of the snapshot, and the size of the deserialized Javascript heap (since it contained the actual Javascript code, rather than just a pointer to it if it used external strings).
Fortunately V8 provides the necessary functionality to allow us to serialize snapshots while allowing us to still supply “external” data to the snapshot. V8 allows us to register “external references” in the snapshot. When serializing the snapshot, we provide a list of external references. V8 will replace each external reference with its index in that list. When deserializing the snapshot, we provide the same list of external references, and V8 will replace each index with the value in the list.
If we register the references to the external strings, V8 is smart enough to avoid copying the external string data into the snapshot and use the external reference functionality to properly revive the external strings during deserialization. This means that we no longer need to store the string data in the snapshot, cutting its size significantly:
./node_main --profile-deserialization -e 0 |& head -n3
$ [Verifying snapshot checksum took 0.827 ms]
[Deserializing isolate (1449656 bytes) took 5.886 ms]
[Deserializing context #3 (379640 bytes) took 2.526 ms]
./node --profile-deserialization -e 0 |& head -n3
$ [Verifying snapshot checksum took 0.443 ms]
[Deserializing isolate (434168 bytes) took 4.880 ms]
[Deserializing context #3 (379640 bytes) took 2.651 ms]
Doing this speeds up startup by 1ms (4%), and also saves 1.0 MiB of memory on each Node.js process:
There’s probably more startup performance to squeeze out by cutting unnecessary data from the snapshot. However there’s a tradeoff: you could imagine eagerly adding commonly loaded modules to the snapshot, which will improve the startup performance of real-world applications which use the preloaded module at the cost of slowing down the minimal startup. For now, we’ll take the free lunch win and move on from focusing on snapshots for a little, while we continue the rest of the series.