Unwinding Node.js/V8 Javascript stacks in eBPF

ustackjs is a Node.js/V8 Javascript stack unwinder in eBPF, allowing you to view backtraces for native C++ code. It is available here.

To see how to use it, let’s consider the following Javascript program.

function foo() { return new Uint8Array(1024); }
function bar() { foo(); }
bar();

Allocating a Uint8Array eventually calls a C++ function known as “v8::Isolate::AdjustAmountOfExternalAllocatedMemory”. You can trace calls to this function using ustackjs. First, we will need to get the mangled name:

$ nm `which node` | grep AdjustAmountOfExternalAllocatedMemory
0000000000b9ac00 T _ZN2v87Isolate37AdjustAmountOfExternalAllocatedMemoryEl
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...

Now we can pass that to ustackjs.py to see all the callsites.

$ sudo python3 ustackjs.py --node `which node` \
        _ZN2v87Isolate37AdjustAmountOfExternalAllocatedMemoryEl

5845 5845 5845 12888.364267112: 1 event:
 561614bf9f40 v8::internal::Builtin_ArrayBufferConstructor(int, unsigned long*, v8::internal::Isolate*)+0x120 ([node])
 5616155ecd79 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit+0x39 ([node])
 56161556e7ec Builtins_JSBuiltinsConstructStub+0xec ([node])
 561615660652 Builtins_CreateTypedArray+0x892 ([node])
 5616155de2c7 Builtins_TypedArrayConstructor+0x87 ([node])
 56161556e7ec Uint8Array ([js])
 561595706add [unknown]
 561595706bb6 foo ([js])
 561595706cb6 bar ([js])
#        ...

There are both C++ functions, like Builtins_CreateTypedArray, and Javascript functions (foo and bar).

The output is in perf-script(1) format, which lets you import it into various tools like Speedscope.

This functionality of tracing calls to a particular native function is useful, but of course it may not quite be what you want. For example, you may want to trace system calls, capture the arguments passed to the function, etc.. Since the tool is open-source, you can modify ustackjs to your liking in order to capture whatever information you need.

To my knowledge, this is the first tool of its class for Node.js/V8. Similar tools exist for other interpreted languages, like Python or Ruby, but Javascript is particularly complicated because of its JIT. And while tools like perf or gdb can be used for native stack traces, neither of those support low-overhead ways to get backtraces like eBPF can!

How it works

I adapted the algorith here from llnode, which can get the backtrace from a coredump. There is not too much to it, since V8 actually makes unwinding pretty easy. Essentially the register rbp points to the saved frame pointer (i.e., the rbp of the previous function). For Javascript frames, you can traverse some nearby objects to eventually get to the name of the function. For C++ frames, you’ll realize that it doesn’t work the Javascript unwinding fails. Then you can set rbp <- old rbp and keep unwinding, until you eventually reach some maximum limit or fail to unwind.

Here are the gory details in a picture, showing the pointers you need to traverse to get the name of a Javascript function.

┌──────────────────────────┐
│ return address           │
├──────────────────────────┤
│ saved frame pointer    ◄─┼─ rbp
├──────────────────────────┤
│ "context"                │
├──────────────────────────┤
│ JSFunction pointer   ────┼──┐
└──────────────────────────┘  │
                              │
┌──────────────────────────┐  │
│ JSFunction map         ◄─┼──┘
├──────────────────────────┤
│                          │
│          ...             │
│                          │
├──────────────────────────┤
│ SharedFunctionInfo ptr  ─┼───┐
└──────────────────────────┘   │
                               │
┌──────────────────────────┐   │
│ SharedFunctionInfo map ◄─┼───┘
├──────────────────────────┤
│                          │
│          ...             │
├──────────────────────────┤
│ name or scope info ptr  ─┼───┐
└──────────────────────────┘   │
                               │
┌──────────────────────────┐   │
│ ScopeInfo map          ◄─┼───┘
├──────────────────────────┤
│          ...             │
├──────────────────────────┤
│ context_local_count      │
├──────────────────────────┤
│  followed by             │
│  2 * context_local_count │
│  8-byte words            │
├──────────────────────────┤
│ name pointer             ┼───┐
└──────────────────────────┘   │
                               │
┌──────────────────────────┐   │
│ String map             ◄─┼───┤
├──────────────────────────┤   │
│ length of string         │   │
├──────────────────────────┤   │
│ string data: "foo"       │   │
└──────────────────────────┘   │
                               │
┌──────────────────────────┐   │
│Root map pointer        ◄─┼───┘
├──────────────────────────┤
│instance type             │
└──────────────────────────┘

Is it safe to use?

I have not formally measured the overhead of this, my current guesstimate is somewhere in the double-digit microseconds per stack trace. I recommend running with a low value of --max-depth and --max-function-name-length, and turning it up carefully until you get enough information to debug whatever you’re looking at.

What’s left to do?

This was more of a proof-of-concept, although I’ve already found it surprisingly useful. Here are some things which are missing and could be nice additions:

Precisely quantify the performance impact.
Get various constants from the binary, rather than hardcoding them. llnode uses “v8dbg_*” symbols, which lets it work across different Node.js versions.
Cache some data until a GC occurs. This can avoid pointer chasing, which is probably not great for performance.
Support other types of strings. Currently we only support “one-byte seq strings”. This means that we can’t always print out the Javascript function name – for example we’ll choke on anything which has unicode. There is a tradeoff here in that supporting more types of strings is slower.
Don’t probe 4 slots for the right string. Right now this copies a hack from llnode – it actually tries multiple offsets in ScopeInfo to find the location of the function name. We should be able to get the exact slot at the cost of some additional code complexity.
Some support for inlining. It would be nice if we could optionally detect inlining and correctly unwind it. This needs to be optional as the performance hit is likely quite high.
Support WebAssembly functions.