Node.js performance of Raspberry Pi 1 sucks

In several previous posts I have studied the performance of the Raspberry Pi (version 1) and Node.js to find out why the Raspberry Pi underperforms so badly when running Node.js.

The first two posts indicate that the Raspberry Pi underperforms about 10x compared to an x86/x64 machine, after compensation for clock frequency is made. The small cache size of the Raspberry Pi is often mentioned as a cause for its poor performance. In the third post I examine that, but it is not that horribly bad: about 3x worse performance for big memory needs compared to in-cache-situations. It appears the slow SDRAM of the RPi is more of a problem than the small cache itself.

The Benchmark Program
I wanted to relate the Node.js slowdown to some other scripted language. I decided Lua is nice. And I was lucky to find Mandelbrot implementations in several languages!

I modified the program(s) slightly, increasing the resolution from 80 to 160. I also made a version that did almost nothing (MAX_ITERATIONS=1) so I could measure and substract the startup cost (which is signifacant for Node.js) from the actual benchmark values.

The Numbers
Below are the average of three runs (minus the average of three 1-iteration rounds), in ms. The timing values were very stable over several runs.

 (ms)                           C/Hard   C/Soft  Node.js     Lua
=================================================================
 QNAP TS-109 500MHz ARMv5                 17513    49376   39520
 TP-Link Archer C20i 560MHz MIPS          45087    65510   82450
 RPi 700MHz ARMv6 (Raspbian)       493             14660   12130
 RPi 700MHz ARMv6 (OpenWrt)        490    11040    15010   31720
 RPi2 900MHz ARMv7 (OpenWrt)       400     9130      770   29390
 Eee701 900MHz Celeron x86         295               500    7992
 3000MHz Athlon II X2 x64           56                59    1267

Notes on Hard/Soft floats:

  • Raspbian is armhf, only allowing hard floats (-mfloat-abi=hard)
  • OpenWrt is armel, allowing both hard floats (-mfloat-abi=softfp) and soft floats (-mfloat-abi=soft).
  • The QNAP has no FPU and generates runtime error with hard floats
  • The other targets produce linkage errors with soft floats

The Node.js versions are slightly different, and so are the Lua versions. This makes no significant difference.

Findings
Calculating the Mandelbrot with the FPU is basically “free” (<0.5s). Everything else is waste and overhead.

The cost of soft float is about 10s on the RPI. The difference between Node.js on Raspbian and OpenWrt is quite small – either both use the FPU, or none of them does.

Now, the interesting thing is to compare the RPi with the QNAP. For the C-program with the soft floats, the QNAP is about 1.5x slower than the RPi. This matches well with earlier benchmarks I have made (see 1st and 3rd link at top of post). If the RPi would have been using soft floats in Node.js, it would have completed in about 30 seconds (based on the QNAP 50 seconds). The only thing (I can come up with) that explains the (unusually) large difference between QNAP and RPi in this test, is that the RPi actually utilizes the FPU (both Raspbian and OpenWrt).

OpenWrt and FPU
The poor Lua performance in OpenWrt is probably due to two things:

  1. OpenWrt is compiled with -Os rather than -O2
  2. OpenWrt by default uses -mfloat-abi=soft rather than -mfloat-abi=softfp (which is essentially like hard).

It is important to notice that -mfloat-abi=softfp not only makes programs much faster, but also quite much smaller (10%), which would be valuable in OpenWrt.

Different Node.js versions and builds
I have been building Node.js many times for Raspberry Pi and OpenWrt. The above soft/softfp setting for building node does not affect performance much, but it does affect binary size. Node.js v0.10 is faster on Raspberry Pi than v0.12 (which needs some patching to build).

Lua
Apart from the un-optimized OpenWrt Lua build, Lua is consistently 20-25x slower than native for RPi/x86/x64. It is not like the small cache of the RPi, or some other limitation of the CPU, makes it worse for interpreted languages than x86/x64.

RPi ARMv6 VFPv2
While perhaps not the best FPU in the world, the VFPv2 floating point unit of the RPi ARMv6 delivers quite decent performance (slightly worse per clock cycle) compared to x86 and x64. It does not seem like the VFPv2 is to be blamed for the poor performance of Node.js on ARM.

Conclusion and Key finding
While Node.js (V8) for x86/x64 is near-native-speed, on the ARM it is rather near-Lua-speed: just another interpreted language, mostly. This does not seem to be caused by any limitation or flaw in the (RPi) ARM cpu, but rather the V8 implementation for x86/x64 being superior to that for ARM (ARMv6 at least).

  1. Building Node.js for OpenWrt (mipsel) | TechFindings - pingback on 2015/10/11 at 19:51
  2. William Simetin-Grenon

    What about RPI 3 ? I’ve been running Node on Pi 1 and it’s seriously painful (the starting delay when you develop…) and I was thinking to buy a new RPI 3.

  3. I have not tried an RPi3. It is 1200MHz instead of 900MHz, I would expect that to reflect the performance difference. The RPi2 on the other hand is a magnitude faster than RPi1, not because of 900Mhz vs 700MHz, but because of other architectural details.

    When it comes to starting times of node, I think this “snapshot” options matters. When (if) you build Nodejs you can choose to include snapshot or not. If you do, it will speedup the startup of nodejs itself. To build with snapshot, I think you need not to use a cross compiler, so I think forget about it for OpenWRT (if that is what you use).

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Time limit is exhausted. Please reload CAPTCHA.

Trackbacks and Pingbacks: