Tag Archives: Node.js

Node.js 6 on OpenWrt

I have managed to produce a working Node.js 6 binary for OpenWrt and RPi (brcm2708/brcm2709).

Binaries

15.05.1: brcm2708 6.9.5
15.05.1: brcm2709 6.9.5
15.05.1: mvebu 6.9.5 Please test (on WRT1x00AC router) and get back to me with feedback

Note: all the binaries work with equal performance on RPi v2 (brcm2709). For practical purposes the brcm2708 may be the only binary needed.

How to build 6.9.5 brcm2708/brcm2709
The procudure is:

  1. Set PATH and STAGING_DIR
  2. Set a few compiler flags and run configure with not so few options
  3. Fix nearbyint/nearbyintf
  4. Fix config.gypi
  5. make

1. I have a little script to set my toolchain variables.

# file:  env-15.05.1-brcm2709.sh
# usage: $ source ./env-15.05.1-brcm2709.sh

PATH=/path/to/staging_dir/bin:$PATH
export PATH

STAGING_DIR=/path/to/staging_dir
export STAGING_DIR

Your path should now contain arm-openwrt-linux-uclibcgnueabi-g++ and other binaries.

2. (brcm2709 / mvebu) I have another script to run configure:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm --without-ssl --without-intl --without-inspector

bash --norc

Please not that this script was the first one that worked. It may not be the best. Some things may not be needed. –without-intl and –without-inspector helped me avoid build errors. If you need those features you have more work to do.

2. (brcm2708)

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm --without-ssl --without-intl --without-inspector

bash --norc

3. Use “grep -nR nearbyint” to find and replace:

  nearbyint => round
  nearbyintf => roundf

This may not be a good idea! However, nearbyint(f) is not supported in OpenWrt, and with the above replacements Node.js builds and it passes the octane benchmark – so it is not that broken. I suppose there is correct way to replace nearbyint(f).

4. Add to config.gypi:

{ 'target_defaults': {
    'cflags': [ '-D__STDC_LIMIT_MACROS' ,'-D__STDC_CONSTANT_MACROS'],
    'ldflags': [ '-Wl,-rpath,/path/to/staging_dir/lib/' ]},

These are just compilation error workarounds.

This works for me.

Dependencies
You need to install dependencies in OpenWrt:

# opkg update
# opkg install librt
# opkg install libstdcpp

Performance
My initial tests indicate that Node.js v6 is a little (~2%) slower than Node.js 4 on ARM v7 (RPi v2).

Other targets
mvebu: I will build a binary, but I need help to test
x86/x86_64: This shall be easy, but I see little need/use. Let me know if you want a binary.
mpc85xx: The chip is quite capable, but the PowerPC port of Node.js will most likely never support it.

Most MIPS architectures lack FPU and are truly unsuitable for running Node.js.

std::snprintf
It seems the OpenWrt C++ std library does not support std::snprintf. You can replace it with just snprintf and add #include <stdio.h> in the file:
deps/v8_inspector/third_party/v8_inspector/platform/inspector_protocol/String16_cpp.template
However, this is not needed when –without-inspector is applied.

Node.js 7
I have failed building Node.js 7 before. But perhaps I will give it a try sometime that Node.js 6 is working.

Older versions of Node.js
I have previously built and distributed Node.js 4 for OpenWrt.

Node.js 4 on OpenWrt

Update 2017-02-27: I have built Node.js 6 for OpenWRT.
Update 2017-02-20: I migrated the files from DropBox since their public shares will stop working.
Update 2017-02-20: Updated binaries for OpenWRT 15.05.1 and Node.js 4.7.3.

Node.js is merged with io.js, and after Node.js 0.12.7 came version 4.0.0.

Well, the good news is that V8 seems to be competely and officially supported on Raspberry Pi (ARMv6+VFPv2) again (it has been a little in and out).

I intend to build and benchmark Node.js for different possible (and impossible) OpenWRT targets, and share a few binaries.

Binaries

Target Binaries Comments
14.07: brcm2708 4.0.0
15.05: x86 4.1.0
15.05: brcm2708 4.1.0 also works for brcm2709 Raspberry Pi 2
15.05: brcm2709 4.1.2
15.05: mvebu 4.1.2 Not Tested! Please test, run octane-benchmark, and let me know!
15.05: ramips/mt7620 0.10.40
r47168: ramips/mt7620 4.1.2 requires kernel FPU emulation (get custom built r47168)
15.05.1: brcm2708 4.4.5
4.7.3
15.05.1: brcm2709 4.4.5
4.7.3
15.05.1: mvebu 4.7.3 Not Tested! Please test, run octane-benchmark, and let me know!

You need to install dependencies:

# opkg update
# opkg install librt
# opkg install libstdcpp

Benchmarks
Octane (1.0.0) Benchmark:

Target        System             CPU        Score      Time
brcm2708      Raspberry Pi v1    700Mhz       97.1     2496s
brcm2708      Raspberry Pi v2    900Mhz     1325        198s
brcm2709      Raspberry Pi v2    900Mhz     1298        198s
x86           Eee701             900Mhz     2559        118s
mt7620        Archer C20i        ( 64 MB RAM not enough )

Performance has been very consistent through different versions of OpenWRT and Node.js.

Building x86
With the 15.05 toolchain, this script configured Node.js 4.1.0

#!/bin/sh -e

export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="i486-openwrt-linux-uclibc-gcc"
export CXX="i486-openwrt-linux-uclibc-g++"
export LD="i486-openwrt-linux-uclibc-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=x86 --dest-os=linux --without-npm

bash --norc

Then just run make, and wait.

Building brcm2708 (Raspberry Pi v1)
I configured
– Node.js 4.0.0 with 14.07 toolchain,
– Node.js 4.1.0 with 15.05 toolchain,
– Node.js 4.4.5 with 15.05.1 toolchain
with the following script:

#!/bin/sh -e

export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Then just run make, and wait.

Building brcm2709 (Raspberry Pi v2)
I configured Node.js 4.1.2 with 15.05 toolchain and 4.4.5 with 15.05.1 toolchain with the following script:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Building ramips/mt7620 (Archer C20i)
For Ramips mt7620, Node.js 0.10.40 runs on standard 15.05 and I have posted build instructions for 0.10.38/40 before.

For Node.js 4, you need kernel FPU emulation (which is normally disabled in OpenWRT). The following script configures Node.js 4 for trunk (r47168, to be DD).

#!/bin/sh -e

export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="mipsel-openwrt-linux-musl-gcc"
export CXX="mipsel-openwrt-linux-musl-g++"
export LD="mipsel-openwrt-linux-musl-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=mipsel --dest-os=linux --without-npm --with-mips-float-abi=soft
bash --norc

Without FPU emulation you will get ‘Illegal Instruction’ and Node.js will not run.

ar71xx (TP-Link WDR3600)
Without a custom built FPU-emulator-enabled kernel, a WDR3600 gives:

root@wdr3600-1505-std:/tmp# ./node 
Illegal instruction

However, with FPU enabled:

root@wdr3600-1505-fpu:/tmp# ./node 
undefined:1



SyntaxError: Unexpected end of input
    at Object.parse (native)
    at Function.startup.processConfig (node.js:265:27)
    at startup (node.js:33:13)
    at node.js:963:3

Same result for 4.1.2 and 4.2.2. That is as far as I have got with ar71xx at the moment (20151115).

Building Node.js for OpenWrt (mipsel)

Update 2015-10-11:See separate post for Node version 4 for different OpenWrt targets. Information about v4 added below.

I managed to build (and run) Node.js OpenWrt and my Archer C20i with a MIPS 24K Little Endian CPU, without FPU (target=ramips/mt7620).

Node.js v0.10.40
First edit (set to false):

deps/v8/build/common.gypi

    54      # Similar to vfp but on MIPS.
    55      'v8_can_use_fpu_instructions%': 'false',
   
    63      # Similar to the ARM hard float ABI but on MIPS.
    64      'v8_use_mips_abi_hardfloat%': 'false',

For 15.05 I use this script to run configure:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="mipsel-openwrt-linux-uclibc-gcc"
export CXX="mipsel-openwrt-linux-uclibc-g++"
export LD="mipsel-openwrt-linux-uclibc-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=mipsel --dest-os=linux --without-npm

bash --norc

Then just “make”. I have uploaded the compiled binary node to DropBox.

Compilation for (DD) trunk (with musl rather than uclibc) fails for v0.10.40.

Node.js v4.1.2
Node.js v4 does not run without a FPU. Normally Linux emulates an FPU if it is not present, but this feature is disabled in OpenWRT. I built and published r47168 with FPU emulation and Node v4.1.2.

Node.js 4.1.2 is configured like:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="mipsel-openwrt-linux-musl-gcc"
export CXX="mipsel-openwrt-linux-musl-g++"
export LD="mipsel-openwrt-linux-musl-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=mipsel --dest-os=linux --without-npm --with-mips-float-abi=soft

bash --norc

Dependencies
In order to run the node binary on OpenWrt you need to install:

# opkg update
# opkg install librt
# opkg install libstdcpp

Performance
The 64MB or RAM of my Archer C20i is not sufficient to run octane-benchmark (even if the node binary and the benchmark are stored on a USB drive). However, I have a Mandelbrot benchmark that I can run. For Archer C20i, timings are:

C/Soft Float                     48s
Lua                              82s
Node.js v0.10.40 (soft float)    65s
Node.js v4.1.2 (FPU emulation)  444s (63s user, 381s kernel)

Clearly, the OpenWrt developers have a good reason to leave FPU emulation out. However, for Node.js in the future, FPU emulation seems to be the only way. My Mandelbrot benchmark is of course ridiculously dependent on FPU performance. For more normal usage, perhaps the penalty is less significant.

Other MIPS?
The only other MIPS I have had the opportunity to try was my WDR3600, a Big Endian 74K. It does not work:

  • v0.10.38 does not build at all (big endian MIPS seems unsupported)
  • v0.12.* builds, but it does not run (floating point exceptions), despite I managed to build for Soft Float.

I need to try rebuilding OpenWRT with FPU emulation for ar71xx, then perhaps Node.js v4 will work.

Node.js performance of Raspberry Pi 1 sucks

In several previous posts I have studied the performance of the Raspberry Pi (version 1) and Node.js to find out why the Raspberry Pi underperforms so badly when running Node.js.

The first two posts indicate that the Raspberry Pi underperforms about 10x compared to an x86/x64 machine, after compensation for clock frequency is made. The small cache size of the Raspberry Pi is often mentioned as a cause for its poor performance. In the third post I examine that, but it is not that horribly bad: about 3x worse performance for big memory needs compared to in-cache-situations. It appears the slow SDRAM of the RPi is more of a problem than the small cache itself.

The Benchmark Program
I wanted to relate the Node.js slowdown to some other scripted language. I decided Lua is nice. And I was lucky to find Mandelbrot implementations in several languages!

I modified the program(s) slightly, increasing the resolution from 80 to 160. I also made a version that did almost nothing (MAX_ITERATIONS=1) so I could measure and substract the startup cost (which is signifacant for Node.js) from the actual benchmark values.

The Numbers
Below are the average of three runs (minus the average of three 1-iteration rounds), in ms. The timing values were very stable over several runs.

 (ms)                           C/Hard   C/Soft  Node.js     Lua
=================================================================
 QNAP TS-109 500MHz ARMv5                 17513    49376   39520
 TP-Link Archer C20i 560MHz MIPS          45087    65510   82450
 RPi 700MHz ARMv6 (Raspbian)       493             14660   12130
 RPi 700MHz ARMv6 (OpenWrt)        490    11040    15010   31720
 RPi2 900MHz ARMv7 (OpenWrt)       400     9130      770   29390
 Eee701 900MHz Celeron x86         295               500    7992
 3000MHz Athlon II X2 x64           56                59    1267

Notes on Hard/Soft floats:

  • Raspbian is armhf, only allowing hard floats (-mfloat-abi=hard)
  • OpenWrt is armel, allowing both hard floats (-mfloat-abi=softfp) and soft floats (-mfloat-abi=soft).
  • The QNAP has no FPU and generates runtime error with hard floats
  • The other targets produce linkage errors with soft floats

The Node.js versions are slightly different, and so are the Lua versions. This makes no significant difference.

Findings
Calculating the Mandelbrot with the FPU is basically “free” (<0.5s). Everything else is waste and overhead.

The cost of soft float is about 10s on the RPI. The difference between Node.js on Raspbian and OpenWrt is quite small – either both use the FPU, or none of them does.

Now, the interesting thing is to compare the RPi with the QNAP. For the C-program with the soft floats, the QNAP is about 1.5x slower than the RPi. This matches well with earlier benchmarks I have made (see 1st and 3rd link at top of post). If the RPi would have been using soft floats in Node.js, it would have completed in about 30 seconds (based on the QNAP 50 seconds). The only thing (I can come up with) that explains the (unusually) large difference between QNAP and RPi in this test, is that the RPi actually utilizes the FPU (both Raspbian and OpenWrt).

OpenWrt and FPU
The poor Lua performance in OpenWrt is probably due to two things:

  1. OpenWrt is compiled with -Os rather than -O2
  2. OpenWrt by default uses -mfloat-abi=soft rather than -mfloat-abi=softfp (which is essentially like hard).

It is important to notice that -mfloat-abi=softfp not only makes programs much faster, but also quite much smaller (10%), which would be valuable in OpenWrt.

Different Node.js versions and builds
I have been building Node.js many times for Raspberry Pi and OpenWrt. The above soft/softfp setting for building node does not affect performance much, but it does affect binary size. Node.js v0.10 is faster on Raspberry Pi than v0.12 (which needs some patching to build).

Lua
Apart from the un-optimized OpenWrt Lua build, Lua is consistently 20-25x slower than native for RPi/x86/x64. It is not like the small cache of the RPi, or some other limitation of the CPU, makes it worse for interpreted languages than x86/x64.

RPi ARMv6 VFPv2
While perhaps not the best FPU in the world, the VFPv2 floating point unit of the RPi ARMv6 delivers quite decent performance (slightly worse per clock cycle) compared to x86 and x64. It does not seem like the VFPv2 is to be blamed for the poor performance of Node.js on ARM.

Conclusion and Key finding
While Node.js (V8) for x86/x64 is near-native-speed, on the ARM it is rather near-Lua-speed: just another interpreted language, mostly. This does not seem to be caused by any limitation or flaw in the (RPi) ARM cpu, but rather the V8 implementation for x86/x64 being superior to that for ARM (ARMv6 at least).

JavaScript: switch options

Is the nicest solution also the fastest?

Here is a little thing I ran into that I found interesting enough to test it. In JavaScript, you get a parameter (from a user, perhaps a web service), and depending on the parameter value you will call a particular function.

The first solution that comes to my mind is a switch:

function test_switch(code) {
  switch ( code ) {
  case 'Alfa':
    call_alfa();
    break;
  ...
  case 'Mike':
    call_mike();
    break;
  }
  call_default();
}

That is good if you know all the labels when you write the code. A more compact solution that allows you to dynamically add functions is to let the functions just be properties of an object:

x1 = {
  Alfa:call_alfa,
  Bravo:call_bravo,
  Charlie:call_charlie,
...
  Mike:call_mike
};

function test_prop(code) {
  var f = x1[code];
  if ( f ) f();
  else call_default();
}

And as a variant – not really making sense in this simple example but anyway – you could loop over the properties (functions) until you find the right one:

function test_prop_loop(code) {
  var p;
  for ( p in x1 ) {
    if ( p === code ) {
      x1[p]();
      return;
    }
  }
  call_default();
}

And, since we are into loops, this construction does not make so much sense in this simple example, but anyway:

x2 = [
  { code:'Alfa'     ,func:call_alfa    },
  { code:'Bravo'    ,func:call_bravo   },
  { code:'Charlie'  ,func:call_charlie },
...
  { code:'Mike'     ,func:call_mike    }
];

function test_array_loop(code) {
  var i, o;
  for ( i=0 ; i<x2.length ; i++ ) {
    o = x2[i];
    if ( o.code === code ) {
      o.func();
      return;
    }
  }
  call_default();
}

Alfa, Bravo…, Mike and default
I created exactly 13 options, and labeled them Alfa, Bravo, … Mike. And all the test functions accept invalid code and falls back to a default function.

The loops should clearly be worse for more options. However it is not obvious what the cost is for more options in the switch case.

I will make three test runs: 5 options (Alfa to Echo), 13 options (Alfa to Mike) and 14 options (Alfa to November) where the last one ends up in default. For each run, each of the 5/13/14 options will be equally frequent.

Benchmark Results
I am benchmarking using Node.js 0.12.2 on a Raspberry Pi 1. The startup time for Nodejs is 2.35 seconds, and I have reduced that from all benchmark times. I also ran the benchmarks on a MacBook Air with nodejs 0.10.35. All benchmarks were repeated three times and the median has been used. Iteration count: 1000000.

(ms)       ======== RPi ========     ==== MacBook Air ====
              5      13      14         5      13      14
============================================================
switch     1650    1890    1930        21      28      30
prop       2240    2330    2890        22      23      37
proploop   2740    3300    3490        31      37      38
loop       2740    4740    4750        23      34      36

Conclusions
Well, most notable (and again), the RPi ARMv6 is not fast running Node.js!

Using the simple property construction seems to make sense from a performance perspective, although the good old switch also fast. The loops have no advantages. Also, the penalty for the default case is quite heavy for the simple property case; if you know the “code” is valid the property scales very nicely.

It is however a little interesting that on the ARM the loop over properties is better than the loop over integers. On the x64 it is the other way around.

Variants of Simple Property Case
The following are essentially equally fast:

function test_prop(code) {
  var f = x1[code];   
  if ( f ) f();       
  else call_x();                        
}   

function test_prop(code) {
  var f = x1[code];   
  if ( 'function' === typeof f ) f();
  else call_x();                        
}   

function test_prop(code) {
  x1[code]();                          
}   

So, it does not cost much to have a safety test and a default case (just in case), but it is expensive to use it. This one, however:

function test_prop(code) {
  try {
    x1[code]();
  } catch(e) {
    call_x();
  }
}

comes at a cost of 5ms on the MacBook, when the catch is never used. If the catch is used (1 out of 14) the run takes a full second instead of 37ms!

Node.js Benchmark on Raspberry Pi (v1)

I have experimented a bit with Node.js and Raspberry Pi lately, and I have found the performance… surprisingly bad. So I decided to run some standard tests: benchmark-octane (v9).

Octane is essentially run like:

$ npm install benchmark-octane
$ cd node_modules/benchmark-octane
$ node run.js

The distilled result of Octane is a total run time and a score. Here are a few results:

                         OS             Node.js                   Time    Score
QNAP TS-109 500MHz       Debian        v0.10.29 (Debian)         3350s      N/A
Raspberry Pi v1 700MHz   OpenWrt BB    v0.10.35 (self built)     2267s      140
Raspberry Pi v1 700MHz   Raspbian       v0.6.19 (Raspbian)       2083s      N/A
Raspberry Pi v1 700MHz   Raspbian       v0.12.2 (self built)     2176s      104
Eee701 Celeron 900Mhz    Xubuntu       v0.10.25 (Ubuntu)          171s     1655
Athlon II X2@3Hz         Xubuntu       v0.10.25 (Ubuntu)           49s     9475
MacBook Air i5@1.4Ghz    Mac OS X      v0.10.35 (pkgsrc)           47s    10896
HP 2560p i7@2.7Ghz       Xubuntu       v0.10.25 (Ubuntu)           41s    15450

Score N/A means that one test failed and there was no final score.

When I first saw the RPi performance I thought I had done something wrong building (using a cross compiler) Node.js myself for RPi and OpenWRT. However Node.js with Raspbian is basically not faster, and also RPi ARMv6 with FPU is not much faster than the QNAP ARMv5 without FPU.

I think the Eee701 serves as a good baseline here. At first glance, possible reasons for the RPi underperformance relative to the Celeron are:

  • Smaller cache (16kb of L1 cache and L2 only available to GPU, i Read) compared to Celeron (512k)
  • Bad or not well utilised FPU (but there at least is one on the RPi)
  • Node.js (V8) less optimized for ARM

I found that I have benchmarked those to CPUs against each other before. That time the Celeron was twice as fast as the RPi, and the FPU of the RPi performed decently. Blaming the small cache makes more sense to me, than blaming the people who implemented ARM support in V8.

The conclusion is that Raspberry Pi (v1 at least) is extremely slow running Node.js. Other benchmarks indicate that RPi v2 is significantly faster.

Raspberry Pi (v1), OpenWrt (14.07) and Node.js (v0.10.35 & v0.12.2)

Since I gave up running NetBSD on my Raspberry pi I decided it was time to try OpenWrt. And, to my surprise I also managed to cross compile Node.js!

Install OpenWrt on Raspberry Pi (v1@700MHz)
I installed OpenWrt Barrier Breaker (the currently stable release) using the standard instructions.

After you have put the image on an SD-card with dd, it is quite easy to resize the root partition:

  1. copy the second partition to an image file using dd
  2. use fdisk to delete the second partition, and create a new, bigger
  3. format the new partition with mkfs.ext4
  4. mount the image file using mount -o loop
  5. mount the new second partition
  6. copy all data from image file to second partition using cp -a

If you want to, you can edit /etc/config/network while you are anyway working with the OpenWrt root partition:

#config interface 'lan'
#	option ifname 'eth0'
#	option type 'bridge'
#	option proto 'static'
#	option ipaddr '192.168.1.1'
#	option netmask '255.255.255.0'
#	option ip6assign '60'
#	option gateway '?.?.?.?'
#	option dns '?.?.?.?'
config interface 'lan'
	option ifname 'eth0'
	option proto 'dhcp'
	option macaddr 'XX:XX:XX:XX:XX:XX'
	option hostname 'rpiopenwrt'

Probably you want to disable dnsmasq, odhcpd and firewall too:

.../etc/init.d/$ chmod -x dnsmasq firewall odhcpd

OR (depending on your idea of what is the right way)

.../etc/rc.d$ sudo rm S60dnsmasq S35odhcpd K85odhcpd S19firewall

Also, it is a good idea to edit config.txt (on the DOS partition):

gpu_mem=1

I don’t know if 1 is really a legal value, but it worked for me, and I had much more memory available than when gpu_mem was not set.

Node.js4 added 2015-10-03
For Node.js, check Node.js 4 builds.

Building Node.js v0.12.2
I downloaded and built Node.js v0.12.2 on a Xubuntu machine with an x64 cpu. On such a machine you can download the standard OpenWrt toolchain for Raspberry Pi.

I replaced configure and cpu.cc in the standard sources with the files from This Page (they are meant for v0.12.1 but they work equally good for v0.12.2).

I then found an a gist that gave me a good start. I modified it, and ended up with:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB

export TARGET_ARCH="-march=armv6j"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6j"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6j"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Run this script in the Node.js source directory. If everything goes fine it configures the Node.js build, and leaves you with a shell where you can simply run:

$ make

If compilation is fine, you find the node binary in the out/Release folder. Copy it to your OpenWrt Raspberry Pi.

Building Node.js v0.10.35
I first successfully built Node.js v0.10.35.

The (less refined) script for configuring that I used was:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB
export GYP_DEFINES="armv7=0"

#Define our target device
export TARGET_ARCH="-march=armv6"
export TARGET_TUNE="-mfloat-abi=hard"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2 -ggdb3"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux
bash --norc

Running node on the Raspberry Pi
Back on the Raspberry Pi you need to install a few packages:

# ldd ./node 
	libdl.so.0 => /lib/libdl.so.0 (0xb6f60000)
	librt.so.0 => not found
	libstdc++.so.6 => not found
	libm.so.0 => /lib/libm.so.0 (0xb6f48000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6f34000)
	libpthread.so.0 => not found
	libc.so.0 => /lib/libc.so.0 (0xb6edf000)
	ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0xb6f6c000)
# opkg update
# opkg install librt
# opkg install libstdcpp

That is all! Now you should be ready to run node. The node binary is about 13Mb (the v0.10.35 was 19Mb perhaps becuase of -ggdb3), so it is not optimal to deploy it to other typical OpenWrt hardware.

Final comments
I ran a few small programs to test, and they were fine. I guess some more testing would be appropriate. The performance is very comparable to Node.js built and executed on Raspbian.

I think RaspberryPi+OpenWrt+Node.js is a very interesting and competitive combination for microservices!

Faking a good goto in JavaScript

There are cases where gotos are good (most possible uses of gotos are not good). I needed to write JavaScript functions (for running in NodeJS) where I wanted to call the callback function just once in the end (to make things as clear as possible). In C that would be (this is a simplified example):

void withgoto(int x, void(*callback)(int) ) {
  int r;

  if ( (r = test1(x)) )
    goto done;

  if ( (r = test2(x)) )
    goto done;
 
  if ( (r = test3(x)) )
    goto done;
 
  r = 0;
done:
  (*callback)(r);
}

I think that looks nice! I mean the way goto controls the flow, not the syntax for function pointers.

JavaScript: multiple callbacks
The most obvious way to me to write this in JavaScript was:

var with4callbacks = function(x, callback) {
  var r

  if ( r = test1(x) ) {
    callback(r)
    return
  }

  if ( r = test2(x) ) {
    callback(r)
    return
  }

  if ( r = test3(x) ) {
    callback(r)
    return
  }
  r = 0
  callback(r)
}

This works perfectly, of course. But it is not nice with callback in several places. It is annoying (bloated) to always write return after callback. And in other cases it can be a little unclear if callback is called zero times, or more than one time… which is basically catastrophic. What options are there?

JavaScript: abusing exceptions
My first idea was to abuse the throw/catch-construction:

var withexceptions = function(x, callback) {
  var r

  try {
    if ( r = test1(x) )
      throw null

    if ( r = test2(x) )
      throw null

    if ( r = test3(x) )
      throw null

    r = 0
  } catch(e) {
  }
  callback(r)
}

This works just perfectly. In a more real world case you would probably put some code in the catch block. Is it good style? Maybe not.

JavaScript: an internal function
With an internal (is it called so?) function, a return does the job:

var withinternalfunc = function(x, callback) {
  var r
  var f

  f = function() {
    if ( r = test1(x) )
      return

    if ( r = test2(x) )
      return

    if ( r = test3(x) )
      return

    r = 0
  }
  f()
  callback(r)
}

Well, this looks like JavaScript, but it is not super clear.

JavaScript: an external function
You can also do with an external function (risking that you need to pass plenty of parameters to it, but in my simple example that is not an issue):

var externalfunc = function(x) {
  var r
  if ( r = test1(x) )
    return r

  if ( r = test2(x) )
    return r

  if ( r = test3(x) )
    return r

  return 0
}

var withexternalfunc = function(x, callback) {
  callback(externalfunc(x))
}

Do you think the readability is improved compared to the goto code? I don’t think so.

JavaScript: Break out of Block
Finally (and I got help coming up with this one), it is possible to do:

var withbreakblock = function(x, callback) {
  var r
  var f

myblock:
  {
    if ( r = test1(x) )
      break myblock

    if ( r = test2(x) )
      break myblock

    if ( r = test3(x) )
      break myblock

    r = 0
  }
  callback(r)
}

Well, that is at close to the goto construction I come with JavaScript. Pretty nice!

JavaScript: Multiple if(done)
Using a done-variable and multiple if statements is also possible:

var with3ifs = function(x, callback) {
  var r
  var done = false

  if ( r = test1(x) )
    done = true

  if ( !done ) {
    if ( r = test2(x) )
      done = true
  }

  if ( !done ) {
    if ( r = test3(x) )
      done = true
  }

  if ( !done ) {
    r = 0
  } 
  callback(r)
}

Hardly pretty, I think. The longer the code gets (the more sequential ifs there is), the higher the penalty for the ifs will be.

Performance
Which one I choose may depend on performance, if the difference is big. They should all be fast, but:

  • It is quite unclear what the cost of throwing (an exception) is
  • The internal function, is it recompiled and what is the cost?

I measured performance as (millions of) calls to the function per second. The test functions are rather cheap, and x is an integer in this case.

I did three test runs:

  1. The fall through case (r=0) is relatively rare (~8%)
  2. The fall through case is very common (~92%)
  3. The fall through case is extremely common (>99.99%)

In real applications fallthrough rate may be the most common case, with no error input data found. The benchmark environment is:

  • Mac Book Air Core i5@1.4GHz
  • C Compiler: Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
  • C Flags: -O2
  • Node version: v0.10.35 (installed from pkgsrc.org, x86_64 version)

Performance was rather consistent over several runs (for 1000 000 calls each):

Fallthrough Rate           ~8%      ~92      >99.99%      
---------------------------------------------------------
     C: withgoto           66.7     76.9     83.3  Mops
NodeJS: with4callbacks     14.9     14.7     16.4  Mops
NodeJS: with exceptions     3.67     8.77    10.3  Mops
NodeJS: withinternalfunc    8.33     8.54     9.09 Mops
NodeJS: withexternalfunc   14.5     14.9     15.6  Mops
NodeJS: withbreakblock     14.9     15.4     17.5  Mops
NodeJS: with3ifs           15.2     15.6     16.9  Mops

The C code was row-by-row translated into the JavaScript code. The performance difference is between C/Clang and NodeJS, not thanks to the goto construction itself of course.

On Recursion
In JavaScript it is quite natural to do recursion when you deal with callbacks. So I decided to run the same benchmarks using recursion instead of a loop. Each recursion step involves three called ( function()->callback()->next()-> ). With this setup the maximum recursion depth was about 3×5300 (perhaps close to 16535?). That may sound much, but not enough to produce any benchmarks. Do I need to mention that C delivered 1000 000 recursive calls at exactly the same performance as the loop?

Conclusion
For real code 3.7 millions exceptions per second sounds pretty far fetched. Unless you are in a tight loop (which you probably are not, when you deal with callbacks), all solutions will perform well. However, the break out of a block is clearly the most elegant way and also the most efficient, second only to the real goto of course. I suspect the generally higher performance in the (very) high fallthrough case is because branch prediction gets more successful.

Any better ideas?

Nodejs v0.12.0 on (unsupported) PowerPC G4

Nodejs can not be built for a G4 processor (PowerPC 7455, as found in pre-Intel Apple hardware) because of a few missing CPU instructions. IBM has made a Power/PowerPC-port of V8 (the JavaScript engine of Nodejs), but it does not work with the G4.

However, there is a quite simple workaround that can probably work for other unsupported platforms (PowerPC G3) as well, but ARMv5 failed.

This solution is to emulate a supported (i386) CPU using Qemu. Qemu is capable of emulating an entire computer (qemu-system-i386) or just emulate for a single program/process (qemu-i386). That is what I do.

I am running Debian 7 on my G4 computer, which comes with an old version of Qemu. It is old enough to not support the system call ‘futex’ (system call 240). My suggestion is to simply use debian backports to install a much more recent version of qemu.

# Add to /etc/apt/sources.list
deb http://http.debian.net/debian wheezy-backports main

# Then run
$ sudo apt-get update
$ sudo apt-get -t wheezy-backports install qemu-user

Now you can use the command qemu-i386 to run i386 binaries. Download the i386 binary linux version of nodejs and extract it somewhere. I extracted mine in /opt and made a symlink to /opt/node for convenience. Now:

zo0ok@sleipnir:~$ qemu-i386 /opt/node/bin/node 
/lib/ld-linux.so.2: No such file or directory

Unless you want to build your own statically linked nodejs binary, you need to get a few libraries from an i386 linux machine. I put these in /opt/node/bin/lib:

zo0ok@sleipnir:/opt/node/bin/lib$ ls -l
total 3320
-rw-r--r-- 1 zo0ok zo0ok  134380 mar  3 21:02 ld-linux.so.2
-rw-r--r-- 1 zo0ok zo0ok 1754876 mar  3 21:13 libc.so.6
-rw-r--r-- 1 zo0ok zo0ok   13856 mar  3 21:06 libdl.so.2
-rw-r--r-- 1 zo0ok zo0ok  113588 mar  3 21:12 libgcc_s.so.1
-rw-r--r-- 1 zo0ok zo0ok  280108 mar  3 21:11 libm.so.6
-rw-r--r-- 1 zo0ok zo0ok  134614 mar  3 21:12 libpthread.so.0
-rw-r--r-- 1 zo0ok zo0ok   30696 mar  3 21:05 librt.so.1
-rw-r--r-- 1 zo0ok zo0ok  922096 mar  3 21:08 libstdc++.so.6

For your convenience, I packed them for you:
https://dl.dropboxusercontent.com/u/9061436/code/linux-i386-lib.tgz
These are from Xubuntu 14.04.1 i386. The original symlinks are eliminated and the files come from different lib-folders. I packed exactly what you need to run the precompiled node-v0.12.0 binary.

Now you should be able to actually run nodejs:

$ zo0ok@sleipnir:~$ qemu-i386 -L /opt/node/bin/ /opt/node/bin/node --version
v0.12.0

To make it 100% convenient I created /usr/local/bin/nodejs:

zo0ok@sleipnir:~$ cat /usr/local/bin/nodejs 
#!/bin/sh
qemu-i386 -L /opt/node/bin /opt/node/bin/node "$@"

Dont forget to make it executable (chmod +x).

Performance is not amazing, but good enough for my purposes. It takes a few seconds to start nodejs, but when running it seems quite fast. I may post benchmarks in the future.

Nodejs v0.12.0 on Debian ARMv5/QNAP

I have written before about building NodeJS for ARMv5 (a QNAP TS-109 running Debian). Since nodejs 0.12.0 just came out, of course I wanted to build this version – but that did not go so well.

Just standard ./configure and make gave me this error after a while.

In file included from ../deps/v8/src/base/atomicops.h:146:0,
                 from ../deps/v8/src/base/once.h:55,
                 from ../deps/v8/src/base/lazy-instance.h:72,
                 from ../deps/v8/src/base/platform/mutex.h:8,
                 from ../deps/v8/src/base/platform/platform.h:29,
                 from ../deps/v8/src/assert-scope.h:9,
                 from ../deps/v8/src/v8.h:33,
                 from ../deps/v8/src/accessors.cc:5:
../deps/v8/src/base/atomicops_internals_arm_gcc.h:258:4: error: #error "Your CPU's ARM architecture is not supported yet"

This was quite expected though, since earlier versions (v0.10.25 was the last I built) did not build that easily. So I forced armv5t-architecture and tried again:

export CFLAGS='-march=armv5t'
export CXXFLAGS='-march=armv5t'
make
...
Segmentation fault
make[1]: *** [/home/kvaser/nodejs/node-v0.12.0/out/Release/obj.target/v8_snapshot/geni/snapshot.cc] Error 139
make[1]: Leaving directory `/home/kvaser/ndejs/node-v0.12.0/out'
make: *** [node] Error 2

It took almost 7 hours to get here. I stopped compiling and started reading instead.
It seems:

  • V8 is not supported on ARMv5 anymore (last supported version was 3.17.6 I think)
  • Building V8 as a shared library is not very easy
  • Even if I manage to build 3.17.6 as a shared library, there is no guarantee
    it would work with nodejs v0.12.0
  • Just replacing the v8 directory of v0.12.0 with an older version of v8 and hope everything just builds and runs perfectly seems… unlikely (but I have not tried and failed, yet)
  • The Raspberry Pi, with its ARMv6 CPU, is supposed to work with v0.12.0, but a little hack is required at this time (RPi 2 with ARMv7 seems safe)

The good thing is that nodejs (v0.10.29) can be installed in Debian 7 (wheezy) using backports. This is a rather nice and consistent way to install software not already in Debian Stable.

It is, after all, not strange that V8 is not maintained for an architecture that has not FPU. JavaScript uses 64-bit floats for all numbers, including integers.

Qemu Failed too
I tried running nodejs in Qemu (which works for a PowerPC G4), but this failed:

kvaser@kvaser:/opt/node-v0.12.0-linux-x86/bin$ qemu-i386 -L . ./node 
./node: error while loading shared libraries: rt.so.1: ncanoot  penrshaoed cbjeit f: No such file or directory

This is the actual result – not a copy-paste-mistake… so I believe something (byte order?) is seriously wrong.