Category Archives: Raspberry Pi

Raspberry PI performance and freezes

On a daily basis I use a Raspberry Pi v2 (4x900MHz) with Raspian as a work station and web server. It is connected to a big display, I edit multiple files and it runs multiple Node.js instances. These Node.js processes serve HTTP and access (both read and write) local files.

I experienced regular freezes. Things that could take 2-3 seconds were listing files in a directory, opening a file, saving a file and so on.

I moved my working directory from my (high performance) SD-card to a regular spinning USB hard drive. That completely solved the problem. I experience zero freezes now, compared to plenty before.

My usual experience with Linux is that the block caching layer is highly effective: things get synced to disk when there is time to do so. I dont know if Linux handles SD-cards fundamentally different from other hard drives (syncing more often) or if the SD card (or the Raspberry Pi SD card hardware) is just slower.

So, for making real use of a Raspberry Pi I would clearly recommend a harddrive.

Raspberry Pi Server

The Raspberry Pi has been around for some years now and it has been used in unbelievable projects. As a budget desktop computer it has not quite had the required performance (although v2 and v3 are much improving the situation over v1). However, for simple hobby server tasks the RPi can work very well.

A simple RPi (any version) setup typically requires:

  • RPi
  • SD Card
  • USB PSU + USB cable
  • Network Cable
  • External USB Drive + USB Cable (+power adapter)
  • A case

That is without display, mouse and keyboard, and you dont have a power button. It gets a bit messy.

The market is full of RPi cases that all do the same thing: nothing. They just contain the board. The market is full of mini/micro-towers for MiniITX. There are rather expensive NAS devices that come without hard drives. Why are there no small tower cases that comes with:

  • PSU
  • Slots for 1-2 hard drives (+USB to SATA converters)
  • Cabling that makes everything tidy and neat

Powering the RPi using an external hard drive
I happened to have an external USB drive with an integrated USB hub (an Iomega Minimax that was left alone when its Mac Mini died). With some wood and glue I built a simple stand for the hard drive and the RPi:

DSCN5193

DSCN5194

DSCN5196

As you can see:

  • the hard drive powers the RPi, and I can even use the hard drive power switch
  • the Ethernet and USB ports are conveniently available on the back side
  • the footprint is just slightly larger (just taller) than the hard drive itself
  • the two USB cables between RPi and harddrive are nicely contained
  • heat/ventilation should be pretty good

I have experienced no problems powering the RPi from a USB drive that it itself is connected to. It may not be a supported or recommended configuration, but for practical purposes it works for me.

Performance
I mostly run Syncthing on this RPi. The bottleneck is very much the 700MHz ARMv6 CPU, not the USB2-to-SATA-overhead.

hdparm gives me:

$ sudo /sbin/hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads:  82 MB in  3.03 seconds =  27.09 MB/sec

$ sudo /sbin/hdparm -T /dev/sda
/dev/sda:
 Timing cached reads:   496 MB in  2.01 seconds = 247.36 MB/sec

Of course it sucks compared to what you can get in 2016, but it is not remarkably bad in anyway. And it is not so fun to live on an SD card.

The Western Digital Kit
The other day Western Digital announced both a special 314GB hard drive and accessories to make it all nice.

Plusberry Pi
There is also the interesting Plusberry Pi project.

Best Raspberry Pi Server Linux Distribution

Since I got my first Raspberry Pi have have wondered: how to turn it into a proper server. Options that I have not been entirely satisfied with:

  • Arch Linux: probably a great option if you know Arch… I have been too lazy to learn.
  • Gentoo Linux: is Gentoo still relevant? Building everything on the RPi sounds very painful (slow)
  • OpenWrt: nice, but slightly too minimal for a server
  • Raspbian: nice, but a little bit too big standard installation (perhaps it does not really matter, but every apt-get upgrade takes longer time, and so on)
  • NetBSD: such a disappointment 🙁

I now found, and tested, Raspbian Unattended Netinstaller. For me, this is the shit.

If is really this simple:

  1. Format your SD-card with FAT32 (just as usual)
  2. Unpack (unzip) the raspbian-ua-netinst on your SD-card
  3. Connect the SD-card, ethernet and power to your Raspberry Pi
  4. Wait (about 25 minutes, they say, that was ok with me)
  5. SSH into your new lean Raspbian system (root/raspbian).
  6. Read under “first boot” what to do next

Clearly, you need a properly configured network (DHCP, allow fetching of packages, and you need to know what IP address it got).

The entire experience is much enhanced if you connect to your Raspberry Pi with a serial cable during the entire procedure. Jokes aside, I used a serial with my first installation. Second time when I felt confident with the process I did not bother with the serial cable.

First boot quick guide

#dpkg-reconfigure locales
#dpkg-reconfigure tzdata

/boot/config.txt: add the line
gpu_mem=16

Upgrade to jessie
For some reason, Raspbian installation is still based on wheezy, not jessie (you don’t get the latest version of Debian). I suggest, upgrade immediately:

/etc/apt/sources.list (replace wheezy with jessie, two places)

# apt-get update
# apt-get dist-upgrade

It is almost as fast as the installation itself 😉

Conclusion
I think, for the Raspberry Pi V1, Raspbian installed this way is the best server system you can have (perhaps Arch is better if you know it). For a Raspberry Pi V2, perhaps standard Debian is better (I have never used an RPi v2). Everthing I have written applies perfectly to the RPi v2 as well.

Notes on Raspberry Pi and Serial

I experimented with my Raspberry Pi (v1 B) and a serial cable, a USB-serial identified as:

[85907.504415] usb 4-5: new full-speed USB device number 19 using ohci-pci
[85907.730850] usb 4-5: New USB device found, idVendor=0403, idProduct=6001
[85907.730863] usb 4-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[85907.730871] usb 4-5: Product: TTL232R-3V3
[85907.730877] usb 4-5: Manufacturer: FTDI
[85907.730882] usb 4-5: SerialNumber: ********
[85907.737978] ftdi_sio 4-5:1.0: FTDI USB Serial Device converter detected
[85907.738070] usb 4-5: Detected FT232RL
[85907.744057] usb 4-5: FTDI USB Serial Device converter now attached to ttyUSB1

My USB-serial-device has six cables: black-brown-red-orange-yellow-green.
Connected to the RPi from the corner pin: none-none-black-yellow-orange-none8x.

At this point I have no success with minicom. Screen works though:

sudo minicom -b 115200 -o -D /dev/ttyUSB1
sudo screen /dev/ttyUSB1 115200

When serial works, my procedure is:

  1. Connect everything except power
  2. Start screen
  3. Connect power
  4. Within a few seconds i get output

If I start a fresh default NOOBS (v1.4):

Uncompressing Linux... done, booting the kernel.

Welcome to the rescue system
recovery login: 

You can log in with root/raspberry, but I don’t know if you are meant to (can) install Raspbian this way.

NOTE: The Raspberry Pi itself prints nothing to the serial console. Only with a properly installed SD-card inserted, you get output.

Already installed System
For an already installed Raspbian, I got a normal login prompt over serial.
For an already installed OpenWRT (14.07), I got a root prompt, no password required, over serial.

Formatting SD-card using Linux
Sometimes it is hard to produce an SD-card that the Raspberry Pi wants to boot from.
This partitioning and formatting works:

$ sudo /sbin/fdisk -l /dev/sde

Disk /dev/sde: 7,4 GiB, 7948206080 bytes, 15523840 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00055f28

Device     Boot Start      End  Sectors  Size Id Type
/dev/sde1        2048 15523839 15521792  7,4G  e W95 FAT16 (LBA)

gt@oden:~/Downloads$ sudo mkfs.vfat /dev/sde1
mkfs.fat 3.0.27 (2014-11-12)

To be on the safe side, before using fdisk:

$ sudo dd if=/dev/zero of=/dev/sde bs=1024 count=10240

Node.js performance of Raspberry Pi 1 sucks

In several previous posts I have studied the performance of the Raspberry Pi (version 1) and Node.js to find out why the Raspberry Pi underperforms so badly when running Node.js.

The first two posts indicate that the Raspberry Pi underperforms about 10x compared to an x86/x64 machine, after compensation for clock frequency is made. The small cache size of the Raspberry Pi is often mentioned as a cause for its poor performance. In the third post I examine that, but it is not that horribly bad: about 3x worse performance for big memory needs compared to in-cache-situations. It appears the slow SDRAM of the RPi is more of a problem than the small cache itself.

The Benchmark Program
I wanted to relate the Node.js slowdown to some other scripted language. I decided Lua is nice. And I was lucky to find Mandelbrot implementations in several languages!

I modified the program(s) slightly, increasing the resolution from 80 to 160. I also made a version that did almost nothing (MAX_ITERATIONS=1) so I could measure and substract the startup cost (which is signifacant for Node.js) from the actual benchmark values.

The Numbers
Below are the average of three runs (minus the average of three 1-iteration rounds), in ms. The timing values were very stable over several runs.

 (ms)                           C/Hard   C/Soft  Node.js     Lua
=================================================================
 QNAP TS-109 500MHz ARMv5                 17513    49376   39520
 TP-Link Archer C20i 560MHz MIPS          45087    65510   82450
 RPi 700MHz ARMv6 (Raspbian)       493             14660   12130
 RPi 700MHz ARMv6 (OpenWrt)        490    11040    15010   31720
 RPi2 900MHz ARMv7 (OpenWrt)       400     9130      770   29390
 Eee701 900MHz Celeron x86         295               500    7992
 3000MHz Athlon II X2 x64           56                59    1267

Notes on Hard/Soft floats:

  • Raspbian is armhf, only allowing hard floats (-mfloat-abi=hard)
  • OpenWrt is armel, allowing both hard floats (-mfloat-abi=softfp) and soft floats (-mfloat-abi=soft).
  • The QNAP has no FPU and generates runtime error with hard floats
  • The other targets produce linkage errors with soft floats

The Node.js versions are slightly different, and so are the Lua versions. This makes no significant difference.

Findings
Calculating the Mandelbrot with the FPU is basically “free” (<0.5s). Everything else is waste and overhead.

The cost of soft float is about 10s on the RPI. The difference between Node.js on Raspbian and OpenWrt is quite small – either both use the FPU, or none of them does.

Now, the interesting thing is to compare the RPi with the QNAP. For the C-program with the soft floats, the QNAP is about 1.5x slower than the RPi. This matches well with earlier benchmarks I have made (see 1st and 3rd link at top of post). If the RPi would have been using soft floats in Node.js, it would have completed in about 30 seconds (based on the QNAP 50 seconds). The only thing (I can come up with) that explains the (unusually) large difference between QNAP and RPi in this test, is that the RPi actually utilizes the FPU (both Raspbian and OpenWrt).

OpenWrt and FPU
The poor Lua performance in OpenWrt is probably due to two things:

  1. OpenWrt is compiled with -Os rather than -O2
  2. OpenWrt by default uses -mfloat-abi=soft rather than -mfloat-abi=softfp (which is essentially like hard).

It is important to notice that -mfloat-abi=softfp not only makes programs much faster, but also quite much smaller (10%), which would be valuable in OpenWrt.

Different Node.js versions and builds
I have been building Node.js many times for Raspberry Pi and OpenWrt. The above soft/softfp setting for building node does not affect performance much, but it does affect binary size. Node.js v0.10 is faster on Raspberry Pi than v0.12 (which needs some patching to build).

Lua
Apart from the un-optimized OpenWrt Lua build, Lua is consistently 20-25x slower than native for RPi/x86/x64. It is not like the small cache of the RPi, or some other limitation of the CPU, makes it worse for interpreted languages than x86/x64.

RPi ARMv6 VFPv2
While perhaps not the best FPU in the world, the VFPv2 floating point unit of the RPi ARMv6 delivers quite decent performance (slightly worse per clock cycle) compared to x86 and x64. It does not seem like the VFPv2 is to be blamed for the poor performance of Node.js on ARM.

Conclusion and Key finding
While Node.js (V8) for x86/x64 is near-native-speed, on the ARM it is rather near-Lua-speed: just another interpreted language, mostly. This does not seem to be caused by any limitation or flaw in the (RPi) ARM cpu, but rather the V8 implementation for x86/x64 being superior to that for ARM (ARMv6 at least).

Effects of cache on performance

It is not clear to me, why is Node.js so amazyingly slow on a Raspberry Pi (article 1, article 2)?

Is it because of the small cache (16kb+128kb)? Is Node.js emitting poor code on ARM? Well, I decided to investigate the cache issue. The 128kb cache of the Raspberry Pi is supposed to be primarily used by the GPU; is it actually effective at all?

A suitable test algorithm
To understand what I test, and because of the fun of it, I wanted to implement a suitable test program. I can imagine a good test program for cache testing would:

  • be reasonably slow/fast, so measuring execution time is practical and meaningful
  • have working data sets in sizes 10kb-10Mb
  • the same problem should be solvable with different work set sizes, in a way that the theoretical execution time should be the same, but the difference is because of cache only
  • be reasonably simple to implement and understand, while not so trivial that the optimizer just gets rid of the problem entirely

Finally, I think it is fun if the program does something slightly meaningful.

I found that Bubblesort (and later Selectionsort) were good problems, if combined with a quasi twist. Original bubble sort:

Array to sort: G A F C B D H E   ( N=8 )
Sorted array:  A B C D E F G H
Theoretical cost: O(N2) = 64/2 = 32
Actual cost: 7+6+5+4+3+2+1     = 28 (compares and conditional swaps)

I invented the following cache-optimized Bubble-Twist-Sort:

Array to sort:                G A F C B D H E
Sort halves using Bubblesort: A C F G B D E H
Now, the twist:                                 ( G>B : swap )
                              A C F B G D E H   ( D>F : swap )
                              A C D B G F E H   ( C<E : done )
Sort halves using Bubblesort: A B C D E F G H
Theoretical cost = 16/2 + 16/2 (first two bubbelsort)
                 + 4/2         (expected number of twist-swaps)
                 + 16/2 + 16/2 (second two bubbelsort)
                 = 34
Actual cost: 4*(3+2+1) + 2 = 26

Anyway, for larger arrays the actual costs get very close. The idea here is that I can run a bubbelsort on 1000 elements (effectively using 1000 memory units of memory intensively for ~500000 operations). But instead of doing that, I can replace it with 4 runs on 500 elements (4* ~12500 operations + ~250 operations). So I am solving the same problem, using the same algorithm, but optimizing for smaller cache sizes.

Enough of Bubblesort… you are probably either lost in details or disgusted with this horribly stupid idea of optimizing and not optimizing Bubblesort at the same time.

I made a Selectionsort option. And for a given data size I allowed it either to sort bytes or 32-bit words (which is 16 times faster, for same data size).

The test machines
I gathered 10 different test machines, with different cache sizes and instructions sets:

	QNAP	wdr3600	ac20i	Rpi	Rpi 2	wdr4900	G4	Celeron	Xeon	Athlon	i5
								~2007   ~2010   ~2013
============================================================================================
L1	32	32	32	16	?	32	64	32	32	128	32
L2				128	?	256	256	512	6M	1024	256
L3							1024				6M
Mhz	500	560	580	700	900	800	866	900	2800	3000	3100
CPU	ARMv5	Mips74K	Mips24K	ARMv6	ARMv7	PPC	PPC	x86	x64	x64	x64
OS	Debian	OpenWrt	OpenWrt	OpenWrt	OpenWrt	OpenWrt	Debian	Ubuntu	MacOSX	Ubuntu	Windows

Note that for the multi-core machines (Xeon, Athlon, i5) the L2/L3 caches may be shared or not between cores and the numbers above are a little ambigous. The sizes should be for Data cache when separate from Instruction cache.

The benchmarks
I ran Bubblesort for sizes 1000000 bytes down to 1000000/512. For Selectionsort I just ran three rounds. For Bubblesort I also ran for 2000000 and 4000000 but those times are divided by 4 and 16 to be comparable. All times are in seconds.

Bubblesort

	QNAP	wdr3600	ac20i	rpi	rpi2	wdr4900	G4	Celeron	Xeon	Athlon	i5
============================================================================================
4000000	1248	1332	997	1120	396	833		507	120	104	93
2000000	1248	1332	994	1118	386	791	553	506	114	102	93
1000000	1274	1330	1009	1110	367	757	492	504	113	96	93
500000	1258	1194	959	1049	352	628	389	353	72	74	63
250000	1219	1116	931	911	351	445	309	276	53	61	48
125000	1174	1043	902	701	349	397	287	237	44	56	41
62500	941	853	791	573	349	373	278	218	38	52	37
31250	700	462	520	474	342	317	260	208	36	48	36
15625	697	456	507	368	340	315	258	204	35	49	35
7812	696	454	495	364	340	315	256	202	34	49	35
3906	696	455	496	364	340	315	257	203	34	47	35
1953	698	456	496	365	342	320	257	204	35	45	35

Selectionsort

	QNAP	wdr3600	ac20i	rpi	rpi2	wdr4900	G4	Celeron	Xeon	Athlon	i5
============================================================================================
1000000	1317	996	877	1056	446	468	296	255	30	45	19
31250	875	354	539	559	420	206	147	245	28	40	21
1953	874	362	520	457	422	209	149	250	30	41	23

Theoretically, all timings for a single machine should be equal. The differences can be explained much by cache sizes, but obviously there are more things happening here.

Findings
Mostly the data makes sense. The caches creates plateaus and the L1 size can almost be prediced by the data. I would have expected even bigger differences between best/worse-cases; now it is in the range 180%-340%. The most surprising thing (?) is the Selectionsort results. They are sometimes a lot faster (G4, i5) and sometimes significantly slower! This is strange: I have no idea.

I believe the i5 superior performance of Selectionsort 1000000 is due to cache and branch prediction.

I note that the QNAP and Archer C20i both have DDRII memory, while the RPi has SDRAM. This seems to make a difference when work sizes get bigger.

I have also made other Benchmarks where the WDR4900 were faster than the G4 – not this time.

The Raspberry Pi
What did I learn about the Raspberry Pi? Well, memory is slow and branch prediction seems bad. It is typically 10-15 times slower than the modern (Xeon, Athlon, i5) CPUs. But for large selectionsort problems the difference is up to 40x. This starts getting close to the Node.js crap speed. It is not hard to imagine that Node.js benefits heavily from great branch prediction and large cache sizes – both things that the RPi lacks.

What about the 128k cache? Does it work? Well, compared to the L1-only machines, performance of RPi degrades sligthly slower, perhaps. Not impressed.

Bubblesort vs Selectionsort
It really puzzles me that Bubblesort ever beats Selectionsort:

void bubbelsort_uint32_t(uint32_t* array, size_t len) {
  size_t i, j, jm1;
  uint32_t tmp;
  for ( i=len ; i>1 ; i-- ) {
    for ( j=1 ; j<i ; j++ ) {
      jm1 = j-1;
      if ( array[jm1] > array[j] ) {
        tmp = array[jm1];
        array[jm1] = array[j];
        array[j] = tmp;
      }
    }
  }
}

void selectionsort_uint32_t(uint32_t* array, size_t len) {
  size_t i, j, best;
  uint32_t tmp;
  for ( i=1 ; i<len ; i++ ) {
    best = i-1;
    for ( j=i ; j<len ; j++ ) {
      if ( array[best] > array[j] ) {
        best = j;
      }
    }
    tmp = array[i-1];
    array[i-1] = array[best];
    array[best] = tmp;
  } 
}

Essentially, the difference is how the swap takes place outside the inner loop (once) instead of all the time. The Selectionsort should also be able of benefit from easier branch prediction and much fewer writes to memory. Perhaps compiling to assembly code would reveal something odd going on.

Power of 2 aligned data sets
I avoided using a datasize with the size an exact power of two: 1024×1024 vs 1000×1000. I did this becuase caches are supposed to work better this way. Perhaps I will make some 1024×1024 runs some day.

Node.js Benchmark on Raspberry Pi (v1)

I have experimented a bit with Node.js and Raspberry Pi lately, and I have found the performance… surprisingly bad. So I decided to run some standard tests: benchmark-octane (v9).

Octane is essentially run like:

$ npm install benchmark-octane
$ cd node_modules/benchmark-octane
$ node run.js

The distilled result of Octane is a total run time and a score. Here are a few results:

                         OS             Node.js                   Time    Score
QNAP TS-109 500MHz       Debian        v0.10.29 (Debian)         3350s      N/A
Raspberry Pi v1 700MHz   OpenWrt BB    v0.10.35 (self built)     2267s      140
Raspberry Pi v1 700MHz   Raspbian       v0.6.19 (Raspbian)       2083s      N/A
Raspberry Pi v1 700MHz   Raspbian       v0.12.2 (self built)     2176s      104
Eee701 Celeron 900Mhz    Xubuntu       v0.10.25 (Ubuntu)          171s     1655
Athlon II X2@3Hz         Xubuntu       v0.10.25 (Ubuntu)           49s     9475
MacBook Air i5@1.4Ghz    Mac OS X      v0.10.35 (pkgsrc)           47s    10896
HP 2560p i7@2.7Ghz       Xubuntu       v0.10.25 (Ubuntu)           41s    15450

Score N/A means that one test failed and there was no final score.

When I first saw the RPi performance I thought I had done something wrong building (using a cross compiler) Node.js myself for RPi and OpenWRT. However Node.js with Raspbian is basically not faster, and also RPi ARMv6 with FPU is not much faster than the QNAP ARMv5 without FPU.

I think the Eee701 serves as a good baseline here. At first glance, possible reasons for the RPi underperformance relative to the Celeron are:

  • Smaller cache (16kb of L1 cache and L2 only available to GPU, i Read) compared to Celeron (512k)
  • Bad or not well utilised FPU (but there at least is one on the RPi)
  • Node.js (V8) less optimized for ARM

I found that I have benchmarked those to CPUs against each other before. That time the Celeron was twice as fast as the RPi, and the FPU of the RPi performed decently. Blaming the small cache makes more sense to me, than blaming the people who implemented ARM support in V8.

The conclusion is that Raspberry Pi (v1 at least) is extremely slow running Node.js. Other benchmarks indicate that RPi v2 is significantly faster.

Raspberry Pi (v1), OpenWrt (14.07) and Node.js (v0.10.35 & v0.12.2)

Since I gave up running NetBSD on my Raspberry pi I decided it was time to try OpenWrt. And, to my surprise I also managed to cross compile Node.js!

Install OpenWrt on Raspberry Pi (v1@700MHz)
I installed OpenWrt Barrier Breaker (the currently stable release) using the standard instructions.

After you have put the image on an SD-card with dd, it is quite easy to resize the root partition:

  1. copy the second partition to an image file using dd
  2. use fdisk to delete the second partition, and create a new, bigger
  3. format the new partition with mkfs.ext4
  4. mount the image file using mount -o loop
  5. mount the new second partition
  6. copy all data from image file to second partition using cp -a

If you want to, you can edit /etc/config/network while you are anyway working with the OpenWrt root partition:

#config interface 'lan'
#	option ifname 'eth0'
#	option type 'bridge'
#	option proto 'static'
#	option ipaddr '192.168.1.1'
#	option netmask '255.255.255.0'
#	option ip6assign '60'
#	option gateway '?.?.?.?'
#	option dns '?.?.?.?'
config interface 'lan'
	option ifname 'eth0'
	option proto 'dhcp'
	option macaddr 'XX:XX:XX:XX:XX:XX'
	option hostname 'rpiopenwrt'

Probably you want to disable dnsmasq, odhcpd and firewall too:

.../etc/init.d/$ chmod -x dnsmasq firewall odhcpd

OR (depending on your idea of what is the right way)

.../etc/rc.d$ sudo rm S60dnsmasq S35odhcpd K85odhcpd S19firewall

Also, it is a good idea to edit config.txt (on the DOS partition):

gpu_mem=1

I don’t know if 1 is really a legal value, but it worked for me, and I had much more memory available than when gpu_mem was not set.

Node.js4 added 2015-10-03
For Node.js, check Node.js 4 builds.

Building Node.js v0.12.2
I downloaded and built Node.js v0.12.2 on a Xubuntu machine with an x64 cpu. On such a machine you can download the standard OpenWrt toolchain for Raspberry Pi.

I replaced configure and cpu.cc in the standard sources with the files from This Page (they are meant for v0.12.1 but they work equally good for v0.12.2).

I then found an a gist that gave me a good start. I modified it, and ended up with:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB

export TARGET_ARCH="-march=armv6j"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6j"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6j"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Run this script in the Node.js source directory. If everything goes fine it configures the Node.js build, and leaves you with a shell where you can simply run:

$ make

If compilation is fine, you find the node binary in the out/Release folder. Copy it to your OpenWrt Raspberry Pi.

Building Node.js v0.10.35
I first successfully built Node.js v0.10.35.

The (less refined) script for configuring that I used was:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB
export GYP_DEFINES="armv7=0"

#Define our target device
export TARGET_ARCH="-march=armv6"
export TARGET_TUNE="-mfloat-abi=hard"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2 -ggdb3"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux
bash --norc

Running node on the Raspberry Pi
Back on the Raspberry Pi you need to install a few packages:

# ldd ./node 
	libdl.so.0 => /lib/libdl.so.0 (0xb6f60000)
	librt.so.0 => not found
	libstdc++.so.6 => not found
	libm.so.0 => /lib/libm.so.0 (0xb6f48000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6f34000)
	libpthread.so.0 => not found
	libc.so.0 => /lib/libc.so.0 (0xb6edf000)
	ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0xb6f6c000)
# opkg update
# opkg install librt
# opkg install libstdcpp

That is all! Now you should be ready to run node. The node binary is about 13Mb (the v0.10.35 was 19Mb perhaps becuase of -ggdb3), so it is not optimal to deploy it to other typical OpenWrt hardware.

Final comments
I ran a few small programs to test, and they were fine. I guess some more testing would be appropriate. The performance is very comparable to Node.js built and executed on Raspbian.

I think RaspberryPi+OpenWrt+Node.js is a very interesting and competitive combination for microservices!

NetBSD on a Raspberry Pi

Update 2015-12-03:According to a reader comment (below), NetBSD for RPi has matured significantly since I wrote this post. That sounds great to me! But I have not tested yet.

As a long time Linux user I have always had some kind of curiosity about the BSDs, especially NetBSD and its minimalistic approach to system design. For a while I have been thinking that perhaps NetBSD is the perfect operating system for turning a Raspberry Pi into a server.

I have read anti-BSD rants like this “BSD, the truth“, and I have also appreciated pkgsrc for Mac OS X. I felt I needed got get my own opinion. It is easy to have a romantic idea about “Old Real UNIX”, but my limited experience with IRIX and Solaris is not that positive. And BSD is another beast.

For the Raspberry Pi (Version 1, Model B) it is supposed to be possible to run both (stable) NetBSD 6.1.5 and (beta) NetBSD 7.0. It seemed, after all, that the beta 7.0 was the way to go.

At first it was fine

I followed the official instructions and installed NetBSD 7.0. I (first) used the (800MB) rpi.img. I set up my user:

# useradd zo0ok
...
# mkdir /home
# mkdir /home/zo0ok
# chown zo0ok:users /home/zo0ok
# usermod -G wheel zo0ok

Then it was time to configure pkgsrc and start installing packages.

The Disk Problem
I did a quick check to see how much available space I have, before installing stuff. To my surprise:

# df -h
Filesystem         Size       Used      Avail %Cap Mounted on
/dev/ld0a          650M       623M      -5.4M 100% /
/dev/ld0e           56M        14M        42M  24% /boot
kernfs             1.0K       1.0K         0B 100% /kern
ptyfs              1.0K       1.0K         0B 100% /dev/pts
procfs             8.0K       8.0K         0B 100% /proc
tmpfs              112M       8.0K       112M   0% /var/shm

It seemed like the filesystem had not been (automatically) expanded as it should be according to the instructions above. So i followed the manual instructions to resize my root partition, with no success whatsover.

So I ran disklabel to see if NetBSD recognized my 8GB SD-card…

# /sbin/disklabel ld0
# /dev/rld0c:
type: SCSI
disk: STORAGE DEVICE
label: fictitious
flags: removable
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 64
sectors/cylinder: 2048
cylinders: 862
total sectors: 1766560
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

8 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a:   1381536    385024     4.2BSD      0     0     0  # (Cyl.    188 -    862*)
 b:    262144    122880       swap                     # (Cyl.     60 -    187)
 c:   1766560         0     unused      0     0        # (Cyl.      0 -    862*)
 d:   1766560         0     unused      0     0        # (Cyl.      0 -    862*)
 e:    114688      8192      MSDOS                     # (Cyl.      4 -     59)

Clearly, NetBSD thought this SD-card was 900MB rather than 8GB, and this is why it failed to automatically resize it.

The sysinst install
I was anyway not very comfortable with getting a preinstalled/preconfigured 800MB system with swap and everything, so I formatted the 8GB SD card with my digital camera (just to be sure the partition table did not contain anything weird), downloaded (6MB) rpi_inst.img and wrote it to the SD card.

NetBSD installation started properly, and I was looking forward to install over SSH. According to the instructions I was supposed to start the DHCP somehow. But DHCP seemed on (the RPi got an IP) but SSH was off, so I installed using keyboard.

Quite immediately I was informed that NetBSD failed to recognise the “disk geometry” properly. I tried the SD card in Linux which reluctantly reported that it had 166 heads and 30 sectors per track (it sounds like nonsense). So I gave this information to the NetBSD sysinst program and now the SD card seemed to be 7.5GB.

Then followed a long and confused period of time when I tried to be smart enough to come up with any working partition scheme that NetBSD could accept. The right procedure was:

  1. Choose entire disk
  2. Confirm to delete the (required) 56MB dos partition
  3. Partition, pretending to be unaware of the need of a dos partition
  4. Magically, in the end, it added the dos partition

I am clearly stupid. There are no words for how confused I am about the a:, c: and e: partitions (that seems to reuse the DOS naming, but for other purposes), the empty space, the disk labels, the BSD partitions inside a (non existing) primary partition.

Anyway, just after I gave up and then gave it a final try I convinced sysinst to install. Then came a phase of choosing download paths, which clearly was non-trivial since I installed a Beta, and I am fine with that.

Installation went on. In the end came a nice menu where I could configure stuff. I liked it! (I wish I knew how to start it later). It managed to get my network settings from DHCP (except the GW), but it failed to configure and test the network itself (despite it had downloaded everything over the network just a few minutes ago). I configured a few other things, I restarted, network was working and I was happy… for a while.

I configured pkgsrc, and it seems ALL other systems where pkgsrc exist have been blessed with the pkgin tool, except NetBSD where you are supposed to do all the job yourself. Well, I added the PKG_PATH to the .shrc (of my user, not root) and enjoyed pkg_add.

(not) Compiling NodeJS
I want to install node.js on my NetBSD Raspberry Pi. It is not in pkgsrc (which is it for Mac OS X, but whatever) so I had to build it myself. I am used to building node.js and I was looking forward to fix all the broken dependencies. If I had ever gotten there.

I downloaded the source and started unpacking it… it is about 10000 files and 100MB of data. My SD card (a SanDisk Ultra, class 10) is not super fast, dd-ing the image to it earlier wrote at a speed of 3MB/s. The unpacking speed of node.js; roughly 1 file per second. I realised I need a (fast) USB-drive or a faster SD card, so I (literally) went out to town, bought a fast USB drive (did not find the SD card I wanted) and a few other things. When I came back more than 8000 files had been extracted and less than 2000 remained. I started reading about how to partition and format a USB drive for NetBSD, and at some point I inserted it in the Raspberry Pi. A little later I noticed my ssh sessions were dead, and the RPi had restarted. It turns out what reality was worse than the truth in “BSD, the truth”:

[…] the kernels of the BSDs are also very fault intolerant.

The best example of this is the issue with removing USBs. The problem appears when USBs are removed without unmounting them first. The result is a kernel panic. The astounding aspect of this is that this problem has been exhibited by all the major BSD variants Free, Open, Net and DragonflyBSD ever since USB support was implemented in them 5 to 6 years ago and has never ever been fixed. FreeBSD mailing lists even ban people who dare mention about it. In Linux, such things never and happen and bugs as serious as this gets fixed before a release is made.

Fact is, NetBSD 7.0 Beta for RPi, crashes, immediately, when I insert a USB drive.

This actually did not make me give up. I really restarted the system with the USB drive inserted, with the intention of treating my USB drive as a fixed disk and not inserting/removing it unless I shut the RPi down first. This was when I did give up: deleting the 16GB dos partition and creating a NetBSD filesystem was just too difficult for me. Admittedly, my patience was running out.

More on memory card performance
I found this very interesting article (linked to, by the Gentoo people, of course). Without going into details; clearly a Raspberry Pi with an SD card root filesystem needs a filesystem and block device implementation that works well with actual SD cards. This is not trivial and this means doing things very differently from rotating media.

I did the same unpacking of the node.js source on Raspbian (I installed Raspbian on exactly the same SD card as I used for NetBSD): 22 seconds (tar: 18s, sync 4s), compared to 3h for NetBSD.

Conclusion
In theory, NetBSD would be a beautiful fit for the Raspberry Pi. The ARMv6 is not supported by standard Debian. Raspbian comes with a little “too much” for my taste (it is not a real problem), and it does not have the feeling of “Debian stable”, but more some “inoffical Debian test” (sorry Raspbian people – I really appreciate your job!).

I have wondered why Noobs does not come with NetBSD… but I think I know now. And, sometimes I am surpised that Linux seems to work better than Mac OS X, perhaps now I know why.

My romantic idea that NetBSD would be perfect for the RPI was just plain wrong. Installing NetBSD today made me remember installing Slackware on a Compaq laptop in 1998.

Perhaps I will give Arch a try. Or put OpenWRT on the RPi.

Owncloud client on Raspbian

I found that Raspbian comes with a very old version (1.2 something) of the Owncloud client. I found no prebuilt more up to date versions, so I built one myself:

$ sudo apt-get install cmake qt4-dev-tools build-essential
$ sudo apt-get install libneon27 libneon27-dev qtkeychain-dev
$ sudo apt-get install sqlite3 libsqlite3-dev libsqlite3-0
$ tar -xjf mirall-1.6.1rc1.tar.bz2
$ mkdir mirall-build
$ cd mirall-build/
$ cmake ../mirall-1.6.1rc1

The owncloud client is now in the bin folder.

Note: I took the commands above from my history, so there is a slight risk of a mistake. Also, I might have installed other packages before, that I am not aware of are not required for owncloud. Feel free to give feedback!

It is quite useful to put a Raspberry Pi with a USB-drive in someone elses home, and let it syncronize your files. That way, you have an off-site backup for worst case scenarios.