Update 2017-12-05: I added a new test in the end that came from real code.
It is both true that functional code is slower and that Node.js v8 is tightening the gap.
Update 2017-07-17: Below i present numbers showing that functional code is slower than imperative code. It seems this has changed with newer versions of Node.js: functional code has not turned faster but imperative code has become slower. You can read a little more about it in the comments. I will look more into this. Keep in mind that the below findings may be more accurate for Node.js v4-6 than for v8.
Functional programming is very popular with contemporary JavaScript programmers. As I have written before, Functional programming sucks and functional libraries for JavaScript also suck.
In this post I will explain more why Functional Programming sucks. I will start with the conclusion. Read on as long as you want more details.
Functional Programming practices are bad for performance
It is very popular to feed lamda-functions to map(), reduce(), filter() and others. If you do this carelessly the performance loss is significant.
It is also popular to work with immutable data. That is, you avoid functions that change (mutate) current state (side effects) and instead you produce a new state (a pure function). This puts a lot of pressure on the garbage collector and it can destroy performance.
The Benchmark Problem
Sometimes I entertain myself solving problems on Hackerrank.com. I particularly like the mathematical challenges in the Project Euler section (the Project Euler is also an independent organisation – HackerRank uses the challenges in Project Euler to create programming challenges).
This article refers to Project Euler 32. I will not go into details, but the solution is basically:
- Generate all permutations of the numbers 1,2,3,4,5,6,7,8,9 (there are 9! of them)
- For each permutation, check if it is “good” (very few are)
- Print the sum of the good instances
The first two steps give good benchmark problems. I have made different implementations of (1) and (2) and then compared the results.
Benchmark Results
I have three different permutation generators (all recursive functions):
- Pure function, immutable data (it may not be strictly pure)
- Function that mutates its own internal state, but not its input
- Function that mutates shared data (no allocation/garbace collection)
I also have three different test functions:
- Tests the orginal Project Euler problem
- Simplified test using reduce() and lamda function
- Simplified test implemented a standard loop
I benchmarked on two different systems using Node.js version 6. I have written elsewhere that Node.js performance on Raspberry Pi sucks.
(seconds) |
Project Euler Test |
Simplified Test |
Test Function: |
|
Functional |
Imperative |
Permutation Gen: |
Pure |
Semi |
Shared |
Shared |
Shared |
Pure |
Raspberry Pi v1 (ARMv6 @ 700) |
69 |
23 |
7.4 |
21 |
3.7 |
62 |
MacBook Air (Core i5 @ 1400) |
0.77 |
0.29 |
0.13 |
0.40 |
0.11 |
0.74 |
Comparing columns 1-2-3 shows the performance of different generators (for Project Euler test)
Comparing columns 4-5 shows the performance of two different test functions (using fast generator)
Comparing columns 5-6 shows the performance of two different generators (for fast simple test)
This shows that the benefit of using shared/mutable data (not running the garbage collector) instead of immutable data is 5x performance on the Intel CPU and even more on the ARM. Also, the cost of using reduce() with a lamda function is more than 3x overall performance on the Intel CPU, and even more on the ARM.
For both the test function and permutation generation, making any of them functional-slow significantly slows down the entire program.
The conclusion of this is that unless you are quite sure your code will never be performance critical you should avoid functional programming practices. It is a lot easier to write imperative code than to later scale out your architecture when your code does not perform.
However, the pure immutable implementation of the permutation generator is arguably much simpler than the iterative (faster) counterpart. When you look at the code you may decide that the performance penalty is acceptable to you. When it comes to the reduce() with a lamda function, I think the imperative version is easier to read (and much faster).
Please notice that if your code consists of nice testable, replaceble parts without side effects you can optimize later on. The functional principles are more valuable at a higher level. If you define your functions in a way that they behave like nice FP functions it does not matter if they are implemented using imperative principles (for performance).
Generating Permutations
I used the following simple method for generating permutations. I start with two arrays and I send them to my permute-function:
head = [];
tail = [1,2,3,4];
permute(head,tail);
My permute-function checks if tail is empty, and then: test/evalute head.
Otherwise it generates 4 (one for each element in tail) new sets of head and tail:
permute( [1] , [2,3,4] )
permute( [2] , [1,3,4] )
permute( [3] , [1,2,4] )
permute( [4] , [1,2,3] )
The difference in implementation is:
- Pure version generates all the above 8 arrays as new arrays using standard array functions
- Semi pure version generates its own 2 arrays (head and tail) and then uses a standard loop to change the values of the arrays between the (recursive) calls to permute.
- Shared version simply creates a single head-array and 9 tail-arrays (one for each recursion step) up front. It then reuses these arrays throughout the 9! iterations. (It is not global variables, they are hidden and private to the permutation generator)
The simplified test
The simplified test checks if the array is sorted: [1,2,3,4]. Of all permutations, there is always exactly one that is sorted. It is a simple test to implement (especially with a loop).
// These functions are part of a "test-class" starting like:
function testorder1() {
var cnt = 0;
// Functional test
this.test = function(p) {
if ( false !== p.reduce(function(acc,val) {
if ( false === acc || val < acc ) return false;
return val;
}, -1)) cnt++;
};
// Iterative test (much faster)
this.test = function(p) {
var i;
for ( i=1 ; i<p.length ; i++ ) {
if ( p[i] < p[i-1] ) return;
}
cnt++;
};
I tried to optimise the functional reduce() version by breaking out a named function. That did not help. I also tried to let the function always return the same type (now it returns false OR a number) but that also made no difference at all.
All the code
For those who want to run this themselves or compare the permutation functions here is the entire program.
As mentioned above, the slowest (immutable data) permutation function is a lot smaller and easier to understand then the fastest (shared data) implementation.
'use strict';
// UTILITIES
function arrayToNum(p, s, e) {
var r = 0;
var m = 1;
var i;
for ( i=e-1 ; s<=i ; i-- ) {
r += m * p[i];
m *= 10;
}
return r;
}
function arrayWithZeros(n) {
var i;
var a = new Array(n);
for ( i=0 ; i<a.length ; i++ ) a[i] = 0;
return a;
}
// PERMUTATION ENGINES
function permutations0(n, callback) {
}
// IMMUTABLE (SLOWEST)
function permutations1(n, callback) {
var i;
var numbers = [];
for ( i=1 ; i<=n ; i++ ) numbers.push(i);
permute1([],numbers,callback);
}
function permute1(head, tail, callback) {
if ( 0 === tail.length ) {
callback(head);
return;
}
tail.forEach(function(t, i, a) {
permute1( [t].concat(head),
a.slice(0,i).concat(a.slice(i+1)),
callback);
});
}
// MUTATES ITS OWN DATA, BUT NOT ITS ARGUMENTS
function permutations2(n, callback) {
var i;
var numbers = [];
for ( i=1 ; i<=n ; i++ ) numbers.push(i);
permute2([],numbers,callback);
}
function permute2(head, tail, callback) {
if ( 0 === tail.length ) {
callback(head);
return;
}
var h2 = [tail[0]].concat(head);
var t2 = tail.slice(1);
var i = 0;
var tmp;
while (true) {
permute2(h2, t2, callback);
if ( i === t2.length ) return;
tmp = h2[0];
h2[0] = t2[i];
t2[i] = tmp;
i++;
}
}
// MUTATES ALL DATA (INTERNALLY) (FASTEST)
function permutations3(n, callback) {
var i;
var head = arrayWithZeros(n);
var tails = new Array(n+1);
for ( i=1 ; i<=n ; i++ ) {
tails[i] = arrayWithZeros(i);
}
for ( i=1 ; i<=n ; i++ ) {
tails[n][i-1] = i;
}
function permute3(x) {
var j;
var tail_this;
var tail_next;
var tmp;
if ( 0 === x ) {
callback(head);
return;
}
tail_this = tails[x];
tail_next = tails[x-1];
for ( j=1 ; j<x ; j++ ) {
tail_next[j-1] = tail_this[j];
}
j=0;
while ( true ) {
head[x-1] = tail_this[j];
permute3(x-1);
j++;
if ( j === x ) return;
tmp = head[x-1];
head[x-1] = tail_next[j-1];
tail_next[j-1] = tmp;
}
}
permute3(n);
}
// TEST FUNCTIONS
function testprint() {
this.test = function(p) {
console.log(JSON.stringify(p));
};
this.done = function() {
return 'Done';
};
}
// CHECKS IF PERMUTATION IS ORDERED - FUNCTIONAL (SLOWEST)
function testorder1() {
var cnt = 0;
this.test = function(p) {
if ( false !== p.reduce(function(acc,val) {
if ( false === acc || val < acc ) return false;
return val;
}, -1)) cnt++;
};
this.done = function() {
return cnt;
};
}
// CHECKS IF PERMUTATION IS ORDERED - IMPERATIVE (FASTEST)
function testorder2() {
var cnt = 0;
this.test = function(p) {
var i;
for ( i=1 ; i<p.length ; i++ ) {
if ( p[i] < p[i-1] ) return;
}
cnt++;
};
this.done = function() {
return cnt;
};
}
// TEST FUNCTION FOR PROJECT EULER 32
function testeuler() {
var sums = {};
this.test = function(p) {
var w1, w2, w;
var m1, m2, mx;
w = Math.floor(p.length/2);
w1 = 1;
w2 = p.length - w - w1;
while ( w1 <= w2 ) {
m1 = arrayToNum(p, 0, w1 );
m2 = arrayToNum(p, w1, w1+w2 );
mx = arrayToNum(p, w1+w2, p.length);
if ( m1 < m2 && m1 * m2 === mx ) {
sums['' + mx] = true;
}
w1++;
w2--;
}
};
this.done = function() {
var i;
var r = 0;
for ( i in sums ) {
r += +i;
}
return r;
};
}
// MAIN PROGRAM BELOW
function processData(input, parg, targ) {
var r;
var test = null;
var perm = null;
switch ( parg ) {
case '0':
perm = permutations0;
break;
case '1':
perm = permutations1;
break;
case '2':
perm = permutations2;
break;
case '3':
perm = permutations3;
break;
}
switch ( targ ) {
case 'E':
test = new testeuler;
break;
case 'O1':
test = new testorder1;
break;
case 'O2':
test = new testorder2;
break;
case 'P':
test = new testprint();
break;
}
r = perm(+input, test.test);
console.log(test.done());
}
function main() {
var input = '';
var parg = '1';
var targ = 'E';
var i;
for ( i=2 ; i<process.argv.length ; i++ ) {
switch ( process.argv[i] ) {
case '0':
case '1':
case '2':
case '3':
parg = process.argv[i];
break;
case 'E':
case 'O1':
case 'O2':
case 'P':
targ = process.argv[i];
break;
}
}
process.stdin.resume();
process.stdin.setEncoding('ascii');
process.stdin.on('data', function (s) {
input += s;
});
process.stdin.on('end', function () {
processData(input, parg, targ);
});
}
main();
This is how I run the code (use a lower value than 9 to have fewer than 9! permutations)
### Project Euler Test: 3 different permutation generators ###
$ echo 9 | time node projecteuler32.js 3 E
45228
8.95user ...
b$ echo 9 | time node projecteuler32.js 2 E
45228
25.03user ...
$ echo 9 | time node projecteuler32.js 1 E
45228
70.34user ...
### Simple check-order test, two different versions. Fastest permutations.
b$ echo 9 | time node projecteuler32.js 3 O1
1
23.71user ...
$ echo 9 | time node projecteuler32.js 3 O2
1
4.72user ...
(the timings here may not exactly match the above figures)
Update 2017-12-05
Admittedly, I sometimes find map(), filter() handy and I try to use them when it makes code more clear. I came to a situation where I want to split a list in two lists (one list with valid objects and one with invalid). This is a simple if/else with a push() in each. Or it would be two calls to filter(). Then it turned out that I wanted to split the valid objects into two lists: good and ugly. The slightly simplified code is:
function goodBadUgly_1(list) {
var i, c;
var ret = {
good : [],
bad : [],
ugly : []
}
for ( i=0 ; i<list.length ; i++ ) {
c = list[i];
if ( !validateItem(c) )
ret.bad.push(c);
else if ( uglyItem(c) )
ret.ugly.push(c);
else
ret.good.push(c);
}
return ret;
}
function goodBadUgly_2(list) {
return {
good : list.filter(function(c) {
return validateItem(c) && !uglyItem(c);
}),
bad : list.filter(function(c) {
return !validateItem(c);
}),
ugly : list.filter(function(c) {
return validateItem(c) && uglyItem(c);
})
};
}
On my not too powerful x64 CPU, and a list of about 1000 items the non-FP version took 6ms and the FP version took 16ms (second run, to allow the JIT to do its job). This was with Node 8.9.1. For Node 6.11.3 the FP version was slower but the non-FP version was almost same speed (quite consistent with my comment in the beginning from 2017-07-17).
You may think that of course the FP code is slower: it calls validateItem twice (always) and uglyItem twice for all valid items. Yes, that is true, and that is also my point! When you do FP you avoid (storing intermediate results in) variables. This results in extra work being done a lot of the time. How would YOU implement this in FP style?
This is 10 ms: does it matter? Well, first it is only 1000 objects.
If you do this in a Web GUI when a user clicks a button, the user will wait 10ms longer for everything to be updated. 10ms is not a lot. But if this multiplies (because you have a longer list) or adds up (because you are doing other things in a slower-than-necessary way) the UX will suffer.
If you do this server side, 10ms is a lot. In Node.js you have just 1 thread. So this overhead is 1% of all available performance each second. If you get 10 requests per second 10% CPU is wasted only because you prefer FP style.
This is one of those cases when FP has the same computational complexity, but its kind of a constant factor slower. Sometimes it can be even worse.
All FP-sucks related articles
Functional Programming Sucks)
Underscore.js sucks! Lodash sucks!
Functional Programming Sucks! (it is slow) (this one)
Lodash Performance Sucks!