Category Archives: Programming

Performance, Node.js & Sorting

I will present two findings that I find strange in this post:

  1. The performance of Node.js (V8?) has clearly gotten consistently worse with newer Node.js versions.
  2. The standard library sort (Array.prototype.sort()) is surprisingly slow, often slower than a simple textbook mergesort.

My findings in this article are based on running a simple program mergesort.js on different computers and different node versions.

You may also want to read this article about sorting in Node.js. It applies to V8 version 7.0, which should be used in Node.js V11.

The sorting algorithms

There are three sorting algorithms compared.

  1. Array.prototype.sort()
  2. mergesort(), a textbook mergesort
  3. mergesort_opt(), a mergesort that I put some effort into making faster

Note that mergesort is considered stable and not as fast as quicksort. As far as I understand from the above article, Node.js used to use quicksort (up to V10), and from V11 uses something better called Timsort.

My mergesort implementations (2) (3) are plain standard JavaScript. Nothing fancy whatsoever (I will post benchmarks using Node.js v0.12 below).

The data to be sorted

There are three types of data to be sorted.

  1. Numbers (Math.random()), compared with a-b;
  2. Strings (random numbers converted to strings), compared with default compare function for sort(), and for my mergesort simple a<b, a>b compares to give -1, 1 or 0
  3. Objects, containing two random numbers a=[0-9], b=[0-999999], compared with (a.a-b.a) || (a.b-b.b). In one in 10 the value of b will matter, otherwise looking at the value of a will be enough.

Unless otherwise written the sorted set is 100 000 elements.

On Benchmarks

Well, just a standard benchmark disclaimer: I do my best to measure and report objectively. There may be other platforms, CPUs, configurations, use cases, datatypes, or array sizes that give different results. The code is available for you to run.

I have run all tests several times and reported the best value. If anything, that should benefit the standard library (quick)sort, which can suffer from bad luck.

Comparing algorithms

Lets start with the algorithms. This is Node V10 on different machines.

(ms)     ===== Numbers =====   ===== Strings =====   ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
NUC i7 82 82 61 110 81 54 95 66 50
NUC i5 113 105 100 191 130 89 149 97 72
NUC Clrn 296 209 190 335 250 196 287 189 157
RPi v3 1886 1463 1205 2218 1711 1096 1802 1370 903
RPi v2 968 1330 1073 1781 1379 904 1218 1154 703

The RPi-v2-sort()-Numbers stand out. Its not a typo. But apart from that I think the pattern is quite clear: regardless of datatype and on different processors the standard sort() simply cannot match a textbook mergesort implemented in JavaScript.

Comparing Node Versions

Lets compare different node versions. This is on a NUC with Intel i5 CPU (4th gen), running 64bit version of Ubuntu.

(ms)     ===== Numbers =====   ===== Strings =====   ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
v11.13.0 84 107 96 143 117 90 140 97 71
v10.15.3 109 106 99 181 132 89 147 97 71
v8.9.1 85 103 96 160 133 86 122 99 70
v6.12.0 68 76 88 126 92 82 68 83 63
v4.8.6 51 66 89 133 93 83 45 77 62
v0.12.9 58 65 78 114 92 87 55 71 60

Not only is sort() getting slower, also running “any” JavaScript is slower. I have noticed this before. Can someone explain why this makes sense?

Comparing different array sizes

With the same NUC, Node V10, I try a few different array sizes:

(ms)     ===== Numbers =====   ===== Strings =====   ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
10 000 10 9 11 8 12 6 4 7 4
15 000 8 15 7 13 14 11 6 22 7
25 000 15 35 12 40 27 15 11 25 18
50 000 35 56 34 66 57 37 51 52 30
100 000 115 107 97 192 138 88 164 101 72
500 000 601 714 658 1015 712 670 698 589 558

Admittedly, the smaller arrays show less difference, but it is also hard to measure small values with precision. So this is from the RPi v3 and smaller arrays:

(ms)     ===== Numbers =====   ===== Strings =====   ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
5 000 34 57 30 46 59 33 29 52 26
10 000 75 129 64 100 130 74 63 104 58
20 000 162 318 151 401 290 166 142 241 132
40 000 378 579 337 863 623 391 344 538 316

I think again quite consistently this looks remarkably bad for standard library sort.

Testing throughput (Version 2)

I decided to measure throughput rather than time to sort (mergesort2.js). I thought perhaps the figures above are misleading when it comes to the cost of garbage collecting. So the new question is, how many shorter arrays (n=5000) can be sorted in 10s?

(count)  ===== Numbers =====   ===== Strings =====   ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
v11.13.0 3192 2538 4744 1996 1473 2167 3791 2566 4822
v10.15.3 4733 2225 4835 1914 1524 2235 4911 2571 4811
RPi v3 282 176 300 144 126 187 309 186 330

What do we make of this? Well the collapse in performance for the new V8 Torque implementation in Node v11 is remarkable. Otherwise I notice that for Objects and Node v10, my optimized algorithm has no advantage.

I think my algorithms are heavier on the garbage collector (than standard library sort()), and this is why the perform relatively worse for 10s in a row.

If it is so, I’d still prefer to pay that price. When my code waits for sort() to finish there is a user waiting for a GUI update, or for an API reply. I rather see a faster sort, and when the update/reply is complete there is usually plenty of idle time when the garbage collector can run.

Optimizing Mergesort?

I had some ideas for optimizing mergesort that I tried out.

Special handling of short arrays: clearly if you want to sort 2 elements, the entire mergesort function is heavier than a simple function that sorts two elements. The article about V8 sort indicated that they use insertion sort for arrays up to length 10 (I find this very strange). So I implemented special functions for 2-3 elements. This gave nothing. Same performance as calling the entire mergesort.

Less stress on the garbage collector: since my mergesort creates memory nodes that are discarded when sorting is complete, I thought I could keep those nodes for the next sort, to ease the load on the garbage collector. Very bad idea, performance dropped significantly.

Performance of cmp-function vs sort

The relevant sort functions are all K (n log n) with different K. It is the K that I am measuring and discussing here. The differences are, after all, quite marginal. There is clearly another constant cost: the cost of the compare function. That seems to matter more than anything else. And in all cases above “string” is just a single string of 10 characters. If you have a more expensive compare function, the significance of sort() will be even less.

Nevertheless, V8 is a single threaded environment and ultimately cycles wasted in sort() will result in overall worse performance. Milliseconds count.

Conclusions

Array.prototype.sort() is a critical component of the standard library. In many applications sorting may be the most expensive thing that takes place. I find it strange that it does not perform better than a simple mergesort implementation. I do not suggest you use my code, or start looking for better sort() implementations out there right away. But I think this is something for JavaScript programmers to keep in mind. However, the compare function probably matters more in most cases.

I find it strange that Node v11, with Timsort and V8 Torque is not more of an improvement (admittedly, I didnt test that one very much).

And finally I find it strange that Node.js performance seems to deteriorate with every major release.

Am I doing anything seriously wrong?

JavaScript Double Linked List

JavaScript has two very powerful and flexible build in data structures: [] and {}. You can program rather advanced JavaScript for years without needing anything else.

Nevertheless I had a conversation about possible advantages of using a linked list (instead of an array). Linked lists are not very popular, Stroustrup himself has suggested they should be avoided. But what if you mostly do push(), pop(), shift() and unshift() and never access an item by its index? Higher order functions as map(), reduce(), filter() and sort(), as well as iterators should be just fine.

I decided to implement a Double Linked List in JavaScript making it (mostly) compatible with Array and do some tests on it. The code both of the DoubleLinkedList itself, and the unit tests/benchmarks are available.

Disclaimer

This is a purely theoretical, academical and nerdy experiment. The DoubleLinkedList offers no advantages over the built in Array, except for possible performance advantages in edge cases. The disadvantages compared to Array are:

  • Lower performance in some cases
  • Less features / limited API
  • Less tested and proven
  • An extra dependency, possible longer application loading time

So, my official recommendation is that you read this post and perhaps look at the code for learning purposes. But I really doubt you will use my code in production (although you are welcome to).

Benchmarks

Benchmarks are tricky. In this case there are three kinds of benchmarks:

  1. Benchmarks using array[i] to get the item at an index. This is horrible for the linked list. I wrote no such benchmarks.
  2. Benchmarks testing map(), reduce(), filter(), that I wrote but that show consistently no relevant and interesting differences between built in Array and my DoubleLinkedList (my code is essentially equally fast as the standard library array code, which on one hand is impressive, and on the other hand is reason not to use it).
  3. Benchmarks where my DoubleLinkedList does fine, mostly that heavily depends on push(), pop(), shift() and unshift().

The only thing I present below is (3). I have nothing for (1), and (2) shows nothing interesting.

The machines are in order an Hades Canyon i7-NUC, an old i5-NUC, a newer Celeron-NUC, an Acer Chromebook R13 (with an ARMv8 CPU), A Raspberry Pi v3, and a Raspberry Pi V2. The Chromebook runs ChromeOS, the i7 runs Windows, the rest run Linux.

My benchmarks use Math.random() to create test data. That was not very smart of me because the variation between test runs is significant. The below numbers (milliseconds) are the median value of running each test 41 times. You can see for yourself that the values are quite consistent.

The tested algorithms

The push(), pop(), shift(), unshift() tests use the array/list as a queue and push 250k “messages” throught it, keeping the queue roughly 10k messages.

The mergesort() test is a mergesort built on top of the datastructures using push()/shift().

The sort() test is the standard Array.sort(), versus a mergesort implementation for DoubleLinkedList (it has less overhead than mergesort(), since it does not create new objects for every push()).

Benchmark result

                    Node8   ============== Node 10 =====================
(ms) NUCi7 NUCi7 NUCi5 NUC-C R13 RPiV3 RPiV2
unshift/pop 250k
Array 679 649 1420 1890 5216 11121 8582
D.L.L. 8 13 10 20 40 128 165
push/shift 250k
Array 37 31 31 49 143 388 317
D.L.L. 10 12 10 19 44 115 179
mergesort 50k
Array 247 190 300 466 1122 3509 3509
D.L.L. 81 88 121 244 526 1195 1054
sort 50k
Array 53 55 59 143 416 1093 916
D.L.L. 35 32 42 84 209 543 463

What do we make of this?

  • For array, push/shift is clearly faster than unshift/pop!
  • It is possible to implement a faster sort() than Array.sort() of the standard library. However, this may have little to do with my linked list (I may get an even better result if I base my implementation on Array).
  • I have seen this before with other Node.js code but not published it: the RPiV2 (ARMv7 @900MHz) is faster than the RPiV3 (ARMv8 @1200Mhz).
  • I would have expected my 8th generation i7 NUC (NUC8i7HVK) to outperform my older 4th generation i5 NUC (D54250WYK), but not so much difference.

More performance findings

One thing I thought could give good performance was a case like this:

x2 = x1.map(...).filter(...).reduce(...)

where every function creates a new Array just to be destroyed very soon. I implemented mapMutate and filterMutate for my DoubleLinkedList, that reuse existing List-nodes. However, this gave very little. The cost of the temporary Arrays above seems to be practically insignificant.

However for my Double linked list:

dll_1 = DoubleLinkedList.from( some 10000 elements )
dll_1.sort()
dll_2 = DoubleLinkedList.from( some 10000 elements )

Now
dll_1.map(...).filter(...).reduce(...) // slower
dll_2.map(...).filter(...).reduce(...) // faster

So it seems I thought reusing the list-nodes would be a cost saver, but it turns out to produce cache-misses instead

Using the Library

If you feel like using the code you are most welcome. The tests run with Node.js and first runs unit tests (quickly) and then benchmarks (slower). As I wrote earlier, there are some Math.random() in the tests, and on rare occations statistically unlikely events occur, making the tests fail (I will not make this mistake again).

The code itself is just for Node.js. There are no dependencies and it will require minimal work to adapt it to any browser environment of your choice.

The code starts with a long comment specifying what is implemented. Basically, you use it just as Array, with the exceptions/limitations listed. There are many limitations, but most reasonable uses should be fairly well covered.

Conclusion

It seems to make no sense to replace Array with a linked list in JavaScript (Stroustrup was right). If you are using Array as a queue be aware that push/shift is much faster than unshift/pop. It would surprise me much if push/pop is not much faster than unshift/shift for a stack.

Nevertheless, if you have a (large) queue/list/stack and all you do is push, pop, shift, unshift, map, filter, reduce and sort go ahead.

There is also a concatMutate in my DoubleLinkedList. That one is very cheap, and if you for some reason do array.concat(array1, array2, array3) very often perhaps a linked list is your choice.

It should come as no surprise, but I was suprised that sort(), mergesort in my case, was so easy to implement on a linked list.

On RPiV2 vs RPiV3

I have on several occations before written about that the 900MHz ARMv7 of RPiV2 completely outperformes the 700MHz ARMv6 of RPiV1. It is about 15 times faster, and not completely clear why the difference is so big (it is that big for JavaScript, not for C typical code).

The RPiV3 is not only 300MHz faster than the RPiV2, it is also a 64-bit ARMv8 cpu compared to the 32-bit ARMv7 cpu of RPiV2. But V3 delivers worse performance than V2.

One reason could be that the RPi does not have that much RAM, and not that fast RAM either, and that the price of 64-bit is simply not worth it. For now, I have no other idea.

References

An article about sorting in V8: https://v8.dev/blog/array-sort. Very interesting read. But I tried Node version 11 that comes with V8 version 7, and the difference was… marginal at best.

Where to ‘use strict’ with Object.freeze()?

I have coded JavaScript short enough time to consider ‘use strict’ a mandatory and obvious feature of the language to use. I always use it unless I forget to.

A while ago I was aware of Object.freeze(). I have been thinking about different ways to exploit this (strict) feature for a while and I now have a very good use case: freeze indata in unit tests to ensure my tested functions don’t incidentally change indata (pure functions are good, pure functions don’t change indata, and it is hard to really guarantee a function in JavaScript is pure).

Imagine I am writing a function that calculates the average and I have a test for it.

const averageOfArray1 = (a) => {
let s = 0;
for ( let i=0 ; i<a.length ; i++ ) s+=a[i];
return s/a.length;
};

describe('test avg', () => {
it('should give the average value of 2', () => {
const a = [1,2,3];
assert.equal(2, averageOfArray1(a) );
});
});

If averageOfArray mutates its input, it would be a serious bug, and the above test would not detect it. Lets look at a different implementation:

const averageOfArray2 = (a) => {
for ( let i=1 ; i<a.length ; i++ ) a[0] += a[i];
return a[0]/a.length;
};

describe('test avg', () => {
it('should give the average value of 2', () => {
const a = [1,2,3];
assert.equal(2, averageOfArray2(a) );
});
});

Some genious “optimized” the function by eliminating an unnecessary variable (s), and the test still passes! However, if the tests where written:

describe('test loop', () => {
it('should give the average value of 2', () => {
const a = Object.freeze([1,2,3]);
assert.equal(2, averageOfArray2(a) );
});
})

the tests would fail! Much better. How do the tests fail? This is what I get:

1) test avg
should give the average value of 2:

AssertionError [ERR_ASSERTION]: 2 == 0.3333333333333333
+ expected - actual
-2
+0.3333333333333333

So it appears that the first element [0] of the array was never changed, thus the return value of 0.3333. But no exception was thrown. If I instead would ‘use strict’ for the entire code:

 'use strict';

const assert = require('assert');

const averageOfArray2 = (a) => {
for ( let i=1 ; i<a.length ; i++ ) a[0] += a[i];
return a[0]/a.length;
};
describe('test avg', () => {
it('should give the average value of 2', () => {
const a = Object.freeze([1,2,3]);
assert.equal(2, averageOfArray2(a));
});
});

instead I get:

1) test avg
should give the average value of 2:
TypeError: Cannot assign to read only property '0' of object '[object Array]'
at averageOfArray2 (avg.js:12:45)
at Context.it (avg.js:20:25)

which is what I really wanted.

So it APPEARS to me that without ‘use strict’ the frozen object is not changed, but changing it just fails silently. With ‘use strict’ I get an exception right way, which leads me to the question where I can put use strict? This is what I found:

 // 'use strict';  // GOOD

const assert = require('assert');

// 'use strict'; // BAD

const averageOfArray2 = (a) => {
// 'use strict'; // GOOD
let i;
// 'use strict'; // BAD
for ( i=1 ; i<a.length ; i++ ) a[0] += a[i];
return a[0]/a.length;
};
describe('test avg', () => {
// 'use strict'; // BAD
it('should give the average value of 2', () => {
const a = Object.freeze([1,2,3]);
assert.equal(2, averageOfArray2(a));
});
});

That is, ‘use strict’ should be in place where the violation actually takes place. And ‘use strict’ must be placed first in whatever function it is placed, otherwise it is silently ignored! This is probably well known to everyone, but it was not to me.

Conclusion

Object.freeze() is very useful for improved unit tests. However, you should use it together with properly placed ‘use strict’ and that is in the function begin tested (not only the unit test).

And note, if you have done Object.freeze in a unit test, and someone refactors the tested function in a way that it both:

  1. Mutates the frozen object
  2. Removes or moves ‘use strict’ to an invalid place

your unit tests may still pass, even though the function is now very dangerous.

Best way to write compare-functions

The workhorse of many (JavaScript) programs is sort(). When you want to sort objects (or numbers, actually) you need to supply a compare-function. Those are nice functions because they are very testable and reusable, but sorting is also a bit expensive (perhaps the most expensive thing your program does) so you want them fast.

For the rest of this article I will assume we are sorting som Order objects based status, date and time (all strings).

The naive way to write this is:

function compareOrders1(a,b) {
if ( a.status < b.status ) return -1;
if ( a.status > b.status ) return 1;
if ( a.date < b.date ) return -1;
if ( a.date > b.date ) return 1;
if ( a.time < b.time ) return -1;
if ( a.time > b.time ) return 1;
return 0;
}

There are somethings about this that is just not appealing: too verbose, risk of a typo, and not too easy to read.

Another option follows:

function cmpStrings(a,b) {
if ( a < b ) return -1;
if ( a > b ) return 1;
return 0;
}

function compareOrders2(a,b) {
return cmpStrings(a.status,b.status)
|| cmpStrings(a.date ,b.date )
|| cmpStrings(a.time ,b.time );
}

Note that the first function (cmpStrings) is highly reusable, so this is shorter code. However, there is still som repetition, so I tried:

function cmpProps(a,b,p) {
return cmpStrings(a[p], b[p]);
}

function compareOrders3(a,b) {
return cmpProps(a,b,'status')
|| cmpProps(a,b,'date')
|| cmpProps(a,b,'time');
}

There is something nice about not repeating status, date and time, but there is something not so appealing about quoting them as strings. If you want to go more functional you can do:

function compareOrders4(a,b) {
function c(p) {
return cmpStrings(a[p],b[p]);
}
return c('status') || c('date') || c('time');
}

To my taste, that is a bit too functional and obscure. Finally, since it comes to mind and some people may suggest it, you can concatenate strings, like:

function compareOrders5(a,b) {
return cmpStrings(
a.status + a.date + a.time,
b.status + b.date + b.time
);
}

Note that in case fields “overlap” and/or have different length, this could give unexpected results.

Benchmarks

I tried the five different compare-functions on two different machines and got this kind of results (i5 N=100000, ARM N=25000), with slightly different parameters.

In these tests I used few unique values of status and date to often hit the entire compare function.

(ms)   i5    i5    ARM
#1 293 354 507
#2 314 351 594
#3 447 506 1240
#4 509 541 1448
#5 866 958 2492

This is quite easy to understand. #2 does exactly what #1 does and the function overhead is eliminated by the JIT. #3 is trickier for the JIT since a string is used to read a property. That is true also for #4, which also requires a function to be generated. #5 puts two strings on the stack needlessly when often only the first two strings are needed to compare anyway.

Conclusion & Recommendation

My conclusion is that #3 may be the best choice, despite it is slightly slower. I find #2 clearly preferable to #1, and I think #4 and #5 should be avoided.

Lambda Functions considered Harmful

Decades ago engineers wrote computer programs in ways that modern programmers scorn at. We learn that functions were long, global variables were used frequently and changed everywhere, variable naming was poor and gotos jumped across the program in ways that were impossible to understand. It was all harmful.

Elsewhere matematicians were improving on Lisp and functional programming was developed: pure, stateless, provable code focusing on what to do rather than how to do it. Functions became first class citizens and they could even be anonymous lambda functions.

Despite the apparent conflict between object oriented, functional and imperative programming there are some universally good things:

  • Functions that are not too long
  • Functions that do one thing well
  • Functions that have no side effects
  • Functions that can be tested, and that also are tested
  • Functions that can be reused, perhaps even being general
  • Functions and variables that are clearly named

So, how are we doing?

Comparing different styles
I read code and I talk to people who have different opinions about what is good and bad code. I decided to implement the same thing following different principles and discuss the different options. I particularly want to explore different ways to do functional programming.

My language of choice is JavaScript because it allows different styles, it requires quite little code to be written, and many people should be able to read it.

My artificial problem is that I have two arrays of N numbers. One number from each array can be added in NxN different ways. How many of these are prime? That is, for N=2, if I have [10,15] and [2,5] i get [12,15,17,20] of which one number (17) is prime. In all code below I decide if a number is prime in the same simple way.

Old imperative style (imperative)
The old imperative style would use variables and loops. If I had goto in JavaScript I would use goto instead of setting a variable (p) before I break out of the inner loop. This code allows for nothing to be tested nor reused, although the function itself is testable, reusable and pure (for practical purposes and correct input, just as all the other examples).

  const primecount = (a1,a2) => {
    let i, j;
    let d, n, p;
    let retval = 0;


    for ( i=0 ; i<a1.length ; i++ ) {
      for ( j=0 ; j<a2.length ; j++ ) {
        n = a1[i] + a2[j];
        p = 1;
        for ( d=2 ; d*d<=n ; d++ ) {
          if ( 0 === n % d ) {
            p = 0;
            break;
          }
        }
        retval += p;
      }
    }
    return retval;
  }

Functional style with lambda-functions (lambda)
The functional programming equivalent would look like the below code. I have focused on avoiding declaring variables (which would lead to a mutable state) and rather using the higher order function reduce to iterate over the two lists. This code also allows for no parts to be tested or reused. In a few lines of code there are three unnamed functions, none of them trivial.

  const primecount = (a1,a2) => {
    return a1.reduce((sum1,a1val) => {
      return sum1 + a2.reduce((sum2,a2val) => {
        return sum2 + ((n) => {
          for ( let d=2 ; d*d<=n ; d++ ) if ( 0 === n % d ) return 0;
          return 1;
        })(a1val+a2val);
      }, 0);
    }, 0);
  };

Imperative style with separate test function (imperative_alt)
The imperative code can be improved by breaking out the prime test function. The advantage is clearly that the prime function can be modified in a more clean way, and it can be tested and reused. Also note that the usefulness of goto disappeared because return fulfills the same task.

  const is_prime = (n) => {
    for ( let d=2 ; d*d<=n ; d++ ) if ( 0 === n % d ) return 0;
    return 1;
  };

  const primecount = (a1,a2) => {
    let retval = 0;
    for ( let i=0 ; i<a1.length ; i++ )
      for ( let j=0 ; j<a2.length ; j++ )
        retval += is_prime(a1[i] + a2[j]);
    return retval;
  };

  const test = () => {
    if ( 1 !== is_prime(19) ) throw new Error('is_prime(19) failed');
  };

Functional style with lambda and separate test function (lambda_alt)
In the same way, the reduce+lambda-code can be improved by breaking out the prime test function. That function, but nothing else, is now testable and reausable.

  const is_prime = (n) => {
    for ( let d=2 ; d*d<=n ; d++ ) if ( 0 === n % d ) return 0;
    return 1;
  };

  const primecount = (a1,a2) => {
    return a1.reduce((sum1,a1val) => {
      return sum1 + a2.reduce((sum2,a2val) => {
        return sum2 + is_prime(a1val+a2val);
      }, 0);
    }, 0);
  };

  const test = () => {
    if ( 1 !== is_prime(19) ) throw new Error('is_prime(19) failed');
  };

I think I can do better than any of the four above examples.

Functional style with reduce and named functions (reducer)
I don’t need to feed anonymous functions to reduce: I can give it named, testable and reusable functions instead. Now a challenge with reduce is that it is not very intuitive. filter can be used with any has* or is* function that you may already have. map can be used with any x_to_y function or some get_x_from_y getter or reader function that are also often useful. sort requires a cmpAB function. But reduce? I decided to name the below functions that are used with reduce reducer_*. It works quite nice. The first one reducer_count_primes simply counts primes in a list. That is (re)useful, testable all in itself. The next function reducer_count_primes_for_offset is less likely to be generally reused (with offset=1 it considers 12+1 to be prime, but 17+1 is not), but it makes sense and it can be tested. Doing the same trick one more time with reducer_count_primes_for_offset_array and we are done. These functions may not be reused. But they can be tested and that is often a great advantage during development. You can build up your program part by part and every step is a little more potent but still completely pure and testable (I remember this from my Haskell course long ago). This is how to solve hard problems using test driven development and to have all tests in place when you are done.

  const is_prime = (n) => {
    for ( let d=2 ; d*d<=n ; d++ ) if ( 0 === n % d ) return 0;
    return 1;
  };

  const reducer_count_primes = (s,n) => {
    return s + is_prime(n);
  };

  const reducer_count_primes_for_offset = (o) => {
    return (s,n) => { return reducer_count_primes(s,o+n); };
  };

  const reducer_count_primes_for_offset_array = (a) => {
    return (s,b) => { return s + a.reduce(reducer_count_primes_for_offset(b), 0); };
  };

  const primecount = (a1,a2) => {
    return a1.reduce(reducer_count_primes_for_offset_array(a2), 0);
  };

  const test = () => {
    if ( 1 !== [12,13,14].reduce(reducer_count_primes, 0) )
      throw new Error('reducer_count_primes failed');
    if ( 1 !== [9,10,11].reduce(reducer_count_primes_for_offset(3), 0) )
      throw new Error('reducer_count_primes_for_offset failed');
    if ( 2 !== [2,5].reduce(reducer_count_primes_for_offset_array([8,15]),0) )
      throw new Error('reducer_count_primes_for_offset_array failed');
  };

Using recursion (recursive)
Personally I like recursion. I think it is easier to use than reduce, and it is great for acync code. The bad thing with recursion is that your stack will eventually get full (if you dont know what I mean, try my code – available below) for recursion depths that are far from unrealistic.  My problem can be solved in the same step by step test driven way using recursion.

  const is_prime = (n) => {
    for ( let d=2 ; d*d<=n ; d++ ) if ( 0 === n % d ) return 0;
    return 1;
  };

  const primes_for_offset = (a,o,i=0) => {
    if ( i === a.length )
      return 0;
    else
      return is_prime(a[i]+o) + primes_for_offset(a,o,i+1);
  }

  const primes_for_offsets = (a,oa,i=0) => {
    if ( i === oa.length )
      return 0;
    else
      return primes_for_offset(a,oa[i]) + primes_for_offsets(a,oa,i+1);
  }

  const primecount = (a1,a2) => {
    return primes_for_offsets(a1,a2);
  };

  const test = () => {
    if ( 2 !== primes_for_offset([15,16,17],2) )
      throw new Error('primes_with_offset failed');
  };

Custom Higher Order Function (custom_higher_order)
Clearly reduce is not a perfect fit for my problem since I need to nest it. What if I had a reduce-like function that produced the sum of all NxN possible pairs from two arrays, given a custom value function? Well that would be quite great and it is not particularly hard either. In my opinion this is a very functional approach (despite its implemented with for-loops). All the functions written are independently reusable in a way not seen in the other examples. The problem with higher order functions is that they are pretty abstract, so they are hard to name, and they need to be general enough to ever be reused for practical purposes. Nevertheless, if I see it right away, I can do it. But I don’t spend time inventing generic stuff instead of solving the actual problem at hand.

  const is_prime = (n) => {
    for ( let d=2 ; d*d<=n ; d++ ) if ( 0 === n % d ) return 0;
    return 1;
  };

  const combination_is_prime = (a,b) => {
    return is_prime(a+b);
  };

  const sum_of_combinations = (a1,a2,f) => {
    let retval = 0;
    for ( let i=0 ; i<a1.length ; i++ )
      for ( let j=0 ; j<a2.length ; j++ )
        retval += f(a1[i],a2[j]);
    return retval;
  };

  const primecount = (a1,a2) => {
    return sum_of_combinations(a1,a2,combination_is_prime);
  };

  const test = () => {
    if ( 1 !== is_prime(19) )
      throw new Error('is_prime(19) failed');
    if ( 0 !== combination_is_prime(5,7) )
       throw new Error('combination_is_prime(5,7) failed');
    if ( 1 !== sum_of_combinations([5,7],[7,9],(a,b)=> { return a===b; }) )
       throw new Error('sum_of_combinations failed');
  };

Lambda Functions considered harmful?
Just as there are many bad and some good applications for goto, there are both good and bad uses for lambdas.

I actually dont know if you – the reader – agrees with me that the second example (lambda) offers no real improvement to the first example (imperative). On the contrary, it is arguably a more complex thing conceptually to nest anonymous functions than to nest for loops. I may have done the lambda-example wrong, but there is much code out there, written in that style.

I think the beauty of functional programming is the testable and reusable aspects, among other things. Long, or even nested, lambda functions offer no improvement over old spaghetti code there.

All the code and performance
You can download my code and run it using any recent version of Node.js:

$ node functional-styles-1.js 1000

The argument (1000) is N, and if you double N execution time shall quadruple. I did some benchmarks and your results my vary depending on plenty of things. The below figures are just one run for N=3000, but nesting reduce clearly comes at a cost. As always, if what you do inside reduce is quite expensive the overhead is negligable. But using reduce (or any of the built in higher order functions) for the innermost and tightest loop is wasteful.

 834 ms  : imperative
874 ms  : custom_higher_order
890 ms  : recursive
896 ms  : imperative_alt
1015 ms  : reducer
1018 ms  : lambda_alt
1109 ms  : lambda

Other findings on this topic
Functional Programming Sucks


Vue components in Angular

I have an application written in AngularJS (v1) that I keep adding things to. Nowadays I prefer to write new code for Vue.js rather than AngularJS but rewriting the entire AngularJS application is out of the question.

However, when the need shows up for a new Page (controller in AngularJS) it is quite simple to write a Vue-component instead.

The AngularJS-html looks like this:

<div ng-if="page.showVue" id="{{ page.getVueId() }}"></div>

You may not have exactly “page” but if you have an AngularJS-application you know how to do this.

Your parent Angular controller needs to initiate Vue.

page.showVue = true;
var vue      = null;
var vueid    = null;

page.getVueId = function() {
    if ( !vueid ) {
        vueid = 'my_vue_component_id';
        var vueload = {
            el: '#' + vueid,
            template : '<my_vue_component />',
            data : {}
        };
        $timeout(function() {
            vue = new Vue(vueload);
        });
    }
    return vueid;
};

At some point you may navigate away from this vue page and then you can run the code:

vue.$destroy();
page.showVue = false;
vue          = null;
vueid        = null;

The way everything works is that when Angular wants to “show Vue” it sets page.showVue=true. This in turn activates the div, which needs an ID. The call to page.getVueId() will generate a Vue component (once), but initiate it only after Angular has shown the parent div with the correct id (thanks to $timeout).

You may use a router or have several different Vue-pages in your Angular-application and you obviously need to adjust my code above for your purposes (so every id is unique, and every component is initatied once).

I suppose (but I have not tried) that it is perfectly fine to have several different Vue-components mounted on different places in your Angular application. But I think you are looking for trouble if you want Vue to use (be a parent for) Angular controllers or directives (as children).

Vue.js is small enough that this will come at a quite acceptable cost for your current Angular application and it allows you to write new pages or parts in Vue in an existing AngularJS application.

Webpack: the shortest tutorial

So, you have some JavaScript that requires other JavaScript using require, and you want to pack all the files into one. Install webpack:

$ npm install webpack webpack-cli

These are my files (a main file with two dependencies):

$ cat main.js 

var libAdd = require('./libAdd.js');
var libMult = require('./libMult.js');

console.log('1+2x2=' + libAdd.calc(1, libMult.calc(2,2)));


$ cat libAdd.js 

exports.calc = (a,b) => { return a + b; };


$ cat libMult.js 

exports.calc = (a,b) => { return a * b; };

To pack this

$ ./node_modules/webpack-cli/bin/cli.js --mode=none main.js
Hash: 639616969f77db2f336a
Version: webpack 4.26.0
Time: 180ms
Built at: 11/21/2018 7:22:44 PM
  Asset      Size  Chunks             Chunk Names
main.js  3.93 KiB       0  [emitted]  main
Entrypoint main = main.js
[0] ./main.js 141 bytes {0} [built]
[1] ./libAdd.js 45 bytes {0} [built]
[2] ./libMult.js 45 bytes {0} [built]

and I have my bundle in dist/main.js. This bundle works just like original main:

$ node main.js 
1+2x2=5
$ node dist/main.js 
1+2x2=5

That is all I need to know about Webpack!

Background
I like the old way of building web application: including every script with a src-tag. However, occationally I want to use code I dont write myself, and more and more often it comes in a format that I can not easily just include it with a src-tag. Webpack is a/the way to make it “just” a JavaScript file that I can do what I want with.

Arrow functions in JavaScript: A strategy

Arrow functions have been a part of JavaScript since ES6. They are typically supported where you run JavaScript, except in Internet Explorer. To be clear, arrow functions are:

(a,b) => a+b

instead of

function(a,b) { return a+b }

I like to make things simple, and

  1. my code sometimes run on Internet Explorer
  2. arrow functions offers shorter and simplified syntax in some cases, but fundamentally you can write the same code with function
  3. I like to not have a build step (babel, webpack and friends) for a language that really does and should not need one

so, until now I have simply avoided them (and kind of banned them, along with other ES6 features) in code and software I am responsible for.

However

  1. arrow functions (as part of ES6) are here to stay
  2. they offer some advantages
  3. Internet Explorer will go away.

so, it makes sense to have a strategy for when to use arrow functions.

What I find on the Internet
The Internet is full of sources telling you how you can use arrow functions, how to write them, what are the pros, cons and pitfalls, and what they cannot do.

  • The key difference is how arrow functions work with this.
  • The syntax is shorter especially for single argument (needs no parenthesis), single statement (needs no return), functions.
  • Arrow functions don’t work well with Object oriented things (as constructors and prototype function)

In short, there are some cases where you can’t use arrow functions, some cases where they offer some real advantages, but in most cases it makes little real difference.

Arrow functions allow you to chain sort().filter().map() in very compact ways. With simple single statement arrow functions it is quite nice. But if the arrow functions become multiple lines I think it is poor programming.

What I don’t really find on the Internet
I don’t really find good advice on when to use arrow functions and when not to use arrow functions. I mean, when I program, I make decisions all the time:

  • Should I break this code out into a function?
  • Should this be an object (prototype style) or just data?
  • Should I break this code into its own module?
  • Should I write tests for this?
  • Should I allow a simple, slower algorithm, or should I add effort and complexity to write my code faster?
  • What should be the scope of these variables?
  • Should this be a parameter or can it be hard coded?
  • Can I make good use of map/reduce/every and friends, or is it better I just use a loop?
  • Naming everything…
  • …and so on…

Using, or not using, an arrow function is also a choice. How do I make that choice to ensure my code is good? I don’t really find very clear guidelines or style guides on this.

Lambda functions in other languages
Other languages have lambda functions. Those are special case anonymous functions. The thing I find peculiar about the use of arrow functions in JavaScript is that they are often used instead of function, when a standard function – not a lambda – would have been the obvious choice in other languages.

Intention
For practical purposes most often function and () => {} are interchangeable. And I guess you can write any JavaScript program using only arrow functions.

When you write code, it mostly does not matter what you use.
When you read code, it comes down to understanding the intention of the writer.

So I think good use of arrow functions is a way that makes the intention of the code as clear as possible. I want clear and consistent guidelines.

Using arrow functions in well defined cases shows more intention and contributes to more clear code than never using them.

I tend to read arrow functions as being a strong marker for functional programming. I find it confusing and when arrow functions are used in code that breaks other good core principles of functional programming.

The strongest cases
The strongest cases for arrow functions I can see:

Minimal syntax (no () or {} required), and never worth breaking such function out.

names = stuffs.map(stuff => stuff.name);

Callback: the arguments (error, data) are already given by openFile and the callback function cannot have a meaningful this. Also, for most practical purposes, the callback needs to use closure to access data in the parent scope, so it can not be a named function declared elsewhere.

openFile('myFile', (error, data) => {
  ... implementation
});

When it makes little difference
For a regular function it makes no difference:

const swapNames = (a,b) => {
  let tmp = a.name;
  a.name = b.name;
  b.name = tmp;
}

The function alternative would be:

function swapNames(a,b) {

and is actually shorter. However, I can appreciate with arrows that it is completely clear from the beginning that a binding of this can never happen, that it can not be used as a constructor and that there can be no hidden arguments (accessed via arguments).

Confused with comparison
There are cases when arrow functions can be confused with comparison.

// The intent is not clear
var x = a => 1 ? 2 : 3;
// Did the author mean this
var x = function (a) { return 1 ? 2 : 3 };
// Or this
var x = a <= 1 ? 2 : 3;

Obfuscate with higher order functions
Higher order functions (map, reduce, filter, sort) are nice and can improve your code. But, carelessly used they can be confusing and obfuscating.

These are not the fault of () => {} in itself. But it is a consequence of making higher order functions with arrow functions too popular.

I have seen for example (things like):

myArray.map(x => x.print())

map() should not have a side effect. It is outright obfuscating to feed a function that has a side effect into map(). And side effects have nothing to do with functional programming in the first place.

I have also seen reduce() and filter() being used when every(), some() or find() would have been the right choice. It is obfuscating, it is expensive, and it produces more code than necessary.

The use of arrow functions with higher order functions is only appropriate when the correct higher order function is used.

The abusive cases
Anonymous functions that are non-trivial and could clearly be named and reused (and testable) is clearly bad code:

myStuff.sort((a,b) => {
  if ( a.name < b.name ) return -1;
  if ( a.name > b.name ) return  1;
  if ( a.id   < b.id   ) return -1;
  if ( a.id   > b.id   ) return  1;
  return 0;
});

especially when the code is duplicated or the parent function is large.

An arrow-friendly policy
Admittedly, after doing my research I feel happier with arrow functions than I thought I would.

I suggest (as long as your runtime supports it) to use arrow functions as the default function. The reason for this is that they do less. I think the standard behavior of arguments, this and of OOP-concepts (prototype and constructors) should be optional and require explicit use (of function).

Just as one-line if-statements and if-statements without {} should be used carefully (I tend to abuse it myself) I think the same applies to arrow functions.

I think this is excellent:

names = stuffs.map(stuff => stuff.name);

but apart from those common simple cases I think think the full syntax should be used for clarity:

const compareItems (a,b) => {
  if ( a.name < b.name ) return -1;
  if ( a.name > b.name ) return  1;
  if ( a.id   < b.id   ) return -1;
  if ( a.id   > b.id   ) return  1;
  return 0;
};

(dont try to be clever by omitting (), {}, or return).

The use of function should be reserved for

  • constructors
  • prototype functions
  • functions that need the standard behavior of this
  • functions that do things with arguments
  • source files where function is used exclusively since before

Basic good functional programming practices should be especially respected when using arrow functions:

  • Dont duplicate code: break out anonymous functions to named functions when appropriate
  • Dont write long functions: break out anonymous functions to named functions when appropriate
  • Avoid side effects and global variables
  • Use the correct higher order function for the job

Also, obviously, take advantage of OOP and function when appropriate!

Callback functions
I think anonymous callback functions should generally be kept short.

const doStuff = () => {
  readFile('myFile', (error, data) => {
    if ( error )
      console.log('readFile failed: ' + e);
    else
      doStuffWithData(data);
  });
};

const doStuffWithData = (data) => {
  ...
};

Performance
In principle, I see no reason why arrow functions should not be at least as fast as regular function. In practice, the current state of JavaScript engines could be disappointing - I don't know.

However, a named static function is typically faster than an anonymous inline function. The JIT typically can optimize a function the more it is run so named and reusable functions are preferred.

I have made no benchmarks on arrow functions.

Feedback
I will start using arrow functions when I write new code and I feel enthusiastic about it. I will probably come across things I have not thought about. Do you have any thoughts on this? Let me know!

Want to be a programmer! Where to start?

Quite often I hear (read) someone who wants to become a programmer and asks where to start. Often, not always, they ask what programming language they should learn first. Sometimes they have decided for a language and they ask what operating system, tools and perhaps online services they should use. Sometimes the understanding of programming in particular and computers in general is vague.

The fascinating thing is that such questions can receive very different answers. Different working programmers have completely different ideas on how to become a programmer. Completely.

The most important thing
If you find a way to work with computers and code that keeps you entertained and thrilled, and you spend hours and days feeling curious and enthusiastic, this is a good way for you to learn! A way of learning that works perfectly for someone else, but does not make you enthusiastic at all, will probably not work well for you. Hard work and difficult things are a lot easier if it is fun and it makes sense to you!

A reading advice
When you read the rest of this text don’t stop if there is a word you don’t understand! For being a text written for beginners the text is full of words (interpreter, service, syntax and so on) that you may not be familiar with – at least not in this context. Ignore it and just read on. You can later find the meaning (in the context of computers and programming) of those words on wikipedia.

The programming ecosystem
Lets say that programming is the act of making computers do stuff for people.

There is a stack of expertise involved in delivering a service or a product:

  1. Computer science: data structures, algorithms, information theory
  2. Coding: reading and writing code, thinking like a computer, getting it right
  3. Programming language: syntax, keywords and tools specific to a programming language
  4. Libraries and frameworks: code you can reuse to do more with writing less code
  5. The Internet: networking, protocols, formats, security, how it all works
  6. Development environment: your computer, its OS, and the tools you use to code
  7. Production environment: where your code runs, if it is some kind of service
  8. Deployment, test, lifecycle: how to continuously release new versions
  9. Data modelling: how to turn real world information into processable computer data
  10. Requirement analysis: understanding your customer and the market
  11. Team work: different people have different skill sets and work together

Obviously you are not going to have the same high expertise in each of the above areas. Perhaps you have a lot of passion for some things while you are completely uninterested in other things. That is fine.

There is a bit of a catch 22 here. When you already have knowledge you can get involved in a project or company, and work with just a few (or all) of the things above. But when you are a beginner, all those fields of knowledge are quite abstract and useless on their own. So to produce anything that is fun or slightly meaningful you want to work with the entire list, which is obviously kind of impossible (as a beginner). So, it helps to be persistent, to like reading and details, and have quite low expectations on what is fun and meaningful!

Programming is enormously rewarding for the brain. You set out to create something, you work on it, and it works. You get dopamine! You need to find a way to work and learn so you get rewarded often. It depends on your grit, but you should usually feel rewarded and experience success several times per day, both when learning and working.

So when learning to code, you need to find small contained projects that are simple and interesting enough to allow you to succeed and feel successful.

A common advice from programmers is often to try different programming languages. I am not so sure about it. I think what is also very important is to iterate often and fast from idea to “product”. With time, ideas will be bigger and more complex. To do that, it makes sense to master a language, the tools and the ecosystem, rather than just learning more of them.

I will discuss a few platforms from a beginners perspective.

Arduino
You can buy an Arduino start kit. It comes with everything you need (except a computer, Mac, Windows, Linux does not matter). It comes with a book with projects that take a few hours to complete. No previous knowledge is required, the Arduino is designed for non programmers (children, artists) to create stuff. When you have completed the projects you can modify them and experiment. When you do this you will learn to write the code needed to achieve what you want.

The Arduino is a very self-contained ecosystem where you can iterate quickly. The code you will write is very basic C-code (actually C++). But you don’t need to know that or think about it.

Later when you want to write other code, not just for Arduino, most everything you have learnt on the Arduino is useful. But more complex ecosystems have many more aspects to consider.

Hackerrank.com
There are many such sites, but Hackerrank is the one I have experience with.

Hackerrank offers a wide range of “problems” to solve online in (almost) any programming language you want. It is free, requires nothing to be installed or configured on your computer, and you get (for training purposes) relevant, well defined problems and a contained environment to work with them.

Hackerrank is great to learn new languages, datastructures and algorithms. You will need a reference or language tutorial elsewhere (but for relevant languages you can find it online). There are things you will not learn on Hackerrank: how to configure your own system, more advanced tools, code that interacts with the user, filesystem or network, and error handling. But it is quite fine to master a language and algorithms first.

iOS
I have no experience with iOS (or macOS) development. But if you have a Mac, an iOS device (iPhone or iPad) and you get a beginners book, you have everything you need to make real iOS apps that you can sell for money.

Apple also have a Swift Playground app for iPad (Swift is the preferred programming language of iOS).

It seems like a good idea to me to learn to iterate from idea to working App in such a contained (and, for good and bad: walled, protected, restricted and designed) ecosystem.

Swift may not be the most useful language outside the Apple world. But it is a modern language that have everything in common with other common languages (such as Java, C#, Rust, Python).

Automation with the shell
If your objective is to automate server configuration/operation/maintenance look at bash for Linux and Powershell for Windows. Don’t expect to become a “real” programmer, but it is the way to get your problems solved. Be sure to be aware of the commands/utilities available in your environment (use sort/grep instead of implementing similar functionality in bash).

Python
Python is a very good language to learn. It is a simple, clean, well documented, widely used language that works equally well in macOS, Linux and Windows.

Python is suitable for simple and advanced mathematical applications and simulations. It is suitable for parsing, processing and outputting data and to interface with databases: automation and integration.

Web
The web is generally a difficult ecosystem for a beginner. The problem is that many things come into play. Lets say we want to write a simple shopping list. Typically you need to deal with a database to store data, backend code for APIs (with authentication/security), http for transporting data and html+css+javascript for the applications itself. Also, you need to think about hosting and domain registration. You end up with several programming languages (for example SQL, PHP and JavaScript) even for a simple application. Not only is the web browser (http+html+css+javascript) a quite cumbersome programming environment, you also need to consider different web browsers.

Nevertheless, the web is probably the most relevant ecosystem to develop applications for! But perhaps you should not learn programming by coding for the web.

Web: WordPress
If you need to deliver websites in the form of a blog (like a little newspaper) or perhaps a little webshop, WordPress can be amazing!

Note that WordPress is based on LAMP (Linux, Apache, MySQL and PHP) which is a rather complex mess. But if you can ignore that (find a hosted solution, or just follow instructions without thinking and questioning too much) WordPress can be very productive. You will learn PHP and JavaScript as you need to do more advanced things. These are perhaps the worst two languages out there for the purpose of learning programming, but perhaps the most productive languages when it comes to delivering content and features.

Web: Node.js
You can build web applications with Node.js. The advantage is that you can use JavaScript both on the server and in the web browser and keep your toolbox smaller. However, it is very possible to grow your toolbox enormously with npm (the package manager for Node.js). I don’t think Node.js-based web applications, or JavaScript, are suitable for beginners. But if you are a beginner and you want to program web applications, it is probably your best choice.

Desktop
Perhaps 20 years ago, programming was much about building desktop applications (programs with a graphic user interface running in Windows, macOS or Linux). This is, I would say, quite a niche field in programming nowadays (more commonly, programmers develop applications for iOS/Android, for the web, or server code for internal use).

Games are obviously a significant part of Desktop programs.

Desktop is quite complex and qualified programming. If you want to do it for macOS only, get a Mac with Xcode and get a beginners book. If you want to write platform independent desktop applications (Linux + macOS + Windows) have a look at QT (which is, kind of, C++, and very nice). For Windows only, ask someone else.

If this is what you want to do, look att Hackerrank above, and stick to C, Swift (for macOS) or C# (for Windows), to first learn the fundamentals of programming. When you know more, go on experimenting with the desktop.

Android
I don’t know what is the best way for a beginner to program for Android. I would say, start coding for iOS to learn “mobile” (and you reach an equally big audience/market with iOS). When you are a proficient iOS developer, I think picking up Android is no big deal.

Very simple games
If you want to develop very simple (retro) games, have a look at PICO-8. It is a (non-free) programming environment for building simple games for a virtual game console. These games can be deployed to and played in a web browser or most computers.

The language is Lua – a very simple language that is useful for other purpose

Deep knowledge – computers and systems
If you want to understand computers, operating systems, security and the internet: learn C (not C++, not C#, not Objective-C, just C). To learn, I suggest you get some tutorial (like the book: Learn C the hard way). Make sure to know C99 (C is standardised – learn that and use it consistently)! I suggest you start with exercises or problems on Hackerrank (or a similar site or tutorial) until you get rather comfortable writing C.

All major operating systems are written in C, as are a lot of the infrastructure that operates the internet. C is “unsafe” and the cause for many security issues in computer systems. This means that to understand the nature of these problems it really helps to know C. Many other languages (or strictly, their runtime/interpreter) are themselves written in C. They need to be, to talk to the operating system (which they need to do most anything). C is not going away and it is fundamental to most every computer we see.

C++ is technically a superset of C (that is C with more features – and a few exceptions). So it can appear C++ is better. But it is two languages with very different “style”. You should solve problems very differently in them. C++ has merits of its own, but for the purpose of deep understanding of computers and operating systems, go for C. C# is a language that mostly resembles Java. Objective-C is also technically a superset of C, but it is a rarely used language that you most likely can ignore.

To go even deeper you can learn Assembly language. Most likely it makes no sense for you to do it. At least not in the beginning of your learning.

Deep knowledge – math and computer science
If you are fascinated with math and you like an academic approach to things you can look into functional programming. This is where programming gets beautiful – if you have sensitivity for that kind of aesthetics. But it is not where you solve most practical problems.

Haskell is for purists. LISP mixes pragmatism with myth. But many modern programming languages (for example Java, Swift, Rust, JavaScript, C++, C#) incorporates practical aspects of functional programming.

LISP (Common LISP to be precise) has very capable built in support for math (fractions, complex numbers, arbitrary large numbers). If you are a mathematician you may find most other languages unsatisfying.

While C more than anything else focuses on making a computer do exactly WHAT you instruct (program) it to do, functional languages are more like programming with mathematical definitions (functions).

Conclusions
When you know programming in general, you understand how the internet and a computer works, you are familiar with established standards and you know a few programming languages, it is pretty easy to learn new languages and tools.

So what language you learn first matters not so much. What matters is that you learn to go from idea to product, and that you know how to do things properly (write clean, efficient, effective, secure and correct code).

To do that, you more than anything else need to work with things that you find challenging, interesting and fun.

Programming is so much more than programming languages: it is about attention to details, understanding the real world, understanding people, making beautiful things, keeping things simple and trying often and failing fast.

Minimalistic Services and Applications

Question: There are plenty of documentation, patterns, architectures and practices for scaling up your cloud Services and Applications solution. But how do I scale it down?

In 2015 I set up a minimalistic architecture for delivering Services and Web Applications. It was based on 15 years of experience (not only positive) on constructing and operating applications, services, servers and integrations. Now in 2018 I can say that my architecture is doing very fine. I have continously been delivering business value for 3 years. I will share some principles and technical details.

Limitations and reservations
Not all solutions are good for everyone. Neither is mine. If you know that you want worldwide internet scalability my architecture is not for you. But often you are building applications that are internal for your organisation. Or you have a local/regional physical business that you need to support with services and applications. Then you know that there is a practical upper limit of users that is not very high.

While modern cloud services are supposed to scale more or less unlimited this does not come for free. It comes with complexity and drawbacks that you may not want to pay for, since you are anyway not aiming for those gigantic volumes of users and data.

The architecture I am presenting is designed both to perform and to scale. But within limits. Know your limits.

Microservices
Microservices is about many things. I have made a practical interpretation.

My… delivery platform… consists of microservices that communicate with each other. They should have limited responsibilities and they should not grow too big. Each service should store its own data, really. Two different services should share data via their (public) APIs, never by using a shared storage.

I ended up with a separate Authentication Service (knowing about users and credentials) and Roles Service (knowing about roles/priviliges granted to a user). In hindsight perhaps this could, or should, have been just one service. On the other hand, if I want to store something like personal Settings/Preferences for each user, perhaps it is good that it does not go to a common single User service that grows more complex than necessary.

As you may know, there is another Microservice principle about each service being able to run in multiple instances and that (via Event Sourcing and CQRS) state is not immediately consistent, but eventually consistent. I normally outright break this principle saying that a single service has a single instance holding the single truth. I feel ok doing this since I know that each service is not too big, and can be optimized/rewritten if needed. I also feel ok doing this because I know I save a lot of complexity and my approach opens up for some nice optimizations (see below).

It is all about HTTP APIs
My microservices talk to each other over HTTP in the simplest possible way. The important thing is that your web applications, native (mobile) applications, external partners and IoT-devices use the same APIs.

I want it to be trivial to connect to a service using wget/curl, an Arduino, or any left behind environments my clients may be using. I also want any server platform to be capable of exposing APIs in a conforming way.

What I basically allow is:

http://host:port/ServiceName/Target/Action?token={token}&...your own parameters

Your service needs to have a name and it is in the URL. Target is something like Order or Customer. Action is something like Update or Cancel. token is something you need to obtain from the Authentication Service before making any calls. You can have extra parameters, but for more data it is preferable to POST a JSON object.

I dont want any extra headers (for authentication, cookies or whatever) but I respect Content-Type and it should be correct. Absolutely no non-standard or proprietary headers.

I only use GET and POST. It just doesn’t get clear and obvious enough if you try to be smart with PUT and DELETE.

For things like encryption (HTTPS) and compression (gz) I rely on nginx.

Reference Implementation
The above principles constitute the Architecture of a number of services together making up a virtual application and service platform. As you can see you can build this with almost any technology stack that you want. That is the entire point!

  • You may want to make API calls from known and unknown devices and systems in the future
  • You may want some legacy system to be part of this virtual delivery platform
  • You may want to build some specific service with some very specific technology (like a .NET service talking to your Active Directory)
  • You may find a better technology choice in the future and migrate some of your services from current technology

But for most purposes you can build most services and applications using a few, simple, free and powerful tools. More important than the tools themselves are established standards (HTTP, HTML, JavaScript and CSS) and principles about simplicity and minimalism.

JSON and JavaScript
After working for years with integrations, web services (SOAP), XML, SQL databases and .NET I can say that the following type of technology stack i common:

  1. Web application is written in JavaScript, works with JSON
  2. Web application communicates with server using XML
  3. Server processes data using .NET/C#
  4. Data is persisted using SQL queries and a relational database

This means that a single business object (such as an Order) has data representations in SQL, C#, XML and JSON. This means that you have several mappings or transitions in both ways. You can also not reuse business logic written in SQL, C# or JavaScript in another layer of your application.

With Node.js you have the opportunity to do:

  1. Web application is written in JavaScript, works with JSON
  2. Web application communiates with server using JSON
  3. Server processes data using JavaScript and JSON
  4. Data is persisted in JSON format (either in files or a database like MongoDB)

This is simply superior. A lot of problems just disappear. You can argue about Java vs C#, HTTP1 vs HTTP2, Angular vs React and things like that. But you just cant argue about this (the fundamental advantage a pure JS stack gives you – not because JavaScript is a superior language but because it is the defacto language of the web).

So my reference platform is based on Node.js and I store my data in JSON.

Binary formats
Binary formats have their advantages. It is about power and efficiency. But rarely the differences are significant. Base64-encoding is 33% more expensive than the original binary. Compiled languages are somewhat faster and use less memory. But humans can’t read binary. The compilation (or transpilation) into binary (or machine generated code) is not only an extra step requiring extra tools. It also creates a longer distance between the programmer and source code on one hand and the execution and its error messages on the other hand. Source maps are a remedy to a problem that can be avoided altogether.

I was once responsible for a .NET solution (with Reporting Services) that was so hard to change and deploy that we eventually refused to even try. I realised that if the system had been coded in the worst imaginable PHP I could have made a copy of the source (in production), modified the system (in production) and restored the system if my changes were not good.

Similar problems can appear with databases. Yes, you can make a backup and restore a database. But how confident do you feel that you can just restore the database and the system will be happy? What is IN the database backup/restore, and what is configuration outside that database that might not be trivial or obvious to restore (access rights, collation settings, indices and stored procedures, id-counters, logging settings and so no).

So my reference platform minimises the use of binary formats, build steps and databases. I code plain JavaScript and i prefarably store data in regular files. Obvously I use native file formats for things like images and fonts.

Storage
Some appliations have more live data than others. I have rarely come across very large amounts of transaction or record data. I have very often come across applications with little data (less than 100Mb) and a truly complex relational database. I have also seen relational databases with not too much data (as in 1GB) with severe performance problems.

So, before architecting your solution for 10-100GB+ data, ask yourself if it can happen. And perhaps, if it eventually happens it is better to deal with it then?

Before constucting a relational datamodel with SQL ask yourself if it is really worth it.

Since we are using a micrsoservice strategy and since services share data via their APIs two things happen:

  1. Most services might get away with very little (or no) data at all (while some have much data)
  2. A service that later turns out to need to deal with more data than it was first built for can be refactored/rebuilt without affecting the other services

So I suggest, if in doubt, start small. What I do (somewhat simplified) is:

  1. Start up Node.js service
  2. Load data from local files into RAM
  3. All RO-access is RAM only
  4. When data is updated, I write back to file within 10s (typically all of it every time, but I keep different kinds of data in different files).
  5. Flush data before shutting down

This has an advantage which is not obvious. JavaScript is single-threaded (but Node.js has more threads) so a single request is guaranteed to finish completely before the next request starts (unless you make some async callback for waiting or I/O). This means that you have no transaction issues to deal with – for free – which significantly simplifies a lot of your request handling code and error handling.

Another advantage is that RAM is extremely fast. It will often be faster and cheaper to “just access all the data in RAM” than to fetch a subset of the data from a database and process it.

This may sound like “reinventing the wheel”. But the truth is that the above 1-5 are very few lines of quite simple code. You can use functions like map(), reduce() and filter() directly on your data without fetching it (async) first. That will save you lines of code.

Again, this may not work for all your services for all future, but it is surprisingly easy and efficient.

Code, Storage, Configuration and installation
When I check out my (single) git repostory I get something like:

packages/                         -- all my source code and dependencies
tools/                            -- scripts to control my platform,
                                     and a few other things

I then copy the environment template file and run install (to make node_modules from packages):

$ cp tools/env-template.json dev.json
$ ./tools/install.sh

This config file can be edited to replace “localhost” with something better and decide what services should run on this machine (here, in this library) and where other services run if I use different machines. Now I start the system, and now I have:

$ node tools/run dev.json ALL     -- use dev.json, start ALL services

dev.data/                         -- all data/state
dev.json                          -- all environment configuration
node_modules/
packages/                         -- all code
tools/

I can now browse that services on localhost:8080, but to login I need to create an Admin user using a script in tools (that just calls an API function) before logging in.

Notice how easy it is to start a new environment. There are no dependencies outside packages. You may create a dev-2.json which will then live in dev-2.data side by side with dev. To backup your state you can simply backup dev.data and move it to any other machine.

Lets have a look at dev.data (the files for one service):

Authentication.localstorage/     -- all data for one service
Authentication.log/              -- a log file for one service (kept short)

In packages you find:

common/                          -- JavaScript packages that can be used
                                    on Node as well as web
node/                            -- Node-only-packages
services/                        -- Node-packages containing services
web/                             -- JavaScript packages that can be used
                                    on the web only

You shall include tests on different levels (unit, integration) in a way that suits you. The above is somewhat simplified, but on the other hand in hindsight I would have preferred some things to be simpler than I actually implemented them.

Notice that there are no build scripts and no packaging required. All node code is executed in place and web applications load and execute files directly from packages/.

Serving files, input validation, proxy and nginx
Node.js is very capable of serving files (and APIs) just as it is. I have written a custom Node.js package that services use to handle HTTP requests. It does:

  • Validation of URL (that URLs conform to my standards)
  • Authentication/authorization
  • Is it a file or an API call
  • Files: serve index.js from common/ and web/, and www/ with all contents from all packages
  • APIs: validate target, action (so it exists), validate all URL-parameters (dates, numbers, mandatory input, and so on)

This may seem odd but there are a few good reasons for doing exactly this.

  1. Service APIs and policies are metadata-driven
  2. Consistent good logging and error messages
  3. Consistent authorization and 401 for everything questionable (both for files and APIs)
  4. The same service serves both API and www-files which eliminates all need to deal with cross-site issues (which is something of the least value-adding activity imaginable)
  5. Consistent input validation (if there is anything I don’t trust people get right every time they write a new service this is it)

You can probably do this on top of Express, or with Express, if you prefer not to use Node.js standard functionality.

At this point, each service listens at localhost:12345 (different ports) so you need a proxy (nginx) that listens to 80 and forwards to each service (remember the service name is always in the URL).

I prefer each service to handle all its API calls. Quite often it just forwards them to another service to do the actual job (lets say a user action of the Order service should create an entry in the Log service: the Order web UI calls Order/log/logline, which in turn calls the Log service). This can be very easily achieved: after authentication/authorization you just send the request through (standard Node.js does this easily).

Dependencies
The web has more npm packages than anyone can possibly want. Use them when you need (if you want, read Generic vs Specific Code, Lodash and Underscore Sucks, …).

My biggest fear (really) is to one day check out the source code on a new machine and not being able to install dependencies, build it, test it, run it and deploy it. So I think you should get rid of dependecies and build, and rather focus on testing, running and deployment.

I think, when you include a dependency, place it in packages/ and push it to your repository. Then you are in control of updating the dependency when it suits you. New dev/test/prod machines will get your proven and tested versions from packages/, regardless what the author did to the package.

This approach has both advantages and disadvantages. It is more predictable than the alternatives and I like that more than anything else.

Error handling
I take error handling seriously. Things can get strange in JavaScript. You should take the differences between numbers and strings, objects and arrays seriously (thats why you should not use Lodash/Underscore). There are no enums to safely use with switch-statements. I often add throw new Error(…) to code paths that should not happen or when data is not what I expect.

On the (Node.js) server I don’t have a big try-catch around everything to make sure the server does not crash. I also don’t restart services automatically when they fail. I write out a stack-trace to and let the server exit. This way I always work with a consistent, correct state. Critical errors need to be fixed, not ignored. This is the Toyota way – everyone has a red button to stop production if they see anything fishy. In effect my production system is among the most stable systems I have ever operated.

Validation, models and objects
Data validation is important. Mostly, the server needs to validate all data sent to it. But a good UX requires continous validation of input as well.

I put effort into defining models (basically a class in an OO language). But since my data objects are regularly sent over the network of fetched from disk I don’t want to rely on prototypes and member functions. I call each object type a model, and early on I write a quite ambitious validation-function for each model.

Sharing code between Node.js, web (and AngularJS)
I want my code (when relevant) be be usable on both Node.js and the Web. The Web used to mean AngularJS but I have started not using it.

This is what I do:

 /*
  * myPackage : does something
  *
  * depends on myUtil.
  */
(function() {
  'use strict';

  function myFactory(myUtil) {

    function doSomething(str) {
      ...
    }

    return {
      doSomething : doSomething
    };
  }

  if ('undefined' !== typeof angular) { // angular
    angular.module('mainApplication').factory('myPackage',
                  ['myUtil',
          function( myUtil ) {
      return myFactory(myUtil);
    }]);
  } else if ( 'undefined' !== typeof MYORG ) { // general web
    MYORG.myPackage = myFactory(MYORG.util);
  } else if ( 'undefined' === typeof window ) { // nodejs (probably)
    module.exports = myFactory( require('common/util') );
  } else {
    throw new Error('Neither angular, node or general web');
  }
})();

This way exactly the same source code can be used both on the web and in Node.js. It requires no build step. The “general web” approach relies on a global object (call it what you want) and you may prefer to do something else. You just need to make sure you can serve common/util/index.js and common/mypackage/index.js to the web.

Scaling and cloud technology
For a simple development system, perhaps for a test system or even for a production system, everything can live in a single folder. If you need more power or separation you can put each service in a Docker container. You can also run different (groups of) services as different users on different machines.

So, the minimalistic architecture easily scales to one service per machine. In practice you can run a heavy service on a single machine with 16GB RAM (or more) which will allow for quite much RW-data. 16GB or more RAM is quite cheap compared to everything else.

Scaling and more data storage
There are many other possible strategies for a service that needs more storage than easily fits in RAM (or can be justified in RAM).

Some services (like a log) is almost exclusively in Write mode. You can keep just the last day (or hour) in RAM and just add new files for every day. It is still quite easy and fast to query several days or logs when needed.

Some services (like a customer statistics portal) has mostly RO-data that is not regularly accessed, and that lives in “islands”. Then you can have (load from other systems) a JSON-file for each customer. When the customer logs in you load that file to memory and later you can just recover that memory. Such a service can also be divided into several services: 1 main RW, 1 RO (A-L), 1 RO (M-Z).

Some services will do expensive processing or perhaps expensive communication/integration with other systems. Such processing or integration can be outsourced to a dedicated service, freeing up resources in the main service. If you for example generate a PDF, make sure you do it in a process outside Node.js.

In the same way a service can offload storage to another service (which could possibly be a MongoDB).

Web files (html, css, images, js) can be cached by nginx (if you accept to serve them without authentication) and served virtually for free even if your service has full control.

Things like logging can also be outsourced to a dedicated and enterprise class logging software. Nevertheless, it is good to have a simple reference Node.js logging service that can be used for development purposes locally.

Finally, GDPR indicates that you should throw away data. You can also move data from a live system to a BI-system or some Big Data tool. Perhaps your architecture does not need to support data growth for 10+ years – perhaps it is better it does not.

Scaling – conclusion
These scaling strategies may not sound too convincing. But the truth is that building your entire system in a single very powerful monolith is probably going to be less scalable. And building everything super scalable from the beginning is not easy or cheap (but if thats what you really need to do, go ahead).

Integration testing
Notice how integration testing can be achieved locally, automated, with virtually no side effects:

  1. Generate a integration-env.json
  2. Start up services (as usual)
  3. Run tests to inject data into the services (throw standard APIs)
  4. Run tests to read data, query data
  5. Shut down services
  6. Remove integration-env.json and integration-env.data/

Source control and repositories
For now, I have all code in a single git repository. It would be easy to use multiple repositories if that simplifies things (when developing multiple independent services at the same time). Linux is in a single git repository so I think my services and applications can be too.

Tooling
All developers prefer different tools and I think this should be respected. I also think coding style does not need to be completely consistent across services (although single files should be kept consistent).

But just as developers should be allowed their own tools, the artifacts of those tools should not make the repository dirty. And the next developer should not need to use the same tools as the previous to be able to keep working on the code.

Web frameworks
If I mastered direct DOM-manipulation I would probably suggest that you should too (and not use any web frameworks). However I have been productive using AngularJS (v1) for years. Since AngularJS is inevitably getting old I have started using Vue.js instead (which I think is actually a better choice than Angular, however check my post about loading vue templates).

React is also a fine framework but it requires a build process. For my minimalistic approach that is a very high and unnessecary addition of complexity. I don’t see any indications that React is fundamentally more productive or competent than Vue.js so I think you are fine with Vue.js (or jquery or Vanilla.js if you prefer).

Performance
I have, to be honest, not had the opportunity to add very many simultaneous users to my system. On the other hand I have used it for rather mission critical services for 3 years with very few issues. So this architecture has served me well – it may or may not serve you well.

My production environment consists of a single VPS with 1 core, 2GB RAM and 20GB storage. Performance is excellent and system load minimal.

Missing Details
Obviously there are a lot of details left out in this post. You dont have to do things exactly the way I did it. I just want to outline an architecture based on minimalistic principles. The details of users, authentication, logging and naming conventions are of course up to you to decide.

Feel free to ask though! I am open to discuss.

Conclusion and final words
I wrote this post quickly and I will probably add more content in the future (and correct/clarify things that could be improved).