Category Archives: Programming

Vue.js: loading template html files

You may want to code your Vue.js application in such way that your html templates are in separate html files, but you still do not want a build/compile step. Well, the people writing Vue dont want you do do this, but it can easily be done.

VueWithHtmlLoader-library
I wrote a little library that simply does what is required in a rather simple way. I will not hold you back and I will show you by example immediately:

  • A Rock-paper-scissors Vue-app, all in 1 file: link
  • A Rock-paper-scissors Vue-app, modularised with separate html/js files: link
  • Source of VueWithHtmlLoader library: link

These are the code changes needed to use VueWithHtmlLoader:

 * 1) After including "vue.js", and
 *    before including your component javascript files,
 *    include "vuewithhtmlloader.js"
 *
 * 2) In your component javascript files
 *    replace: Vue.component(
 *       with: VueWithHtmlLoader.component(
 *
 *    replace: template: '...'
 *       with: templateurl: 'component-template.html' (replace with your url)
 *
 * 3) The call to "new Vue()" needs to be delayed, like:
 *    replace: var myVue = new Vue(...);
 *       with: var myVue;          
 *             function initVue() {
 *               myVue = new Vue(...);
 *             }
 *             VueWithHtmlLoader.done(initVue);

My intention that the very simple Rock-paper-scissors-app shall work as an example.

Disclaimer: the library is just written and tested only with this application. The application is written primarily to demonstrate the library. The focus has been clarity and simplicity. Please feel free to suggest improvements to the library or the application, but keep in mind that it was never my intention to follow all best practices. The purpose of the library is to break a Vue best practice.

What the library does:

  1. It creates a global object: VueWithHtmlLoader
  2. It provides a function: VueWithHtmlLoader.component() that you shall use instead of Vue.component() (there may be unsupported/untested cases)
  3. When using VueWithHtmlLoader.component(), you can provide templateurl:’mytemplate.html’ instead of template:’whatever Vue normally supports’
  4. The Vue()-constructor must be called after all templateurls have been downloaded. To facilitate this, place the code that calls new Vue() inside a function, and pass that function to VueWithHtmlLoader.done()
  5. The library will now load all templateurls. When an html template is successfully downloaded over the network Vue.component() is called normally.
  6. When all components are initiated, new Vue() is called via the provided function

Apart from this, you can and should use the global Vue object normally for all other purposes. There may be more things that you want to happen after new Vue() has been called.

The library has no dependencies (it uses XMLHttpRequest directly).

Background
Obviously there are people (like me) with an AngularJS (that is v1) background who are used to ng-include and like it. We see Vue as a better, smaller AngularJS for the future, but we want to keep our templates in separate files without a build step.

As I see it, there are different sizes of applications (and sizes of team and support around them).

  1. Small single-file applications: I think it is great that Vue supports simple single-file applications (with x-template if you want), implemented like my game above. This has a niche!
  2. Applications that clearly require modularization, but optimizing loading times is not an issue, and you want to use the the simplest tools available (keep html/js separate to allow standard editor support and not require a build step). AngularJS (v1) did this nicely. I intend Vue to do it nicely too with this library.
  3. Applications built by people or organizations that already use Webpack and such tools, or applications that are so demanding that such tools are required.

I fully respect and understand the Vue project does not want to support case 2 out of the box and that they prefer to keep the Vue framework small (and as fast as possible).

But i sense some kind of arrogance with articles like 7 Ways To Define A Component Template in Vue.js. I mean 1,2 are only useful for very small components. 3 is only useful for minimal applications that dont require modularization. 4 has very narrow use cases. 5 is insane for normal development (however, I can see cases where you want to output/generate it). And 6,7 requires a build step.

8. Put the damn HTML in an HTML-file and include it? Nowhere to be seen.

The official objection to 8 is obviously performance. I understand that pre-compiling your html instead of serving html that the client will compile is faster. But compared to everything else this overhead may be negligable. And that is what performance is all about, focusing on what is critical and keeping everything else simple. My experience is that loading data to my applications take much more time than loading the application itself.

The Illusion of Simplicity
AngularJS (v1) gave the illusion of simplicity. You just wrote JavaScript-files and (almost) HTML-files, the browser loaded everything and it just worked. I know this is just an illusion and a lot happens behind the scenes. But my experience is that this illusion works well, and it does not leak too much. Vue.js is so much simpler than AngularJS in so many ways. I think my library can keep my illusion alive.

Other options
There is thread on Stackoverflow about this and there are obviously other solutions. If you want to write .vue-files and load them there is already a library for that. For my solution I was inspired by the simple jquery example, but: 1) it is nice to not have a jquery dependency, 2) it is nice to keep the async stuff in one place, 3) the delayed call of new Vue() seems forgotten.

Feedback, limitations, bugs…
If you have suggestions for improvements or fixes of my library, please let me know! I am happy to make it better and I intend to use it for real applications.

I think this library suits some but not all (or even most) Vue.js applications. Lets not expect it to serve very complex requirements or applications that would actually benefit more of a Webpack treatment.

JavaScript: Sets, Objects and Arrays

JavaScript has a new (well well) fancy Set datastructure (that does not come with functions for union, intersection and the likes, but whatever). A little while ago I tested Binary Search (also not in the standard library) and I was quite impressed with the performance.

When I code JavaScript I often hesitate about using an Array or an Object. And I have not started using Set much.

I decided to make some tests. Lets say we have pseudo-random natural numbers (like 10000 of them). We then want to check if a number is among the 10000 numbers or not (if it is a member of the set). A JavaScript Set does exactly that. A JavaScript Object just requires you to do: set[314] = true and you are basically done (it gets converted to a string, though). For an Array you just push(314), sort the array, and then use binary search to see if the value is there.

Obviously, if you often add or remove value, (re)sorting the Array will be annoying and costly. But quite often this is not the case.

The test
My test consists of generating N=10000 random unique numbers (with distance 1 or 2 between them). I then insert them (in a kind of pseudo-random order) into an Array (and sorts it), into an Object, and into a Set. I measure this time as an initiation time (for each data structure).

I repeat. So now I have 2xArrays, 2xObjects, 2xSets.

This way I can test both iterating and searching with all combinations of data structures (and check that the results are the same and thus correct).

Output of a single run: 100 iterations, N=10000, on a Linux Intel i5 and Node.js 8.9.1 looks like this:

                         ====== Search Structure ======
(ms)                        Array     Object      Set
     Initiate                1338        192      282
===== Iterate =====    
        Array                 800         39       93
       Object                 853        122      170
          Set                1147         82      131

By comparing columns you can compare the cost of searching (and initiating the structure before searching it). By comparing rows you can compare the cost of iterating over the different data structures (for example, iterating over Set while searching Array took 1147ms).

These results are quite consistent on this machine.

Findings
Some findings are very clear (I guess they are quite consistent across systems):

  • Putting values in an Array, to sort it, and the search it, is much slower and makes little sense compared to using an Object (or a Set)
  • Iterating an Array is a bit faster than iterating an Object or Set, so if you are never going to search an Array is faster
  • The newer and more specialized Set offers little advantage to good old Objects

What is more unclear is why iterating over Objects is faster when searching Arrays, but iterating over Sets if faster when searching Objects or Sets. What I find is:

  • Sets seem to perform comparably to Objects on Raspberry Pi, ARMv7.
  • Sets seem to underperform more on Mac OS X

Obviusly, all this is very unclear and can vary depending on CPU-cache, Node-version, OS and other factors.

Smaller and Larger sets
These findings hold quite well for smaller N=100 and larger N=1000000. The Array, despite being O(n log n), does not get much more worse for N=1000000 than it already was for N=10000.

Conclusions and Recommendation
I think the conservative choice is to use Arrays when order is important or you know you will not look for a member based on its unique id. If members have unique IDs and are not ordered, use Object. I see no reason to use Set, especially if you target browsers (support in IE is still limited in early 2018).

The Code
Here follows the source code. Output is not quite as pretty as the table above.

var lodash = require('lodash');

function randomarray(size) {
  var a = new Array(size);
  var x = 0;
  var i, r;
  var j = 0;
  var prime = 3;

  if ( 50   < size ) prime = 31;
  if ( 500  < size ) prime = 313;
  if ( 5000 < size ) prime = 3109;

  for ( i=0 ; i<size ; i++ ) {
    r = 1 + Math.floor(2 * Math.random());
    x += r;
    a[j] = '' + x;
    j += prime;
    if ( size <= j ) j-=size;
  }
  return a;
}

var times = {
  arr : {
    make : 0,
    arr  : 0,
    obj  : 0,
    set  : 0
  },
  obj : {
    make : 0,
    arr  : 0,
    obj  : 0,
    set  : 0
  },
  set : {
    make : 0,
    arr  : 0,
    obj  : 0,
    set  : 0
  }
}

function make_array(a) {
  times.arr.make -= Date.now();
  var i;
  var r = new Array(a.length);
  for ( i=a.length-1 ; 0<=i ; i-- ) {
    r[i] = a[i];
  }
  r.sort();
  times.arr.make += Date.now();
  return r;
}

function make_object(a) {
  times.obj.make -= Date.now();
  var i;
  var r = {};
  for ( i=a.length-1 ; 0<=i ; i-- ) {
    r[a[i]] = true;
  }
  times.obj.make += Date.now();
  return r;
}

function make_set(a) {
  times.set.make -= Date.now();
  var i;
  var r = new Set();
  for ( i=a.length-1 ; 0<=i ; i-- ) {
    r.add(a[i]);
  }
  times.set.make += Date.now();
  return r;
}

function make_triplet(n) {
  var r = randomarray(n);
  return {
    arr : make_array(r),
    obj : make_object(r),
    set : make_set(r)
  };
}

function match_triplets(t1,t2) {
  var i;
  var m = [];
  m.push(match_array_array(t1.arr , t2.arr));
  m.push(match_array_object(t1.arr , t2.obj));
  m.push(match_array_set(t1.arr , t2.set));
  m.push(match_object_array(t1.obj , t2.arr));
  m.push(match_object_object(t1.obj , t2.obj));
  m.push(match_object_set(t1.obj , t2.set));
  m.push(match_set_array(t1.set , t2.arr));
  m.push(match_set_object(t1.set , t2.obj));
  m.push(match_set_set(t1.set , t2.set));
  for ( i=1 ; i<m.length ; i++ ) {
    if ( m[0] !== m[i] ) {
      console.log('m[0]=' + m[0] + ' != m[' + i + ']=' + m[i]);
    }
  }
}

function match_array_array(a1,a2) {
  times.arr.arr -= Date.now();
  var r = 0;
  var i, v;
  for ( i=a1.length-1 ; 0<=i ; i-- ) {
    v = a1[i];
    if ( v === a2[lodash.sortedIndex(a2,v)] ) r++;
  }
  times.arr.arr += Date.now();
  return r;
}

function match_array_object(a1,o2) {
  times.arr.obj -= Date.now();
  var r = 0;
  var i;
  for ( i=a1.length-1 ; 0<=i ; i-- ) {
    if ( o2[a1[i]] ) r++;
  }
  times.arr.obj += Date.now();
  return r;
}

function match_array_set(a1,s2) {
  times.arr.set -= Date.now();
  var r = 0;
  var i;
  for ( i=a1.length-1 ; 0<=i ; i-- ) {
    if ( s2.has(a1[i]) ) r++;
  }
  times.arr.set += Date.now();
  return r;
}

function match_object_array(o1,a2) {
  times.obj.arr -= Date.now();
  var r = 0;
  var v;
  for ( v in o1 ) {
    if ( v === a2[lodash.sortedIndex(a2,v)] ) r++;
  }
  times.obj.arr += Date.now();
  return r;
}

function match_object_object(o1,o2) {
  times.obj.obj -= Date.now();
  var r = 0;
  var v;
  for ( v in o1 ) {
    if ( o2[v] ) r++;
  }
  times.obj.obj += Date.now();
  return r;
}

function match_object_set(o1,s2) {
  times.obj.set -= Date.now();
  var r = 0;
  var v;
  for ( v in o1 ) {
    if ( s2.has(v) ) r++;
  }
  times.obj.set += Date.now();
  return r;
}

function match_set_array(s1,a2) {
  times.set.arr -= Date.now();
  var r = 0;
  var v;
  var iter = s1[Symbol.iterator]();
  while ( ( v = iter.next().value ) ) {
    if ( v === a2[lodash.sortedIndex(a2,v)] ) r++;
  }
  times.set.arr += Date.now();
  return r;
}

function match_set_object(s1,o2) {
  times.set.obj -= Date.now();
  var r = 0;
  var v;
  var iter = s1[Symbol.iterator]();
  while ( ( v = iter.next().value ) ) {
    if ( o2[v] ) r++;
  }
  times.set.obj += Date.now();
  return r;
}

function match_set_set(s1,s2) {
  times.set.set -= Date.now();
  var r = 0;
  var v;
  var iter = s1[Symbol.iterator]();
  while ( ( v = iter.next().value ) ) {
    if ( s2.has(v) ) r++;
  }
  times.set.set += Date.now();
  return r;
}

function main() {
  var i;
  var t1;
  var t2;

  for ( i=0 ; i<100 ; i++ ) {
    t1 = make_triplet(10000);
    t2 = make_triplet(10000);
    match_triplets(t1,t2);
    match_triplets(t2,t1);
  }

  console.log('TIME=' + JSON.stringify(times,null,4));
}

main();

When to (not) use Web Workers?

Web Workers is a mature, simple, standardised, compatible technology for allowing multithreaded JavaScript-applications in the web browser.

I am not going to write about how to use Web Worker (check the excellent MDN article). I am going to write a little about when and why to (not) use Web Worker.

First, Web Workers are about performance. And performance is typically not the best thing to think about first when you code something.

Second, when you have performance problems and you throw more cores at the problem your best speedup is x2, x4 or xN. In 2018 it is quite common with 4 cores and that means in the optimal case you can make your program 4 times faster by using Web Workers. Unfortunately, if it was not fast enough from the beginning chances are a 4x speedup is not going to help much. And the cost of 4x speedup is 4 times more heat is produced, the battery will drain faster, and perhaps other applications will be suffering. A more efficient algorithm can often produce 10-100 times speedup without making the maintainability of the program suffer too much (and there are very many ways to make a non-optimised program faster).

Let us say we have a web application. The user clicks “Show report”, the GUI locks/blocks for 10s and then the report displays. The user might accept that the GUI locks, if just for 1-2 seconds. Or the user might accept that the report takes 10s to compute, if it shows up little by little and the program does not appear hung. The way we could deal with this in JavaScript (which is single thread and asyncronous) is to break the 10s report calculation into small pieces (say 100 pieces each taking 100ms) and after calculating each piece calling window.setTimeout which allows the UI to update (among other things) before calculating another piece of the report. Perhaps a more common and practical approach is to divide the 10s job into logical parts: fetch data, make calculations, make report, but this would not much improve the locked GUI situation since some (or all) parts still take significant (blocking) time.

If we could send the entire 10s job to a Web Worker our program GUI would be completely responsive while the report is generated. Now the key limitation of a web worker (which is also what allows it to be simple and safe):

Data is copied to the Worker before it starts, and copied from the Worker when it has completed (rather than being passed by reference).

This means that if you already have a lot of data, it might be quite expensive to copy that data to the web worker, and it might actually be cheaper to just do the job where the data already is. In the same way, since there is some overhead in calling the Web Worker, you can’t send too many too small pieces of work to it, because you will occupy yourself with sending and receiving messages rather than just doing the job right away.

This leaves us with obvious candidates for web workers (you can use Google):

  • Expensive searches (like chess moves or travelling salesman solutions)
  • Encryption (but chances are you should not do it in JavaScript in the first place, for security reasons)
  • Spell and grammar checker (I don’t know much about this).
  • Background network jobs

This is not too useful in most cases. What would be useful would be to send packages of work (arrays), like streams in a functional programming way: map(), reduce(), sort(), filter().

I decided to write some Web Worker tests based on sort(). Since I can not (easily, and there are probably good reasons) write JavaScript in WordPress I wrote a separate page with the application. Check it out now:

So, for 5 seconds I try to do the following job as many times I can, while I keep track of how much the GUI is suffering:

  1. create an array of 10001 random numbers: O(n)
  2. sort it: O(n log n)
  3. get the median (array[5000]): O(1)

The expensive part is step 2, the sort (well, I actually have not measured 1 vs 2). If the ratio of amount of work done per byte being sent is high enough then it can be worth it to send the job to a Web Worker.

If you run the tests yourself I think you shall see that the first Web Worker tests that outsource all of 1-2-3 are quite ok. But this basically means giving the web worker no data at all and when it has done a significant amount of job, receiving just a few numbers. This is more Web Worker friendly than Chess where at least the board would need to be sent.

If you then run the tests that outsource just sort() you see significantly lower throughput. How suitable sort()? Well, sorting 10k ~ 2^13 elements should require each element to be compared (accessed) about 13 times. And there is no data sent that is not needed by the Web Worker. Just as a counter example: if you send an order to get back the sum of the lines most of the order data is ignored by the Web Worker, and it just needs to access each line value once; much much less suitable than sort().

Findings from tests
I find that sort(), being O(n log n), on an array of numbers is far too fast to be outsourced to a Web Worker. You need to find a much more “dense” problem to benefit of a Web Worker.

Islands of data
If you can design your application in such way that one Web Worker maintains its own full state and just shares small selected parts occationally, that could work. The good thing is that this would also be clean encapsulation of data and separation of responsibilites. The bad thing is that you probably need to design with the Web Worker in mind quite early, and this kind of premature optimization is often a bad idea.

This could be letting a Web Worker do all your I/O. But if most data that you receive is needed in your application, and most data you send comes straight from your application, the benefit is very questionable. An if most data you receive is not needed in your application, perhaps you should not receive so much data in the first place. Even if you process your incoming data quite much: validating, integrating with current state, precalculating I would not expect it to come very close to the computational intensity of my sort().

Conclusions
Unfortunately, the simplicity and safety of Web Worker is unfortunately also its biggest limitation. The primary reason for using a Web Worker should be performance and even for artificial problems it is hard to get any benefit.

Note to self: never try-catch more than necessary!

A wrote a function, and then a unittest, and the unit test was good.
Then I called the function from my real project, and it failed!

I isolated the problem and thought I had found a bug in V8 (except after many years as a programmer I have I learnt it is never the compilers fault).

This was my output:

$ node bug.js 
Test good
main: err=Not JSON

This is my simplified (faulty) code:

function callSomething(callback) {
  var rawdata = '{ "a":"1" }';
  var jsondata; 

  try {
    jsondata = JSON.parse(rawdata);
    callback(null,jsondata);
  } catch (e) {
    callback('Not JSON', null);
  }
}

function test() {
  callSomething(function(err,data) {
    if ( err ) console.log('Test bad: ' + err);
    console.log('Test good');
  });
}

function main() {
  var result = {
    outdata : {}
  };

  callSomething(function(err,data) {
    if ( err ) {
      console.log('main: err=' + err);
    } else {
      result.outata.json = data;
      console.log('main: json=' + JSON.stringify(result.outdata.json));
    }
  });
}

test();
main();

How can the test not fail when main fails?

Well, here is the correct output

$ node nodebug.js 
Test good
main: json={"a":"1"}

of the correct code main function:

function main() {
  var result = {
    outdata : {}
  };

  callSomething(function(err,data) {
    if ( err ) {
      console.log('main: err=' + err);
    } else {
//    result.outata.json = data;
      result.outdata.json = data;
      console.log('main: json=' + JSON.stringify(result.outdata.json));
    }
  });
}

The misnamed property caused an Error which was (unintentionally) caught, causing the anonymous callback function to be called once more, this time with err set, but to the wrong error.

It would have been better to write:

function callSomething(callback) {
  var rawdata = '{ "a":"1" }';
  var jsondata; 

  try {
    jsondata = JSON.parse(rawdata);
  } catch (e) {
    callback('Not JSON', null);
    return;
  }
  callback(null,jsondata);
}

and the misnamed propery error would have crashed the program in the right place.

Conclusion
Don’t ever try more things than necessary. And if you need to try several lines, consider making separate try for each.

Minification of real web Application

I have built and I maintain a reasonably large (AngularJS) web application and here follow a few notes on the effect of minification.

I start with the findings:

                            Uncompressed         GZIP     Minified    Min+GZIP

App 1:  Size        (kb)            1130         1130          843         841
        Transferred (kb)            1150          375          861         308
        Load time    (s)             2.8          1.6          2.7         1.7

App 2:  Size        (kb)             708          708          659         659
        Transferred (kb)             721          359          672         347
        Load time    (s)             4.0          3.5          3.1         3.5

Conclusions
You should always enable gzip on the server. It is faster to compress and send less data than to send the uncompressed data. The benefits of gzip are huge and there are no negative side effects.

Minification saves some bandwidth (and if unlike me you do it ahead of time, some loading time). But unless your code contains mostly comments the effects are marginal (although that might be a big saving if you use very much bandwidth or you are looking for fastest possible load times).

Also, gzip tends to be good at what minification can easily do, and while the effect of minification alone is quite significant, the effect of minification together with gzip is smaller.

Behind the figures
The figures above come from Firefox Load time over the internet.

  • App1: About 100 files are served, mostly .js (a few .html and .css)
  • App2: About 80 files are served, mostly .js (a few .html and .css)
  • App1: Angular is always pre-minified 165kb, gzipped to 67kb.
  • App2: Angular+modules is always pre-minified 298kb, gzipped to 127kb
  • App2 contains a few fonts which are neither minified nor gzipped (142kb)
  • Files served by Node.js
  • Files minified by custom Node.js code in real time
  • Files gzipped by nginx in real time
  • Not everything is initiated when Load is complete (more html-files are loaded dynamically as user navigates, and data is loaded from APIs on demand)

Implications of minification
Minification (and possibly packaging of code) has more implications than gzip. Possible negative side effects are:

  • A build process is not strictly needed for web development, but minification is often done as part of a build process, increasing complexity of development, testing and deployment.
  • Testing and development is made harder when debugging minified code (although there are tools to mitigate this).
  • More aggressive minification can have unexpected results

The minification code I run in Node.js, when I serve a file, basically just:

  • Removes all white space in the beginning and end of lines
  • Removes all comments

This nice thing about this simple minification strategy is that everything that is obviously just waste is removed at a low cost, but the code is for all practical purposes completely unchanged (even line numbers are preserved to not complicate debugging). Also, developers should feel free to write as many comments as they like in the code, yet comments should never be served in a public facing application. More powerful minification comes at higher costs, and the effects are probably mostly lost after gzip.

I guess every project and system have a sweet spot when it comes to minification and I think my simple minification strategy makes sense for my needs.

Programming paradigms rock and suck!

I wrote a few articles about why functional programming sucks (1, 2) and why some functional libraries suck (3, 4). Obviously the titles were a little click bait and I think anyone reading the articles understood that I argue that it is the mindless hype that sucks, not the (FP) paradigm itself. Anyways, writing those articles, getting some feedback on them, and programming on my projects gave me more thoughts.

Lets say we have these programming paradigms:

  • Functional Programming: Pure, testable functions, avoiding state and variables.
  • Object Oriented Programming: Encapsulating data inside objects exposing a simplified, safe interfaces
  • Imperative Programming: Explicitly instructing the computer step by step what to do and how to do it, changing state (or Procedural Programming if you like)
  • Declarative Programming: Expressing your problem as data, and something else takes that data and produce what you want
  • State Machines: Global data and distinct states: your program reacts to input, transitions between states and modifies its data

As you perhaps understand, I do not aspire to be a computer scientist. This is all very practical.

I find – and I dont know if you will find this obvious or outrageous – that all these paradigms have strengths and weaknesses, and that a reasonably sized program or system will need to use a combination of them.

Here follows some strenghts and weaknesses:

Functional Programming Strengths

  • High reusability
  • Highly testable code
  • Compact code
  • Transformation, or streaming, or raw data matches the reality of networked services

Functional Programming Weaknesses

  • Obscurity – some code can be very cryptic and hard to read
  • Over engineering – attempting to make super reusable code can complicate simple things
  • Avoiding state – sometimes you have a state and you need to deal with it (and with FP there is a risk you try to avoid reality rather than face it head on)
  • Performance – FP comes with some overhead
  • Algorithmic complexity – it can be hard to understand the algorithmic complexity, or to write an efficient implementation, and this can lead to performance problems

Object Oriented Programming Strengths

  • Encourages clear APIs
  • Encourages reusability
  • Encourages information modelling
  • Allows refactoring of implementation

Object Oriented Programming Weaknesses

  • It adds little (no) value and significant overhead to serialize/deserialize data as it is sent/received and stored/loaded
  • Inheritance is a questionable concept
  • Ideally, objects have only one-way-dependencies, but in the real world this creates difficult, artificial design problems and complexity
  • It adds code and concepts that may cost more than you practically get from it: public/private declaration, getters/setters

Imperative Programming Strengths

  • Simple and straight forward
  • Little overhead: high performance
  • Algorithms explicitly implemented: high performance
  • Works the same in many different languages

Imperative Programming Weaknesses

  • Bad testability: if you have too big procedures with too many side effects
  • Bad maintainability: if you end up writing too big, complex modules
  • You can end up writing much code, copying and pasting
  • It takes discipline

Declarative Programming Strengths

  • Can be very compact
  • Can be very easy to read (if you understand/accept the “language”)
  • Can allow for quickly adding or changing features

Declarative Programming Weaknesses

  • Performance may be bad and/or unpredictable
  • Although compact, it can be very cryptic
  • Debugging can be very hard

State Machine Strengths

  • Can deal with complex states
  • Suitable for error handling, recovery
  • Can deliver efficiency at system level

State Machine Weaknesses

  • Requires proper analysis and design upfront
  • May be hard to refactor or change
  • Complex (represents and deals with complexity rather than hides it or lets someone else take care of it)

I now imagine a simple mobile/web application (like an little online betting site). There is data storage, server side application logic, authentication, http APIs, client loading data, client side application logic, UX, and the user saving/updating data to the system.

Server Side
The server itself should be thought of as a state machine. It can be up and good, but it can also be starting or shutting down. It can have bad or missing connectivity to other services (authentication, payment, data feeds). It can be in maintenance mode or perhaps in test or debug mode. It needs to hold and renew cached data. All these things can dramatially affect how a simple API call is handled! Failing to do this in a structured way can lead to very complicated API implementions or severe performance or stability problems.

If there are adapters connecting to other systems or the storage these may very well be implemented in an Object Oriented way. They expose a simple and safe API and they hide a lot of implementation.

When it comes to server side business logic it is fed with input from the storage or cache and its output is sent over the network to clients (or the other way around). Such business logic should be designed in a Functional way: it should be clear what it does and what data it uses, and it can and should be testable.

The implementation of the business logic or the adapters may be performance critical and non trivial. Imperative programming can be used here – inside what is exposed as Functional or Object oriented code. It must not leak. But every bolt and nut in an FP function does not have to be implemented using FP principles, and every internal part of an Object need not be built on the principles of Object Orientation.

Finally, the definition of the APIs, the access rules can be Declared. Other code can execute these rules and declarations behind the scenes.

Client Side
The client also needs to be thought of as a state machine first. Is the user authenticated and logged in? Have we received all data, or are we still waiting for something? Have we received the latest updates or have there been connection problems? Does the user have edited, dirty, state that is not saved to the server? Are we having errors saving data that we are retrying? Where in the application is the user and what settings, configurations or policies applies to the user? You need to deal with all these things. If you try break it into many small modules with no mutual dependencies you will find that is very hard. Put the damn global state somewhere, and accept that it is a global state. Make it all publicly readable for anything that cares. Changing of the state is a completely different business that must only be done via specific interfaces (the entire state machine can appear like an Object).

Now Declare your user interface: the pages, buttons, colors and everything else. Write code that consumes these declarations and produce the UX. To the state machine this code may appear Object Oriented, and your UX components are probably some kind of OO objects, reusable across the pages.

Whenever you can, break stuff out into Functional, pure, testable functions.

And when it comes to getting stuff done, as long as your code is contained within Objects or Functions and they do not leak: write simple and efficient code Imperative code if that is the best.

The GUI can be described in data rather than code in a declarative way.

Conclusion
It is quite pointless to talk about (for example) Functional programming as better or worse than anything else. It simply depends on. And for programs/systems of non-trivial size and complexity, mixing programming paradigms are fine.

Fast binary search without Division

Arrays are simple datastructures but finding elements is not fast. However, if you have a sorted array, a favourite algorithm is Binary Search.

When it comes to JavaScript there is no binary search in the standard library. Standard functions like find and indexOf simply just don’t cut it for many purposes, being O(n).

Rosetta Code has two implementations of binary search for JavaScript. However, binary search is based on division (by 2, you split the search interval in half every iteration). In programming languages with integer math division by 2 is performed by doing a single shift, so it is a cheap operation (division is generally expensive). In JavaScript integer division by two looks like:

mid = Math.floor((lo + hi) / 2);

This is just… unacceptable. Perhaps V8 recognizes the pattern and replaces it with the right thing, I don’t care, floating point math has nothing to do with binary search and this is ugly.

Perhaps I can do better? Well, I actually can!

Instead of dividing the actual interval by 2 in every iteration you can use the powers of two: 32,16,8,4,2,1 and search with them. Lets say the array is 41 elements long. Start check element 32. If its too little try with 32+8 (skip 32+16 > 41), and if it is too large try with 16. That way you essentially do a binary search without a division.

Since I don’t want to do heap allocation for every search I calculate the powers of two before declaring the function. In the code below my maximum value is 2^31, so I can mostly search arrays of size 2^32-1. If you dont like that limit you can raise 32 to whatever you like (but at 52-53 you are running into new problems, and well before that you will run out of RAM).

function powers_init(s) {
  var i;
  var r = new Array(s);
  r[0] = 1;
  for ( i=1 ; i<s ; i++ ) {
    r[i] = 2*r[i-1];
  }
  return r;
}

var powers = powers_init(32);  // [1,2,4,8,16,32,...

function binary_search_divisionless(a, value) {
  var pix = 0;
  var aval;
  var offset0 = 0;
  var offset1;

  if ( a[0] === value ) return 0;
  while ( powers[pix+1] < a.length ) pix+= 1;

  while ( 0 <= pix ) {
    offset1 = offset0 + powers[pix];
    if ( offset1 < a.length ) {
      aval = a[offset1];
      if ( value === aval ) {
        return offset1;
      } else if ( aval < value ) {
        offset0 = offset1;
      }
    }
    pix--;
  }

  return -1;
}

It is perhaps not obvious why I check for a[0] before doing anything else. The function returns from the main while loop as soon as it finds what it is looking for. So a[offset0] does not contain the value in later iterations when offset>0. However, this is not guaranteed from the beginning when offset0=0, and it is not automatically being tested in the end. So I explicitly test it first.

Benchmark
Below follows relative performance in time for different array sizes.

Array Size:                  10      100     1000    10000   100000
=======================================================================
Standard Library indexOf   1.65     2.02    11.89    94.00   790.00
Rosetta Code Recursive     1.32     1.48     1.84     1.81     2.02
Rosetta Code Imperative    1.18     1.08     1.41     1.43     1.45
Divisionless               1.00     1.00     1.00     1.00     1.00
   - unrolled              0.93     0.76     0.82     0.79     0.78

I am satisfied that my code is consistently faster than the alternatives. What surprised me was that binary search clearly wins even for short arrays (10). The little loop before the big loop can be unrolled for significant performance gain:

//while ( powers[pix+ 1] < a.length ) pix+= 1;
  if ( powers[pix+16] < a.length ) pix+=16;
  if ( powers[pix+ 8] < a.length ) pix+= 8;
  if ( powers[pix+ 4] < a.length ) pix+= 4;
  if ( powers[pix+ 2] < a.length ) pix+= 2;
  if ( powers[pix+ 1] < a.length ) pix+= 1;

This is also the reason why 32 was a particularly good array size for the powers of two. This is a kind of optimization I would normally not let into my code. However, if you make much use of binary search on the Node.js server side of an application, go ahead.

It is also worth noting that the following patch doubles the execution time of my code!

//  if ( offset1 < a.length ) {
//    aval = a[offset1];
    if ( undefined !== ( aval = a[offset1] ) ) {

General Compare Function
With little modification, the algorithm can be used with a compare function for arrays of objects other than numbers (or strings).

Division
Since my algorithm only uses powers of 2, I can do division in JavaScript without using Math.floor. It is a quite easy change to eleminate the powers-array and divide by two instead. It turns out it make very little difference on performance (to do integer / 2 instead of array lookup). However, when I added a (meaningless) Math.floor() around the division, performance dropped the same as the the iterative version from Rosetta Code. So my intuition was correct to avoid it.

I checked the Lodash binary search code (sortedIndexOf) and it uses bit shift <<< to divide by two. Admittedly, the Lodash code is slightly faster than my code.

Conclusion
You should have a well implemented and tested binary search function available in your toolbox (admittedly, use Lodash for this). This is not the right place to lose milliseconds. For very small arrays you can argue that indexOf works equally well, but the cost of using binary search is insignificant.

Developer lost in Windows

Admittedly, I am not a Microsoft fan, but Windows is a quite fine operating system nowadays. This article is not about complaining with Windows.

This article is also not about native Windows development for Windows. That is, if you use Windows, Visual Studio and C# (or other languages native to Windows development) to produce software targeting Windows, this article is not for you.

I and other programmers use Linux, OS X or possibly BSD to develop software meant to be OS independent. Our core tools are perhaps bash and a terminal emulator. We use tools like grep, head, tail, curl, iconv, sed, bc, emacs, vim, ssh, nc, git or svn on a daily basis (without them we are in a foreign land not understanding the customs). Depending on programming language we use compilers and interpreters like gcc, python, perl, php, nodejs and sbcl. In Linux, OS X or BSD this all come very naturally but sometimes we find ourself using a Windows computer:

  1. We just happen to have a second Windows computer that we want to use
  2. Company policy requires us to use Windows
  3. We want to be capable of working in Windows
  4. We want our project to work fine in Windows
  5. Perhaps you are a Windows developer/user who need/want to do unix-like development without getting a separate computer.

This article is for you!

The good news is that basically everything you wish to do can be done with free software (also on Windows). What is also good is that you have plenty of choices of good stable software to choose from.

The bad news is that finding just what is right for you and making it feel as simple and smooth as you are used to can be time consuming, difficult and frustrating.

Embracing Windows
To be productive in Windows you should (at least to some degree) embrace Windows. One approach is to do as the Windows developers. This would mean using cmd.exe (the old DOS-shell) or Powershell, and learn/embrace what comes with it. You would perhaps install only node/php/python native for Windows and git. I encourage you to try this, but I will not write more about it.

This might be the best solution if you develop in JavaScript or Java AND basically all tools you use are part of a JavaScript/Java ecosystem. Perhaps git is the only thing you need apart from what you install with npm.

Avoiding Windows
One approach is to just use Windows to access (and possibly run a virtual) Linux (system). It can work better or worse depending on your situation. A full screen VM might be the best solution if Linux GUI tools are central to your development, or if you are anyway very used to working with VMs. SSH to a remote system might be the best solution if mobility is not a problem. However, some development gets more complicated when different things are on different IP addresses, and security becomes more relevant when everything is not on 127.0.0.1. I will not write more about this.

Make Windows Unix-like
So, our gool is to feel reasonably at home in Windows by installing what is missing. It is always tricky to divide things in clear categories, but I would say you will want a stack of four layers:

  1. A Terminal Emulator: Windows already comes with one, but chances are you will not find it good enough. I have very modest demands but I expect copy-paste to work nicely and I expect multiple tabs. This works perfectly in any default Linux desktop and Mac OS X, not so in Windows.
  2. Editor: Unless you prefer to use vim or emacs directly in the terminal you want to install a decent or familiar text editor (Nodepad or Wordpad are not suitable for serious programming).
  3. The standard UNIX/GNU tools: Windows does not come with bash, head, tail, grep, sed, vim, bc and most other tools you take for granted. The old (DOS) equivalents are simply inferior. The new (Powershell) equivalents are… well… lets just say it is a steep learning curve and your bash scripts will not run.
  4. Interpreter/Compiler: Windows does not come with gcc, python, perl, nodejs or php. However, they are most often available as a separate download for Windows. Such a native Windows version may be slightly different from the Unix-version you are used too (command line options, especially related to paths may be different).

The short version is that you can install things like Cygwin or Windows Subsystem for Linux and you might just be fine with it! Or not. If you are bored of reading, try it out now.

To make a more informed choice you need to consider what types of binaries you want to run (and perhaps produce).

Native Windows Binaries
A Native Windows binary can (with some luck) be copied from a Windows computer to another and run. To interact with Windows (and you) it uses the APIs of Windows (typically Win32). It probably expects paths to be formed like in Windows (c:\tmp rather than /tmp).

Native Linux Binaries
A Native Linux binary was typically built on a Linux system to run on that Linux system (like Debian). Until recently it would not run on Windows, however Microsoft put a massive effort into Windows Subsystem for Linux, which allows you to run Linux programs (including bash) directly in Windows 10 (only). This is not perfect though. A Linux filesystem is quite different from a Windows filesystem so access between the two filesystems is limited and may require some thinking. This is perhaps the best approach if you are targetting Linux (like development for Docker) but it is obviously a bad approach if you want your program to work on a Windows server or other Windows computers.

Hybrid / Cygwin
There is an old (as in proven and reliable) project called Cygwin. It is basically a DLL-file that translates all (most) Linux system calls into Native Windows calls. This means that the unmodified source code of (most) programs written for Unix can be compiled with Cygwin, for Cygwin, and run on a Windows computer that has the Cygwin DLL installed. There are some drawbacks. First, performance suffers (from nothing to quite much depending on what you do). Second, for more advanced software, especially with GUI or heavy on network (like Apache), the hybrid solution can feel like the worst of two worlds. Third, access to the entire filesystem is smooth, but when it comes to access rights it sometimes does not work perfectly (files created by Cygwin get weird, even broken, permissions).

Now, to complicate things, there is a project called MSYS2 that maintains a fork of Cygwin, very similar to Cygwin. Cygwin or MSYS2 can be included/embedded in other projects (such as cmder). If you install multiple unix-compability-suites on your system it can get confusing.

Choosing binary type
At first glance Windows Subsystem for Linux, or Cygwin, seem very attractive. But lets assume that we do web development in Python. If you go with Windows Subsystem for Linux you will need to run a webserver (apache, lighttpd) inside that subsystem. To me, configuring, starting and stopping services inside this subsystem is not attractive. What could possibly go wrong? Well, a lot of things. With Cygwin you can probably make Windows IIS invoke Cygwin Python (if you really dont care about performance), because running Cygwin Apache sounds creepy (it can be done though). If, on the other hand you install Python built for Windows you get the real thing. All Windows/Python documentation and forum information suddenly applies to you. But then you end up with a Cygwin shell where everything is just like in Linux, except Python is not (except Cygwin will come with Python too, so you can end up with two versions of Python, with different features).

I would hesitate to run apache inside Cygwin and even more so inside Windows Subsystem for Linux. But I always also hesitate to do anything with IIS. Perhaps the best thing is to install Apache and Python for Windows (not depending on Cygwin) and just find the tools you need to edit your files?

The same reasoning can apply to PHP, nodejs or whatever you do.

Most configurations can probably be made to work. But you just want your Windows computer as simple and standard as possible, and you want your usual Linux (or OS X or BSD) stuff just to work the way they usually do. This is really not a matter of right or wrong, it is more a matter of taste (what kind of worthless errors do you want to solve just because you are a foreigner in Windows doing stuff not the way it was really meant to be done in Windows?).

Maintainability and Management
One thing is to make it work. Another thing is that it works week after week, month after month. Also another thing is to keep your software up to date and being able to add new tools along the way. You will find that some of the software I mention in this post does not come with the Windows installer/uninstaller that you are used to.

There is something called Chocolatey for Windows, which is a package manager dealing with installing, upgrading and uninstalling software on Windows in a uniform way. I don’t know much about it, and I will not write more about it.

While unix programs typically have a .file with configuration (there are a few places to look though) Windows programs typically use the registry. When it comes to unix software adapted for Windows you never really know: registry, config file, both or… depends on? And if a config file, where is it?

While unix programs can usually be installed in the home directory of an unpriviliged user, or in /opt, programs in Windows often require administrative priviliges to spread their files a little bit everywhere.

The more stuff you try and throw out, the more garbage you have left on your computer, which could possibly interfere with a newer version of something you install in the future. Keep this in mind. One day you will install some exotic software that does not work as expected, and you dont know if some old garbage on your computer caused it.

Git
Git is a very popular version control system. Originally designed for Linux it is today (?) the officially preferred system also for native Windows development. Git is so popular (or demanding?) that you can get it bundled for Windows together with some of your favourite tools. This might be just enough for you!

  • posh-git : git for powershell
  • git for Windows : based on Git for Windows SDK
  • cmder : based on MSYS2
  • …more?

cmder
I have tried cmder and I dont like it. It is an ugly install of just unpacking a huge zip file. MSYS2 itself is hidden inside the cmder-folder, so I dont feel comfortable managing it on my own. There seems to be no upgrade strategy (except throwing it all away and downloading the latest version). Git is run from one shell (a traditional Windows shell with a lambda prompt) but a msys2/bash (identical to Cygwin) shell is started separately. I dont want to change console to run git: I run git all the time. But it might be perfect for you (many people like cmder).

Cygwin
Cygwin is nice becuase it comes with an installer (setup.exe) that is also a package manager. It has a lot of packages, and it is capable of installing things like apache as windows services. My experience is that I am too lazy downloading the latest setup.exe, and I am too lazy running setup.exe regularly. Sometimes you end up with old versions and upgrade problems.

My disappointment with Cygwin is that it comes with its own (compared to standard Windows) terminal Mintty that still does not have tabs. I also do nodejs development and nodejs is not a Cygwin package, so I need to use standard Windows node. This sucks a little because node in Windows behaves slightly different from node in Linux/OS X (particularly when it comes to where packages go) so the Cygwin experience is a bit broken when you start using Windows node and (perhaps particularly) npm.

Also, I like bash scripts and they tend to run significantly slower in Cygwin than in Linux (process forking is extremely cheap in Linux and rather expensive in Windows, so with the Cygwin overhead it can get rather slow for heavy scripts).

As I now try to update and configure my Cygwin environment for my Node.js project I find:

  • I use Cygwin wget to download setup.exe (so I get it where I want it, rather than Downloads) to update Cygwin. When I double click it (to run it) permissions are wrong and I cant execute it. It is an easy fix, but compared to OS X / Linux this is awkward.
  • I run node from Cygwin. I get no prompt (>). It turns out node.exe does not recognize Cygwin/bash as a terminal and I need to run node.exe -i.
  • Symlinks keep being a mystery. There are some kind of symlinks in Windows now, Cygwin seems to try to use them, but the result is not consistent.

For a Terminal with tabs, check Fatty below.

MinGW & MSYS
While the idea of Cygwin is to provide a Posix compliant environment on Windows the MinGW/MSYS project was about porting unix tools (perhaps particularly a gcc-based C/C++ build environment) to run natively on Windows. According to the Wikipedia page of MinGW this is pretty much abandoned.

Gow: GNU on Windows
Gnu on Windows is a lightweight alternative to Cygwin. It appears to not have been updated since early 2014 (when 0.8.0) came out (and the Windows Subsystem for Linux seems to be one reason it is less relevant). I will not write more about it.

MSYS2
MSYS2 is the successor to MSYS, and (surprise) it is based on Cygwin. I tried it quickly and I find that:

  • It seems safe to install side by side with Cygwin
  • Using the MSYS2 is very similar to Cygwin
  • Instead of the Cygwin GUI package installer, MSYS uses pacman from Arch (if you much prefer that, go with MSYS2)
  • MSYS2 has some emphasis on MinGW32 and MinGW64. As I understand it this is about being able to use MSYS2 to build native Windows software from C/C++ code (if you do this in Cygwin, you end up with a Cygwin dll dependency)

So, for my purposes MSYS2 seems to be quite equivalent to Cygwin. Expect the same annoyances as I mentioned for Cygwin above.

Windows Subsystem for Linux
If you try to run bash from a Windows 10 command line you will probably get something like:

-- Beta feature --
This will install Ubuntu on Windows, distributed by Canonical
and licensed under its terms available here:
https://aka.ms/uowterms
In order to use this feature you must have Developer Mode enabled.
Press any key to continue...

Note that this can be quite confusing if you have installed some other bash.exe on your system. If you unexpectedly get the above message, check your PATH and make sure you invoke the right bash executable.

Installation is very easy (activate Developer mode and run bash), after giving username+password you are actually good to go! If you are used to Debian/Ubuntu you will feel surpisingly at home.

I find my Windows files in /mnt/c (not too surprising).
I find my Linux home files in c:\Users\zo0ok\AppData\Local\lxss\home\zo0ok.
(copying files there from Windows did not make them appear in Linux though)

So, if you want to edit files using a Windows GUI editor, they need to be in Windows-land, and that is obviously not the optimal environment for you project.

In general it works very well though. My node services had no problems listening to localhost:8080 and accepting incoming http requests from a Windows web browser.

If you are not happy with Ubuntu or you want more control of your Linux environment you will need to do further research. Ideally, Windows Subsystem for Linux has most of the advantages of a virtual machine, but none of the drawbacks. However, depending on what you really do and need, it can turn out to have most of the drawbacks and few of the advantages instead.

Fatty
The Mintty terminal that comes with Cygwin is ok, but it does not support tabs. There are different alternatives, and a simple one is Fatty (it is really Mintty with tabs). Installing Fatty requires doing a git clone and compiling it yourself. If you are brave you can download fatty-1.6.exe from me.

The web page for fatty tells you how to make a desktop shortcut but it did not work for me. What works for me is to set Shortcut target: “C:\cygwin64\bin\fatty.exe -“. Simple as that. I think I will be quite fine with fatty, actually.

Making Fatty run Windows Subsystem for Linux was trickier (as in no success) though.

ConEmu
ConEmu seems to be the ultra powerful flexible console. After 5 minutes I have still not found out how to change the font size.

ConsoleZ
ConsoleZ is good. Under Edit->Settings>Tabs you can add your own shell types.

Cygwin:                       Shell = C:\cygwin64\bin\bash.exe --login
Windows Subsystem for Linux:  Shell = bash.exe

Apart from that, ConzoleZ is reasonably easy to configure and it stays out of your way (I hide toolbar, status bar, search bar).

Editors
I am fine with vim in the console. However, there are many fine editors for Windows:

  • Visual Studio Code (at no cost)
  • Atom
  • Notepad++

Conclusions
I have had Windows as a 2nd/3rd platform for many years and I can see that the game has changed a bit. Microsoft has started supporting Ubuntu on Windows and at the same time the native options (like MSYS) are fading away. There are reasons to think general development is getting more and more based on Unix.

I would say:

  • Posh-git : Powershell is your thing, and you don’t care about unix tools
  • Git for Windows : You want Git, but you don’t care much about other unix tools
  • Cygwin : You want plenty of choices of unix tools in Windows
  • MSYS2 : You like pacman (Arch) or you want to build native Windows C/C++ binaries using free software
  • Windows Subsystem for Linux : You have a Windows 10 computer but you want to keep your Linux development separate from Windows

If you use Cygwin and just want tabs, get Fatty. Otherwise ConsoleZ is good. Chocolatey is more a Windows power user tool than something you need to provide unix capabilities.

In the past I have mostly been using Cygwin (with mixed feelings). Lately, when I have heard about the options (cmder, poshgit, Git for Windows, MSYS2) I have got a feeling that it is rather hard to configure an optimal environment. Now however I have come to realize that the differences are not very big. Most options are hybrids based on Cygwin and/or with Cygwin embedded (perhaps in the name of MSYS2). For Windows developers not used to Unix it is good with things like Git for Windows that just come with the basic Unix tools with no need to think about it. For developers with a Unix background it makes more sense to run Cygwin and MSYS2 (or Windows Subsystem for Linux). The days of unix tools built natively for Windows are over, and it is probably a good thing.

What you need to think about is your compiler, interpreter and/or web server.

Peculiar Compiler Optimizations

My teacher in a High Performance Computing class once told me not to confuse the compiler. This was in the late 90s, and SGI C and Fortran compilers were supposed to replace entire blocks of code with highly optimised implementations. As long as the compiler understood your intentions.

I have never discovered this, but yesterday perhaps! Read on.

I have been playing around with LISP, solving Project Euler challenges on Hackerrank. For problems 44 and 45 I decided to do Binary Search (which afterwards turned out not to be so smart, but that is another story) and took the implementation from Rosetta Code (the iterative one).

Binary search is about finding an element in a sorted array by starting in the middle, and jumping left or right, cutting the remaning array in half each time.

In my case I decided just a part of the array was worth searching so instead of searching the entire array [0..length] I wanted to search up to the Nth element [0..N]. Searching fewer elements should be faster, so I improved the binary search function to take an additional argument: hi. For SBCL (Steel Bank Common Lisp), this surprisingly had horrible effect on performance.

Benchmark Results
The results for different algorithms, different machines and different LISP implementations follow. The RPi V1 runs Arch Linux, Clisp comes with Arch, SBCL is downloaded from SBCL webpage. The RPi V2 runs Raspbian and SBCL that comes with the distribution. The Celeron runs Ubuntu that comes with SBCL. The MacBook Air runs OS X and SBCL is downloaded separately.

 (times in seconds)             Array Size Standard Optimized Recursive    C -O2
================================================================================
RPi V1  ARMv6 700MHz  Clisp 2.49      5000    640       633       720
                      SBCL  1.3.12    5000     15.6      27        34       0.95
RPi V2  ARMv7 900MHz  SBCL  1.2.4     5000      6.2      16        17       0.31
                                     20000    110       293       300       5.8
NUC Celeron J3455     Clisp 2.49     20000    765       762       720   
                      SBCL  1.3.3    20000      8.3      16.7      18.0     1.0
MacBook Air i5        SBCL  1.2.11   20000      4.0      11.5      12.3     0.75

A very slight “optimization” turns out to have very negative impact on performance for the quite fast (compiled) SBCL. I can’t imagine any other explanation than SBCL replaces the standard binary search with optimized code rather than executing my program. For Clisp the optimization actually works quite as would be expected and the recursive code is actually the fastest. On the Celeron, Clisp and SBCL behaves completely opposite.

Comparing to C
The other week I had the feeling (SBCL) LISP was fast and decided to compare LISP to C. This time I had the feeling that LISP was rather slow so I ported my test program to C (basically line by line). Well, I found that SBCL is actually pretty fast (especially on x86/x64), and C was faster only thanks to -O2 on some systems. -O2 actually made the C-program more than 5 times faster: perhaps also the C-compiler replace the entire binary search?

The Test Program
The code follows. The only difference between Standard and Optimized is the single line that is commented out (with ; in LISP) selecting which binary search to run (the length of the function name does not explain the performance difference).

The program creates an array of length N and populates it with values by the formula n(n+1)/2. This takes little time. It then checks for values 10,20,30… if the values are found in the array (using binary search). In this program the entire array is always searched, not taking advantage of the extra parameter (although the optimized version does not need to find the length of the array every time called).

(defun binary-search (value array)                       ; Standard 2 lines
    (let ((low 0) (high (1- (length array))))            ;
        (do () ((< high low) nil)
            (let ((middle (floor (+ low high) 2)))
                (cond ((> (aref array middle) value)
                       (setf high (1- middle)))
                      ((< (aref array middle) value)
                       (setf low (1+ middle)))
                      (t (return middle)))))))

(defun binary-search-optimized (value array hi)          ; Optimized 2 lines
    (let ((low 0) (high hi))                             ;
        (do () ((< high low) nil)
            (let ((middle (floor (+ low high) 2)))
                (cond ((> (aref array middle) value)
                       (setf high (1- middle)))
                      ((< (aref array middle) value)
                       (setf low (1+ middle)))
                      (t (return middle)))))))

(defun binary-search-r (value
                        array
                        &optional (low 0)
                        (high (1- (length array))))
  (if (< high low)
      nil
      (let ((middle (floor (+ low high) 2)))
        (cond ((> (aref array middle) value)
               (binary-search-r value array low (1- middle)))
              ((< (aref array middle) value)
               (binary-search-r value array (1+ middle) high))
              (t middle)))))

(defun formula (n)
    (/ (* n (+ n 1)) 2))

(defun init-array (n)
    (let ((arr (make-array n)))
        (loop for i from 0 to (1- n) do
            (setf (aref arr i) (formula (1- i))))
        arr))

(defun solve (arr n max)
    (let ((ret 0))
        (loop for i from 10 to max by 10 do
            (if (binary-search i arr)                     ; Toggle code used
;           (if (binary-search-r i arr)                   ;
;           (if (binary-search-optimized i arr n)         ;
                (incf ret)
                Nil))
        ret))
            
(defun main ()
    (let ((n (read)))
        (let ((arr (init-array n)))
            (format T "~D~%" (solve arr (1- n) (aref arr (1- n)))))))

(main)

Since I am a very novice LISP programmer I appreciate any feedback. The code above does not solve Project Euler 44 or 45, it is much simplified to test binary search. Initially I wrote code that relied on recursion rather than loops but I exceeded the stack size and ended up with loops (according to what I read, loops rather than recursion is the preferred style of Common Lisp).

Conclusion
Well... optimization is hard, and dont make any assumptions. As I have found many times before, what makes code faster on some platforms can make it slower on others. When it comes to optimizing SBCL and compiled LISP much experience is required, and dont forget to measure!

Playing with LISP and LISP vs C

Lisp is fun! Well, since I first knew about Lisp I was fascinated, but I have found it hard to learn Lisp and to play with it in a meaningful way. A few years ago I wrote about it here and here. As usual, the first steps of learning something new can be the hardest.

Occationally I use Hackerrank to find programming challanges to solve for fun. I particularly like the Project Euler competition. I find it particularly good for trying out new languages: you get a “meaningful” challenge, a simple environment prepared in your web browser, and automated test cases. So, this time I didn’t waste my time trying to find the right Lisp implementation for me, I just started hacking on Project Euler 38 and 39 on Hackerrank.

Problem 38 was quite simple, but 39 was more interesting. When I had solved it, I found my implementation was not at all fast enough, so I started experimenting locally (the Hackerrank environment is not optimal for tweaking, optimization and debugging).

Choosing a (Common) Lisp implementation
There are quite many Common Lisp implementations out there. The one Hackerrank uses is SBCL. That is clearly the Common Lisp implementation I would recommend (based on my little experience) if it is available for your platform.

I installed SBCL with apt-get in Ubuntu. I also downloaded binaries directly for my Mac OS X computer and my Raspberry Pi (v1) running Arch linux. Installation is a bit non-standard, but you can actually run it without installing (just execute run-sbcl.sh in downloaded folder).

I also tried clisp and ecl, none of these could deal with the memory usage (stack size) of my program. For clisp I found no way to manipulate stack sizes at all. For ecl I made some progress but I could not make it run my program.

SBCL is a Lisp compiler, and it produces fast and efficient code. I later compared it to C.

Project Euler 39
Project Euler 39 is basically about finding integer solutions to Pythagoras theorem. For a given, large, perimeter, how many right triangles are there? For example:

300000^2 + 400000^2 = 500000^2

This triangle has a perimeter of 300000+400000+500000=1200000. What other values for a and b so that

a + b = 700000
a^2 + b^2 = 500000^2

are there? The Hackerrank challenge requires you to work with perimeters up to 5000000. If you implement a solution, a few things to immediately note:

  • The squares wont fit in a 32bit integer. They will fit with no loss of precision in the 53 bits of a 64 bit double and they will also fit in a 64 bit integer. (This matters not for Common Lisp)
  • If you want to do recursion (and of course you want when you code Lisp) it will be millions of recursion steps, which will be a challenge to the stack size. (This also turned out not to matter for SBCL)

The Lisp implementation
It turned out that the SBCL compiler optimized the recursion is such a way that the memory usage was quite low. SBCL successfully runs my program on RPi/Arch, Intel/Ubuntu and Intel/OSX with quite reasonable memory usage.

Since this is about learing Lisp I wanted a 100% functional programming implementation. Only pure functions. A lot of my code is about generating, modifying and testing triangles. A triangle (a b c) can obviously be represented as a Lisp list (a b c) and this was my first implementation. Then if you want to read a, b or c from a list abc, or create the list from a, b and c, you can do:

  a: (car abc)
  b: (car (cdr abc))
  c: (car (cdr (cdr abc)))

abc: (list a b c)

I found this cumbersome. It became a lot of list-to-variables and variables-to-list overhead (I didnt care so much about performance, more about my code readability). I learnt that Lisp functions can return multiple values using value and that you can bind them with multiple-value-bind and use them as arguments to a function using multiple-value-call. This felt functional and pure enough, and it made my code 25% faster than the car/cdr pattern above.:

; a (stupid) function returning a triangle as three values
(defun get-345-triangle ()
  (values 3 4 5))

; a function calculating the perimeter of triangle (from a function)
(defun triangle-perimeter-1 (tri-func)
  (multiple-value-bind (a b c) (funcall tri-func)
    (+ a b c)))

; and in this case you dont need to bind, you can use + directly
(defun triangle-perimeter-2 (tri-func)
  (multiple-value-call #'+ (funcall tri-func)))

; now this works
(triangle-perimeter-1 #'get-345-triangle)
(triangle-perimeter-2 #'get-345-triangle)

Since I am a very inexperienced Lisp programmer I appreciate suggestions for improvement.

Performance of Lisp
My final Hackerrank submission of Lisp code executes in about 4.5 seconds on my Intel i5/Ubuntu. It takes about the same time on the Hackerrank web page, which is fast enough to pass all tests. On the Raspberry Pi v1 (ARMv6 @700 MHz) it takes more than 700 seconds. My intuition told me that 4.5 seconds was very good. This made me ask two questions. How would Lisp compare to C? And why is the ARM more than 100 times slower, how would that compare in C?

The C implementation
My ambition was to rewrite Lisp to C line by line. So my C-program has exactly the same functions which take almost exactly the same arguments. All calculations are identical and performed in exactly the same order. The C-program relies entirely on recursion instead of loops (just like the Lisp program). However…

Functions in C can not return multiple variables. While Lisp had values I decided to use a reference to a struct in C:

(defun get-a-triangle()
  (values x y z))

void get_a_triangle(struct triangle *t) {
  t->a = x;
  t->b = y;
  t->c = z;
}

If the C-triangle struct is a local variable on the callers stack the difference is quite small (from a practical point of view, from a theoretic strict functional programming perspective its a different story).

Numbers in Lisp have arbitrary precision integers and floats make no difference. So, when porting to C, I had to pick numeric types. For most purposes, int32_t was good enough. But, for the purpose of calculating Pythagoras theorem higher precision was needed (as I wrote above, the 53 bits of double, or 64 bits of int64_t are good). So I ended up with 5 versions of the C-program (to compare performance):

  1. All 64-bit integers
  2. 32-bit integers, 64-bit for “triangles”
  3. 32-bit integers, double for “triangles”
  4. 32-bit integers, 64-bit only for pythagoras calc
  5. 32-bit integers, double only for pythagoras calc

(In cases 2,3 the struct triangle has int64_t/doubles properties, and all manipulations and calculations on triangles use these datatypes. In cases 4,5 everything is int32_t, except the internals of a single function, which casts to higher precision before doing its calculations.)

The C-program requires a significant stack size. The stack size can be obtain and changed like (numbers in kb, all values given with ulimit -a):

$ ulimit -s
8192

$ ulimit -s 100000

For my program, a stack size much higher than 8192 is needed (see below). It seems impossible to get large stack than 64Mb in Mac OS X, so my C program could never run there.

Benchmark findings
All C-programs are compiled with gcc -O2.

 CPU            MHZ      SBCL        64     32/64  32/double   32(64)  32(double)
==================================================================================
Time (s)
 i5-4250U 1300-2600       4.5      1.52      1.52      1.60      1,54      1.58
 ARMv6          700      ~715        85        83        45        42        39
 ARMv7          900       357        23        21        13        12        10

Max Res (MB)
 i5-4250U                  41       103       103       103       103       103
 ARMv6                     50       220       210        79       110        76
 ARMv7                     57       180       160        87        97        62

This is not too easy to interpret! The ironic thing is that the fastest thing on the x64-cpu (64-bit integers everywhere) is the slowest on the ARMv6. However, the fastest option on the ARMv6 (32-bit everywhere, and when absolutely needed, use double) is almost the worst on the i5 CPU.

When it comes to the 64-bit i5, it basically does not matter what datatypes you use.

When it comes to the ARMv6, the most important thing is to not store the triangles as int64_t. The strange thing here is the stack sizes. Why does it double (compared to x64) when triangles are stored as int64_t? And the doubles, why do they reduce stack size so much (where are all these doubles actually stored)?

The time command gives max resident memory usage. If I set ulimit -s 128 the first two programs fail (with Segmentation fault 11), and the last three ones succeed, on the ARMv6.

I have found before that the performance of the ARMv6 suffers because of its slow memory and small cache. It is quite possible that the poor performance of the ARMv6 compared to the i5 is related to its slow memory, and the recursion (and stack memory) heavy algorithm.

Finally, SBCL in x64 has very good performance even compared to C (however, an iterative C-implementation, fitting completely in cache, would probably be faster). Note that I am a novice Lisp programmer and this is a math heavy program where the generic number type of Lisp will come at a cost. On the ARMv6, Lisp performance suffers much more.

Windows stack size limit
For Windows, stack size limit is set in the binary, not in the shell. With Cygwin/GCC use the flag -Wl,–stack,1000000 for one million bytes. Note that these are options passed on to the linker.

Future investigations
And I am curious about how much faster a minimal-memory-footprint loop-based C-program would perform.

The source code
Since this code solves a problem in Hackerrank I hesitate to publish it. If you want it for any other reason than just running it on Hackerrank let me know.