Category Archives: Programming

On Grit and becoming a better programmer

I have read the book Grit by Angela Duckworth. It brought some obvious (or not) things to my attention:

1. To really get better at something you need to challenge yourself, try more difficult things, not just repeatedly do what you are already capable of. (my words, not a quote from the book)

If you think of an athlete, a high jumper, this seems very obvious (you are never going to jump 2.00m if you keep practicing 1.50m all days).

2. Mastering something is about allowing yourself to dig deeper, getting more depth and seeing more details (than a novice).

If you think of a sports commentator (making remarks about subtle technical details in figure scating or gymnastics), this also seems fairly obvious.

What are programmers told to learn
I often hear advice to programmers about how to learn and work. I would say it is mostly about trying new things:

  • Learn new programming languages
  • Learn new tools and libraries (that simplifies things)

While these are obviously good things to learn it is neither very challenging nor very much going deeper. And when it comes to tools and libraries that simplify things you perhaps trade deep understanding for easy productivity.

It is not very often I hear the advice to try to solve a hard and challenging problem using tools you already know very well. And it is also not very common that I hear the advice to go into very intricate details about anything.

Programmers seem to value, or be valued by, knowledge about things allowing them to find shortcuts or ready building blocks:

  • Libraries and frameworks – to write no code, less code or avoid writing difficult code
  • Different programming languages – to pick a language that makes it easy
  • Patterns and methodology – to avoid difficult technical analysis and design
  • …and soft skills or course

All these skills are quite easily described in a CV. But none of it is particularly difficult or challenging.

What is truly hard about programming
To implement a correct solution a programmer needs to:

  • Understand the problem or problem domain correctly (and perhaps completely)
  • Come up with a solution and software architecture that can actually solve the problem
  • Go through the labour of correctly crafting the required code
  • …and this is or course done iteratively, learning and adapting along the way (because getting everything right fom the beginning is often impossibly hard), so you need to make incorrect/insufficient decisions that lead you in the right direction for now

This can perhaps be called problem solving or seniority in a CV, but problem solving is a rather abstract cliche and seniority is often measured in years more than anything else. Also it can appear to be covered by things like requirement analysis, patterns, TDD and agile. But these things are about how to plan, facilitate and manage the difficult things. You can know a lot about TDD, without being able to correctly write test cases that describe the problem domain correctly, and without being able to implement an algorithm that solves the problem sufficiently well.

A balanced training
Back to athletes. Golfers (lets call them athletes) used to not be very fit. Then came Tiger Woods. Since then all (top) golfers go to the gym (to be able to compete). To me, this is like you can be a good programmer but if you don’t know git you are simply not very competetive.

But golfers spend most of their time mastering their swing (or in the gym, or with a shrink). They don’t also do horseback riding, pole jumping and run marathons. Or if they do, they at least don’t think it is key to becoming a better golfer. But when it comes to programmers this is often what we do: learn a new language, a new framework or a new service. Like it would significantly make us better (even though it is no challenge at all, just some time spent).

No similes (or metaphors) are perfect. Golf is not programming. Most programmers don’t aspire to be among the best in the world. But I think the question is worth asking:

Do I, as a programmer, have the right mix of hard/challenging practice and trying/learning new stuff?

Learning in the workplace
In our workplaces they don’t want us to work with things that are so challenging that we might very well fail. That is a project risk. And IT projects fail far too often for anyone to be satisfied. It is not at all strange that organisations want us to work with things that we know and otherwise they want to mitigate the risk by making things easier. But do we learn this way? And do we, 5-10 years down the road, reach our potential and develop the capabilities that would benefit us the most?

Is there a genuine conflict because making things as easy and productive as possible on one hand and improving skills on the other?

For whom do we learn?
I don’t know if programmers who challenge themselves and become masters with deep knowledge are rewarded for it. I don’t know if most organisations even want such programmers. I already hear the complaints:

  • No one else understands her code (but if the problem was THAT hard, what was the poor master going to do?)
  • She is just inventing things instead of using standard tools
  • She is not following best practices

Also, who will ever know what a great job such a programmer does? It is like:

  1. This integration routine is unstable and too slow, can you try to fix it? (lets say it is very hard)
  2. Master fixes it in 4 days
  3. Some suspicious comments: yeah sure?!
  4. After the next week no one ever remembers and its just taken for granted that it works the way it always should have

Don’t do as they say they do, do as they do!
I can’t back this up, but I have the feeling that the best programmers we know are people who challenged themselves with insane projects. But what we hear is that programmers are valued by the number of technologies they know.

I would think that smart organisations know how to identify and appreciate a master. And I think master programmers eventually find themselves in those smart organisations. But I think it happens mostly below the radar.

Example: git
Before git there was subversion (improved on svn) and a number of commercial version control systems. These were, like all tools, both appreciated and hated and using them was best practice.

Now Master Torvalds was not happy. He challenged existing technologies and designed and wrote his own system: git.

However, what I find fascinating here is that he wrote git in C. People complained about it of course. But git was fine because Torvalds

  1. deeply knew the problem domain,
  2. designed git very well,
  3. implemented it in a language he mastered.

It is like, you can’t argue for implementing a system like git in C, but in the end git could not have been better (smaller, faster, more portable) if it was implemented in any other language.

I guess for the rest of us the question is always:

  1. should we use a proven solution and take the easy path?
  2. should we invent our own solution possibly using the crude tools we master?

But if we are never arrogant enough to go for #2, how will we ever grow to be able to go for #2 when it is really needed of us?

The Hard Way
There is a (somewhat infamous) book series and online courses about learning to code the hard way. Many programmers like C/C++ perhaps partly because the fact that it is difficult and even a bit unsafe is fun. I think somehow JavaScript has the same appeal in a different way.

Many hackers seem to be struggling with the impossible even though it is hardly worth it from a rational perspective.

I sometimes entertain myself with Hackerrank.com (especially Project Euler). Some challenges are truly hard (so I have not solved them). Some I have solved after weeks of struggle (often using Lisp or C). I used to judge myself, thinking it was an absolute waste of time. On top of everything it made me a bit stressed and gave me occasional sleeping problems because I could not stop thinking about a problem. I am about to reconsider it. And perhaps it is the really hard challenges, that I fail to solve properly, that I should focus on.

Conclusion
I have left a number of unanswered questions in this post. I don’t have answers. But I think it is worth considering the question: do I reach my potential as a programmer the way I try to learn?

VIM: Disable autoindent

More and more often I find that Vim comes with auto indention enabled. I don’t want that.

Perhaps the best way to fix this annoyance is to add the following to your .vimrc file.

" Switch off all auto-indenting
set nocindent
set nosmartindent
set noautoindent
set indentexpr=
filetype indent off
filetype plugin indent off

I found these exact lines here.

Acer Chromebook R13: 3. As a Linux development workstation

I have got an Acer Chromebook R13 and I will write about it from my perspective.

1. Background
2. As a casual computer
3. As a Linux development workstation (this post)

As a Linux development workstation
I switched my Chromebook to Development mode and everything that follows depends on that.

In ChromeOS you can hit CTRL-ALT-T to get a crosh shell. If in Development mode you can run shell to get a regular “unix” shell. You now have access to all of ChromeOS. It looks like this:

crosh> shell
chronos@localhost / $ ls /
bin     dev  home  lost+found  mnt  postinst  root  sbin  tmp  var
debugd  etc  lib   media       opt  proc      run   sys   usr
chronos@localhost / $ ls ~
'Affiliation Database'          login-times
'Affiliation Database-journal'  logout-times
Bookmarks                       'Media Cache'
Cache                           'Network Action Predictor'
Cookies                         'Network Action Predictor-journal'
Cookies-journal                 'Network Persistent State'
'Current Session'               'Origin Bound Certs'
'Current Tabs'                  'Origin Bound Certs-journal'
databases                       'Platform Notifications'
data_reduction_proxy_leveldb    Preferences
DownloadMetadata                previews_opt_out.db
Downloads                       previews_opt_out.db-journal
'Download Service'              QuotaManager
'Extension Rules'               QuotaManager-journal
Extensions                      README
'Extension State'               'RLZ Data'
Favicons                        'RLZ Data.lock'
Favicons-journal                'Service Worker'
'File System'                   'Session Storage'
GCache                          Shortcuts
'GCM Store'                     Shortcuts-journal
GPUCache                        Storage
History                         'Sync App Settings'
History-journal                 'Sync Data'
'History Provider Cache'        'Sync Extension Settings'
IndexedDB                       'Sync FileSystem'
'Last Session'                  Thumbnails
'Last Tabs'                     'Top Sites'
local                           'Top Sites-journal'
'Local App Settings'            'Translate Ranker Model'
'Local Extension Settings'      TransportSecurity
'Local Storage'                 'Visited Links'
log                             'Web Data'
'Login Data'                    'Web Data-journal'
'Login Data-journal'
chronos@localhost / $ uname -a
Linux localhost 3.18.0-16387-g09d1f8eebf5f-dirty #1 SMP PREEMPT Sat Feb 24 13:27:17 PST 2018 aarch64 ARMv8 Processor rev 2 (v8l) GNU/Linux
chronos@localhost / $ df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/root                1.6G  1.4G  248M  85% /
devtmpfs                 2.0G     0  2.0G   0% /dev
tmp                      2.0G  248K  2.0G   1% /tmp
run                      2.0G  456K  2.0G   1% /run
shmfs                    2.0G   24M  1.9G   2% /dev/shm
/dev/mmcblk0p1            53G  1.3G   49G   3% /mnt/stateful_partition
/dev/mmcblk0p8            12M   28K   12M   1% /usr/share/oem
/dev/mapper/encstateful   16G   48M   16G   1% /mnt/stateful_partition/encrypted
media                    2.0G     0  2.0G   0% /media
none                     2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs                    128K   12K  116K  10% /run/crw

This is quite good! But we all know that starting to install things and modifying such a system can cause trouble.

Now, there is a tool called Crouton that allows us to install a Linux system (Debian or Ubuntu) into a chroot. We can even run X if we want. So, I would say that for doing development work on your Chromebook you have (at least) 5 options:

  1. Install things directly in ChromeOS
  2. Crouton: command line tools only
  3. Crouton: xiwi – run X and (for example) XFCE inside a ChromeOS window
  4. Crouton: X – run X side by side with ChromeOS
  5. Get rid of ChromeOS and install (for example) Arch instead

I will explore some of the options.

#2. Crouton command line tools only
For the time being, I don’t really need X and a Window Manager. I am fine (I think) with the ChromeOS UI and UX. After downloading crouton I ran:

sudo sh ./crouton -n deb-cli -r stretch -t cli-extra

This gave me a Debian Stretch system without X, named deb-cli (in case I want to have other chroots in the future). Installation took a few minutes.

To access Debian I now need to

  1. CTRL-ALT-T : to get a crosh shell
  2. crosh> shell : to get a ChromeOS unix shell
  3. $ sudo startcli : to get a shell in my Debian strech system

This is clearly a sub-optimal solution to get a shell tab (and closing the shell takes 3x exit). However, it works very well. I installed Node.js (for ARMv8) and in a few minutes I had cloned my git nodejs-project, installed npm packages, run everything and even pushed some code. I ran a web server on 127.0.0.1 and I could access it from the browser just as expected (so this is much more smooth than a virtual machine).

For my purposes I think this is good enough. I am not very tempted to get X up an running side-by-side with ChromeOS. However I obviously would like things like shortcuts and virtual desktops.

Actually, I think a chroot is quite good. It does not modify the base system the way package managers for OS X tend to do. I don’t need to mess with PATH and other variables. And I get a more complete Debian system compared to just the package manager. And it is actually the real Debian packages I install.

I installed Secure Shell and Crosh Window allowing me to change some defaults parameters of the terminal (by hitting CTRL-SHIFT-P), so at least I dont need to adjust the font size for every terminal.

#4. Crouton with XFCE
Well, this is going so good that I decided to try XFCE as well.

sudo sh ./crouton -n deb-xfce -r stretch -t xfce,extensions

It takes a while to install, but when done just run:

sudo startxfce4

The result is actually pretty nice. You switch between ChromeOS and XFCE with CTRL-ALT-SHIFT-BACK/FORWARD (the buttons next to ESC). The switching is a little slow, but it gives you a (quite needed) virtual desktop. Install crouton extensions in ChromeOS to allow copy-paste. A good thing is that I can run:

sudo enter-chroot -n deb-xfce

to enter my xfce-chroot without starting X and XFCE. So, for practical purposes I can have an X-chroot but I dont need to start X if I dont want to.

screen
After a while I have uninstalled XFCE and I only use crouton with cli. The terminal (part of the Chrome browser) is a bit sub-optimal. My idea is to learn to master screen, however:

$ screen
Cannot make directory '/run/screen': Permission denied

This is easily fixed though (link):

mkdir ~/.screen
chmod 700 ~/.screen

# add to .bashrc
export SCREENDIR=$HOME/.screen

# and a vim "alias" I found handy
svim () { screen -t $1 vim $1; }

I found that I get problems when I edit UTF-8 files in VIM in screen in crouton in a crosh shell. Without screen there are also issues, but slightly less so. It seems to be a good idea to add the following line to .vimrc:

set encoding=utf8

It improves the situation, but still a few glitches.

Now at least screen works. It remains to be seen if I can master it.

lighttpd
I installed lighttpd just the normal Debian way. It does not start automatically, but the normal way works:

$ $ sudo service lighttpd start

If you close your last crouton-session without stopping lighttpd you get:

$ exit
logout
Unmounting /mnt/stateful_partition/crouton/chroots/deb-cli...
Sending SIGTERM to processes under /mnt/stateful_partition/crouton/chroots/deb-cli...

That stopped lighttpd after a few seconds, but I guess a manual stop is preferred.

Performance
I have written about NUC vs RPi before and to be honest I was worried that my ARM Chromebook would more have the poor performance of the RPi than the decent performance of the NUC. I would say this is not a problem, the Acer R13 is generally fast enough.

After a few Nodejs tests, it seems the Acer Chromebook R13 is about 5-6 times faster than an RPi V2.

A C-program (some use of 64-bit double floats, little memory footprint) puts it side-by-side with my Celeron/NUC:

                s
RPi V1        142
RPi V2         74
Acer R13       12.5
Celeron J3455  13.0
i5-4250U        7.5

Benchmarks are always tricky, but I think this gives an indication.

Vue.js: loading template html files

You may want to code your Vue.js application in such way that your html templates are in separate html files, but you still do not want a build/compile step. Well, the people writing Vue dont want you do do this, but it can easily be done.

VueWithHtmlLoader-library
I wrote a little library that simply does what is required in a rather simple way. I will not hold you back and I will show you by example immediately:

  • A Rock-paper-scissors Vue-app, all in 1 file: link
  • A Rock-paper-scissors Vue-app, modularised with separate html/js files: link
  • Source of VueWithHtmlLoader library: link

These are the code changes needed to use VueWithHtmlLoader:

 * 1) After including "vue.js", and
 *    before including your component javascript files,
 *    include "vuewithhtmlloader.js"
 *
 * 2) In your component javascript files
 *    replace: Vue.component(
 *       with: VueWithHtmlLoader.component(
 *
 *    replace: template: '...'
 *       with: templateurl: 'component-template.html' (replace with your url)
 *
 * 3) The call to "new Vue()" needs to be delayed, like:
 *    replace: var myVue = new Vue(...);
 *       with: var myVue;          
 *             function initVue() {
 *               myVue = new Vue(...);
 *             }
 *             VueWithHtmlLoader.done(initVue);

My intention that the very simple Rock-paper-scissors-app shall work as an example.

Disclaimer: the library is just written and tested only with this application. The application is written primarily to demonstrate the library. The focus has been clarity and simplicity. Please feel free to suggest improvements to the library or the application, but keep in mind that it was never my intention to follow all best practices. The purpose of the library is to break a Vue best practice.

What the library does:

  1. It creates a global object: VueWithHtmlLoader
  2. It provides a function: VueWithHtmlLoader.component() that you shall use instead of Vue.component() (there may be unsupported/untested cases)
  3. When using VueWithHtmlLoader.component(), you can provide templateurl:’mytemplate.html’ instead of template:’whatever Vue normally supports’
  4. The Vue()-constructor must be called after all templateurls have been downloaded. To facilitate this, place the code that calls new Vue() inside a function, and pass that function to VueWithHtmlLoader.done()
  5. The library will now load all templateurls. When an html template is successfully downloaded over the network Vue.component() is called normally.
  6. When all components are initiated, new Vue() is called via the provided function

Apart from this, you can and should use the global Vue object normally for all other purposes. There may be more things that you want to happen after new Vue() has been called.

The library has no dependencies (it uses XMLHttpRequest directly).

Background
Obviously there are people (like me) with an AngularJS (that is v1) background who are used to ng-include and like it. We see Vue as a better, smaller AngularJS for the future, but we want to keep our templates in separate files without a build step.

As I see it, there are different sizes of applications (and sizes of team and support around them).

  1. Small single-file applications: I think it is great that Vue supports simple single-file applications (with x-template if you want), implemented like my game above. This has a niche!
  2. Applications that clearly require modularization, but optimizing loading times is not an issue, and you want to use the the simplest tools available (keep html/js separate to allow standard editor support and not require a build step). AngularJS (v1) did this nicely. I intend Vue to do it nicely too with this library.
  3. Applications built by people or organizations that already use Webpack and such tools, or applications that are so demanding that such tools are required.

I fully respect and understand the Vue project does not want to support case 2 out of the box and that they prefer to keep the Vue framework small (and as fast as possible).

But i sense some kind of arrogance with articles like 7 Ways To Define A Component Template in Vue.js. I mean 1,2 are only useful for very small components. 3 is only useful for minimal applications that dont require modularization. 4 has very narrow use cases. 5 is insane for normal development (however, I can see cases where you want to output/generate it). And 6,7 requires a build step.

8. Put the damn HTML in an HTML-file and include it? Nowhere to be seen.

The official objection to 8 is obviously performance. I understand that pre-compiling your html instead of serving html that the client will compile is faster. But compared to everything else this overhead may be negligable. And that is what performance is all about, focusing on what is critical and keeping everything else simple. My experience is that loading data to my applications take much more time than loading the application itself.

The Illusion of Simplicity
AngularJS (v1) gave the illusion of simplicity. You just wrote JavaScript-files and (almost) HTML-files, the browser loaded everything and it just worked. I know this is just an illusion and a lot happens behind the scenes. But my experience is that this illusion works well, and it does not leak too much. Vue.js is so much simpler than AngularJS in so many ways. I think my library can keep my illusion alive.

Other options
There is thread on Stackoverflow about this and there are obviously other solutions. If you want to write .vue-files and load them there is already a library for that. For my solution I was inspired by the simple jquery example, but: 1) it is nice to not have a jquery dependency, 2) it is nice to keep the async stuff in one place, 3) the delayed call of new Vue() seems forgotten.

Feedback, limitations, bugs…
If you have suggestions for improvements or fixes of my library, please let me know! I am happy to make it better and I intend to use it for real applications.

I think this library suits some but not all (or even most) Vue.js applications. Lets not expect it to serve very complex requirements or applications that would actually benefit more of a Webpack treatment.

TODO and DONE

  • A minified version – I have not really decided on ambition/obfuscation level
  • Perhaps change loglevel if minified Vue is used? or not.
  • I had some problems with comments in html-files, but I failed to reproduce them. I think <!– comments –> should definitely be supported.

JavaScript: Sets, Objects and Arrays

JavaScript has a new (well well) fancy Set datastructure (that does not come with functions for union, intersection and the likes, but whatever). A little while ago I tested Binary Search (also not in the standard library) and I was quite impressed with the performance.

When I code JavaScript I often hesitate about using an Array or an Object. And I have not started using Set much.

I decided to make some tests. Lets say we have pseudo-random natural numbers (like 10000 of them). We then want to check if a number is among the 10000 numbers or not (if it is a member of the set). A JavaScript Set does exactly that. A JavaScript Object just requires you to do: set[314] = true and you are basically done (it gets converted to a string, though). For an Array you just push(314), sort the array, and then use binary search to see if the value is there.

Obviously, if you often add or remove value, (re)sorting the Array will be annoying and costly. But quite often this is not the case.

The test
My test consists of generating N=10000 random unique numbers (with distance 1 or 2 between them). I then insert them (in a kind of pseudo-random order) into an Array (and sorts it), into an Object, and into a Set. I measure this time as an initiation time (for each data structure).

I repeat. So now I have 2xArrays, 2xObjects, 2xSets.

This way I can test both iterating and searching with all combinations of data structures (and check that the results are the same and thus correct).

Output of a single run: 100 iterations, N=10000, on a Linux Intel i5 and Node.js 8.9.1 looks like this:

                         ====== Search Structure ======
(ms)                        Array     Object      Set
     Initiate                1338        192      282
===== Iterate =====    
        Array                 800         39       93
       Object                 853        122      170
          Set                1147         82      131

By comparing columns you can compare the cost of searching (and initiating the structure before searching it). By comparing rows you can compare the cost of iterating over the different data structures (for example, iterating over Set while searching Array took 1147ms).

These results are quite consistent on this machine.

Findings
Some findings are very clear (I guess they are quite consistent across systems):

  • Putting values in an Array, to sort it, and the search it, is much slower and makes little sense compared to using an Object (or a Set)
  • Iterating an Array is a bit faster than iterating an Object or Set, so if you are never going to search an Array is faster
  • The newer and more specialized Set offers little advantage to good old Objects

What is more unclear is why iterating over Objects is faster when searching Arrays, but iterating over Sets if faster when searching Objects or Sets. What I find is:

  • Sets seem to perform comparably to Objects on Raspberry Pi, ARMv7.
  • Sets seem to underperform more on Mac OS X

Obviusly, all this is very unclear and can vary depending on CPU-cache, Node-version, OS and other factors.

Smaller and Larger sets
These findings hold quite well for smaller N=100 and larger N=1000000. The Array, despite being O(n log n), does not get much more worse for N=1000000 than it already was for N=10000.

Conclusions and Recommendation
I think the conservative choice is to use Arrays when order is important or you know you will not look for a member based on its unique id. If members have unique IDs and are not ordered, use Object. I see no reason to use Set, especially if you target browsers (support in IE is still limited in early 2018).

The Code
Here follows the source code. Output is not quite as pretty as the table above.

var lodash = require('lodash');

function randomarray(size) {
  var a = new Array(size);
  var x = 0;
  var i, r;
  var j = 0;
  var prime = 3;

  if ( 50   < size ) prime = 31;
  if ( 500  < size ) prime = 313;
  if ( 5000 < size ) prime = 3109;

  for ( i=0 ; i<size ; i++ ) {
    r = 1 + Math.floor(2 * Math.random());
    x += r;
    a[j] = '' + x;
    j += prime;
    if ( size <= j ) j-=size;
  }
  return a;
}

var times = {
  arr : {
    make : 0,
    arr  : 0,
    obj  : 0,
    set  : 0
  },
  obj : {
    make : 0,
    arr  : 0,
    obj  : 0,
    set  : 0
  },
  set : {
    make : 0,
    arr  : 0,
    obj  : 0,
    set  : 0
  }
}

function make_array(a) {
  times.arr.make -= Date.now();
  var i;
  var r = new Array(a.length);
  for ( i=a.length-1 ; 0<=i ; i-- ) {
    r[i] = a[i];
  }
  r.sort();
  times.arr.make += Date.now();
  return r;
}

function make_object(a) {
  times.obj.make -= Date.now();
  var i;
  var r = {};
  for ( i=a.length-1 ; 0<=i ; i-- ) {
    r[a[i]] = true;
  }
  times.obj.make += Date.now();
  return r;
}

function make_set(a) {
  times.set.make -= Date.now();
  var i;
  var r = new Set();
  for ( i=a.length-1 ; 0<=i ; i-- ) {
    r.add(a[i]);
  }
  times.set.make += Date.now();
  return r;
}

function make_triplet(n) {
  var r = randomarray(n);
  return {
    arr : make_array(r),
    obj : make_object(r),
    set : make_set(r)
  };
}

function match_triplets(t1,t2) {
  var i;
  var m = [];
  m.push(match_array_array(t1.arr , t2.arr));
  m.push(match_array_object(t1.arr , t2.obj));
  m.push(match_array_set(t1.arr , t2.set));
  m.push(match_object_array(t1.obj , t2.arr));
  m.push(match_object_object(t1.obj , t2.obj));
  m.push(match_object_set(t1.obj , t2.set));
  m.push(match_set_array(t1.set , t2.arr));
  m.push(match_set_object(t1.set , t2.obj));
  m.push(match_set_set(t1.set , t2.set));
  for ( i=1 ; i<m.length ; i++ ) {
    if ( m[0] !== m[i] ) {
      console.log('m[0]=' + m[0] + ' != m[' + i + ']=' + m[i]);
    }
  }
}

function match_array_array(a1,a2) {
  times.arr.arr -= Date.now();
  var r = 0;
  var i, v;
  for ( i=a1.length-1 ; 0<=i ; i-- ) {
    v = a1[i];
    if ( v === a2[lodash.sortedIndex(a2,v)] ) r++;
  }
  times.arr.arr += Date.now();
  return r;
}

function match_array_object(a1,o2) {
  times.arr.obj -= Date.now();
  var r = 0;
  var i;
  for ( i=a1.length-1 ; 0<=i ; i-- ) {
    if ( o2[a1[i]] ) r++;
  }
  times.arr.obj += Date.now();
  return r;
}

function match_array_set(a1,s2) {
  times.arr.set -= Date.now();
  var r = 0;
  var i;
  for ( i=a1.length-1 ; 0<=i ; i-- ) {
    if ( s2.has(a1[i]) ) r++;
  }
  times.arr.set += Date.now();
  return r;
}

function match_object_array(o1,a2) {
  times.obj.arr -= Date.now();
  var r = 0;
  var v;
  for ( v in o1 ) {
    if ( v === a2[lodash.sortedIndex(a2,v)] ) r++;
  }
  times.obj.arr += Date.now();
  return r;
}

function match_object_object(o1,o2) {
  times.obj.obj -= Date.now();
  var r = 0;
  var v;
  for ( v in o1 ) {
    if ( o2[v] ) r++;
  }
  times.obj.obj += Date.now();
  return r;
}

function match_object_set(o1,s2) {
  times.obj.set -= Date.now();
  var r = 0;
  var v;
  for ( v in o1 ) {
    if ( s2.has(v) ) r++;
  }
  times.obj.set += Date.now();
  return r;
}

function match_set_array(s1,a2) {
  times.set.arr -= Date.now();
  var r = 0;
  var v;
  var iter = s1[Symbol.iterator]();
  while ( ( v = iter.next().value ) ) {
    if ( v === a2[lodash.sortedIndex(a2,v)] ) r++;
  }
  times.set.arr += Date.now();
  return r;
}

function match_set_object(s1,o2) {
  times.set.obj -= Date.now();
  var r = 0;
  var v;
  var iter = s1[Symbol.iterator]();
  while ( ( v = iter.next().value ) ) {
    if ( o2[v] ) r++;
  }
  times.set.obj += Date.now();
  return r;
}

function match_set_set(s1,s2) {
  times.set.set -= Date.now();
  var r = 0;
  var v;
  var iter = s1[Symbol.iterator]();
  while ( ( v = iter.next().value ) ) {
    if ( s2.has(v) ) r++;
  }
  times.set.set += Date.now();
  return r;
}

function main() {
  var i;
  var t1;
  var t2;

  for ( i=0 ; i<100 ; i++ ) {
    t1 = make_triplet(10000);
    t2 = make_triplet(10000);
    match_triplets(t1,t2);
    match_triplets(t2,t1);
  }

  console.log('TIME=' + JSON.stringify(times,null,4));
}

main();

When to (not) use Web Workers?

Web Workers is a mature, simple, standardised, compatible technology for allowing multithreaded JavaScript-applications in the web browser.

I am not going to write about how to use Web Worker (check the excellent MDN article). I am going to write a little about when and why to (not) use Web Worker.

First, Web Workers are about performance. And performance is typically not the best thing to think about first when you code something.

Second, when you have performance problems and you throw more cores at the problem your best speedup is x2, x4 or xN. In 2018 it is quite common with 4 cores and that means in the optimal case you can make your program 4 times faster by using Web Workers. Unfortunately, if it was not fast enough from the beginning chances are a 4x speedup is not going to help much. And the cost of 4x speedup is 4 times more heat is produced, the battery will drain faster, and perhaps other applications will be suffering. A more efficient algorithm can often produce 10-100 times speedup without making the maintainability of the program suffer too much (and there are very many ways to make a non-optimised program faster).

Let us say we have a web application. The user clicks “Show report”, the GUI locks/blocks for 10s and then the report displays. The user might accept that the GUI locks, if just for 1-2 seconds. Or the user might accept that the report takes 10s to compute, if it shows up little by little and the program does not appear hung. The way we could deal with this in JavaScript (which is single thread and asyncronous) is to break the 10s report calculation into small pieces (say 100 pieces each taking 100ms) and after calculating each piece calling window.setTimeout which allows the UI to update (among other things) before calculating another piece of the report. Perhaps a more common and practical approach is to divide the 10s job into logical parts: fetch data, make calculations, make report, but this would not much improve the locked GUI situation since some (or all) parts still take significant (blocking) time.

If we could send the entire 10s job to a Web Worker our program GUI would be completely responsive while the report is generated. Now the key limitation of a web worker (which is also what allows it to be simple and safe):

Data is copied to the Worker before it starts, and copied from the Worker when it has completed (rather than being passed by reference).

This means that if you already have a lot of data, it might be quite expensive to copy that data to the web worker, and it might actually be cheaper to just do the job where the data already is. In the same way, since there is some overhead in calling the Web Worker, you can’t send too many too small pieces of work to it, because you will occupy yourself with sending and receiving messages rather than just doing the job right away.

This leaves us with obvious candidates for web workers (you can use Google):

  • Expensive searches (like chess moves or travelling salesman solutions)
  • Encryption (but chances are you should not do it in JavaScript in the first place, for security reasons)
  • Spell and grammar checker (I don’t know much about this).
  • Background network jobs

This is not too useful in most cases. What would be useful would be to send packages of work (arrays), like streams in a functional programming way: map(), reduce(), sort(), filter().

I decided to write some Web Worker tests based on sort(). Since I can not (easily, and there are probably good reasons) write JavaScript in WordPress I wrote a separate page with the application. Check it out now:

So, for 5 seconds I try to do the following job as many times I can, while I keep track of how much the GUI is suffering:

  1. create an array of 10001 random numbers: O(n)
  2. sort it: O(n log n)
  3. get the median (array[5000]): O(1)

The expensive part is step 2, the sort (well, I actually have not measured 1 vs 2). If the ratio of amount of work done per byte being sent is high enough then it can be worth it to send the job to a Web Worker.

If you run the tests yourself I think you shall see that the first Web Worker tests that outsource all of 1-2-3 are quite ok. But this basically means giving the web worker no data at all and when it has done a significant amount of job, receiving just a few numbers. This is more Web Worker friendly than Chess where at least the board would need to be sent.

If you then run the tests that outsource just sort() you see significantly lower throughput. How suitable sort()? Well, sorting 10k ~ 2^13 elements should require each element to be compared (accessed) about 13 times. And there is no data sent that is not needed by the Web Worker. Just as a counter example: if you send an order to get back the sum of the lines most of the order data is ignored by the Web Worker, and it just needs to access each line value once; much much less suitable than sort().

Findings from tests
I find that sort(), being O(n log n), on an array of numbers is far too fast to be outsourced to a Web Worker. You need to find a much more “dense” problem to benefit of a Web Worker.

Islands of data
If you can design your application in such way that one Web Worker maintains its own full state and just shares small selected parts occationally, that could work. The good thing is that this would also be clean encapsulation of data and separation of responsibilites. The bad thing is that you probably need to design with the Web Worker in mind quite early, and this kind of premature optimization is often a bad idea.

This could be letting a Web Worker do all your I/O. But if most data that you receive is needed in your application, and most data you send comes straight from your application, the benefit is very questionable. An if most data you receive is not needed in your application, perhaps you should not receive so much data in the first place. Even if you process your incoming data quite much: validating, integrating with current state, precalculating I would not expect it to come very close to the computational intensity of my sort().

Conclusions
Unfortunately, the simplicity and safety of Web Worker is unfortunately also its biggest limitation. The primary reason for using a Web Worker should be performance and even for artificial problems it is hard to get any benefit.

Note to self: never try-catch more than necessary!

A wrote a function, and then a unittest, and the unit test was good.
Then I called the function from my real project, and it failed!

I isolated the problem and thought I had found a bug in V8 (except after many years as a programmer I have I learnt it is never the compilers fault).

This was my output:

$ node bug.js 
Test good
main: err=Not JSON

This is my simplified (faulty) code:

function callSomething(callback) {
  var rawdata = '{ "a":"1" }';
  var jsondata; 

  try {
    jsondata = JSON.parse(rawdata);
    callback(null,jsondata);
  } catch (e) {
    callback('Not JSON', null);
  }
}

function test() {
  callSomething(function(err,data) {
    if ( err ) console.log('Test bad: ' + err);
    console.log('Test good');
  });
}

function main() {
  var result = {
    outdata : {}
  };

  callSomething(function(err,data) {
    if ( err ) {
      console.log('main: err=' + err);
    } else {
      result.outata.json = data;
      console.log('main: json=' + JSON.stringify(result.outdata.json));
    }
  });
}

test();
main();

How can the test not fail when main fails?

Well, here is the correct output

$ node nodebug.js 
Test good
main: json={"a":"1"}

of the correct code main function:

function main() {
  var result = {
    outdata : {}
  };

  callSomething(function(err,data) {
    if ( err ) {
      console.log('main: err=' + err);
    } else {
//    result.outata.json = data;
      result.outdata.json = data;
      console.log('main: json=' + JSON.stringify(result.outdata.json));
    }
  });
}

The misnamed property caused an Error which was (unintentionally) caught, causing the anonymous callback function to be called once more, this time with err set, but to the wrong error.

It would have been better to write:

function callSomething(callback) {
  var rawdata = '{ "a":"1" }';
  var jsondata; 

  try {
    jsondata = JSON.parse(rawdata);
  } catch (e) {
    callback('Not JSON', null);
    return;
  }
  callback(null,jsondata);
}

and the misnamed propery error would have crashed the program in the right place.

Conclusion
Don’t ever try more things than necessary. And if you need to try several lines, consider making separate try for each.

Minification of real web Application

I have built and I maintain a reasonably large (AngularJS) web application and here follow a few notes on the effect of minification.

I start with the findings:

                            Uncompressed         GZIP     Minified    Min+GZIP

App 1:  Size        (kb)            1130         1130          843         841
        Transferred (kb)            1150          375          861         308
        Load time    (s)             2.8          1.6          2.7         1.7

App 2:  Size        (kb)             708          708          659         659
        Transferred (kb)             721          359          672         347
        Load time    (s)             4.0          3.5          3.1         3.5

Conclusions
You should always enable gzip on the server. It is faster to compress and send less data than to send the uncompressed data. The benefits of gzip are huge and there are no negative side effects.

Minification saves some bandwidth (and if unlike me you do it ahead of time, some loading time). But unless your code contains mostly comments the effects are marginal (although that might be a big saving if you use very much bandwidth or you are looking for fastest possible load times).

Also, gzip tends to be good at what minification can easily do, and while the effect of minification alone is quite significant, the effect of minification together with gzip is smaller.

Behind the figures
The figures above come from Firefox Load time over the internet.

  • App1: About 100 files are served, mostly .js (a few .html and .css)
  • App2: About 80 files are served, mostly .js (a few .html and .css)
  • App1: Angular is always pre-minified 165kb, gzipped to 67kb.
  • App2: Angular+modules is always pre-minified 298kb, gzipped to 127kb
  • App2 contains a few fonts which are neither minified nor gzipped (142kb)
  • Files served by Node.js
  • Files minified by custom Node.js code in real time
  • Files gzipped by nginx in real time
  • Not everything is initiated when Load is complete (more html-files are loaded dynamically as user navigates, and data is loaded from APIs on demand)

Implications of minification
Minification (and possibly packaging of code) has more implications than gzip. Possible negative side effects are:

  • A build process is not strictly needed for web development, but minification is often done as part of a build process, increasing complexity of development, testing and deployment.
  • Testing and development is made harder when debugging minified code (although there are tools to mitigate this).
  • More aggressive minification can have unexpected results

The minification code I run in Node.js, when I serve a file, basically just:

  • Removes all white space in the beginning and end of lines
  • Removes all comments

This nice thing about this simple minification strategy is that everything that is obviously just waste is removed at a low cost, but the code is for all practical purposes completely unchanged (even line numbers are preserved to not complicate debugging). Also, developers should feel free to write as many comments as they like in the code, yet comments should never be served in a public facing application. More powerful minification comes at higher costs, and the effects are probably mostly lost after gzip.

I guess every project and system have a sweet spot when it comes to minification and I think my simple minification strategy makes sense for my needs.

Programming paradigms rock and suck!

I wrote a few articles about why functional programming sucks (1, 2) and why some functional libraries suck (3, 4). Obviously the titles were a little click bait and I think anyone reading the articles understood that I argue that it is the mindless hype that sucks, not the (FP) paradigm itself. Anyways, writing those articles, getting some feedback on them, and programming on my projects gave me more thoughts.

Lets say we have these programming paradigms:

  • Functional Programming: Pure, testable functions, avoiding state and variables.
  • Object Oriented Programming: Encapsulating data inside objects exposing a simplified, safe interfaces
  • Imperative Programming: Explicitly instructing the computer step by step what to do and how to do it, changing state (or Procedural Programming if you like)
  • Declarative Programming: Expressing your problem as data, and something else takes that data and produce what you want
  • State Machines: Global data and distinct states: your program reacts to input, transitions between states and modifies its data

As you perhaps understand, I do not aspire to be a computer scientist. This is all very practical.

I find – and I dont know if you will find this obvious or outrageous – that all these paradigms have strengths and weaknesses, and that a reasonably sized program or system will need to use a combination of them.

Here follows some strenghts and weaknesses:

Functional Programming Strengths

  • High reusability
  • Highly testable code
  • Compact code
  • Transformation, or streaming, or raw data matches the reality of networked services

Functional Programming Weaknesses

  • Obscurity – some code can be very cryptic and hard to read
  • Over engineering – attempting to make super reusable code can complicate simple things
  • Avoiding state – sometimes you have a state and you need to deal with it (and with FP there is a risk you try to avoid reality rather than face it head on)
  • Performance – FP comes with some overhead
  • Algorithmic complexity – it can be hard to understand the algorithmic complexity, or to write an efficient implementation, and this can lead to performance problems

Object Oriented Programming Strengths

  • Encourages clear APIs
  • Encourages reusability
  • Encourages information modelling
  • Allows refactoring of implementation

Object Oriented Programming Weaknesses

  • It adds little (no) value and significant overhead to serialize/deserialize data as it is sent/received and stored/loaded
  • Inheritance is a questionable concept
  • Ideally, objects have only one-way-dependencies, but in the real world this creates difficult, artificial design problems and complexity
  • It adds code and concepts that may cost more than you practically get from it: public/private declaration, getters/setters

Imperative Programming Strengths

  • Simple and straight forward
  • Little overhead: high performance
  • Algorithms explicitly implemented: high performance
  • Works the same in many different languages

Imperative Programming Weaknesses

  • Bad testability: if you have too big procedures with too many side effects
  • Bad maintainability: if you end up writing too big, complex modules
  • You can end up writing much code, copying and pasting
  • It takes discipline

Declarative Programming Strengths

  • Can be very compact
  • Can be very easy to read (if you understand/accept the “language”)
  • Can allow for quickly adding or changing features

Declarative Programming Weaknesses

  • Performance may be bad and/or unpredictable
  • Although compact, it can be very cryptic
  • Debugging can be very hard

State Machine Strengths

  • Can deal with complex states
  • Suitable for error handling, recovery
  • Can deliver efficiency at system level

State Machine Weaknesses

  • Requires proper analysis and design upfront
  • May be hard to refactor or change
  • Complex (represents and deals with complexity rather than hides it or lets someone else take care of it)

I now imagine a simple mobile/web application (like an little online betting site). There is data storage, server side application logic, authentication, http APIs, client loading data, client side application logic, UX, and the user saving/updating data to the system.

Server Side
The server itself should be thought of as a state machine. It can be up and good, but it can also be starting or shutting down. It can have bad or missing connectivity to other services (authentication, payment, data feeds). It can be in maintenance mode or perhaps in test or debug mode. It needs to hold and renew cached data. All these things can dramatially affect how a simple API call is handled! Failing to do this in a structured way can lead to very complicated API implementions or severe performance or stability problems.

If there are adapters connecting to other systems or the storage these may very well be implemented in an Object Oriented way. They expose a simple and safe API and they hide a lot of implementation.

When it comes to server side business logic it is fed with input from the storage or cache and its output is sent over the network to clients (or the other way around). Such business logic should be designed in a Functional way: it should be clear what it does and what data it uses, and it can and should be testable.

The implementation of the business logic or the adapters may be performance critical and non trivial. Imperative programming can be used here – inside what is exposed as Functional or Object oriented code. It must not leak. But every bolt and nut in an FP function does not have to be implemented using FP principles, and every internal part of an Object need not be built on the principles of Object Orientation.

Finally, the definition of the APIs, the access rules can be Declared. Other code can execute these rules and declarations behind the scenes.

Client Side
The client also needs to be thought of as a state machine first. Is the user authenticated and logged in? Have we received all data, or are we still waiting for something? Have we received the latest updates or have there been connection problems? Does the user have edited, dirty, state that is not saved to the server? Are we having errors saving data that we are retrying? Where in the application is the user and what settings, configurations or policies applies to the user? You need to deal with all these things. If you try break it into many small modules with no mutual dependencies you will find that is very hard. Put the damn global state somewhere, and accept that it is a global state. Make it all publicly readable for anything that cares. Changing of the state is a completely different business that must only be done via specific interfaces (the entire state machine can appear like an Object).

Now Declare your user interface: the pages, buttons, colors and everything else. Write code that consumes these declarations and produce the UX. To the state machine this code may appear Object Oriented, and your UX components are probably some kind of OO objects, reusable across the pages.

Whenever you can, break stuff out into Functional, pure, testable functions.

And when it comes to getting stuff done, as long as your code is contained within Objects or Functions and they do not leak: write simple and efficient code Imperative code if that is the best.

The GUI can be described in data rather than code in a declarative way.

Conclusion
It is quite pointless to talk about (for example) Functional programming as better or worse than anything else. It simply depends on. And for programs/systems of non-trivial size and complexity, mixing programming paradigms are fine.

Fast binary search without Division

Arrays are simple datastructures but finding elements is not fast. However, if you have a sorted array, a favourite algorithm is Binary Search.

When it comes to JavaScript there is no binary search in the standard library. Standard functions like find and indexOf simply just don’t cut it for many purposes, being O(n).

Rosetta Code has two implementations of binary search for JavaScript. However, binary search is based on division (by 2, you split the search interval in half every iteration). In programming languages with integer math division by 2 is performed by doing a single shift, so it is a cheap operation (division is generally expensive). In JavaScript integer division by two looks like:

mid = Math.floor((lo + hi) / 2);

This is just… unacceptable. Perhaps V8 recognizes the pattern and replaces it with the right thing, I don’t care, floating point math has nothing to do with binary search and this is ugly.

Perhaps I can do better? Well, I actually can!

Instead of dividing the actual interval by 2 in every iteration you can use the powers of two: 32,16,8,4,2,1 and search with them. Lets say the array is 41 elements long. Start check element 32. If its too little try with 32+8 (skip 32+16 > 41), and if it is too large try with 16. That way you essentially do a binary search without a division.

Since I don’t want to do heap allocation for every search I calculate the powers of two before declaring the function. In the code below my maximum value is 2^31, so I can mostly search arrays of size 2^32-1. If you dont like that limit you can raise 32 to whatever you like (but at 52-53 you are running into new problems, and well before that you will run out of RAM).

function powers_init(s) {
  var i;
  var r = new Array(s);
  r[0] = 1;
  for ( i=1 ; i<s ; i++ ) {
    r[i] = 2*r[i-1];
  }
  return r;
}

var powers = powers_init(32);  // [1,2,4,8,16,32,...

function binary_search_divisionless(a, value) {
  var pix = 0;
  var aval;
  var offset0 = 0;
  var offset1;

  if ( a[0] === value ) return 0;
  while ( powers[pix+1] < a.length ) pix+= 1;

  while ( 0 <= pix ) {
    offset1 = offset0 + powers[pix];
    if ( offset1 < a.length ) {
      aval = a[offset1];
      if ( value === aval ) {
        return offset1;
      } else if ( aval < value ) {
        offset0 = offset1;
      }
    }
    pix--;
  }

  return -1;
}

It is perhaps not obvious why I check for a[0] before doing anything else. The function returns from the main while loop as soon as it finds what it is looking for. So a[offset0] does not contain the value in later iterations when offset>0. However, this is not guaranteed from the beginning when offset0=0, and it is not automatically being tested in the end. So I explicitly test it first.

Benchmark
Below follows relative performance in time for different array sizes.

Array Size:                  10      100     1000    10000   100000
=======================================================================
Standard Library indexOf   1.65     2.02    11.89    94.00   790.00
Rosetta Code Recursive     1.32     1.48     1.84     1.81     2.02
Rosetta Code Imperative    1.18     1.08     1.41     1.43     1.45
Divisionless               1.00     1.00     1.00     1.00     1.00
   - unrolled              0.93     0.76     0.82     0.79     0.78

I am satisfied that my code is consistently faster than the alternatives. What surprised me was that binary search clearly wins even for short arrays (10). The little loop before the big loop can be unrolled for significant performance gain:

//while ( powers[pix+ 1] < a.length ) pix+= 1;
  if ( powers[pix+16] < a.length ) pix+=16;
  if ( powers[pix+ 8] < a.length ) pix+= 8;
  if ( powers[pix+ 4] < a.length ) pix+= 4;
  if ( powers[pix+ 2] < a.length ) pix+= 2;
  if ( powers[pix+ 1] < a.length ) pix+= 1;

This is also the reason why 32 was a particularly good array size for the powers of two. This is a kind of optimization I would normally not let into my code. However, if you make much use of binary search on the Node.js server side of an application, go ahead.

It is also worth noting that the following patch doubles the execution time of my code!

//  if ( offset1 < a.length ) {
//    aval = a[offset1];
    if ( undefined !== ( aval = a[offset1] ) ) {

General Compare Function
With little modification, the algorithm can be used with a compare function for arrays of objects other than numbers (or strings).

Division
Since my algorithm only uses powers of 2, I can do division in JavaScript without using Math.floor. It is a quite easy change to eleminate the powers-array and divide by two instead. It turns out it make very little difference on performance (to do integer / 2 instead of array lookup). However, when I added a (meaningless) Math.floor() around the division, performance dropped the same as the the iterative version from Rosetta Code. So my intuition was correct to avoid it.

I checked the Lodash binary search code (sortedIndexOf) and it uses bit shift <<< to divide by two. Admittedly, the Lodash code is slightly faster than my code.

Conclusion
You should have a well implemented and tested binary search function available in your toolbox (admittedly, use Lodash for this). This is not the right place to lose milliseconds. For very small arrays you can argue that indexOf works equally well, but the cost of using binary search is insignificant.