Tag Archives: JavaScript

Sort strings without case sensitivity

In JavaScript, I wanted to sort arrays of strings without caring about case. It was more complicated than I first thought.

The background is that I present lists like this in a GUI:

  • AMD
  • Apple
  • Gigabyte
  • IBM
  • Intel
  • Microsoft
  • MSI
  • Nokia
  • Samsung
  • Sony

I want AMD and MSI (spelled in all caps) to be sorted without respect to case. Standard sort() would put MSI before Microsoft.

Obviously I am not the first one wanting to do this and I found an article on stackoverflow. It suggests the following solution:

Use toLowerCase()
You can make your own string compare function that uses toLowerCase and send it as an argument to sort():

function cmpCaseless(a,b) {
    a = a.toLowerCase();
    b = b.toLowerCase();
    if ( a < b ) return -1;
    if ( a > b ) return  1;
    return 0;
}

myStringArray.sort(cmpCaseless);

This has a number of problems. The article above mentions that it is not stable. That is probably true in some cases but I was of course worried about performance: making two String objects for each compare should make the garbage collector quite busy, not to mention the waste of copying and lowercasing potentially quite long stings when usually the first character is enought. When I started experimenting I found another more critical flaw though: in Swedish we have three extra characters in the alphabet; Å,Ä,Ö, in that order. The above cmpCaseless orders Ä,Å,Ö, which sounds like a little problem, but it is simply unacceptable.

Use localeCompare
There is a more competent (or so I thought, read on) way to compare strings in JavaScript: the localeCompare function. This one simply treats A,Å,Ä and O,Ö as the same character, which is far more unacceptable than the toLowerCase problem.

However, it also has a “locales” option (a second optional argument). If I set it to ‘sv’ I get the sort order that I want, but performance is horrible. And I still have to use toLowerCase as well as localeCompare:

function localeCompare(a,b) {
    return a.toLowerCase().localeCompare(b.toLowerCase());
}

function localeCompare_sv(a,b) {
    return a.toLowerCase().localeCompare(b.toLowerCase(), 'sv');
}

localeCompare() has an extra options argument with a “sensitivity” parameter, but it is no good for our purpuses.

Rolling my own
Of course, I ended up building my own function to do caseless string compare. The strategy is to compare one character at a time, not making any new String objects, and fallback to localeCompare if both characters are above the 127 ASCII characters:

function custom(a,b) {
    var i, al, bl, l;
    var ac, bc;
    al = a.length;
    bl = b.length;
    l = al < bl ? al : bl;
        
    for ( i=0 ; i<l ; i++ ) {
        ac = a.codePointAt(i);  // or charCodeAt() for better compability
        bc = b.codePointAt(i);
        if ( 64 < ac && ac < 91 ) ac += 32;
        if ( 64 < bc && bc < 91 ) bc += 32;
        if ( ac !== bc ) { 
            if ( 127 < ac && 127 < bc ) {
                ac = a.substr(i,1).toLowerCase();
                bc = b.substr(i,1).toLowerCase();
                if ( ac !== bc ) return ac.localeCompare(bc);
            } else {
                return ac-bc;
            }
        }
    }
    return al-bl;
}

One fascinating thing is that here I can use localeCompare() without 'sv'.

Test for yourself
I built a simple webpage where you can test everything yourself.

Conclusion
Defining a string sort order is not trivial, when you dont just have ASCII characters. If you look at the ascii table you see that non alphabetic characters are spread out:

  • SPACE, #, 1-9, and many more come before both A-Z and a-z
  • Underscore: _, and a few other characters come after A-Z but before a-z
  • Pipe: | and a few other characters come after A-Z and a-z

When it comes to characters behind ASCII 127, it just gets more complicated: how do you sort european language latin letters, greek letters and arrows and other symbols?

For this reason, I think it makes sense to define your own sorting function and clearly define the behaviour for the characters you know that you care about. If it really matters in your application.

My function above is significantly faster than the options.

Disclaimer
These results can probably be inconsistent over different web browsers.

Playing with smart.js and V7

I have been playing quite much with Node.js lately, and I have put some effort into trying to build it for typical OpenWRT hardware. It turns out that Node.js/V8 is not, and will never be, suitable for hardware without an FPU and at least 128MB RAM.

My curiousity led me to smart.js which is based on the V7 javascript engine. Among the positive details were posix compability, ECMA script 5.1 support, HTTP support (not very different from Node.js, just much less of it), and supposed to be the fastest non-compiled JavaScript engine.

smart.js
smart.js seems to be the IoT-platform, while v7 is just the JavaScript engine. smart.js had some peculiar properties:

  • The source zip I downloaded from github failed to compiled becuase of a missing .git-directory. Using git clone solved this.
  • The binary always reads and executes smart.js from the same folder. If you give more command line arguments, it executes those after smart.js.
  • I found no way to pass command line arguments.

It comes with a little set of IoT-example files, and I found this a little confusing.

v7
I found out it was possible to use v7 standalone. It is a very nice v7.c file that is simply compiled:

$ gcc -O3 -DV7_EXE -o v7 v7.c -lm

This is very promising – a lot easier than building node.

I found that neither console.log or process.argv exist and found ways to work around this. My plan was to build a little benchmark suite and test different things. I did not get so far: the below program just creates 100 random objects (TestBit) 10 times and stores in arrays.

if ( 'undefined' === typeof process ) {
  process = {
    exit : exit,
    argv : [ 'v7', 'v7bench.c' ]  // faking command line arguments
  };
}

if ( 'undefined' === typeof console ) {
  console = {
    log : print
  }
}

var TestBit = function() {
  this.n  = Math.random();
  this.s  = '' + this.n;
  this.s1 = this.s.substr(2,1);
  this.s2 = this.s.substr(3,2);
  this.s3 = this.s.substr(5,3);
  this.b  = this.s > 0.5;
};

var generateTestBits = function(n) {
  var i;
  var r = new Array(n);
  for ( i=0 ; i<n ; i++ ) {
    r[i] = new TestBit();
  }
  return r;
}

var Timer = function() {
  this.start = Date.now();
  this.last  = this.start;

  this.split = function() {
    var r;
    var n = Date.now();
    r = n - this.last;
    this.last = n;
    return r;
  }
}

var timer = new Timer();

var i = 0;
var tests = [];
var tmp;

console.log('Start');

while(i<10) {
  tests.push(generateTestBits(100));
  i++;
  console.log('' + ( i*100) + ':' + timer.split());
}

This simple program unfortunately proved that performance is much worse than I could expect. Below are timings for generating the 1000 objects, in steps of 100 at a time. Benchmarks on a RPi2 900MHz ARMv7 CPU.

There is a flag (-vo) described as “object arena size” that I decided to play with (values 1, 10, 100 gave very similar results as 1000)

       default    -vo 1000    -vo 5000    -vo 10000
 100      0.2s       0.12s        6.8s          28s                         
 200      0.4s       0.16s        8.6s
 300      1.3s       0.23s       12  s
 400      7.4s       0.24s        2.6s
 500     13  s       0.37s       15  s
 600     35  s       0.93s       32  s
 700     47  s       2.9 s       48  s
 800     79  s       4.6 s       45  s
 900    160  s       5.9 s       36  s
1000    160  s       8.0 s       37  s

Well, generating 100 random objects on a RPi2 using a simple interpreter could take 0.2 seconds (the top left value): quite reasonable I guess. As the number of generated objects grows the performance is completey ruined. With default parameter, a full 1.6 seconds is spent to generate one single “TestBit”. Memory usage of the v7 process is insignificant.

I dont know what the “object arena size” setting does, but it obviously changes something. The 400-value for the vo=5000 series is actually reproducable.

Even for an IoT-platform I find this performance (and the lack of predictability) unacceptable. V7 officially aims to be the fastest interpreted JavaScript engine out there: it must be cabable of handling what is actually small arrays (10×100 elements) of small objects.

JavaScript: switch options

Is the nicest solution also the fastest?

Here is a little thing I ran into that I found interesting enough to test it. In JavaScript, you get a parameter (from a user, perhaps a web service), and depending on the parameter value you will call a particular function.

The first solution that comes to my mind is a switch:

function test_switch(code) {
  switch ( code ) {
  case 'Alfa':
    call_alfa();
    break;
  ...
  case 'Mike':
    call_mike();
    break;
  }
  call_default();
}

That is good if you know all the labels when you write the code. A more compact solution that allows you to dynamically add functions is to let the functions just be properties of an object:

x1 = {
  Alfa:call_alfa,
  Bravo:call_bravo,
  Charlie:call_charlie,
...
  Mike:call_mike
};

function test_prop(code) {
  var f = x1[code];
  if ( f ) f();
  else call_default();
}

And as a variant – not really making sense in this simple example but anyway – you could loop over the properties (functions) until you find the right one:

function test_prop_loop(code) {
  var p;
  for ( p in x1 ) {
    if ( p === code ) {
      x1[p]();
      return;
    }
  }
  call_default();
}

And, since we are into loops, this construction does not make so much sense in this simple example, but anyway:

x2 = [
  { code:'Alfa'     ,func:call_alfa    },
  { code:'Bravo'    ,func:call_bravo   },
  { code:'Charlie'  ,func:call_charlie },
...
  { code:'Mike'     ,func:call_mike    }
];

function test_array_loop(code) {
  var i, o;
  for ( i=0 ; i<x2.length ; i++ ) {
    o = x2[i];
    if ( o.code === code ) {
      o.func();
      return;
    }
  }
  call_default();
}

Alfa, Bravo…, Mike and default
I created exactly 13 options, and labeled them Alfa, Bravo, … Mike. And all the test functions accept invalid code and falls back to a default function.

The loops should clearly be worse for more options. However it is not obvious what the cost is for more options in the switch case.

I will make three test runs: 5 options (Alfa to Echo), 13 options (Alfa to Mike) and 14 options (Alfa to November) where the last one ends up in default. For each run, each of the 5/13/14 options will be equally frequent.

Benchmark Results
I am benchmarking using Node.js 0.12.2 on a Raspberry Pi 1. The startup time for Nodejs is 2.35 seconds, and I have reduced that from all benchmark times. I also ran the benchmarks on a MacBook Air with nodejs 0.10.35. All benchmarks were repeated three times and the median has been used. Iteration count: 1000000.

(ms)       ======== RPi ========     ==== MacBook Air ====
              5      13      14         5      13      14
============================================================
switch     1650    1890    1930        21      28      30
prop       2240    2330    2890        22      23      37
proploop   2740    3300    3490        31      37      38
loop       2740    4740    4750        23      34      36

Conclusions
Well, most notable (and again), the RPi ARMv6 is not fast running Node.js!

Using the simple property construction seems to make sense from a performance perspective, although the good old switch also fast. The loops have no advantages. Also, the penalty for the default case is quite heavy for the simple property case; if you know the “code” is valid the property scales very nicely.

It is however a little interesting that on the ARM the loop over properties is better than the loop over integers. On the x64 it is the other way around.

Variants of Simple Property Case
The following are essentially equally fast:

function test_prop(code) {
  var f = x1[code];   
  if ( f ) f();       
  else call_x();                        
}   

function test_prop(code) {
  var f = x1[code];   
  if ( 'function' === typeof f ) f();
  else call_x();                        
}   

function test_prop(code) {
  x1[code]();                          
}   

So, it does not cost much to have a safety test and a default case (just in case), but it is expensive to use it. This one, however:

function test_prop(code) {
  try {
    x1[code]();
  } catch(e) {
    call_x();
  }
}

comes at a cost of 5ms on the MacBook, when the catch is never used. If the catch is used (1 out of 14) the run takes a full second instead of 37ms!

Faking a good goto in JavaScript

There are cases where gotos are good (most possible uses of gotos are not good). I needed to write JavaScript functions (for running in NodeJS) where I wanted to call the callback function just once in the end (to make things as clear as possible). In C that would be (this is a simplified example):

void withgoto(int x, void(*callback)(int) ) {
  int r;

  if ( (r = test1(x)) )
    goto done;

  if ( (r = test2(x)) )
    goto done;
 
  if ( (r = test3(x)) )
    goto done;
 
  r = 0;
done:
  (*callback)(r);
}

I think that looks nice! I mean the way goto controls the flow, not the syntax for function pointers.

JavaScript: multiple callbacks
The most obvious way to me to write this in JavaScript was:

var with4callbacks = function(x, callback) {
  var r

  if ( r = test1(x) ) {
    callback(r)
    return
  }

  if ( r = test2(x) ) {
    callback(r)
    return
  }

  if ( r = test3(x) ) {
    callback(r)
    return
  }
  r = 0
  callback(r)
}

This works perfectly, of course. But it is not nice with callback in several places. It is annoying (bloated) to always write return after callback. And in other cases it can be a little unclear if callback is called zero times, or more than one time… which is basically catastrophic. What options are there?

JavaScript: abusing exceptions
My first idea was to abuse the throw/catch-construction:

var withexceptions = function(x, callback) {
  var r

  try {
    if ( r = test1(x) )
      throw null

    if ( r = test2(x) )
      throw null

    if ( r = test3(x) )
      throw null

    r = 0
  } catch(e) {
  }
  callback(r)
}

This works just perfectly. In a more real world case you would probably put some code in the catch block. Is it good style? Maybe not.

JavaScript: an internal function
With an internal (is it called so?) function, a return does the job:

var withinternalfunc = function(x, callback) {
  var r
  var f

  f = function() {
    if ( r = test1(x) )
      return

    if ( r = test2(x) )
      return

    if ( r = test3(x) )
      return

    r = 0
  }
  f()
  callback(r)
}

Well, this looks like JavaScript, but it is not super clear.

JavaScript: an external function
You can also do with an external function (risking that you need to pass plenty of parameters to it, but in my simple example that is not an issue):

var externalfunc = function(x) {
  var r
  if ( r = test1(x) )
    return r

  if ( r = test2(x) )
    return r

  if ( r = test3(x) )
    return r

  return 0
}

var withexternalfunc = function(x, callback) {
  callback(externalfunc(x))
}

Do you think the readability is improved compared to the goto code? I don’t think so.

JavaScript: Break out of Block
Finally (and I got help coming up with this one), it is possible to do:

var withbreakblock = function(x, callback) {
  var r
  var f

myblock:
  {
    if ( r = test1(x) )
      break myblock

    if ( r = test2(x) )
      break myblock

    if ( r = test3(x) )
      break myblock

    r = 0
  }
  callback(r)
}

Well, that is at close to the goto construction I come with JavaScript. Pretty nice!

JavaScript: Multiple if(done)
Using a done-variable and multiple if statements is also possible:

var with3ifs = function(x, callback) {
  var r
  var done = false

  if ( r = test1(x) )
    done = true

  if ( !done ) {
    if ( r = test2(x) )
      done = true
  }

  if ( !done ) {
    if ( r = test3(x) )
      done = true
  }

  if ( !done ) {
    r = 0
  } 
  callback(r)
}

Hardly pretty, I think. The longer the code gets (the more sequential ifs there is), the higher the penalty for the ifs will be.

Performance
Which one I choose may depend on performance, if the difference is big. They should all be fast, but:

  • It is quite unclear what the cost of throwing (an exception) is
  • The internal function, is it recompiled and what is the cost?

I measured performance as (millions of) calls to the function per second. The test functions are rather cheap, and x is an integer in this case.

I did three test runs:

  1. The fall through case (r=0) is relatively rare (~8%)
  2. The fall through case is very common (~92%)
  3. The fall through case is extremely common (>99.99%)

In real applications fallthrough rate may be the most common case, with no error input data found. The benchmark environment is:

  • Mac Book Air Core i5@1.4GHz
  • C Compiler: Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
  • C Flags: -O2
  • Node version: v0.10.35 (installed from pkgsrc.org, x86_64 version)

Performance was rather consistent over several runs (for 1000 000 calls each):

Fallthrough Rate           ~8%      ~92      >99.99%      
---------------------------------------------------------
     C: withgoto           66.7     76.9     83.3  Mops
NodeJS: with4callbacks     14.9     14.7     16.4  Mops
NodeJS: with exceptions     3.67     8.77    10.3  Mops
NodeJS: withinternalfunc    8.33     8.54     9.09 Mops
NodeJS: withexternalfunc   14.5     14.9     15.6  Mops
NodeJS: withbreakblock     14.9     15.4     17.5  Mops
NodeJS: with3ifs           15.2     15.6     16.9  Mops

The C code was row-by-row translated into the JavaScript code. The performance difference is between C/Clang and NodeJS, not thanks to the goto construction itself of course.

On Recursion
In JavaScript it is quite natural to do recursion when you deal with callbacks. So I decided to run the same benchmarks using recursion instead of a loop. Each recursion step involves three called ( function()->callback()->next()-> ). With this setup the maximum recursion depth was about 3×5300 (perhaps close to 16535?). That may sound much, but not enough to produce any benchmarks. Do I need to mention that C delivered 1000 000 recursive calls at exactly the same performance as the loop?

Conclusion
For real code 3.7 millions exceptions per second sounds pretty far fetched. Unless you are in a tight loop (which you probably are not, when you deal with callbacks), all solutions will perform well. However, the break out of a block is clearly the most elegant way and also the most efficient, second only to the real goto of course. I suspect the generally higher performance in the (very) high fallthrough case is because branch prediction gets more successful.

Any better ideas?

Very simple REST JSON node.js server

I want to build a modern web application (perhaps using AngularJS) or some mobile application, and I need to have some working server side to get started. There are of course plenty of options; .NET WebApi, LAMP, MongoDB, NodeJS + Express, and many more. But I want it stupid simple. This is tested on Linux, but everything should apply in Windows too.

I wrote a very simple REST/JSON server for node.js, and this is about it (source code in the end).

How to run it
Presuming you have nodejs installed:

$ node simple-rest-server.js

It now listens to port 11337 on 127.0.0.1 (that is hard coded in the code).

Configure with Apache
The problem with port 11337 is that if you build a web application you will get cross site problems if the service runs on a different port than the html files. If you are running apache, you can:

# a2enmod proxy
# a2enmod proxy_http

Add to /etc/apache/sites-enabled/{your default site, or other site}
ProxyPass /nodejs http://localhost:11337
ProxyPassReverse /nodejs http://localhost:11337

# service apache2 restart

You can do this with nginx too, and probably also with IIS.

Use from command line
Assuming you have a json data file (data99.json) you can write to (POST), read from (GET) and delete from (DELETE) the server:

$ curl --data @data99.json http://localhost/nodejs/99
$ curl http://localhost/nodejs/99
$ curl -X DELETE http://localhost/nodejs/99

If you did not configure apache as proxy as suggested above, you need to us the :port instead of /nodejs. In this case 99 is the document id (a positive number). You can add any number of documents with whatever ids you like (as long as they are positive numbers, and as long as the server does not run out of memory). There is no list function in this very simple server (although it would be very easy to add).

Using from AngularJS
The command line is not so much fun, but AngularJS is. If you create your controller with $http the following works:

function myController($scope, $http) {

  // write an object named x with id
  h = $http.post('http://localhost/nodejs/' + id, x)
  h.error(function(r) {
    // your error handling (may use r.error to get error message)
  })
  h.success(function(r)) {
    // your success handling
  })

  // read object with id to variable x
  h = $http.get('http://localhost/nodejs/' + id)
  h.error(function(r) {
    // your error handling
  })
  h.success(function(r) {
    x = r.data
  })

  // delete object with id 
  h = $http['delete']('http://localhost/nodejs/' + id)
  h.error(function(r) {
    // your error handling
  })
  h.success(function(r)) {
    // your success handling
  })
}

I found that Internet Explorer can have problems with $http.delete, thus $http[‘delete’] (very pretty).

What the server also does
The server handles GET, POST and DELETE. It validates and error handles its input (correctly, I think). It stores the data to a file, so you can stop/start the server without losing information.

What the server does not do
In case you want to go from prototyping to production, or you want more features, it is rather simple to:

  1. add function to list objects
  2. add different types of objects
  3. let the server also serve files such as .html and .js files
  4. use MongoDB as backend
  5. add security and authentication

The code
The entire code follows (feel free to modify and use for your own purpose):

/*
 * A very simple JSON/REST server
 *
 * http://host:port/{id}       id is a positive number
 *
 * POST   - create/overwrite   $ curl --data @file.json http...
 * GET    - load               $ curl http...
 * DELETE - delete             $ curl -X DELETE http...
 *
 */
glHost    = { ip:'127.0.0.1', port:'11337' }
glHttp    = require('http')
glUrl     = require('url')
glFs      = require('fs')
glServer  = null
glStorage = null

/* Standard request handler - read all posted data before proceeding */
function requestHandler(req, res) {
  var pd = ""
  req.on("data", function(chunk) {
    pd += chunk
  })
  req.on("end", function() {
    requestHandlerWithData(req, res, pd)
  })
}

/* Custom request handler - posted data in a string */
function requestHandlerWithData(req, res, postdata) {
  var in_url  = glUrl.parse(req.url, true)
  var id      = in_url["pathname"].substring(1) //substring removes leading /
  var retcode = 200
  var retdata = null
  var error   = null

  if ( ! /^[1-9][0-9]*$/.test(id) ) {
    error   = "Invalid id=" + id
    retcode = 400
  }

  if ( ! error ) switch ( req.method ) {
  case "GET":
    if ( ! glStorage[id] ) {
      error = "No object stored with id=" + id
      retcode = 404
    } else {
      retdata = glStorage[id]
    }
    break; 
  case "POST":
    try {
      glStorage[id] = JSON.parse(postdata)
      writeStorage()
    } catch(e) {
      error = "Posted data was not valid JSON"
      retcode = 400
    }
    break;
  case "DELETE":
    delete glStorage[id]
    writeStorage()
    break;
  default:
    error   = "Invalid request method=" + req.method
    retcode = 400
    break;
  }

  res.writeHead(retcode, {"Server": "nodejs"})
  res.writeHead(retcode, {"Content-Type": "text/javascript;charset=utf-8"})
  res.write(JSON.stringify( { error:error, data:retdata } ))
  res.end()

  console.log("" + req.method + " id=" + id + ", " + retcode +
    ( error ? ( " Error=" + error ) : " Success" ) )
}

function writeStorage() {
  glFs.writeFile("./db.json",JSON.stringify(glStorage),function(err) {
    if (err) {
      console.log("Failed to write to db.json" + err)
    } else {
      console.log("Data written to db.json")
    }
  })
}

glFs.readFile("db.json", function(err, data) {
  if (err) {
    console.log("Failed to read data from db.json, create new empty storage")
    glStorage = new Object()
  } else {
    glStorage = JSON.parse(data)
  }
})
glServer = glHttp.createServer(requestHandler)
glServer.listen(glHost.port, glHost.ip)
console.log("Listening to http://" + glHost.ip + ":" + glHost.port + "/{id}")

Simple minification of JavaScript and HTML

With frameworks like AngularJS it is possible to write really nice web applications just relying on HTML, JavaScript and REST services. Of course you indent and comment in your HTML and JavaScript-files, but this data does not need to be served to the user. The web browser just need the functional parts.

There are many minifiers, uglifiers or obfuscates; programs that remove comments and formatting from your code to make them smaller. Sometimes they also scramble/obfuscate the code with the intention to make it harder for someone to understand (and possibly use or exploit).

Those minifiers can be Windows applications, web pages, web server plugins and they can be implemented in a wide variety of languages or platforms depending on use. What I wanted was something very simple that I could just include in a simple build script on linux system: a command line tool (that does not rely on installing a bunch of java libraries or php-packages, and that does not support hundreds of dangerous options).

For JavaScript it was easy: I found JSMin written by a Master, Crockford. JSMin comes as a single C source file – that is easy for me.

For HTML it was trickier. Probably because few people actually write big HTML files directly – most often a web server and a server framework (like PHP) delivers the code. Also, there were many web based HTML minifies, but those are annoying to automate and depend on. So I actually spent more time looking for something as simple as JSMin, than I actually spent implementing the thing myself. It was tempting to do it in C, but then it would have taken longer time to implement than I already wasted looking for a tool. I choose Python (version 3, so it is incompatible with most peoples Python interpreter). Here we go:

#!/usr/bin/python3
import sys

in_pre = False
in_comment = False

def outCommentHandler(line):
  x = line.find('<!--')
  if -1 == x:
    return line,False,''
  else:
    return line[:x],True,line[4+x:]

def inCommentHandler(line):
  x = line.find('-->')
  if -1 == x:
    return True,''
  else:
    return False,line[3+x:]

for line in sys.stdin:
  rem = line.strip()
  if in_pre:
    if rem.upper() == '</PRE>':
      in_pre = False
      print(rem)
    else:
      print(line)
  elif rem.upper() == '<PRE>':
    in_pre = True
    print(rem)
  elif '' != rem:
    while '' != rem:
      if in_comment:
        clean = ''
        in_comment,rem = inCommentHandler(rem)
      else:
        clean,in_comment,rem = outCommentHandler(rem)
      if '' != clean:
        print(clean, '')

Both jsmin and htmlmin.py are used the same way:

$ jsmin < code.js > code.min.js
$ htmlmin.py < page.html > page.min.html

Both programs are not perfect.

JSMin
I found that JSMin fails with regular expression patters like this:

var alpha=/^[a-z]+$/
var ALPHA=/^[A-Z]+$/

Adding ; to the end of the lines fix that problem.

htmlmin.py
What it does is simply:

  • Preserves <PRE> as long as the PRE-tags are on their own lines
  • Removes all comments: <!-- A comment -->
  • Removes white space in the beginning (and end) of lines
  • Removes empty lines

This is about the low hanging fruit only, but I think it is good enough for most purposes.

What is the “compression” rate?
For my test code:
HTML was compressed from 59kb to 42kb
JavaScript was compressed from 162kb to 108kb

It is possible to do better with better tools, but this is very simple, and it takes away the obvious waste from the files, with minimal risk of changing behavior. More heavy JavaScript minifiers rename variables and rewrites code.

Build Node.js on Debian ARM

Update 2015-02-15: So far, I have failed building Nodejs v0.12.0 on ARMv5

I have a QNAP TS109 running Debian (port:armel, version:7), and of course I want to run node.js on it. I don’t think there are any binaries, so building from source is the way to go.

About my environment:

$ cat /etc/debian_version
7.2
$ gcc --version | head -n 1
gcc (Debian 4.6.3-14) 4.6.3
$ uname -a
Linux kvaser 3.2.0-4-orion5x #1 Debian 3.2.51-1 armv5tel GNU/Linux
$ cat /proc/cpuinfo
Processor       : Feroceon rev 0 (v5l)
BogoMIPS        : 331.77
Features        : swp half thumb fastmult edsp
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 0

Hardware        : QNAP TS-109/TS-209
Revision        : 0000
Serial          : 0000000000000000

I downloaded the latest version of node.js: node-v0.10.25, and this is how I ended up compiling it (first writing build.sh, then executing it as root):

$ cat build.sh
#!/bin/sh
export CFLAGS='-march=armv5t'
export CXXFLAGS='-march=armv5t'
./configure
make install
$ sudo ./build.sh

That takes almost four hours.

A few notes…

make install
Naturally, make install has to be run as root. When I do that, everything is built again, from scratch. This is not what I expect of make install, and to me this seems like a bug. This is why I put the build lines into a little shell script, and ran the entire script with sudo. Compiling as root does not make sense

-march=armv4 and -march=armv4t
Compiling with -march=armv4t (or no -march at all, defaulting to armv4 I believe) results in an error:

../deps/v8/src/arm/macro-assembler-arm.cc:65:3: error:
#error "For thumb inter-working we require an architecture which supports blx"

You can workaround this by above line 65 in the above file:

#define CAN_USE_THUMB_INSTRUCTIONS 1

as I mentioned in my old article about building Node.js on Debian ARM.

-march=armv5te
I first tried building with -march=armv5te (since that seemed closest to armv5tel which is what uname tells me I have). The build completed, but the node binary generated Segmentation fault (however node -h did work, so the binary was not completely broken).

I do not know if this problem is caused by my CPU not being compatible with/capable of armv5te, or, if there is something about armv5te that is not compatible with the way Debian and its libraries are built.

Building Node.js on Debian ARM (old)

Update 20140130: I suggest you first have a look at my new article on the same topic.

I thought it was about time to extend my JavaScript curiosity to the server side and Node.js.

A first step was to install it on my web server, a QNAP TS-109 running Debian 6. I downloaded the latest version (v0.10.15), and did the usual:

$ ./configure
$ make

after hours:

../deps/v8/src/arm/macro-assembler-arm.cc:65:3: error: #error "For thumb inter-working we require an architecture which supports blx"

That is not the first time my TS 109 has been too old. However, the english translation of the above message is that you have to have an ARM cpu V5 or later, and it has to have a ‘t’ in its name (at least, this is what the source tells, see below). In my case

$ uname -a
Linux kvaser 2.6.32-5-orion5x #1 Sat May 11 02:12:29 UTC 2013 armv5tel GNU/Linux

so I should be fine. I found a workaround from which I extracted the essential part.

// We always generate arm code, never thumb code, even if V8 is compiled to
// thumb, so we require inter-working support
#if defined(__thumb__) && !defined(USE_THUMB_INTERWORK)
#error "flag -mthumb-interwork missing"
#endif

// ADD THESE THREE LINES TO macro-assembler-arm.cc

#if !defined(CAN_USE_THUMB_INSTRUCTIONS)
# define CAN_USE_THUMB_INSTRUCTIONS 1
#endif

// We do not support thumb inter-working with an arm architecture not supporting
// the blx instruction (below v5t).  If you know what CPU you are compiling for
// you can use -march=armv7 or similar.
#if defined(USE_THUMB_INTERWORK) && !defined(CAN_USE_THUMB_INSTRUCTIONS)
# error "For thumb inter-working we require an architecture which supports blx"
#endif

After adding the three lines, I just ran make again, and after a few hours more everything was fine. Next time I will try the -march or -mcpu option instead.

Golf scorecard for your Smartphone

Do you need to use your smartphone as a golf scorecard, try my Golf scorecard for Smartphones app. It is free and gratis.

For the user
Any smartphone should work (iPhone, Android, Symbian, WP). Please let me know if you have problems with a particular model. All data is stored locally on the phone, and you do not need to be online to use the scorecard (but you need to be online when loading it, as it is a web application). Data is preferably stored in cookies, but if you do not allow cookies it will try to fallback to local storage (which has not been 100% stable for me).

For curious developers
This application is built 100% using DHTMLX Touch framework and JavaScript. The style/layout is 100% default DHTMLX Touch – no CSS editing from my side. Also, no other JavaScript libraries (such as jQuery) has been used.

Simple JavaScript for old school programmer

Obviously JavaScript is a very relevant language in 2013. I have to admit I am an old school programmer. I know how to write command line programs, GUI programs, and CGI-style web pages (get/post-refresh-style). It is not that JavaScript and Dynamic HTML (as it was called years ago, when XHTML was thought to be the future) are very new technologies, but to an old school programmer it is a bit different from what I am used to. Many things are confusing about JavaScript… for example:

1. It does not really resemble other script languages a lot, and it does not really resemble Java either (less so than C or C#).

2. There are many examples on the web about how to fix cool stuff on web pages with JavaScript, but where to start if you want to understand the language itself and feel comfortable with it?

3. The “script” in the name could imply that it would be simple, perhaps a subset of Java. Wrong. It is not simple and is in many ways more powerful than Java.

4. Where to start if you want to do JavaScript without necessarily creating a web page at the same time?

5. Browser incompabilities and JQuery (and more) – how to know I learn JavaScript, not just some features of JQuery?

Anyway, I decided to do some practical programming with JavaScript, in the most simple way. I wrote simple static web applications with no server side functionality, and as little and simple HTML as possible. Just like simple command line or GUI programs with no I/O, that still do useful stuff. And, to make my applications a little bit more real and relevant, I had mobile targets in mind.

A simple calculator
My first app was a simple calculator. It uses very standard HTML, and JQuery. It is simply too painful to not use JQuery when it is needed. All I wrote is in index.html, just view source. The fascinating thing for an old school programmer is the way (global) variables work:

  1. In old web programming, any kind of state had to be maintained by passing parameters to the reloaded page – it really feels defiant that everything is still there after the user clicked a button.
  2. In old school programming, global variables were embarrassing, but here, just go on like we have not learnt anything since the BASIC era.

From this simple application I learnt how to use JavaScript at all and to use JQuery.

A link list
My second application is a simple list of links. Think bookmarks, but the oldest ones are on top, so you don’t end up wasting your time checking the same blogg twice the same afternoon. With this application I learnt to use cookies (via third party library written by Scott Hamper) to make a persistant list of bookmarks. I also learnt to manipulate HTML through JavaScript. Obviously the HTML design is a little too simple to be appealing, and I can imagine many improvements. But the result is still useful. Again, everything is in the index.html file.

A speed-distance-time calculator
Have you ever tried to calculate how long time it would take to travel 54km in 15m/s. Well, it is not hard math. Anyway, I wrote a little application, especially designed for mobile units with touch screens (but it works well on a computer too). I think my Human-computer-interaction teachers would cry if they saw it, but it works and I think it is a little cool. On the implementation:

  1. Everything is in index.html, again
  2. Uses the DHTMLX Touch framework
  3. Not a single HTML element within <body>, just the JavaScript code

I have tried it on iPhone and Symbian (as well as computers). I hope it works on Android too.

This application has very little to do with web programming, and is essentially a pure JavaScript application (using a library, of course). I think this is a fascinating way of delivering end user programs in the future.

Conclusions
Getting started with JavaScript is very simple (you just need a text editor and a web browser – I have been using Firefox and its standard functions for debugging JavaScript – very old school).

I think it was useful to avoid getting stuck in the normal challenges of Web (back end server, data, content) and focus on building a more traditional application.

I believe my applications (and similar applications) can serve as a base for more advanced JavaScript experimentation. Inevitable, to create something really useful, I need to look more into:

  1. Simple Server side, for persisting/sharing data
  2. Local storage (thinking HTML5)

I can appreciate JavaScript for its very straight forward syntax: you do not waste many bytes when you write JavaScript code.

Without being a LISP expert, I find similarities with LISP. I find myself having as many curly brackets in JavaScript as I had parenthesis in LISP – that might not be a good thing. But there is also no difference between compile time and runtime in the sense that all functions are created as your code is executed, and that functions are just data passed to the function keyword. If you are smart (?) you can write programs that rewrite themselves, kind of, I think 😉

I would say JavaScript is one of the worst possible programming language for beginners. It incorporates all advanced programming concepts imaginable, without making a deal about it at all. Shooting yourself in the foot is very easy, as is being confused by runtime errors. But, perhaps I am just old school – maybe JavaScript is very natural if you are not burdened with prejudice about that programming languages should work like PASCAL, just better.