## Finding a Color Matrix for dcraw (and libraw)

Read this post first for a little background, if you need it.

So, a color matrix in the dcraw code looks like:

```   { "NIKON COOLPIX P6000", 0, 0,
{ 9698,-3367,-914,-4706,12584,2368,-837,968,5801 } },
```

Some cameras have four colors instead of three, and those cameras have a color matrix with twelve elements instead of nine.

As you can read on the homepage of dcraw under “Why does dcraw output have a green tint and weak color?”, the color matrix can be determined (with some procedure) if you (under correct conditions) photograph a sheet with reference colors. Obviously, we have to leave the digital world to do this. As I live in a part of the world where daylight is not very likely to occur for a few months, I decided to try to find my color matrix in another way.

My Strategy
Since I do have a computer with Nikons ViewNX program, I decided to try this strategy:

1. Take a suitable photograph in RAW
2. Export it to 16-bit TIFF using ViewNX
3. Find a color matrix, that makes dcraw output a TIFF file equivalent to that from View NX

There were some challenges to overcome…

Little disclaimer
More things than the color matrix decide the result of dcraw. In particular, white balance (-a -A) and brightness (-W -b) matters a lot to the output. Thus, I have accepted that my color matrix does not produce exactly the same result as Nikons ViewNX.

Picture dimensions
ViewNX (and the camera) says full size is 3648×2736 pixels. But dcraw gives you 3664×2742 pixels. ImageMagick solves this with

```  \$ convert -crop 3648x2736+8+3 orig.tiff cropped.tiff
```

Choosing a good picture
I dont know, but I guess a good picture has many different colors in a fair distribution. If one color is missing the color matrix might fail to handle it well. I also (after initial tests) decided I wanted a slighly unsharp picture to avoid high contrasts between adjacent pixels. Pictures are noisy if you look closely at pixel level, and if that noise is the result of the Bayer filter (or something I dont know about) I dont want it to disturb my tests. I even decided to blur my picture (both reference and target of course) using ImageMagick again:

```  \$ convert -blur 12x4 pic-in.tiff pic-out.tiff
```

Here is the picture I used (feel free to tell me why it is unsuitable): It was shot outdoors in something that some people would call daylight.

Deciding if a solution is good
I found a little program that could measure the difference/error between two images:

```  \$ imgcmp -f pict1.jpg -F pict2.jpg -m pae
```

If does not support tiff so I had to live with uncompressed 8-bit jpegs.

The mathematical solution
In the end I did what I tried to avoid first – I did the math of calculating a matrix instead of trying to search for it.

This mathematical solution requires a lot more understanding, so I couldnt start there anyway. First two sources: wikipedia on sRGB and a thread from someone who tried the same thing before me.

Now some bad news. RAW file uses 12-14 bits color depth. My NRW-files from my camera are about 15-20Mb. When converted to 16-bit TIFF, they turn 60Mb ( 3 colors X 2 bytes/color X 10Mpix ). Why is the RAW file much smaller? Because each pixel is EITHER red OR green OR blue! In fact 50% are green, 25% red and 25% blue. It is called Bayer Filter, and some cameras use different but similar methods. This line in the dcraw patch fixes this:

```    filters = 0x94949494;
```

This is bad for two reasons:

1. It feels like marketing bluff that a 10MPix-camera has just 2.5M sensors that read the color blue
2. The first step dcraw has to perform is interpolation to fix this Bayer Filter, and it is possible (likely) that dcraw doesnt do it exacly the same way Nikon does, and the pictures will never be identical (yet equally good)

Next steps for dcraw are:

1. apply our Color Matrix
2. convert from RGB to sRGB
3. apply gamma

I wanted dcraw to output right after interpolation. So, I invoked dcraw (a version without a color matrix for my camera) like:

```  ./dcraw -T -o 0 -W -g 1 1 -c MYFILE.RAW > source.tiff
```

I cropped the file, and wrote a little c-program (using libtiff) that extracted a few hundred pixel values, from both this file and the values for the same pixels from the Nikon ViewNX tiff. (Sorry, I gave up on 16-bit tiff… libtiff doesnt support it).

Now some linear algebra (skip to the code below if you dont get this). I have those matrices:

• V: pixel values from the Vendor file
• R: pixel values from the Raw file (from dcraw)
• S: the sRGB-to-RGB-matrix
• C: My unknown color matrix

The values in V and R needs to be divided by 255.0 to move from [0-255] to [0.0-1.0] ranges. The values in V has to be fixed for gamma, so we get linear colors. Now we have

```  C x S x V = R
```

Mathematically, in order to decide C, only three pixel are needed. I want more pixles. I tried with 9 selected and 200 “random”, and I got exactly the same answer. If we remember some linear algrebra from school and solve the equation for C, we get:

```  C = ( inv( (SV)*(SV)' ) * SV * R' )'
```

Maybe it can be simplified, and the inverse does not need to be calculated… but I dont care now. I put everything into a little script for Scilab (Matlab for the poor – available as a package in all linux distros I have tried).

```VENDOR=[
215 230 254 ;
223 176 101 ;
236 140  98 ;
247  68 107 ;
168  87 156 ;
106  94 165 ;
100 200 254 ;
63 171 140 ;
124 188 126 ];

SOURCE=[
88 168 98 ;
70  78 19 ;
68  57 16 ;
63  35 13 ;
31  29 20 ;
15  26 20 ;
26  95 74 ;
14  54 21 ;
28  69 22 ];

// Maybe change this one
gammaval=2.2;

// Dont edit below

// sRGB-to-RGB-matrix
S=[
0.412453 0.357580 0.180423 ;
0.212671 0.715160 0.072169;
0.019334 0.119193 0.950227 ];

function x=srgb_gamma(y)
rc=size(y)
for r = 1:rc(1)
for c = 1:rc(2)
z = y(r,c)
if z <= 0.04045
x(r,c) = z / 12.92
else
x(r,c) = ( ( z + 0.055 ) / 1.055 ) ^ gammaval
end
end
end
endfunction

function CM=find_color_matrix()
// first move from 0-255 to 0.0-1.0 ranges, and transpose
Vtmp=(VENDOR./255.0)'
R=(SOURCE./255.0)'

// second, undo gamma in vendor colors
V=srgb_gamma(Vtmp)

SV=S*V

// Solve for CM, in  CM*S*N=R
CMtmp=( inv( (SV)*(SV)' ) * SV * R' )'
// dont talk to me about costs of inverting 3x3 matrices

// fix output for dcraw
CM=round(10000 .* CMtmp)
endfunction
```

Now, just update VENDOR and SOURCE with your values, and call find_color_matrix(). In Scilab!

Search!
My first idea was to search for a matrix. Basically:

1. Set a start matrix, calculate fitness
2. Take a "step", obtain a new matrix, calculate new fitness
3. If new is better, update best
4. Goto 2

My plan was to iterate until the error was small (like Peak Absolute Error < 2 for each color). That didnt happen. No really close solutions. I tried linear combinations of the 200+ matrices already in dcraw. I tried random steps, steps in one or several directions.