Reverse-engineering techniques I used with the Canon FS4000US scanner

[ Last updated: 11 August 2004 ]

Intro

Reverse engineering is an art and my techniques below will differ from other people's approaches. I hope these help someone else in their work but remember that what worked for me may not work for you, etc. Although I have a Windows box and therefore used Windows tools, similar tools probably exist on Mac and Linux. If nothing else, the mindset used below may inform someone else's approach to reverse-engineering a scanner. Comments are welcome.

Process analysis tools on Windows

High-Level

The code that drives the FS4000 is a TWAIN driver from Canon called FilmGet. Starting at a high-level, I needed to know the scope of FilmGet, i.e. what DLLs it uses, what is the environment it runs in, etc. I wanted to know how Windows sees the driver and what happens at run-time at the process-level. The two best tools (that are also free) that let me do this:

Process Explorer. This tool can tell us lots of info that might be useful:

BTW, this is just one of many tools that I used from the Sysinternals Freeware site. Any serious systems work related to this involves several tools from this site. A must-have.

Depends. This tool is part of the Microsoft Platform SDK. It used to be just pretty good, giving a static view of which DLLs that an executable binary is statically linked to and specifically what functions a DLL will call in another. The latest version though (my version = 2.1.3790 dated 24 March 2003) is massively upgraded and can watch a process dynamically to see what DLLs it loads/unloads, what functions it dynamically tries to find in each DLL, etc. Extremely valuable. Some example output:

00:00:00.875: DllMain(0x73BA0000, DLL_PROCESS_ATTACH, 0x00000000) in "STI.DLL" returned 1 (0x1).
00:00:00.875: LoadLibraryA("sti.dll") returned 0x73BA0000.
00:00:00.875: GetProcAddress(0x73BA0000 [STI.DLL], "StiCreateInstance") called from "VUESCAN.EXE" at address 0x0042E94D and returned 0x73BA1C47.

Some examples of what I found after using this tool was that Canon did not use ASPI and does not control the SCSI bus with direct calls to DeviceIOControl() either. Instead they use a 3rd-party user-mode SCSI library from STI. Also, I found that the FARE dust-removal system is entirely contained within one DLL (biccrct.dll) that exports only one function: BICImageCorrect. This is useful to note in case I want to hook this function later to see if I can make use of it for myself.

Low-level

I still didn't know much about how things actually worked. Having developed commercial software for many years, I know that developers will often leave logging code in a release build to help with supportability in the field. If it exists, the trick is figuring out how to turn it on. There are probably a million ways developers could do this but since most developers want to make their lives simple, there are three most common ways I know of:

  1. a secret environment variable
  2. a secret file that, if it exists, will have log messages written to it.
  3. a secret registry setting

To see if there was a secret environment variable, I wanted to watch when the process made any calls to the C-runtime function getenv() or the lower-level Windows functions GetEnvironmentVariable and GetEnvironmentStrings. I loaded my debugger (MS VC++) and set breakpoints on those 3 functions. I then had the debugger launch a TWAIN driver app. I decided to not use Photoshop for this because the simpler the containing app, the better my chances that any activity that happens is caused by the TWAIN driver and not the app itself. Remember, we're looking for the proverbial needle in a haystack here so anything to make the haystack smaller helps. There are many small imaging apps around that will let you import from a TWAIN device. Since I was evaluating a copy of Digital Light & Color's Picture Window Pro at the time, I decided to use that. The main problem with this approach is that it's tedious: there are lots of calls to those functions in any normal program, especially at start-up time which is the phase most likely to have the TWAIN driver call it looking for a secret switch. But, you gotta do it in case you get a "hit" that reveals something good.

When I got a hit at the breakpoint, I then had to figure out what environment variable the calling function was looking for. To do this though, you must have an understanding of stacks and how parameters are passed on them in C and C++ (which is likely what the driver was written in). An explanation of how all that works is beyond the scope of this document but if I get enough requests, I may throw some tips together. So: since I know from the API doc's what the parameters are, this wasn't too hard. Take the string's address off the stack, look at a memory dump for that chunk of memory to see what string was bring passed in. This took a lot of time and turned nothing up.

Next was the secret file technique. To see what file system activity is caused by a process, use another SysInternals tool called FileMon. Note that there is a version for Linux kernel 2.4 as well. FileMon is brilliant. I set up FileMon to watch my TWAIN driver process and then, without using a debugger, I run it. It spews a million file accesses: opens, reads, closes, stats, etc. You can filter these for ease though. Looking for strings like "canon" and "FilmGet" didn't turn anything up.

Last try: I used another SysInternals tool called RegMon. Same idea as FileMon but it logs a process's registry accesses. Again, it spewed thousands of items. I filtered for the string "FilmGet" though and an interesting item popped up: something in the process opened a key named HKLM\Software\Canon\FilmGet 1.0\ and looked for a value named "Log" which failed because that value didn't exist. Hmmmm. I opened the registry editor, went to that key (created by Canon's install software) and created a value named Log. I then reran the RegMon test. Now, after it found the value named Log, it tried to find values named LogScsiData and LogScsiRead. Hmmmm! Now we're on to something. I created those values too, reran the test and no new registry keys were looked for. Now I went back to the FileMon test and reran that, filtering for FilmGet again. Sure enough, a file named FilmGet.log was now being created. I tried to open that up in a text editor and, what do you know, it was a text file. It contained mostly names of FS4000 commands along with the SCSI transaction data that goes with them. There were also just a few debug messages about entering and exiting different function names. Bingo.

I now needed to see if the data for the Log items was meaningful, i.e. maybe different values would control log level/verbosity. I put a breakpoint on the Windows API RegQueryValue and RegQueryValueEx and reran. Using the same technique that I used to look at parameters on the stack that I did for environment variables above, I found the calls to these functions. I then stepped over the function and looked at what the calling function did with the returned data. I got lucky again: all the code did was check RegQueryValue's return code to see if it was successful. It didn't actually look at the data that was read. This meant that FilmGet only cared that the registry values existed, not what the values were.

I should point out one last tool that's too easy not to try. I ran the Unix 'strings' command on every relevant binary that I listed from the tools above. I found lots of interesting tidbits including a full list of the names of FS4000 commands, some format strings for printf, even the names of the registry keys I just found. Immediately adjacent to those registry key names was a string "K_COM_CONSOLE_WINDOW". This looked really promising but how was this being used if I knew it wasn't the name of an environment variable, file, or registry key? In the debugger, I placed a memory breakpoint on the first byte of that string so that, whenever that memory location was accessed, the debugger would stop and I could see who was using it and what they were doing with it. Unfortunately, the debugger it never broke at that point so it's still a mystery to me.

Controlling the scanner

Now that I had FilmGet's logs, I had some way to analyze how to control the scanner. I should point out that I was incredibly lucky to find this log file. If I never found it, I probably would have dropped the project long ago. But, there was a 3rd-party app named VueScan that can control the FS4000 and it made no secret of having a log to send in in case of a need for support. I checked that box in the GUI and looked for an output file. This too had all the SCSI data I needed. Again lucky: I now had 2 different apps to compare scanner control methods with.

Here's a tip to someone else who attempts all of this that I can't recommend highly enough. What I did was write a utility that read in either a FilmGet log or a VueScan log (both mainly full of hex numbers) and printed out an easier-to-read version that was formatted consistently and could therefore be diff'ed with other processed logs. As I experimented further, I improved the output of the utility with each new discovery and then reran all of the old logs through it. Re-examining those logs with diff again sometimes revealed new insights about how things worked now that I'd made some part clearer. I "primed" these utilities by looking over the SCSI-2 specification and used that to write code to interpret what commands I could from the logs. That worked extremely well and now what I had left were the non-standard commands that the FS4000 uses. The stroke of luck here was that at least I knew the name of the non-standard commands since they were written in text form in the log files. That at least gave me an idea of what the command did at a high-level.

I also built upon what others had done. By using code I downloaded for the SANE project's FS2710 back-end, I derived a few more things. Some did not fit but they were still clues that a certain feature might be available with this scanner in some other way.

I made further improvements by doing many, many scans with FilmGet and VueScan, each time only changing one option in the GUI before rescanning. I could then diff the logs from two scans and see what parameters changed. From this point on, things tilt more towards the intuitive and less towards the scientific. I looked for patterns in the data, i.e. it's probable that a set of 3 adjacent parameters control something RGB-related. I also looked for known values, hoping to derive the meaning of adjacent unknown values by context. For instance, the scanner's maximum dpi is 4000. I could see 4000 in many places in the logged data.

One useful result of the previous idea is to see when changing a GUI parameter does not change any command sent to the scanner. This indicates that that feature is implemented in the driver software. For instance, VueScan does all image processing via software from what I can see. It seems to always use the same parameters to extract image data.

Another approach I tried was to use the Unix command "tail -f" on the log files while I was messing around in the FilmGet or VueScan GUIs. I could then make a tighter association between my actions and the timing of the log contents. For instance, when FilmGet starts and calibrates the scanner, it's not clear what it's really doing. With this approach, I could see that, after doing some quick scans to calibrate shading, etc., it mainly just sat in a loop, doing a GET LAMP command to see how long the lamp had been on to make sure it was warmed up.

Another way I enhanced context to help my understanding is that I took a log file from a typical scan and edited out almost all numerical data so that just names of commands remained. I then looked for loops of commands and condensed those and indented the loop contents. When I was done, I had pseudo-code for the entire scanning process.

One thing I did which made a huge leap forward for me was to stop using other people's code and write a test app of my own. Since I was in control, I could make faster progress. I got the ASPI code to control SCSI running and then started writing simple test programs that issued commands with many different values for the parameters. Usually the scanner would return an error saying that a value was invalid or out of range. Sometimes, knowing the legal range of values for a parameter can help you derive its meaning. For example, I found several that were only 0 or 1 and they turned out to be simple Booleans.

A big remaining unknown was the format and type of the image data returned from a scan. Once I had my own code running and had a basic understanding of the commands and a basic understanding of the order to use them in (from the pseudo-code idea), I then wrote code to do a full scan and dumped the data out to disk. I used Photoshop's RAW file-type (not Camera RAW) to open it (RAW has no header, you have to tell PS every time what the dimensions are, etc.). This was not a success at all. Looking at the raw data in a hex editor, I could see that the two-byte values (the FS4000 is a 14-bit scanner) were little-endian. Surprising since all multi-byte integers in FS4000 commands are big-endian. I fixed that and then noticed that the scanner was returning more data than made sense given the dimensions of the scanning window I'd specified. I worked backwards by taking the file size and tried dividing it by the number of columns and the number of rows I'd specified. Dividing by my number of rows gave an even number of columns so I tried loading the file by specifying that new column width. There was the image. The reason for the extra columns is the FS4000 appears to put a black 40-pixel margin on the end of each scan-line, no matter how many pixels of image data are in the scan-line. It's still a mystery to me why these are here and whether or not they're meaningful. Long story short: I cropped out the 40-pixel margin, flipped the image horizontally, and applied a gamma ramp of about 2.2 and the image looked pretty good. Except that the RGB channels were offset/interleaved and the amount of that depended on the scan resolution. For example, I had to write code that for a given pixel at 4000dpi, took the red value from 8 lines before and the blue value from 8 lines ahead.

One piece of advice I'd give to anyone serious about figuring exposure-related things out is to buy a calibration target. You can get inexpensive ones for film and flatbed scanners here. I bought a set of these along with a Windows app named Profile Mechanic to create ICC scanner profiles from Digital Light & Color.

Remaining Unknowns

There are still some things not well understood about the scanner. A short list of things that seem most significant:

I still don't know how to exactly control RGB sensor gain and exposure with DEFINE SCAN MODE. If I reuse FilmGet or VueScan's parameters, I get decent results. I can get better results by doing an empirical "binary search" with different parameters and comparing resulting histograms but that's surely not generally applicable (not to mention slow).

The EXECUTE AF/AE command is somewhat mysterious. I know it can control focus since I hear the focusing motor run. I think it also has something to do with exposure since it sets three values returned by GET FILM STATUS that don't change until the next EXECUTE AF/AE.

The list of scanner commands from the 'strings' output listed some scanner commands I haven't seen used in a log yet so I don't know what their op-codes are or how to use them, let alone what they do: