HIDeous Adventures with Open Source

January 21st, 2006

I’ve been pretty busy lately, and spread thin over a wide range of projects. One of the “new to me” experiences involves supporting a custom HID device via Apple’s IOHIDLib API.

The IOHIDLib API allows plain-ol folks like me to interact with HID-conformant USB devices without installing any kernel-level drivers or extensions. Let me say, this is great for both developers and users. Fewer cooks in the kernel kitchen means fewer crashes. (Except when Apple’s in the kitchen – crashing!)

I haven’t spent so much time testing the “reboot” feature of my machine since I was a Mac OS 8 System File engineer. Without saying too much about the device I’m supporting, I’ll say that it has LED outputs, like the caps-lock light on your keyboard. I’m in the early stages, so my program basically consists of a test routine that sprays eternally at the LEDs, making them do fun yet useless things.

It was a moment of triumph and joy when I finally got my head wrapped around the HID libraries such that I could make my device “dance.” Thanks, Apple DTS, for the excellent sample code and support. However, my joy was short-lived, as I noticed that after some number of dancing iterations, the LEDs just stopped. My program had frozen. And when I say stopped, I mean stopped beyond any conceivable point of revival. Force-quit can’t quit it. GDB can’t attach to it (!). Sample can’t sample it. This bad boy is just gonna hang out in my process list and dock until I restart the machine.

After much scrutiny of my own code, I did what I should have done from the beginning. I turned on Guard Malloc. I’ve sung the praises of this tool on the mailing lists but I don’t think I’ve mentioned it here yet. This debugging aid slows your application to an absolute crawl as it intercepts all memory allocations and sticks protected memory pages between every malloc’d block. The end result? The vast majority of your “overrun the array” type bugs are caught dead in their tracks.

I turned on Guard Malloc and launched my app. I started to get up for coffee, because as I said, Guard Malloc slows your app to molasses. To my surprise, the application had crashed almost instantly. Hmm – I wondered if this was the same bug that caused the freeze or just something else to look into (joy of joys!). In any case, I wasn’t going to be able to get to the bottom of the problem until I cleared all obstacles in the path. I sat back in my seat and started looking for clues.

Hmm, I’m crashing inside IOHIDLib. Surely I must be screwing something up. But what? In HID-ese these are called elements. You basically ask IOHIDLib to fetch your device, then you open up all the elements you want to write to. In HID-ese writing to the device is called a “transaction.” After you’ve configured a HID device and are talking with it, the rest of your application’s life consists of setting element values (e.g. LED on/off), committing the transaction (send it all over to the device), and then clearing the transaction (so you start out with default values next time).

I whittled my test app down to a simpler case:

  1. Open a device.
  2. Configure a one output transaction on the device.
  3. Clear the transaction.

Sure enough, my simple little applications crashes in the IOHIDLib with just these simple steps (when Guard Malloc is on – with Guard Malloc off you crash at some undetermined time down the road).

Crap. Crashing in Apple’s code. I’m going to have to get help from Apple, or else do some serious disassembly hacking to figure out what’s going on. Then I remembered that IOHIDLib is open source!

I downloaded IOHIDFamily-172.8. After a few tweaks (Apple’s open source projects often rely on Apple-Internal build paths and/or header files – but often you can work around it and get a working build), I was able to build a copy of the IOHIDLib.plugin bundle. With debugging symbols enabled, debugging this was going to be a breeze! I took a deep breath and copied my debugging binary over the Apple-supplied production version (after making a backup):

sudo cp ./IOHIDLib /System/Library/Extensions/IOHIDFamily.kext/Contents/PlugIns/IOHIDLib.plugin/Contents/MacOS/

Phew! I can still use my mouse! Whenever I tweak system-level stuff I’m always a tiny-bit scared that I’ll end up making some foggy mistake that leaves me “bringing my system back” via SSH or single-user mode.

With the debugging IOHIDLib installed, all I had to do was re-run my application (Guard Malloc still on!). Sure enough, it dynamically picked up the new version of the IOHIDLib, and crashed with full source code at the offending line. The source file is IOHIDOutputTransactionClass.cpp and the crashing code is:

for (int i=0; elementDataRefs[i] && i<numElements; i++)

What’s wrong with this code? If you look carefully you’ll probably figure it out. The problem is in that continuation test. “While elementDataRefs[i] is not zero, and i is less than numElements, keep doing stuff.” This is a case of mixed-up order of precedence. In C, the “&&” operator evaluates left to right. So if you’ve got a dangerous test that’s only safe when a benign test is true, you have to write it “benign && dangerous”. If benign is never true, then dangerous never gets run. But here we always do the dangerous act, array indexing, before testing the index value! This code occurs in three separate places, each of which corresponds to a common IOHIDLib API for handling transactions with HID devices. No matter how many “output elements” I configure on my HID device, the IOHidLib is always going to read beyond its legal limit.

Now, in the vast majority of cases, the 4-bytes (indexing a long int) that immediately follow this array will probably not be part of a protected page. Heck, the fact that countless numbers of HID-interacting programs apparently deal with these APIs every day and don’t crash basically proves that. But it scares me. And if I ever see a random crash on my system with a similar stack crawl, I’ll know just who to blame!

It is precisely to avoid these types of bugs that I am not a huge fan of these “complicated for loops.” I would probably write this function in an elaborated, completely “too uncool for C school” form where the for loop only tests for “i<numElements” and the secondary (dangerous) test gets its own if-block inside the for loop. This removes all doubt. But for the purposes of quickly working around this crash, I replaced the three instances of the above code with the following:

// DCJ - Splashes beyond array bounds when (i == numElements)
// for (int i=0; elementDataRefs[i] && i<numElements; i++)
for (int i=0; i<numElements && elementDataRefs[i]; i++)

With the fix in place, I was able to continue debugging my application. I fired up the debugger again with Guard Malloc enabled. I waited anxiously to see if my LEDs would keep flashing, or whether another crash would be uncovered.

Unfortunately, it just froze again. Yep, the nasty “can’t kill the application process in any way except rebooting” freezing. In this case, the bug I found in Apple’s open source didn’t turn out to be my bug. I’ll have to keep looking for the root cause of that. But the availability of this open source allowed me to fairly quickly get past it and move on to the next step of debugging. And that’s always a good thing. Thanks, Apple!

(IOHIDLib reading past array bounds reported as Radar 4417524)

Intel Impatience

January 18th, 2006

On the morning of January 10, I woke up expecting to waste a couple hours of my day scouring web sites and IRC channels for the latest transcriptions of Steve Jobs’ Macworld keynote speech. It didn’t really bother me that it was not being streamed live this year, because every other year when they’ve tried that, I’ve ended up squinting at a low-quality, slow-updating video and not really getting much more than I would have from the painstakingly verbatim notes I read live on IRC.

While I was expecting to throw away a couple hours, I didn’t yet know I’d be throwing away a couple thousand dollars. [“Throwing away” will seem less appropriate once I actually get the hardware!] I knew I needed/wanted to get an Intel box sometime soon, in order to start testing my own software. I’d made whatever “transition” was necessary in theory months ago, but in practice I’d yet to try the apps out in a live Intel box.

So when I saw the Intel iMac, I sighed. No Intel for me, I thought. I had subconsciously formulated an Intel fantasy that involved buying an Intel-based Mini, using it in the common area as a media center, and rebooting it into Windows XP as necessary for my work. I already use VPN for almost all of my PC needs. The ugly Dell sits underneath my desk, serving mostly as a foot warmer and sometimes as a reminder of just how good we have in the Mac world.

When the MacBook Pro was announced, however, my ears perked up. My “portability” currently consists of a 500MHz iBook G4, stuck on 10.3 because I’ve been too lazy to figure out how to install Tiger from a hard drive (no DVD drive). This announcement gelled for me a magic combination of want and need. I need an Intel machine to test against. I want a laptop to cruise to the cafe with. OK, sold! I placed my order at 11:08AM PST. The problem is, it’s not slated to ship until February.

I’m getting impatient! Other people are getting iMacs, and I’m stuck on the waiting list. I’m considering switching my order to an iMac – perhaps I can replace my Dual G5 2.0GHz with… an iMac? If there was ever a good time to sell a powerful G5, it’s probably now. While the G5 still runs Photoshop faster than an Intel-based Mac.

I got so impatient to try out my in-house applications, that I built Universal copies and posted them in a private section of my web site. I then went to my local Apple Store (after calling ahead to confirm that they got their Intel macs on schedule) to do a little test-drive.

I walked into the Apple Store and started scanning for the word “Intel.” Nowhere to be found. There were tons of iMac G5 displays, but no Intel iMac! Damn it! They lied to me. As a last stab at success, I asked an employee, “Are there are any Intel iMacs on display?” I tried to feign consumerism so he wouldn’t get suspicious about my motives. “Sure, we have one right over here. There isn’t much on it, but you can surf the web or whatever.” Something tells me Apple wouldn’t be too happy to have their employees describe the “full suite of Apple applications” as “nothing much.”

The machine he escorted to me was indeed Intel-based, despite the outdated “iMac G5 – $1299” placard next to it. I guess a side-effect of not changing the price is that it allows retail stores to be lazy. I went straight to my web page, downloaded my universal apps, and attempted to launch them from the Desktop. “You are not allowed to launch this application.” What! Oh crap, I’m going to have to come clean about my motives. I thought there was like a 10% chance the Apple store employees would take pity or be awed by my plight, and let me loose on the machine. But the 90% chance of them just saying “No” and then watching me more carefully informed my decision to just play around a bit.

In a short while I figured out how to work around the security. And that made me feel pretty bad-ass. I’ve got code to debug, damn it! I excitedly launched my applications, only to learn that they were in fact not Intel compatible. I was really expecting them to launch and run without a hitch. “Well, better to find out now than to ship and let my customers find out!” I rationalized. But I was a bit bummed. I basically do things “by the book.” Why don’t my apps run? In fact, one runs and one crashes. But the root cause appears to be the same. I’m getting some kind of NSUnarchiver based exception. Does this look familiar to anybody?

2006-01-17 15:48:58.058 Clarion[1011] An uncaught exception was raised
2006-01-17 15:48:58.168 Clarion[1011] *** file inconsistency: read ‘I’, expecting ‘L’
2006-01-17 15:48:58.168 Clarion[1011] *** Uncaught exception: <NSArchiverArchiveInconsistency> *** file inconsistency: read ‘I’, expecting ‘L’

It’s hard to debug the problem at the Apple store, since they don’t have Developer Tools installed. So I took some notes and scp’d them up to my web server from the store. When I got home I examined the logs (and a stack trace in the crasher case) more carefully.

I’m assuming there’s some data somewhere that is supposed to look like “LIST” or something but is coming back “ILTS” (or would it be “TSIL”?) due to byte-swapping issues. A quick scan of my sources doesn’t reveal any funky archiver behavior, and I can’t find any references to this type of error alongside “Intel” in the developer list archives. I searched my Nib files for “LI” and “IL” but didn’t find anything particularly interesting. The only comments I can find related to archiving issues in Apple’s documentation have to do with archiving bitfield values. What am I doing in common in both of my apps that would cause this behavior?

I will probably go back down there today or tomorrow to try to get some more information, but I thought it might ring a bell with somebody.

Update: I just noticed that my applications are using “Pre-10.2 Nib” format. I guess I never updated them, or something. I wonder if this could be the problem. I notice that my “objects.nib” file in the Pre-10.2 version does have a lot of “I”s in it, while the keyedobjects.nib file from a “10.2 and later” nib has a lot of “L”s.

Documentation Sucks

January 12th, 2006

A recent mailing-list discussion has raised my hackles about documentation. So at the risk of offending all the tech-writers out there, let me restate my thesis: Documentation Sucks!

I’m not just saying that because I’ve been very slow to provide substantial documentation for my two in-house applications. The only thing worse than lack of documentation is too much documentation. It’s too often a sign of design failure in the application. Documentation is insidious. As soon as it reaches a certain size, it seems to become a black hole for “explaining away mistakes.” The new feature is completely unintuitive, but completed by the deadline? It’s OK, just stick a note in the documentation! Then you end up with a big junky piece of unusable music composition software, or ridiculously ill-behaved spreadsheet app.

When you install a piece of software, its value can often be measured by how much you can get done without looking at the manual. I’m not saying there shouldn’t be a manual (except in rare cases), but it shouldn’t be necessary for your first conversation with the program.

Who here has read the manual for Safari? The Dock? Address Book? You haven’t? How do you get anything done?! Do they even have documentation? Sure they do, but they’re not mammoth, and they’re mostly there to feed the public’s addiction to documentation. Take Address Book. If you open its help book from the Help menu, you’ll see the the documentation is split up into four fairly discrete sections:

  1. What’s new in Address Book?
  2. Discover Address Book
  3. Solving Problems
  4. Index

The first section is sort of a “release notes” for consumers. This is valuable, but hardly needs to be in the manual. In fact, it’s probably the last place I’d look for it. For a paid product, I would expect to see it on the web in marketing information. This is the type of information that Apple gets to include in the documentation because they push their applications to us for free. We don’t have to decide that there is a true benefit to us before deciding to upgrade to them. So they have to sell us on the application after we already have it!

The second section is also essentially marketing information, but packaged as a “getting to know your new Address Book” type section. There is some helpful stuff in here if you’re looking to do advanced features like syncing, but in the “obligatory documentation” department we get items like this:

Finding a contact in Address Book

You can quickly search your Address Book for a name, email address, or any other information. In Address Book, type the text you want to find in the search field. As you type, Address Book displays the matching contacts.

“Oh that’s what the little oval box with the magnifying glass in it does!” Don’t get me wrong, users need to know this! But if every application takes responsibility for explaining every UI convention to every user, there will be a whole lot of reading and whole little of doing.

The third section: “Solving Problems” is really the holy grail for users. It’s both the only section users really need, and the only section that can’t be written without their help. Until your users present problems, you can’t succinctly elaborate on them in the documentation. This section, in addition to being the only useful one, is the one that you want to focus on eliminating as revisions to your application are produced. Your bug list and your “Solving Problems” list should be pretty well in sync.

When’s the last time you needed a manual to watch a DVD? Is there a manual? There’s a complicated (sometimes infuriating) user interface that provides access to a sometimes large number of features. The UI is in fact much less standardized than a typical computer UI. Yet there’s no manual. Why? Because the penalty for making a mistake is small and the consequences of an action are clearly communicated to the user. The only thing you need a manual for on most DVDs is to find the secret, hidden “easter egg” features. Unfortunately, many applications on the Mac treat all features like easter eggs. Want to copy text from one paragraph to another? “First, go to page 182.B of the manual. Now, while pressing control, shift, and caps-lock, turn around 3 times while fingerspelling ‘Paperclip’ with your free hand. To learn more, please turn the page.”

For that matter, when’s the last time you read a manual for a hammer? Pair of pliers? Your desk? The rug on the floor? For which tools in your life do you find high value in the manual? Is it valuable because it’s required to use the product or because it enhances the product? If it’s required then don’t you resent the manufacturer?

A great example of an extremely complicated, extremely dangerous application with a minimalist manual is the automobile. Most of us trust ourselves implicitly to step into any automobile manufactured in the past 60 years and get from point A to point B with relative ease and absolute safety. Sure, there’s a manual for your car, but have you ever looked at it? I’ll tell you what it says. It has some ridiculous stuff like “This is a key. Keys make your car start. Please don’t pump the gas while the car starts.” OK, that’s filler, but the manual is still thinner than most software manuals. If you’re lucky, it also has some really useful Solving Problems stuff like “Oh Crap. Your battery’s dead. You probably thought you’d never have to do this, but here is how you jumpstart a car.” (Some automobile manufacturers, realizing that the Solving Problems list is also a bug list, have added features to their cars like backup batteries to automatically jump start the main battery when it dies. Nice!)

Nowhere in your car’s manual does it say “vehicles approaching at 60MPH should not be collided with head-on.” We are “car literate,” so we know these things without having to be told. In fact most of what we know about driving and cars we learn by observation and experience – we are exposed to driving from a young age and become confident about our understanding of the complex system. Sure there is training, but once you learn how to use one car, you are in a good position to use any car – and most importantly, these pre-requisites are not the car manufacturer’s responsibility! This is how computer software should be. Customers who are minimally computer literate should be as comfortable stepping into your software’s trial period as they are stepping into a rental car. As our world becomes more and more computer literate, it’s becoming an insult to explain things like “This is a search box. Search boxes are for searching.”

Car companies save a lot of paper because there isn’t an expectation of a phone-book sized manual when you buy a new vehicle. The software industry isn’t so lucky. One of the biggest ways in which documentation sucks is that parts of the computer-using masses (particularly those who still feel a bit cautious about getting behind the mouse) have been conditioned to expect manuals for everything. And big manuals, at that! If we don’t get a big manual from the manufacturer, we’ll pay some book publisher for a big fatty.

Two decades of crappy software and obligatory documentation have conditioned users to expect the worst. We need to wean the world off of manuals by proving to them that we too can get them from point A to point B without getting injured.

Noooooooooo!

January 11th, 2006

It looks like my dreams of replacing my Dell with a new MacBook are (temporarily?) shattered. [Via DaringFireball.net]