I’ve been pretty busy lately, and spread thin over a wide range of projects. One of the “new to me” experiences involves supporting a custom HID device via Apple’s IOHIDLib API.
The IOHIDLib API allows plain-ol folks like me to interact with HID-conformant USB devices without installing any kernel-level drivers or extensions. Let me say, this is great for both developers and users. Fewer cooks in the kernel kitchen means fewer crashes. (Except when Apple’s in the kitchen – crashing!)
I haven’t spent so much time testing the “reboot” feature of my machine since I was a Mac OS 8 System File engineer. Without saying too much about the device I’m supporting, I’ll say that it has LED outputs, like the caps-lock light on your keyboard. I’m in the early stages, so my program basically consists of a test routine that sprays eternally at the LEDs, making them do fun yet useless things.
It was a moment of triumph and joy when I finally got my head wrapped around the HID libraries such that I could make my device “dance.” Thanks, Apple DTS, for the excellent sample code and support. However, my joy was short-lived, as I noticed that after some number of dancing iterations, the LEDs just stopped. My program had frozen. And when I say stopped, I mean stopped beyond any conceivable point of revival. Force-quit can’t quit it. GDB can’t attach to it (!). Sample can’t sample it. This bad boy is just gonna hang out in my process list and dock until I restart the machine.
After much scrutiny of my own code, I did what I should have done from the beginning. I turned on Guard Malloc. I’ve sung the praises of this tool on the mailing lists but I don’t think I’ve mentioned it here yet. This debugging aid slows your application to an absolute crawl as it intercepts all memory allocations and sticks protected memory pages between every malloc’d block. The end result? The vast majority of your “overrun the array” type bugs are caught dead in their tracks.
I turned on Guard Malloc and launched my app. I started to get up for coffee, because as I said, Guard Malloc slows your app to molasses. To my surprise, the application had crashed almost instantly. Hmm – I wondered if this was the same bug that caused the freeze or just something else to look into (joy of joys!). In any case, I wasn’t going to be able to get to the bottom of the problem until I cleared all obstacles in the path. I sat back in my seat and started looking for clues.
Hmm, I’m crashing inside IOHIDLib. Surely I must be screwing something up. But what? In HID-ese these are called elements. You basically ask IOHIDLib to fetch your device, then you open up all the elements you want to write to. In HID-ese writing to the device is called a “transaction.” After you’ve configured a HID device and are talking with it, the rest of your application’s life consists of setting element values (e.g. LED on/off), committing the transaction (send it all over to the device), and then clearing the transaction (so you start out with default values next time).
I whittled my test app down to a simpler case:
- Open a device.
- Configure a one output transaction on the device.
- Clear the transaction.
Sure enough, my simple little applications crashes in the IOHIDLib with just these simple steps (when Guard Malloc is on – with Guard Malloc off you crash at some undetermined time down the road).
Crap. Crashing in Apple’s code. I’m going to have to get help from Apple, or else do some serious disassembly hacking to figure out what’s going on. Then I remembered that IOHIDLib is open source!
I downloaded IOHIDFamily-172.8. After a few tweaks (Apple’s open source projects often rely on Apple-Internal build paths and/or header files – but often you can work around it and get a working build), I was able to build a copy of the IOHIDLib.plugin bundle. With debugging symbols enabled, debugging this was going to be a breeze! I took a deep breath and copied my debugging binary over the Apple-supplied production version (after making a backup):
Phew! I can still use my mouse! Whenever I tweak system-level stuff I’m always a tiny-bit scared that I’ll end up making some foggy mistake that leaves me “bringing my system back” via SSH or single-user mode.
With the debugging IOHIDLib installed, all I had to do was re-run my application (Guard Malloc still on!). Sure enough, it dynamically picked up the new version of the IOHIDLib, and crashed with full source code at the offending line. The source file is IOHIDOutputTransactionClass.cpp and the crashing code is:
What’s wrong with this code? If you look carefully you’ll probably figure it out. The problem is in that continuation test. “While elementDataRefs[i] is not zero, and i is less than numElements, keep doing stuff.” This is a case of mixed-up order of precedence. In C, the “&&” operator evaluates left to right. So if you’ve got a dangerous test that’s only safe when a benign test is true, you have to write it “benign && dangerous”. If benign is never true, then dangerous never gets run. But here we always do the dangerous act, array indexing, before testing the index value! This code occurs in three separate places, each of which corresponds to a common IOHIDLib API for handling transactions with HID devices. No matter how many “output elements” I configure on my HID device, the IOHidLib is always going to read beyond its legal limit.
Now, in the vast majority of cases, the 4-bytes (indexing a long int) that immediately follow this array will probably not be part of a protected page. Heck, the fact that countless numbers of HID-interacting programs apparently deal with these APIs every day and don’t crash basically proves that. But it scares me. And if I ever see a random crash on my system with a similar stack crawl, I’ll know just who to blame!
It is precisely to avoid these types of bugs that I am not a huge fan of these “complicated for loops.” I would probably write this function in an elaborated, completely “too uncool for C school” form where the for loop only tests for “i<numElements” and the secondary (dangerous) test gets its own if-block inside the for loop. This removes all doubt. But for the purposes of quickly working around this crash, I replaced the three instances of the above code with the following:
With the fix in place, I was able to continue debugging my application. I fired up the debugger again with Guard Malloc enabled. I waited anxiously to see if my LEDs would keep flashing, or whether another crash would be uncovered.
Unfortunately, it just froze again. Yep, the nasty “can’t kill the application process in any way except rebooting” freezing. In this case, the bug I found in Apple’s open source didn’t turn out to be my bug. I’ll have to keep looking for the root cause of that. But the availability of this open source allowed me to fairly quickly get past it and move on to the next step of debugging. And that’s always a good thing. Thanks, Apple!
(IOHIDLib reading past array bounds reported as Radar 4417524)