A Stroke Of Luck
April 9th, 2010One of my customers reported a really subtle bug having to do with a keyboard shortcut for a command. I’m going to change some facts to protect the innocent, but imagine I got a report:
“Every time I press cmd-N to create a new document and start typing, instead of accepting the text, it just beeps at me.”
This seems impossible to me. Creating a new document and typing into it is fundamental to my product. Surely if it was broken I would know from my own daily testing of the application. I try diligently to reproduce it, but of course, I can’t. Create a new document. Start typing. Everything works, everybody’s happy. Except the person who reported the bug.
At some point in my tinkering I stumble upon the bug. Beep. I can’t type. There it is! What a drag. This application sucks!
Now that I’ve seen and heard the bug with my own eyes, I’m more committed than ever to fixing it. But how on earth did I do it? I try several more times and am unable to reproduce the problem.
I have only spotted the issue once in my dozens of trials, yet the customer claims it affects him every single time. What explains this? Maybe it simply doesn’t happen as often on my Mac as it does on his. My thoughts turn to the brute force of AppleScript. If this happens only once in a blue moon on my Mac, surely I can reproduce it by running a script that iterates the test conditions hundreds or thousands of times. I open AppleScript Editor and enter the following:
tell application "System Events" repeat 100 times keystroke "n" using command down keystroke "Testing typing" end repeat end tell
Nothing! Even after running it 1000 times, nothing. How can this be? How could something so common for the customer, something that I saw myself once, be so difficult to reproduce on my Mac?
After playing around a bit more, I started to reproduce it more readily. What was happening? I couldn’t figure it out, but something “felt” a little different when I reproduced it. I thought carefully about the parameters of the bug, as exemplified (I thought) by my script:
- The command key is pressed.
- The N key is pressed.
- The N key is released.
- The command key is released.
- Typing into the new document is attempted.
I looked again at the script. Perhaps this “keystroke” command is not literal enough. For all I know it’s circumventing the normal key events that get generated when you type on the keyboard. On a lark, I modified the script to be more explicit:
tell application "System Events" repeat 100 times key down {command} key down "n" key up "n" key up {command} key down "x" key up "x" end repeat end tell
Here we have a more literal match with my analysis of what happens. Hold down the command key, press the N key, release, then type something. In this case I just type an x character because I figure any typing is enough to trigger the bug.
I ran this 100 times as well and no luck. The bug is simply not reproducible through scripting. I guess I’ll have to figure out what conditions are making it more likely for the customer to run into the bug. Maybe he has some 3rd party software that is interfering. Or … wait a minute.
tell application "System Events" repeat 100 times key down {command} key down "n" key up {command} key up "n" key down "x" key up "x" end repeat end tell
I ran the modified script and the bug immediately exhibited itself. So what changed? Look carefully and you’ll see that my previous script assumed that when a keyboard shortcut is pressed, there is a sort of nested symmetry to the order of pressing and releasing the keys. When I press keyboard shortcuts, I hold the modifier keys down throughout the stroke, and release only after the letter key has been released. But this customer has muscle memory that inclines him to instead release the keys in the order they were pressed. When he releases the N key, the command key has “long” since been released.
This subtle difference turns out to trigger a bug in my event monitoring code that, to make a long story short, robs first responder status from the default view in the document. So when he creates a new document, there is no first responder, and his typing just causes a bunch of beeps. When I create a new document, there’s always a first responder, so I don’t see the bug.
As difficult as this bug was to track down, it would have been near impossible without the theory testing, disproving, and ultimately proving tool of using AppleScript and System Events to zero on in the behavior. Sometimes when the steps a customer provides to reproduce a bug seem exactly the same as what you’re doing, you’re just not trying hard enough to find the difference.