Crappy Crash LogsDecember 6th, 2007
Developers who write code inevitably write code that crashes. Luckily for our users, 99% of those crashes happen on our own development machines, and we get the opportunity to fix them before shipping them off to our unsuspecting computers.
But every once in a while even the most polished and refined software exhibits that nastiest of malfunctions: the crashing bug. In the middle of a possibly important user task, the application comes screeching to a halt and disappears, leaving the user with nothing more than a cryptic message from the operating system, along with an offer to send the crash log off to Apple.
Needless to say, crashes suck. A lot. For users, especially. If there’s one behavior of your application that you should focus on eliminating, it’s the behavior of crashing. Above all other “nuisances,” this is the one that is absolutely unacceptable.
Which is why crash logs are so important. Occasionally, when a user is sophisticated enough to understand the value of the crash log and, instead of choosing to send it to Apple, copies and pastes it into an email to you, you should be overjoyed! Finally! A crash has been captured in the wild and delivered to you for easy inspection. This artifact not only proves that a crashing bug exists, but usually includes highly detailed information about where and how the crash has happened.
Which is why it boggles the mind to think that some developers see these crash logs as relatively useless. In my own experience of reporting bugs to other developers, I have found frequently that my crash log is considered more as a statistic than as a specimen. While I expect a response like “great, I see what the bug is, and I’ll get this fixed,” I often get responses more like “sigh, I’ve seen this a few times but have no idea what’s going on!”
Sometimes I find that the developer in question simply needs to be educated about the tools at his or her disposal. I imagine the hours wasted speculating about what the crash log means, and can’t help but think if they knew how to analyze it, they could have the bug solved (in many cases) in moments. Oh – and I’m sounding high and mighty right now, so before I get too carried away, let me just say that I too need to be educated on a great many things, but making sense of crash logs is not particularly high on that list. I leave it to other bloggers to fill in the gaps of my education.
Crash Logs: The Crappy Way
So what is this big thing that developers are missing? Well, a crash log essentially shows the “path” that the little hamsters in your computer were running on when they fell off the cliff. It paints a picture of where the program got misguided and went off doing something it never should have done. Problem is, the picture is a little blurry. Below is an actual crash log submitted by an actual MarsEdit user (abbreviated and condensed to fit the blog):
Thread 0 Crashed: Foundation 0x93cc2a2e _NSAPAddPage + 94 Foundation 0x93cc3b27 __NSAutoreleaseObject + 263 Foundation 0x93cc3962 -[NSObject(NSObject) autorelease] + 18 Foundation 0x93cceaca +[NSString stringWithFormat:] + 106 marsedit 0x000637fb 0x1000 + 403451 marsedit 0x00053883 0x1000 + 338051 marsedit 0x000539ea 0x1000 + 338410 marsedit 0x00053d7b 0x1000 + 339323 marsedit 0x00053e45 0x1000 + 339525 marsedit 0x0005238b 0x1000 + 332683
For any non-programmers who are still reading (bravi!), you may have a little trouble understanding what the heck this means. This is an ordered description of the path the computer followed while trying to do something. Problem is it ended up crashing. The path is shown in reverse-order, so the crash is at the top of the list, inside Foundation (an Apple system library). Trying to make human sense of this goes something like this:
- Something in MarsEdit at address 0x0005238b…
- … called something in MarsEdit at address 0x00053e45
- … called something in MarsEdit at address 0x00053d7b
- … called something in MarsEdit at address 0x000539ea
- … called something in MarsEdit at address 0x00053883
- … called something in MarsEdit at address 0x000637fb
- … which called a standard string library method “stringWithFormat”
- which eventually crashed in the system.
You can sort of see how some developers are forced to throw up their arms in despair, right? But what if you could turn this muddled picture into something a bit sharper? The fact is, you can. The reason the picture is so blurry is because MarsEdit shipped without debugging symbols. The reasons for this are mostly to keep the shipping application as small as possible, but there are also some good reasons to hide the symbols if they reveal too much about your business strategy.
Crash Logs: The Happy Way
Many developers seem to think that gathering useful crash logs requires shipping a symbol-laden application, but aren’t willing to do so. The fact is, you can have the best of both worlds by shipping a symbol-stripped version of your application, but keeping a symbol-laden version on-hand for the developer’s convenience.
What am I getting at? Apple offers a tool called “atos”, which makes it relatively easy to map an address from a symbol-stripped application to a name in a symbol-laden version of that same application. As long as the two applications were built with the same sources and using the same build options, the addresses are bound to match up. It’s just the symbols that are “stripped” from the shipping application.
Let’s take a look at what happens when I apply my atos powers to the crash log I pasted above. From the Terminal, I invoke atos with the literal memory addresses of all the mysterious points in the “path” from the crash log. You can list the addresses in whatever order you want, but I’m listing them so that the result will be in order from top to bottom, the same way I reasoned out the crash in plain “crappy” English above:
iBook> cd MarsEdit.app/Contents/MacOS iBook> atos -arch i386 -o MarsEdit \ 0x0005238b 0x00053e45 0x00053d7b 0x000539ea 0x00053883 0x000637fb
Note also that I specified the architecture explicitly. This is because I am debugging this on a PowerPC Mac, but the crash log I received showed that the customer is running on an Intel machine. This yields a magical response with, behold, real meaningful symbols!
-[RSXMLRPCCall invoke:] (in MarsEdit) (RSXMLRPCCall.m:397) -[RSXMLRPCRequest requestText] (in MarsEdit) (RSXMLRPCRequest.m:239) -[RSXMLRPCRequest serializeParams:] (in MarsEdit) (RSXMLRPCRequest.m:216) -[RSXMLRPCRequest serializeDictionary:toString:] (in MarsEdit) (RSXMLRPCRequest.m:153) -[RSXMLRPCRequest serializeData:toString:] (in MarsEdit) (RSXMLRPCRequest.m:123) -[NSData(extras) base64StringWithLineLength:] (in MarsEdit) (NSData+extras.m:215)
Now I can apply my human sense again, but with the benefit of symbolic names instead of raw memory addresses:
- MarsEdit invoked an RPC network request
- … which while building the actual request cargo text
- … was trying to serialize the parameters
- … when it tried to serialize a dictionary object
- … whose data
- … upon being converted to a Base64 representation
- … caused a “stringWithFormat:” call that went blammo!
Not only does the plot to this story make a lot more sense now, but I’ve even got handy pointers to the line numbers in the source code. Let’s see, NSData+extras.m, around line 215:
NSString *charString = [NSString stringWithFormat:@"%c", base64EncodingTable [outbuf [i]]];
Survey says? That’s the line I’m going to zero in on now as I try to figure out why this crash is occurring. Sure, I’ll have to now speculate as to why, and under what circumstances the observed crash might occur. So the magic of “atos” doesn’t take all the work out of debugging a crash log, but it brings most crashes out of the “frustratingly impossible” into the realm of “all in a day’s work.” I can use my knowledge of the situation to ask intelligent questions of the user, and hopefully get to the bottom of it, so that no other user ever has to see the nasty crash again.
I can do more than just tell the user “thanks for the report, I have no clue what’s happening here. Sorry.”
Doing It For Yourself
I hope I’ve intrigued you by the power of what I was able to achieve with “atos”, but of course to do the same with your own project, you’ll need to make sure you can come up with a “symbol-laden version” of your release app, so you can apply the same magic. The best way I know to do this is to simply add another build configuration in xcode, that is just like your “Release” configuration, but doesn’t strip out debugging symbols. You can read more about this approach and others in Apple’s fine documentation.
If you find yourself dealing with crash logs on a regular basis, it might be a good idea to make building a “symbol-laden” version of your app part of the standard build and distribute process. For myself, I’ve found it happens rarely enough that I’m happy with a process whereby I re-checkout a tagged version of the sources from Subversion, and then rebuild a copy using the “Release With Symbols” build configuration.
Nota Bene: If you’re on Leopard and trying to replicate a symbol-laden version of an app you shipped with Tiger, you’ll probably have to tweak Xcode to use the old linker, in order to get a build whose memory addresses match up with your old Tiger-built application. Might be easier to just go back to Tiger and rebuild it there.
I know this was probably common knowledge for some of you, but I also know from experience of chatting with my colleagues that this is completely new, forehead-slapping “why didn’t I know about that?!” type material for many others. I hope it saves you hours of work. Possibly even enough hours to continue reading long articles such as this one!
December 6th, 2007 at 4:06 pm
The next step here is to build a database that holds the symbols for each released version of your applications. Stand up a web site that accepts customer crash logs, automatically resolves the symbols from the appropriate versions in the database, and then archives the crash in another database table. From there, build an internal web view that lets you browse the reported crashes, search in them by symbol frequency, etc.
December 6th, 2007 at 4:11 pm
The step after that is to integrate PayPal with your website and start charging developers to use it : )
December 6th, 2007 at 4:17 pm
That sounds really cool, but it definitely sounds like it comes some time after the “making a symbols-laden build” part of your standard deployment process :) In other words, until the burden of processing crash logs becomes heavy enough, I don’t think it’s worth going too out of my way to automate the process.
December 6th, 2007 at 4:52 pm
This is really helpful, Daniel. Thanks!
December 6th, 2007 at 5:13 pm
Thanks in advance. I feel it’s safe to say this *will* be a lifesaver.
December 6th, 2007 at 5:40 pm
Thanks for this info! Typically the class involved has been enough for me to narrow down where something is happening, but this could come in handy…
December 6th, 2007 at 5:51 pm
There is another way to symbolize a crash log, without having to distribute a symbol-laden app. Look at this tech article:
December 6th, 2007 at 5:55 pm
I’ve been doing this for a few years and it’s been helpful. I have a framework that alerts the user when an exception occurs and also if the application crashed previously. The UI has an email button so the user can email me the crash log. When my program launches, I scan the log files to see if there are any new crashes since last launch and put up the alert. I also include a look up file that matches addresses to method names, so I don’t have to run atos on the submitted crash log. I should probably open source the whole thing, but it relies on other frameworks which make it harder to include in another application.
December 6th, 2007 at 6:44 pm
There’s no need to have a separate build configuration (which is a pain to keep synchronised anyway).
We run a script at the end that copies the application at the end of the build and strips at that point.
I suspect that we could improve the process now that you have the option of using dSYM bundles with DWARF. I’d have to check to see if that leaves any unwanted symbols in the main executable.
I’d also add that we do all release builds on a build server which has the advantage of isolating potential differences in developer machines and gives us access to the exact build objects and logs for any release.
December 6th, 2007 at 7:49 pm
If you’re using DWARF, there’s an additional bit of information to tell you which version of your app crashed.
Check the bottom of the crash log:
0x1000 – 0x1ffe +a.out ??? (???) /Users/gparker/a.out
That random number is a UUID generated by the linker. Every time you change your code and recompile, the built binary gets a different number. Use the UUID to make sure you’re reading symbols from the same version and architecture that your user was running. This works best if your build generates a fully-symboled binary that you then strip and ship, or if you build to a separate .dSYM file.
% otool -lv -arch i386 a.out
uuid 0x55 0x28 0x0d 0x01 0xe7 0x09 0xb4 0x7f
0x24 0x4f 0xfb 0x9e 0x56 0xdc 0x36 0x0d
December 6th, 2007 at 7:51 pm
Of course, the crash log puts the uuid in <>, and the live preview didn’t show that.
0x1000 – 0x1ffe +a.out ??? (???) <55280d01e709b47f244ffb9e56dc360d> /Users/gparker/a.out
December 8th, 2007 at 3:03 am
Sure, it is a helpful post. Thanks, Daniel
December 8th, 2007 at 12:13 pm
This is tremendously useful– thanks for the detailed explanation of how to use this tool!
December 9th, 2007 at 5:21 am
What’s always seemed strange to me (as a non-programmer) is that while the crashlog is sent to Apple by default, it *isn’t* automatically sent to the third-party developer. For a long time, I simply assumed that the developers were informed by Apple, until Michael Tsai disabused me of that assumption in the course of a late-night troubleshooting email exchange in response to a SpamSieve crash – a crash which he was notified about without my intervention. He informed me that developers had to write their own generate-a-message hooks into Apple’s logging system. Yuck. How difficult is it to do? If Apple can generate a message to itself, presumably the engine to do it is available to any programmer, yes?
Regardless of whether developers know how to interpret Apple’s crashlogs or not, I would think that a developer would want to have whatever information Apple is collecting about his app and its problems…
December 9th, 2007 at 11:32 am
Peter: It’s true that Apple doesn’t share any of that information with developers, and it’s a real sore spot for lots of folks. As far as the difficulty of “hooking in” ourselves, it’s not trivial to cover all the bases and do it 100% correctly. So like most things, for most developers it becomes a lower priority than simply hacking away at new features.
For an application that has a rampant crashing problem I think some kind of “send the report in automatically” system is a good idea to try to reel in the problem.
December 9th, 2007 at 11:53 pm
To be clear, the log isn’t sent to Apple by default. No logs come in at all unless the user clicks “Send to Apple”. I wonder how many people do that, and how many others just close the window and ignore it.
December 11th, 2007 at 2:16 pm
That’s awesome. Thanks for the heads up on this, I foresee it being very useful.
December 18th, 2007 at 5:20 am
Eric: You’re absolutely correct, of course – the crash info doesn’t get sent to Apple without the user’s OK. Since I always send crash info on to whichever relevant party asks for it, I suppose I assume that everyone else does the same, as a matter of computing hygiene :-/