Easy Endian-ness
January 25th, 2006A lot of my readers have noticed that this blog covers extremely geeky programming topics that often fly right over the heads of less technically-submerged people. It’s not an intelligence thing – it’s all just jargon and experience. So every once in a while I’d like to pull my head out of my assumptions about who is reading the blog, and tackle something considerably more “simple.” Topics that you must understand if you’re a developer, and may want to understand as a curious non-developer.
Tim Gaden points out on his Hawk Wings blog that the term “endian-ness” has become widely used in the Mac community, and it’s leaking out of the labs and into common conversation with customers. He astutely observes that many people haven’t got the foggiest idea what it means.
In the context you’ve probably been hearing it lately, it has to do with differences between two computer chip architectures – particularly Intel and PowerPC.
PowerPC is a “big-endian” while Intel is “little-endian”.
A hexadecimal (hex, base 16) number uses the numerals 0-9 (like decimal) but adds the letters A-F, so it can represent larger numbers with a shorter number of characters. Other than that, it’s constructed like a normal decimal number, when talked about and written by humans. For instance, here’s a hex number:
0x11ff
(The 0x at the beginning is common shorthand for “this is a hex number”).
On a PowerPC system, that is exactly how the number is stored in memory. The “Big End” is the end farthest to the left. This is what you’re used to with decimal numbers. It’s why 3,000,000 dollars is a lot of money. Because the characters that “mean a lot” are on the left. On an Intel machine, the number is stored “with it’s little end in first” – so it looks backwards:
0xff11
As a very practical example, let’s say I run a program that stores a directory of all my friends. It was written for a PowerPC computer, and as part of its data format, writes the total count of friends as a number:
0x0001
You don’t have to be an expert with hex numbers to guess that the “logical value” of this number is ONE. Now, what happens when this program is run on an Intel machine? If it just reads it verbatim from disk, it ends up looking exactly the same in memory as it did on the PowerPC, but it means something completely different. Since the “little end” goes in first on an Intel machine, the number is interpreted backwards, and becomes equivalent to this PowerPC-based value:
0x0100
So instead of 1 friend, the program running on Intel thinks I have 256 friends. Hey, I like this byte-order problem! But when the program proceeds to try reading those 255 non-existant friends from disk, your data format is hosed, and the application either crashes or behaves very strangely.
The solution for most of these problems is something called “byte swapping.” This makes it the responsibility of the programmer to ensure that bytes that went to disk on whatever architecture come back into memory in the appropriate format for the current chip. A byte on a computer is exactly the amount of data that uses up two characters in a hex number. So using the above example, the bytes in question are “0x00″ and 0x01”. If the bytes went to disk in big-endian format (0x0001), they need to “trade places” when they’re read back on Intel, so that they still mean ONE in little-endian (0x0100).
Apple has done a lot of the work for developers in this transition. Thanks to the growing use of highly abstracted data formats, Apple was able to handle the grunt work for things like preferences storage automatically. But for developers with custom data types, who have not planned for endian-ness issues, the announcement that Apple would be moving to Intel was a major wake-up call. They’d have to revamp their data storage and retrieval strategy so that they were always capable of “doing the right thing” regardless of the architecture. For most Mac developers, this means “assume it’s always big-endian, and byte swap if necessary.” This assumption might change over time if little-endian processors like Intel’s end up being what the Mac sticks with. But for the time being, assuming big-endian means that the data formats can be passed seamlessly between existing applications and their Intel-savvy counterparts.
The length of even this “easy” overview of endian-ness proves that it’s actually a complex and difficult concept. I have barely scratched the surface but I hope this helps put things into a bit more perspective. The next time you hear geeks yammering on about endian-ness, perhaps you’ll have a quip or two to interject!