The Red Sweater Review Of Instapaper’s Archiving Of John Siracusa’s Review Of Mountain Lion 10.8
July 26th, 2012Instapaper developer Marco Arment responded to the latest installment of John Siracusa’s famously elaborate Mac OS X reviews with an amusing, fairly elaborate review of the review itself. Each is worth reading, or at a minimum, each is worth reading later.
Siracusa’s article is particularly suitable to archiving for later perusal, as it clocks in at a whopping 24 “web pages.” Arment estimated it took him around two hours to read. Given my busy schedule and poor attention span, I suspect this will be split up into several shifts, reading a little bit when I get the chance.
Multi-page articles used to be a bugaboo for Instapaper. Faced with an article like Siracusa’s, it would happily save it to your reading list, but when you sat down to dig into the story, you’d be vexed to find you were stuck with only the first of 24 pages. Happily, in March of this year, the Instapaper bookmarklet was updated to support multi-page archival.
I have an Instapaper keyboard shortcut wired up, so when I’m looking at a page I want to read later, I just press control-p, and up comes Instapaper’s friendly “Saving” panel. When it’s saving a multi-page article, it updates the UI while it cranks through the pages.
I pressed the keyboard shortcut on Siracusa’s article, and settled in for a relatively long Instapapering lull. To my surprise, the save panel appeared and disappeared almost instantly. Uh oh, has Marco screwed up multi-page archiving for a canonical example of its usefulness? On an article that he himself has drawn additional attention to?
Nope. All the pages are there. I confirmed through Instapapaper that the complete, gloriously long article would be waiting for me on my subway ride later this morning. Kudos to Instapaper!
But how did it happen so quickly? Does Marco special-case certain popular pages like this in an effort to boost perceived performance? Or perhaps one of the subtle improvements over the years has been some kind of automatic server-side de-duping of archives. This would save Marco a bunch of space on his servers while also improving performance for users.
Archiving Advice
However Instapaper did it, archiving John Siracusa’s review of Mountain Lion 10.8 with Instapaper was, ahem, instant and complete. Would archive again.
July 26th, 2012 at 10:33 am
This is all good, but what I really want to know is… Will it blend? ;-)
July 26th, 2012 at 10:54 am
When I added multipage support, I wrote it in both the bookmarklet and the server-side crawler so that people’s multipage saves would still work from API clients that don’t execute the bookmarklet code (such as Twitter and RSS clients).
Only the bookmarklet can save pages behind a login barrier, but Ars doesn’t require a login. And there’s a weird Javascript bug in Safari that often crashes the tab when the bookmarklet tries to crawl the high number of pages in a Siracusa review on Ars. So, for Ars, the server-side multipage crawling is the most reliable option.
The night before Siracusa’s review was published, I added a new, as-yet-undocumented option to Instapaper’s bodytext configurations, “always_fetch_from_server”, and set that to “yes” for Ars Technica. This forces the bookmarklet to skip its content-capturing and multipage-fetching routines and just send the URL so the server will crawl it.
Unfortunately, the bookmarklet doesn’t provide any indication to the user that this has happened. I’ll try to figure out a good solution to that.
July 26th, 2012 at 10:56 am
I think he said on Build and Analyse that he doesn’t save identical articles multiple times.
July 30th, 2012 at 6:42 am
I was waiting for someone to review Marco’s review of Siracusa”™s review! Sadly, I think you could have done a more detailed analysis of Marco’s deconstruction of Siracusa”™s latest work. I must conclude that you can be easily distracted by shiny new features. Perhaps another reader here can provide additional insight by reviewing my review of your review of Marco’s review of John’s review.