Find Gremlins With BBEdit

February 26th, 2008

MarsEdit has a little shortcoming that can cause a vexing situation from time to time. It takes what you type or paste into it a bit too literally. So if you flub up at the keyboard and make some wacky keystrokes, you might end up with a weird “invisible character” in your blog post. What happens on most systems when you then go to publish, is you get a very unfortunate error message in return.


“Parse Error. Not well formed.” How lovely!

These tricky characters are especially sinister because you can almost never see them. When you end up with one of these bad boys in your post, the only hint you might get is if you are moving the cursor around with the arrow keys, you might see it “hiccup” a second while it stops on the invisible character.

I’m sure there are a lot of different names for these unwelcome guests in text, but I like the one the folks at Bare Bones use: “gremlins”. Their fine editor, BBEdit, has a dedicated tool just for rooting these suckers and either eliminating them or making them visible: Zap Gremlins.

I’d like to add something similar to MarsEdit, so I can spare users the pain of having to figure this out when run into an error dialog such as the one above. But in the meantime, I’ve been resorting to handling the customer support inquiries by myself taking the user’s example text into BBEdit and looking for gremlins. Then I can point out the location of the offending character to the user, they backspace it out of existence, and life goes on.

As relatively painless as BBEdit’s function makes the task, it’s not really perfectly suited to what I need. It’s more aimed at eliminating the beasts than examining them. Whereas I want as much information as I can get about them, so I can effectively communicate to the user (and also so I can catalog what types of characters users are running into trouble with).

Thanks to AppleScript support for the Zap Gremlins function, I was able to whip up a pretty handy script to streamline this operation. Find Gremlins takes the text contents of the clipboard, runs Zap Gremlins on it to find the gremlins, and then displays a summary of what it knows about them.

It gives me all the details I would normally have to work hard to figure out. What is the character? Where is it exactly, and most importantly of all, shows the character in context, replaced by a bullet for easy visibility. Although adding the text to a BBEdit document and manually examining the gremlins wasn’t too much of a time waster, it was still a bit boring and tedious.

I realize finding gremlins in the clipboard text will be useful to approximately 0% of you, but I thought it was a good opportunity to demonstrate how AppleScript support for the features of an application can turn out to serve incredibly particular needs. Thanks, Bare Bones! Zap Gremlins is saving me time, and helping me get my customers back to blogging in comfort.

Update: Thanks to reader Daniel Blanken who noticed a bug in the script that prevented it from finding gremlins very close to the beginning or ends of the selected text. I’ve updated the script with his fixes.

11 Responses to “Find Gremlins With BBEdit”

  1. Mike Says:

    Just curious – are the gremlins ever useful at all? Is there a reason why you’d want to keep them?

    Is it possible to explicitly disallow people to enter gremlin characters in your app, or is that more of a system-controlled thing?

    I *think* I’ve hit problems with gremlins in the past, but I’ve never been too sure what I was dealing with.

  2. Julian Grey Says:

    I usually end up with these when cutting and pasting text from a web page open in Safari to BBedit, the funny thing is that it never happens when cutting from Firefox.

    Hmm but I have never ended up “keyboarding” them in. I would be nice to figure out where they come from.

  3. Ölbaum Says:

    I have a hard time understanding how an invisible character can render the XML of the request not well-formed. XML uses Unicode. If the ‘gremlin’ comes from the body of the post, then it should not be a problem. The only way I could see a problem is if they appear in the XML tags, which should not happen as the tags of the request are generated by MarsEdit and not directly from user input.

  4. Daniel Jalkut Says:

    Mike: That’s a good question. I suspect they’re never useful, but one of the things I am sensitive to with MarsEdit is not “mucking with” the blogger’s content. So I’d want to be very sure before automatically stripping them out. Currently what I’m thinking is maybe I can just make them show up as very visible red warning icons in the text. Then you’d be able to see it and delete it, but also leave it if it’s for some reason purposeful.

    Ölbaum: I don’t think the XML being generated is technically invalid, but just triggers a problem on the servers it sends to (in this example, WordPress). But that raises a good question: perhaps this should also be reported as a bug to systems that don’t handle such content gracefully.

    Daniel

  5. John Gruber Says:

    The reason these gremlins render the resulting XML invalid (I think) is that they’re *bytes*. Unicode can support any character, but you can’t insert random bytes in the middle of a UTF-8 stream. BBEdit’s Show Invisibles command is useful for spotting these things, too.

  6. Daniel Jalkut Says:

    Gruber: I’m not 100% sure, but I think the fact that the invisible characters are still legal “ASCII” – they’re a legal part of UTF8.

  7. sairuh Says:

    I ran into this recently after pasting stuff I had copied from a log file (from Console) into a blog entry. Was I glad to have found a solution in the MarsEdit forum to fix the problem!

    Daniel, that’s a great idea to highlight or display warnings where these wacky characters appear in the content.

  8. Miraz Jordan Says:

    I’m one of that 0%. These things just creep in from copy and pasting, or from other sources, by mysterious means.

    I have BBEdit and often use Command J to edit posts there before sending to my blog. The red upside down question mark is very handy in BBEdit.

    It would be kind of useful to be allowed to display invisible characters in Mars Edit, for those of us who do lots of writing, and editing quotes from others….

  9. Cameron Hayne Says:

    In case someone finds it useful, here’s a Perl script that checks for “gremlins” in text files and prints the lines containing them.
    (I’m using the PRE and CODE tags around this script in the hope that it might make it more readable – it doesn’t seem to work in the live preview but I’m hoping that is just a bug.)


    #!/usr/bin/perl

    # announceGremlins
    # This script is intended for checking text files for the presence of gremlins.
    # (A "gremlin" is the BBEdit name for an unwanted character, often invisible,
    # that has somehow gotten into your text file.)
    # The current implementation of this script looks for characters that
    # are not in the POSIX printable character class.
    # You can supply filenames as command-line arguments, or send the text you
    # want checked via STDIN.
    # Cameron Hayne ([email protected]) February 2008

    use strict;
    use warnings;

    while ()
    {
    chomp;
    my $line = $_;
    while ($line =~ /[[:^print:]]/g)
    {
    my $colNum = pos($line); # we want a 1-based column number
    my $before = $`;
    my $after = $';
    print "Non-printable character at column $colNum of line $. of file $ARGV\n";
    print "$before?$after\n";
    }
    }
    continue
    {
    # reset line numbering on each input file
    close ARGV if eof; # Not eof()! (eof with parentheses is different)
    }

  10. Charles Says:

    This has bitten me a few times too (more often when creating HTML from TextEdit using docs people have created on Windows). Tex-Edit Plus, the still-going free text editor, has a “cleanup” function which will strip out high ASCII (which these usually are to cause the problem; low ASCII such as CR/LF etc don’t trouble it).

    But I’d echo other folk saying that it would be good to have highlighting, at least for high ASCII (red?) and maybe even for low ASCII (green?). Though as you say it might turn out to be one of those functions that only 0% really need, and those who don’t know about it would still send support queries. Unless, hmm, you had such a function turned on by default – people would see the measles spots in their text and wonder what it was.

  11. dave e Says:

    this whole thing just bit me in my scripting. I now manually check for these but now that I can automate it makes it even better.

    thanks daniel for the great support !

    Dave in Anchorage

Comments are Closed.

Follow the Conversation

Stay up-to-date by subscribing to the Comments RSS Feed for this entry.