First of all I’m happy to have received note that my blog have been added to MXNA today.
They tried to add it on friday, but apparently their parser didn’t like my feed.
It contained an “feff” tag before the XML prolog I was told.
The feed was fine according to feedvalidator, other aggregators managed to parse it and it worked fine in my feedreader.
I double checked my PHP script that generates the feed and assured myself that there was no erroneous garbage.
Google help me!!
“FEFF” turns out to be a quite perculiar unicode character.
It’s used to signify the byte-order of unicode documents, and also have the meaning “ZERO WIDTH NO-BREAK SPACE”….hmmm basically meaning it means nothing.
Using UTF-8 for my feed there is no byte order needed though, and if there ever is a need for a “ZERO WIDTH NO-BREAK SPACE” I’m curious to find out what it could be.
So what was the byte order mark (BOM) doing in my feed?
Apparently it’s because I used notepad to make a quick edit to the document and told it to save as UTF-8.
Many apps will add the BOM also to UTF-8 encoded documents, not to signify a byte order, but to signify that the document is unicode.
But a parser might not be written to look for the BOM and strip it out if it’s found, and hence choke since it’s expecting the file to start with an XML prolog.
And for the parser there is no need to use the BOM to find out if the document is unicode. It will look at what encoding you have specified in the prolog, and assume UTF-8 for a document without encoding specified.
I guess that in my case the reason that the BOM ended up in the feed is that I use an include to get my db connection settings.
If the BOM appears before the PHP declaration I guess there should be no problem and it will not be included in the feed.
But with a BOM in the include you get a corrupted feed which not all parsers can handle.
In the end I used my preferred editor, SciTE, and saved the files as “UTF-8 cookie” to ensure that there was no BOM, and problem was solved.