May 102008

Going trough some PHP code I have to clean up I’m a bit surprised at finding that it can take many lines using everything from preg_match to addslashes and mysql_escape_string to sanitize numeric input.
So I thought I share a method that I find handy:

$id = (int) $_GET["id"];

In some cases it might want to handle any attempts to input a string where an number is expected though.
When converting a string to an integer PHP will default to 0, rather than something handy like NaN.
Usually it’s not too much a problem if someone tries to hack you site and only gets directed to the contents for “id=0″ when they try to input “id=1;DROP TABLE users”.

But one would think that this should work as a check:

$id = (int) $_GET["id"];
if($_GET["id"] == $id){
	echo "ok";
} else {

But even if $_GET["id"] is “foo” it will equal 0, again because PHP will evaluate a string that doesn’t represent a number as 0.

Instead I use this to check if the input string actually evaluates to a number:

	echo "ok";
} else {
Sep 192005

First of all I’m happy to have received note that my blog have been added to MXNA today.

They tried to add it on friday, but apparently their parser didn’t like my feed.
It contained an “feff” tag before the XML prolog I was told.

The feed was fine according to feedvalidator, other aggregators managed to parse it and it worked fine in my feedreader.
I double checked my PHP script that generates the feed and assured myself that there was no erroneous garbage.
Google help me!!

“FEFF” turns out to be a quite perculiar unicode character.
It’s used to signify the byte-order of unicode documents, and also have the meaning “ZERO WIDTH NO-BREAK SPACE”….hmmm basically meaning it means nothing.
Using UTF-8 for my feed there is no byte order needed though, and if there ever is a need for a “ZERO WIDTH NO-BREAK SPACE” I’m curious to find out what it could be.

So what was the byte order mark (BOM) doing in my feed?
Apparently it’s because I used notepad to make a quick edit to the document and told it to save as UTF-8.
Many apps will add the BOM also to UTF-8 encoded documents, not to signify a byte order, but to signify that the document is unicode.
But a parser might not be written to look for the BOM and strip it out if it’s found, and hence choke since it’s expecting the file to start with an XML prolog.
And for the parser there is no need to use the BOM to find out if the document is unicode. It will look at what encoding you have specified in the prolog, and assume UTF-8 for a document without encoding specified.

I guess that in my case the reason that the BOM ended up in the feed is that I use an include to get my db connection settings.
If the BOM appears before the PHP declaration I guess there should be no problem and it will not be included in the feed.
But with a BOM in the include you get a corrupted feed which not all parsers can handle.

In the end I used my preferred editor, SciTE, and saved the files as “UTF-8 cookie” to ensure that there was no BOM, and problem was solved.

Switch to our mobile site