My day started off fairly simple, I wanted to write a function that could take a string of things like “True” and “Yes” and return boolean true, thusly “False” and “No” would return me boolean false. PHP has a function that can handle the basics of this already, filter_var, however it only works on variants of true and false, not yes or no. Keeping with PHP’s strtolower and strtoupper naming I named this function strtobool and stuck it in my utility function namespace.
I then compared it speedwise to the built-in filter_var function, even though filter_var cannot do what I want at the scope I want it – it would still be a good test to see if it was even worth supporting “Yes” and “No”. Here are the results of that test.
Original Idea, Broken Down
Disclaimer, I know there will be almost zero production use cases for this. Just bear with me. I’ve run these tests on PHP 5.3.2 and 5.3.6, and ran them multiple times for an average set.
PHP Code:<?php
/*
Jaina:Tools bob$ php strtobool1-test.php
zen\util\strtobool: 0.189sec
filter_var: 0.070sec
*/
namespace zen\util {
function strtobool($input) {
return (boolean)preg_match('/^(?:true|t|yes|y)$/i',$input);
}
}
namespace {
$start = 0;
$a = 0;
$start = gettimeofday(true);
for($a = 0; $a < 100000; $a++) {
zen\util\strtobool('False');
}
printf("zen\util\strtobool: %.3fsec\n",(gettimeofday(true)-$start));
$start = gettimeofday(true);
for($a = 0; $a < 100000; $a++) {
filter_var('False',FILTER_VALIDATE_BOOLEAN);
}
printf("filter_var: %.3fsec\n",(gettimeofday(true)-$start));
}
?>
Needless to say, 0.189 compared to 0.070, I was a little heartbroken. I tried several other methods instead of preg_replace, and preg_replace actually was the fastest out of all I could come up with. Elizabeth suggested to me that I use filter_var for true/false, and then only test yes/no if filter_var returns null. This sounded like a decent enough solution so the first thing I did was move the filter_var call to my function and tested that just out of nervous habit.
PHP Code:<?php
/*
Jaina:Tools bob$ php strtobool2-test.php
zen\util\strtobool: 0.210sec
filter_var: 0.070sec
*/
namespace zen\util {
function strtobool($input) {
return filter_var($input,FILTER_VALIDATE_BOOLEAN);
}
}
namespace {
$start = 0;
$a = 0;
$start = gettimeofday(true);
for($a = 0; $a < 100000; $a++) {
zen\util\strtobool('False');
}
printf("zen\util\strtobool: %.3fsec\n",(gettimeofday(true)-$start));
$start = gettimeofday(true);
for($a = 0; $a < 100000; $a++) {
filter_var('False',FILTER_VALIDATE_BOOLEAN);
}
printf("filter_var: %.3fsec\n",(gettimeofday(true)-$start));
}
?>
Wait a minute. Why is my function now even slower than it was before? It literally now at this point is just a wrapper around filter_var. Are you serious that so much extra time is spent on the function lookup? We all already know, PHP is generally thought of as “slow” it is something we accept for the flexibility it provides. Native code stuck in PHP modules should always run faster than PHP code itself, that is just an accepted fact of any interpreted language.
Also, I sort of thought preg_match might be faster than the filter function set, and it says so right there. We were not even testing for that.
So now just for grins, I copy pasted my function into the Global Namespace, and compared all three functions.
PHP Code:<?php
/*
Jaina:Tools bob$ php strtobool3-test.php
zen\util\strtobool: 0.209sec
strtobool: 0.117sec
filter_var: 0.077sec
*/
namespace zen\util {
function strtobool($input) {
return filter_var($input,FILTER_VALIDATE_BOOLEAN);
}
}
namespace {
function strtobool($input) {
return filter_var($input,FILTER_VALIDATE_BOOLEAN);
}
$start = 0;
$a = 0;
$start = gettimeofday(true);
for($a = 0; $a < 100000; $a++) {
zen\util\strtobool('False');
}
printf("zen\util\strtobool: %.3fsec\n",(gettimeofday(true)-$start));
$a = 0;
$start = gettimeofday(true);
for($a = 0; $a < 100000; $a++) {
strtobool('False');
}
printf("strtobool: %.3fsec\n",(gettimeofday(true)-$start));
$a = 0;
$start = gettimeofday(true);
for($a = 0; $a < 100000; $a++) {
filter_var('False',FILTER_VALIDATE_BOOLEAN);
}
printf("filter_var: %.3fsec\n",(gettimeofday(true)-$start));
}
?>
Ho-lee-crap. You mean to tell me that the exact same function, just placed in a namespace, is almost twice as slow to lookup and run than the same function in the global namespace? I am not a PHP Internals guy because I have naturally high blood pressure, but as a developer who uses PHP every day, I cannot believe the time differences between these three functions. The time difference between the namespaced version and the global version alone is staggering.
However, the difference between filter_var and the global scope strtobool is more what I was expecting to see the first time! This means the problem has to be in Namespaces themselves, and probably in the namespace/function lookup/resolution area.
Getting Biblical
Based on this I am going to go ahead and call it that functions in namespaces need to be re-evaluated and streamlined ASAP. Honestly I am not interested in why my wrapper version is so slow, nor why the namespaced version is so much slower than that even. From the standpoint of an end user I could not care any less about why. What I do care about is the end result, and the end result is that it IS slow. I am thinking some of the features of namespace name resolution probably just NEED to go and just force people to be less dumb when they write code.
“But Bob,” you might contest, “that is stupid you would never actually do that in production code.” You might be right, in most practical situations, you would never loop call the same function 100,000 times in a row. If I call both functions just once, obviously the time spent is not even calculable because it does happen so fast.
However streamlining begins in places like this, time saved here can have a huge impact on a full web server under load, where when your hardware is near its breaking point problems feel like they stack exponentially instead of linear as you might expect.
Getting Philosophical
If this is as good as Namespaces will ever perform, then I may be forced to re-evaluate if even using namespaces will be worth it. If I am contracted to write a performance driven PHP 5.3 application can I in good conscience use namespaced code in this current state? Should I continue the development of my own Namespaced framework? I have spent so much time telling people how fast my framework is, and here it seems I can literally cut the time in half just by refactoring it to get rid of the namespaces since nearly every call is to a namespaced function. Am I just being overly picky? I do not think so.
In the end in an actual production use case the time differences are hard to measure under ideal conditions. It is just something to think about. Something important to think about.
tl;dr
- native functions are fast
- global functions are decent
- namespace functions are slow
