If you use arrays in PHP, one of the most common tasks you’ll find yourself doing is determining if Item A is in Array X. The function you would probably use in this case is PHP’s in_array.
[php]bool in_array ( mixed $needle , array $haystack [, bool $strict = FALSE ] )[/php]
This function works great and I recommend sticking to it when it makes sense. However, when you’re dealing with a very large haystack and need to run in_array() on thousands of values, you’ll discover that in_array isn’t particularly fast when cumulated over thousands of calls. Having recently run into this situation, I set up a little experiment to try two different approaches to in_array().
The haystack in my experiment was an array containing 60,000 strings that were 50 characters in length as values.
[php]$arr = array(“String1″,”String2″,”String3”, etc…)[/php]
The needle was a string of 50 characters.
Method A – Using in_array()
[php]if (in_array($needle, $haystack))
echo(“Method A : needle ” . $needle . ” found in haystack
Method B – Using isset()
Basically, I reformatted the haystack so that the values of my original array became keys instead and the new value for each key was set to 1.
[php]foreach(array_values($haystack) as $v)
$new_haystack[$v] = 1;[/php]
So my haystack became :
[php]$arr[“String1”] = 1;
$arr[“String2”] = 1;
$arr[“String3”] = 1;
Then, all you need to do is look up the key:
echo(“Method B : needle ” . $needle . ” found in haystack
Method C – Using array_intersect()
When all you really need to know is if needle is in haystack, using array_intersect() can also work.
[php]if (count(array_intersect(array($needle), $haystack))>0)
echo(“Method C : needle ” . $needle . ” found in haystack
With these different methods in place, I executed them against the same $haystack and $needle and the results were clear :
[php]Method A : 0.003180980682373 seconds
Method B : 0.0000109672546 seconds
Method C : 0.045687913894653 seconds[/php]
Method B wins! Keep in mind that this only really becomes interesting with very large data sets. For those of you wondering how long it took to re-arrange the haystack for Method B to use, the answer is 0.025528907775879 seconds.
In this experiment, determining if 100,000 strings are or are not in the data set went from 318.098 seconds with in_array() to 1.1222 seconds using isset(). That’s pretty decent.