[PHP] Faster array lookup than using in_array()

If you use arrays in PHP, one of the most common tasks you’ll find yourself doing is determining if Item A is in Array X. The function you would probably use in this case is PHP’s in_array.

[php]bool in_array ( mixed $needle , array $haystack [, bool $strict = FALSE ] )[/php]

This function works great and I recommend sticking to it when it makes sense. However, when you’re dealing with a very large haystack and need to run in_array() on thousands of values, you’ll discover that in_array isn’t particularly fast when cumulated over thousands of calls. Having recently run into this situation, I set up a little experiment to try two different approaches to in_array().

The haystack in my experiment was an array containing 60,000 strings that were 50 characters in length as values.

[php]$arr = array(“String1″,”String2″,”String3”, etc…)[/php]

The needle was a string of 50 characters.

Method A – Using in_array()

[php]if (in_array($needle, $haystack))
echo(“Method A : needle ” . $needle . ” found in haystack

Method B – Using isset()
Basically, I reformatted the haystack so that the values of my original array became keys instead and the new value for each key was set to 1.

[php]foreach(array_values($haystack) as $v)
$new_haystack[$v] = 1;[/php]

So my haystack became :

[php]$arr[“String1”] = 1;
$arr[“String2”] = 1;
$arr[“String3”] = 1;

Then, all you need to do is look up the key:

[php]if (isset($haystack[$needle]))
echo(“Method B : needle ” . $needle . ” found in haystack

Method C – Using array_intersect()
When all you really need to know is if needle is in haystack, using array_intersect() can also work.

[php]if (count(array_intersect(array($needle), $haystack))>0)
echo(“Method C : needle ” . $needle . ” found in haystack

With these different methods in place, I executed them against the same $haystack and $needle and the results were clear :

[php]Method A : 0.003180980682373 seconds
Method B : 0.0000109672546 seconds
Method C : 0.045687913894653 seconds[/php]

Method B wins! Keep in mind that this only really becomes interesting with very large data sets. For those of you wondering how long it took to re-arrange the haystack for Method B to use, the answer is 0.025528907775879 seconds.

In this experiment, determining if 100,000 strings are or are not in the data set went from 318.098 seconds with in_array() to 1.1222 seconds using isset(). That’s pretty decent.

32 thoughts on “[PHP] Faster array lookup than using in_array()”

  1. This is lightning fast, love it!

    But I think you mixed up your variable names in Method B, here’s what I did:
    $arrA = array(); /* fill $arrA like with only key, no value */

    $arrB = array();
    foreach ( $arrA as $key ) $arrB[$key] = 1;

    foreach( $arrA as $key ) {
    if ( !isset($arrB[$key]) ) { echo “$key not in $arrB!”; }

  2. How you count time results

    i make similar test – and in_array – really more faster than others!!!!
    $a = [];
    $mc_default = [];
    $mc_my = [];

    $d = 'nonexisted key';

    $testcnt = 20;

    function myin_array($val, array $arr)
    $newarr = array_flip($arr);
    return isset($newarr[$val]);

    public function randomString($len = 10)
    $key_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
    $num_chars = strlen($key_chars);
    $key = '';
    for ($i = 0; $i < $len; $i++) {
    $key .= substr($key_chars, rand(1, $num_chars) – 1, 1);

    return $key;

    for ($i = 0; $i < 100000; $i++) {
    $len = 50;
    $a[] = \insolita\things\helpers\Helper::randomString($len,true);

    for ($i = 0; $i < $testcnt; $i++) {
    $start = microtime(true);
    $c = myin_array($d, $a);
    $end = microtime(true);
    $mc_my[] = ($end – $start);

    for ($i = 0; $i < $testcnt; $i++) {
    $start = microtime();
    $c = in_array($d, $a);
    $end = microtime();
    $mc_default[] = ($end – $start);

    $def_avg = array_sum($mc_default) / count($mc_default);
    $my_avg = array_sum($mc_my) / count($mc_my);

    echo 'Defaults (in_array)=’ . implode(‘, ‘, $mc_default) . ‘ Avg: ‘ . $def_avg;
    echo ‘My func=’ . implode(‘, ‘, $mc_my) . ‘ Avg: ‘ . $my_avg;


    And my results look such as
    Defaults (in_array)=0.003767, 0.00376, 0.003816, 0.003758, 0.0038010000000001, 0.003768, 0.004038, 0.003969, 0.0038659999999999, 0.003829, 0.00381, 0.003772, 0.00376, 0.003746, 0.003783, 0.00381, 0.003758, 0.0037470000000001, 0.003772, 0.004051
    Avg: 0.00381905

    My func=0.02696418762207, 0.023251056671143, 0.024040937423706, 0.023049116134644, 0.023766040802002, 0.023875951766968, 0.025014162063599, 0.02332615852356, 0.023674011230469, 0.023051977157593, 0.023787021636963, 0.023398876190186, 0.023919105529785, 0.023699045181274, 0.023690938949585, 0.023056983947754, 0.023715019226074, 0.024850130081177, 0.023258924484253, 0.023529052734375
    Avg: 0.023845934867859

    php -v
    PHP 5.5.9-1ubuntu4.4 (cli) (built: Sep 4 2014 06:57:30)
    Linux Mint 32-bit 4Gb RAM
    php as mod_apache without any optimizers and cachers

    1. Hi Donna,

      The times you recorded for the default PHP “in_array” function seem on par with what I had found. However, your custom method is very slow which leads me to believe that perhaps array_flip is doing something funky behind-the-scenes. In my test, I remember I explicitly set each key to 1 – no array_flip.

      I’ll try this again as soon as I get some free time.

      By the way, I’m pretty sure I was using PHP 5.3.10 during my experiment.

    2. Yeah, the reason why your myin_array() function here is slower, is beacause you’re calling the array_flip() function every time inside this function.

      So basically you flip the same array 20 times in this case. If you’re gonna pull the array_flip() outside of this function and call it just once, the results should be a lot faster.

  3. Thanks, nice.

    Small correction though, “Basically, I reformatted the haystack so that the keys became values and I set the value to 1.”
    Actually you’re turning values to keys!

  4. Awesomepants. Your tip just made my script run about 50x faster. (I was regularly having to compare a list of 300,000 email addresses against a list of up to 300,000 email addresses that had already been contacted…)

  5. I regularly have to determine whether up to 200.000 “needles” are present in a “haystack” of 2 million strings. This tip (combined with the one from Sven) made my script gain 50% (2 times faster). Thanks!

  6. I’m having to work with 90k+ rows of data, and in_array was so slow the script wouldn’t execute. I used Method B and it saved the day. Thanks!

  7. Well, if you include rearranging time in method B. The total time becomes more than method A. Although in last time you have not considered it considering 100,000 records.

    1. Method B, in theory, after the initial key/value flip, should be O(1), since it’s a hashmap. Method A is a sequential search I’d assume, so it’s O(n). I’m not sure about Method C, but I think it’s O(2n).

  8. Thanks so much, i think i save a tons of time now, i going from 11,5 min on 27k rows to go down to around 80 sec, for the same script, in this script i make a lots of db lookups and orther kind of stuff, so thanks so much for this tip! 🙂

  9. Hi, nice approach! I would appreciate if anyone could help with the following array search:
    I have a key=>value pair array coming from MySQL db. It looks like this:

    Array1 ( [0] => Array ( [id] => 1 [name] => String A ) [1] => Array ( [id] => 2 [name] => String B) [2] => Array ( [id] => 3 [name] => String C) [3] => Array ( [id] => 4 [name] => String D) [4] => Array ( [id] => 5 [name] => String E) ……. )

    I use it to populate a Select combo. I have another array that also comes from DB and it looks like this:

    Array2 ( [0] => 3 [1] => 4 [2] => 5 ) = this are the options user have selected before.

    Now user may view his profile, so I want to show his options selected, so for every element on Array1 I want to check if it exists in Array2. I’m trying the following:

    foreach($Array1 as $value) {
    $find = $value[‘id’];
    if (array_search($find, $Array2)) {
    echo “Great, your element was found”;

    The problem is it finds only elements 4 and 5. Element 3 is not found! Any ideas?

  10. Great tip, but only if you have to search the same haystack several times, otherwise method A is faster since rearranging the haystack takes up quite some time.

  11. Awesome, it’s like night and day with the several thousand items I need to compare to a 45,000-record array. It was timing out in 500 seconds, but now runs in under 2 seconds.

    As others have stated, this method is only necessary for huge arrays being called repeatedly.

Leave a Reply

Your email address will not be published. Required fields are marked *