Programming the Number Plate Game

by Hamilton Chapman


Last Summer I went to T in the Park, a music festival in Kinross, Scotland. My trip up to Scotland was split into two parts due to me having my graduation on the first day of the festival. However, on the way back home I was able to get a lift with some friends in a car. Travelling from Scotland down to Guildford takes a good long while, about 10 hours in our case, and as you can imagine things can start to get a bit boring.

One of the games we played while driving down was new to me. It was cleverly referred to as "The Numberplate Game". It involves looking out of the front of the car and reading the last three letters on a car's numberplate, for example you'd take DCD from HM09 DCD. With these letters you're then tasked with thinking of a word that contains all of the letters in that order. The letters don't necessarily have to be adjacent nor does the first letter of the word have to be the first letter of the three from the numberplate.

Continuing on with HM09 DCD a valid answer would be DECIDE. Equally you could go with UNDECIDED. Everyone in the car enjoyed the game and there was some healthy competition between us all.

When I got home I asked the rest of my family if any of them had played the game before and none of them had. At the time I was teaching myself to do some programming using Objective-C and so I thought it would be a fun little practice app to make.

At the core of the game is the ability to determine whether or not a given answer is acceptable given the three letters from the numberplate. It's simple enough to code this up. My first attempt was something roughly like this:

bool isCharInString(NSString *charToCheck, NSString *stringToCheckForChar)
{
  return [stringToCheckForChar rangeOfString:charToCheck options:NSCaseInsensitiveSearch].location != NSNotFound;
}

bool guessContainsValidTriple(NSString *randomTriple, NSRange firstLetter, NSRange secondLetter, NSRange thirdLetter, NSString *guess)
{
 bool validGuess = false;
 NSString *firstLetterOfTriple = [randomTriple substringWithRange:firstLetter];
 NSString *secondLetterOfTriple = [randomTriple substringWithRange:secondLetter];
 NSString *thirdLetterOfTriple = [randomTriple substringWithRange:thirdLetter];

 if(isCharInString(guess, firstLetterOfTriple))
 {
   NSInteger firstLetterLocation = [guess rangeOfString:[randomTriple substringWithRange:firstLetter] options:NSCaseInsensitiveSearch].location;

   if(firstLetterLocation == [guess length] - 1)
   {
     return validGuess;
   }

   NSRange rangeForSecondLetter = {firstLetterLocation + 1, [guess length] - (firstLetterLocation + 1)};       
   NSString *stringToCheckForSecondLetter = [guess substringWithRange:rangeForSecondLetter];

   if(isCharInString(stringToCheckForSecondLetter, secondLetterOfTriple))
   {
     NSInteger secondLetterLocation = [stringToCheckForSecondLetter rangeOfString:[randomTriple substringWithRange:secondLetter] options:NSCaseInsensitiveSearch].location + firstLetterLocation + 1;
     if(secondLetterLocation == [guess length] - 1)
     {
       return validGuess;
     }

     NSRange rangeForThirdLetter = {secondLetterLocation + 1, [guess length] - (secondLetterLocation + 1)};
     NSString *stringToCheckForThirdLetter = [guess substringWithRange:rangeForThirdLetter];

     if(isCharInString(stringToCheckForThirdLetter, thirdLetterOfTriple))
     {
       validGuess = true;
     }
   }
 }
 return validGuess;
}

Not pretty. And not very fast. Not at all. Speed isn't a problem when you're just checking if one word contains a given set of three letters. However, I realised that if I wanted to make the game usable and fun to play then I'd want to have an idea of how many occurrences there are of each set of three letters in the English dictionary, if any. You don't want to ask a player to try and come up with a word that contains three letters that are never actually found in the given order in a word in the dictionary. I didn't even know if such a set of letters existed. Moreover, knowing the number of occurrences would allow me to implement some concept of difficulty.

At the same time as all of this I decided that it would be fun to be able to do it with pairs of letters as well as sets of 4 letters. Even with the pairs of letters I still wanted to know how frequently they were found in the dictionary. It's easy to guess whether words containing ES are more common than words containing ZJ, but by what magnitude?

I set up the code to loop through all of the possible combinations of sets of 2, 3, and 4 letters and had my code then cycle through all of the words in the dictionary for each set of letters and check whether the word contained the letters. I started running the code and waited. I could see my CPU at 100%. Had it hung? Was there an infinite loop? Maybe it was just taking a while? I let it run a bit longer until about half an hour had passed. CPU still at 100%.

I decided to stop it and then put in some logging to see where it had gotten up to as it ploughed through the work.

It was slow. So slow. Out of curiosity I let it run to completion and could see from the logs that it had taken just shy of 30 hours to complete. Not ideal. Although given the numbers involved it didn't seem too ridiculous. 26 letters in the alphabet, about 250,000 words in the dictionary I had. That made about 26 * 26 * 250,000 = 169,000,000 passes just for the pairs of letters. In total it meant about 118,807,000,000 runs through the loops. I needed to speed it up.

I tried bolting on some changes to the previous attempt and then chopped bits up and moved them around but it felt like it was getting too cumbersome. I decided to just rewrite it with the same basic ideas powering it, but with some tweaks here and there. Oh, and it was generalised so that it would work with sets that contained any number of letters. This is what it ended up looking like.

bool isCharInString(NSString *charToCheck, NSString *stringToCheckForChar)
{
  return [stringToCheckForChar rangeOfString:charToCheck options:NSCaseInsensitiveSearch].location != NSNotFound;
}

bool guessContainsValidSetOfLetters(NSString *guess, NSString *setOfLetters)
{
  bool validGuess = false;

  NSInteger numOfLettersInSet = setOfLetters.length - (setOfLetters.length / 2);
  NSInteger numOfCurrentLetter = 1;
  NSString *stringToCheckForChar = guess;

  while(!validGuess && numOfCurrentLetter <= data-preserve-html-node="true" numOfLettersInSet)
  {
    NSRange letterToCheck = {2 * numOfCurrentLetter - 2, 1};
    NSString *charToCheck = [setOfLetters substringWithRange:letterToCheck];
    NSInteger letterLocationInWord = [guess rangeOfString:[setOfLetters substringWithRange:letterToCheck] options:NSCaseInsensitiveSearch].location;

    if(isCharInString(charToCheck, stringToCheckForChar))
    {
      if((letterLocationInWord == [guess length] - 1) && numOfCurrentLetter < numOfLettersInSet)
      {
        return validGuess;
      }
      NSRange rangeToCheckNext = {letterLocationInWord + 1, [guess length] - (letterLocationInWord+ 1)};
      stringToCheckForChar = [guess substringWithRange:rangeToCheckNext];

      if(numOfCurrentLetter == numOfLettersInSet)
      {
        validGuess = true;
      }
    }
    else
    {
      return validGuess;
    }
    numOfCurrentLetter++;
  }
  return validGuess;
}

A definite improvement. The generalisation was great, it was easier to read (and debug), but it wasn't really much quicker. There was a small bump, but nothing to make me feel satisfied.

Maybe regular expressions would help me speed things up? It seemed like I'd be able to cut out some code, which is always useful when you're looking for a speed boost.

I'll say it now; I'm by no means a master of regular expressions. This may well be a horrible way of using them but it made sense to me at the time.

NSRegularExpression *createRegexFromLetters(NSString *setOfLetters)
{
  NSString *regexString = @".*";
  for (NSString *letter in letters(setOfLetters))
  {
    regexString = [regexString stringByAppendingString:letter];
    regexString = [regexString stringByAppendingString:@".*"];
  }
  return [NSRegularExpression regularExpressionWithPattern:(@"%@", regexString)
   options:NSRegularExpressionCaseInsensitive
   error:nil];
}

bool wordContainsLetters(NSString *word, NSString *setOfLetters, NSRegularExpression *regex)
{
  return [regex numberOfMatchesInString:word options:0 range:NSMakeRange(0, [word length])] > 0;
}

Pretty compact. Pretty easy to follow. Pretty slow.

In fact SLOWER

Back to the drawing board.

How about using a recursive method instead of nesting for loops or using a while loop? I set about writing this method, again being conscious of performance, and ended up with the following.

NSInteger numberOfLettersInSet(NSString *setOfLetters)
{
  return [letters(setOfLetters) count];
}

NSArray *letters(NSString *setOfLetters)
{
  return [setOfLetters componentsSeparatedByString:@" "];
}

bool lettersAreInWord (NSString *word, NSString *setOfLetters)
{
  if(setOfLetters.length == 1)
  {
    return [word rangeOfString:setOfLetters options:NSCaseInsensitiveSearch].location != NSNotFound;
  }

  if([word rangeOfString:[setOfLetters substringToIndex:1] options:NSCaseInsensitiveSearch].location != NSNotFound)
  {
    NSString *newWord = [word substringFromIndex:[word rangeOfString:[setOfLetters substringToIndex:1] options:NSCaseInsensitiveSearch].location + 1];
    word = [word substringFromIndex:[word rangeOfString:[setOfLetters substringToIndex:1] options:NSCaseInsensitiveSearch].location];
    return lettersAreInWord(newWord, [setOfLetters substringFromIndex:2]);
  }
  else
  {
    return false;
  }
}

Nicely separated logic in this attempt and again it's fairly straightforward to follow what's going on. How did it fare in the speed test though?

Not too badly. It was quicker than the previous non-regex attempt and also a good deal quicker than the regex attempt. I was nearly happy to call it a day because it was just about quick enough for what I needed it to do such that it wasn't an annoyance, nor a real bottleneck.

I told my Dad about it though and saw that look on his face he gets when he senses there's a challenge afoot. He got me to explain exactly what I was trying to do and then he set out on writing some code to achieve the desired outcome. He spends most of his time programming in C#, C++ or straight C. He decided to write his attempt in C. Here's what he came up with.

unichar* FindLetterInSequence(unichar letterToFind, unichar *stringToSearch)
{
  unichar* foundLocation = NULL;
  while (*stringToSearch)
  {
    if (*stringToSearch == letterToFind)
    {
      foundLocation = stringToSearch;
      break;
    }
    else
    {
      ++stringToSearch;
    }
  }
  return foundLocation;
}

bool FindLettersInSequence(unichar *lettersToFind, unichar *stringToSearch)
{
  bool found = false;

  if (lettersToFind)
  {
    unichar *startPositionInStringToSearch = stringToSearch;
    found = true;
    while (*lettersToFind)
    {
      startPositionInStringToSearch = FindLetterInSequence(*lettersToFind, startPositionInStringToSearch);
      if (startPositionInStringToSearch)
      {
        ++startPositionInStringToSearch;
        ++lettersToFind;
      }
      else
      {
        found = false;
        break;
      }
    }
  }
  return found;
}

Easy to read, easy to follow, and it also happens to be really quick. At least compared to what I'd come up with using my variety of Objective-C based attempts.

I did a crude version of a benchmark for the best versions of each of the different types of solution; the fastest regex, recursive, and vanilla C solutions. I made each one loop through an array of all of the pairs of letters and count up the number of occurrences of each pair of letters in the dictionary. The results show how large the potential speed boosts can be.

  • Regular Expression: 8 minutes 54 seconds
  • Recursive: 2 minutes 7 seconds
  • Vanilla C: 13 seconds

So there you have my best solutions (and one of my Dad's!) to what is a pretty simple problem. If you can come up with a more efficient solution then please let me know!