Optimal coverage for Wordle with Monte Carlo methods - Part III

Marton Trencseni - Sat 22 January 2022 - Machine Learning

Introduction

In the previous article, we saw that a relatively simple Monte Carlo method can find a lot of 24-letter-unique wordlists, about 1 per second running at 16x parallelism. Here I use a crucial insights from the english language to make the search more efficient, and find a 25-letter-unique Worlde wordlist. The ipython notebook is up on Github.

Vowels

The key insight came from my friend and collegaue Rameez Kakodker, who introduced me to Wordle. We were discussing the methods for solving this problems, when he said that vowels are likely to duplicate. This led me to the following idea: to get a 25-unique-solution, we need 5 words with 25 unqiue letters. But if a word has two (or more) vowels, like BRAVE, it is very unlikely that this word would be in such a wordlist, because it contains both A and E. But there are only 5+1 vowels in english: AIEUO+Y, so ignoring Y, if a word uses up 2 vowels, like BRAVE, then we won't have enough vowels left for all the 5 words.

So the idea is:

  1. Like before, prune the vowel wordlist, and keep only words which have 5 unique letters.
  2. Further prune the wordlist, and keep only words that have 1 vowel. So eg. BRAVE and ALIEN is pruned, but STICK is kept.
  3. When searching for the 5 words, always pick 5 words (from the above pruned words) which contain one of the 5 vowels AIEUO. In other words, treat the 5 word list like 5 slots, and always pick an A-word for the first slot, always pick an I-word for the second slot, and so on, eg. ['bawty', 'crimp', 'fjeld', 'vughy', 'zonks']

Step 1 and 2 in code:

vowels = sorted(['a', 'e', 'i', 'o', 'u'])

ltw = defaultdict(set) # letter-to-word index
for word in words:
    if len(set(list(word))) == len(word): # make sure all 5 letters are unique
        if len(set(vowels) & set(list(word))) == 1: # make sure there is exactly 1 vowel in the word
            for letter in word:
                ltw[letter].add(word)
len(words), sum([len(v) for _, v in ltw.items()])

For step 3, we will use the same method as before: randomly generate wordlists, and if we find one that is promising (22 or more unique letters), then stick to that and try a few variations:

def solve_wordle(num_tests=1000, num_improve_attempts=1000, random_seed=0):
    random.seed(random_seed * 131071)
    solutions = []
    for _ in range(num_tests):
        wordlist = [random.sample(ltw[vowel], 1)[0] for vowel in vowels]
        letters = set(list(''.join(wordlist)))
        if len(letters) >= 22:
            for _ in range(num_improve_attempts):
                v = random.randint(0, len(vowels) - 1) # pick out one of the vowels randomly
                word = random.sample(ltw[vowels[v]], 1)[0] # pick a new word which contains this vowel
                new_wordlist = deepcopy(wordlist)
                new_wordlist[v] = word
                new_letters = set(list(''.join(new_wordlist)))
                if len(new_letters) > len(letters):
                    wordlist, letters = new_wordlist, new_letters
            if len(letters) >= 24:
                print(len(letters), wordlist)
                solutions.append(wordlist)
    return solutions

The rest of the code is as before, using joblib to run in parallel:

def flatten_list(li):
    return [item for sublist in li for item in sublist]

def clean_solutions(solutions):
    return [t.split(',') for t in sorted({','.join(sorted(c)) for c in solutions})]

# single-threaded:
# solve_wordle(num_tests=10*1000*1000, num_improve_attempts=1*1000*1000)

# multi-threaded:
n_jobs = 16
num_tests = 10*1000*1000
num_improve_attempts = 100*1000
solutions = Parallel(n_jobs=n_jobs)(delayed(solve_wordle)
                                    (num_tests=num_tests,
                                     num_improve_attempts=num_improve_attempts,
                                     random_seed=i*random.randint(0, 100*1000))
                                    for i in range(n_jobs))
solutions = clean_solutions(flatten_list(solutions))
for wordlist in solutions:
    num_unique_letters = len(set(list(''.join(wordlist))))
    print(f'{num_unique_letters}-unique-letter solution: {wordlist}')
    if num_unique_letters == 25:
        print('*** JACKPOT! ***')
print(f'Found {len(solutions)} solutions...')

On my 12-core AMD system, after running it for 13 minutes, I got:

24-unique-letter solution: ['backs', 'flint', 'grews', 'jumpy', 'vozhd']
24-unique-letter solution: ['balky', 'cinqs', 'grump', 'vozhd', 'wheft']
...
24-unique-letter solution: ['brick', 'flews', 'jumpy', 'thanx', 'vozhd']
25-unique-letter solution: ['brick', 'glent', 'jumpy', 'vozhd', 'waqfs']
*** JACKPOT! ***
24-unique-letter solution: ['brick', 'gulpy', 'meynt', 'vozhd', 'waqfs']
24-unique-letter solution: ['bring', 'chomp', 'junky', 'veldt', 'waqfs']
...
24-unique-letter solution: ['jumby', 'knelt', 'pyric', 'vozhd', 'waqfs']
24-unique-letter solution: ['jumby', 'prong', 'veldt', 'wakfs', 'zilch']
Found 249 solutions...

Jackpot!

The following wordlist has 25 unique letters: ['brick', 'glent', 'jumpy', 'vozhd', 'waqfs']. The english alphabet has 26 letters, the one missing letter is x. Playing these 5 words in Wordle will usually reveral a lot of letters from the solutions, and probably a few positions. For example, for today's Wordle:

World

So, without any thinking, we already know the 5 letters (ICENW), and 2 positions (W**C*). The word is WINCE:

World

Words

3 of the 5 words are unknown to me, 2 of them are not really english:

  1. glent: to move quickly especially in an oblique direction
  2. vozhd: a Russian leader (the word is russian, and means "to lead")
  3. waqfs: plural of waqf (probably from arabic), a Muslim religious or charitable foundation created by an endowed trust fund

Conclusion

I find it pleasing that hard problems can be solved on a modern desktop computer by brute-forcing with some minimal insights and minor optimizations.