Optimal coverage for Wordle with Monte Carlo methods - Part II

Marton Trencseni - Fri 21 January 2022 - Machine Learning

Introduction

In the previous article, we saw that the simplest brute-force approach, when run in parallel, will yield 21-22-letter-unique solutions to Wordle. Let's look at simple ways to get 23-24-letter-unique solutions. The ipython notebook is up on Github.

Prune the dictionary

If we want to get a lot of unique letters, we need unique letters in the dictionary words. For example, HELLO is wasteful in the sense that it already has a duplicate. So let's only keep words where all 5 letters are unique. While we do this, let's a build a letter-to-word ltw index, which contains a list of words which contain each letter:

ltw = defaultdict(set) # letter-to-word index
for word in words:
    if len(set(list(word))) == len(word): # make sure all 5 letters are unique
        for letter in word:
            ltw[letter].add(word)

Two-stage Monte-Carlo

Let's take the following approach:

Select 5 unique random letters.
For each letter, take a random word to generate a 5-word candidate wordlist.
If the number of unique letters is less than 22, go to 1.
Else, this looks promising, go to stage 2, then go to 1.
Stage 2: Given the promising wordlist, try to improve it. Select a letter which is not currently in the wordlist, and replace one of the words with a word that contains this missing letter. If the new wordlist is better than the old, continue stage 2 with that. Perform this stage 2 search a million (or more) times.

In code:

def solve_wordle(num_tests, random_seed, num_loops):
    # stage 1:
    seed_solutions = generate_seed_solutions(num_tests, random_seed)
    # stage 2:
    solutions = run_improve_loops(deepcopy(seed_solutions), num_loops)
    return solutions

def generate_seed_solutions(num_tests, random_seed):
    random.seed(random_seed * 131071)
    seed_solutions = []
    for _ in range(num_tests):
        random_letters = random.sample(ltw.keys(), 5) # 5 random letters
        wordlist = sorted([random.sample(ltw[letter], 1)[0] for letter in random_letters])
        letters = set(list(''.join(wordlist)))
        if len(letters) >= 22 and wordlist not in seed_solutions:
            seed_solutions.append(wordlist)
    return seed_solutions

def run_improve_loops(seed_solutions, num_loops):
    loop_wordlist, solutions = seed_solutions, []
    for i in range(num_loops):
        additionals = []
        for j, wordlist in enumerate(loop_wordlist):
            new_wordlists, sub_solutions = generate_wordlists(wordlist)
            additionals.extend(new_wordlists)
            solutions.extend(sub_solutions)
        loop_wordlist = clean_solutions(additionals)
    return clean_solutions(solutions)

def generate_wordlists(wordlist):
    new_wordlists, solutions = [], []
    all_letters = set(list(string.ascii_lowercase))
    letters = set(list(''.join(wordlist)))
    remaining_letters = all_letters - letters
    for letter in remaining_letters:
        for new_word in ltw[letter]:
            for old_word in wordlist:
                new_wordlist = sorted(list(set(wordlist) - set([old_word]) | set([new_word])))
                new_letters = set(list(''.join(new_wordlist)))
                if len(new_letters) >= len(letters):
                    new_wordlists.append(new_wordlist)
                if len(new_letters) >= 24 and new_wordlist not in solutions:
                    solutions.append(new_wordlist)
    return new_wordlists, solutions

We can run this independently in parallel:

n_jobs = 16
num_tests = 10*1000*1000
solutions = Parallel(n_jobs=n_jobs)(delayed(solve_wordle)
                        (num_tests=num_tests, random_seed=i, num_loops=2)
                        for i in range(n_jobs))
solutions = clean_solutions(flatten_list(solutions))

Yields in 33 minutes of runtime:

24-unique-letter solution: ['ablet', 'crunk', 'jimpy', 'vozhd', 'waqfs']
24-unique-letter solution: ['absit', 'fling', 'jumpy', 'vozhd', 'wreck']
...
24-unique-letter solution: ['jumps', 'qubit', 'vozhd', 'wreck', 'xylan']
24-unique-letter solution: ['micky', 'pelfs', 'twang', 'urbex', 'vozhd']
Found 1863 solutions...

So this approach yields about one 24-letter-unique solution per second.

Conclusion

In the next article, I will show a Monte-Carlo approach which finds the jackpot, 25-unique-letter solution(s) in a few minutes.