Optimal coverage for Wordle with Monte Carlo methods - Part II

Marton Trencseni - Fri 21 January 2022 - Machine Learning

Introduction

In the previous article, we saw that the simplest brute-force approach, when run in parallel, will yield 21-22-letter-unique solutions to Wordle. Let's look at simple ways to get 23-24-letter-unique solutions. The ipython notebook is up on Github.

Prune the dictionary

If we want to get a lot of unique letters, we need unique letters in the dictionary words. For example, HELLO is wasteful in the sense that it already has a duplicate. So let's only keep words where all 5 letters are unique. While we do this, let's a build a letter-to-word ltw index, which contains a list of words which contain each letter:

ltw = defaultdict(set) # letter-to-word index
for word in words:
    if len(set(list(word))) == len(word): # make sure all 5 letters are unique
        for letter in word:
            ltw[letter].add(word)

Two-stage Monte-Carlo

Let's take the following approach:

  1. Select 5 unique random letters.
  2. For each letter, take a random word to generate a 5-word candidate wordlist.
  3. If the number of unique letters is less than 22, go to 1.
  4. Else, this looks promising, go to stage 2, then go to 1.
  5. Stage 2: Given the promising wordlist, try to improve it. Select a letter which is not currently in the wordlist, and replace one of the words with a word that contains this missing letter. If the new wordlist is better than the old, continue stage 2 with that. Perform this stage 2 search a million (or more) times.

In code:

def solve_wordle(num_tests, random_seed, num_loops):
    # stage 1:
    seed_solutions = generate_seed_solutions(num_tests, random_seed)
    # stage 2:
    solutions = run_improve_loops(deepcopy(seed_solutions), num_loops)
    return solutions

def generate_seed_solutions(num_tests, random_seed):
    random.seed(random_seed * 131071)
    seed_solutions = []
    for _ in range(num_tests):
        random_letters = random.sample(ltw.keys(), 5) # 5 random letters
        wordlist = sorted([random.sample(ltw[letter], 1)[0] for letter in random_letters])
        letters = set(list(''.join(wordlist)))
        if len(letters) >= 22 and wordlist not in seed_solutions:
            seed_solutions.append(wordlist)
    return seed_solutions

def run_improve_loops(seed_solutions, num_loops):
    loop_wordlist, solutions = seed_solutions, []
    for i in range(num_loops):
        additionals = []
        for j, wordlist in enumerate(loop_wordlist):
            new_wordlists, sub_solutions = generate_wordlists(wordlist)
            additionals.extend(new_wordlists)
            solutions.extend(sub_solutions)
        loop_wordlist = clean_solutions(additionals)
    return clean_solutions(solutions)

def generate_wordlists(wordlist):
    new_wordlists, solutions = [], []
    all_letters = set(list(string.ascii_lowercase))
    letters = set(list(''.join(wordlist)))
    remaining_letters = all_letters - letters
    for letter in remaining_letters:
        for new_word in ltw[letter]:
            for old_word in wordlist:
                new_wordlist = sorted(list(set(wordlist) - set([old_word]) | set([new_word])))
                new_letters = set(list(''.join(new_wordlist)))
                if len(new_letters) >= len(letters):
                    new_wordlists.append(new_wordlist)
                if len(new_letters) >= 24 and new_wordlist not in solutions:
                    solutions.append(new_wordlist)
    return new_wordlists, solutions

We can run this independently in parallel:

n_jobs = 16
num_tests = 10*1000*1000
solutions = Parallel(n_jobs=n_jobs)(delayed(solve_wordle)
                        (num_tests=num_tests, random_seed=i, num_loops=2)
                        for i in range(n_jobs))
solutions = clean_solutions(flatten_list(solutions))

Yields in 33 minutes of runtime:

24-unique-letter solution: ['ablet', 'crunk', 'jimpy', 'vozhd', 'waqfs']
24-unique-letter solution: ['absit', 'fling', 'jumpy', 'vozhd', 'wreck']
...
24-unique-letter solution: ['jumps', 'qubit', 'vozhd', 'wreck', 'xylan']
24-unique-letter solution: ['micky', 'pelfs', 'twang', 'urbex', 'vozhd']
Found 1863 solutions...

So this approach yields about one 24-letter-unique solution per second.

Conclusion

In the next article, I will show a Monte-Carlo approach which finds the jackpot, 25-unique-letter solution(s) in a few minutes.