Optimal coverage for Wordle with Monte Carlo methods - Part II
Marton Trencseni - Fri 21 January 2022 - Machine Learning
Introduction
In the previous article, we saw that the simplest brute-force approach, when run in parallel, will yield 21-22-letter-unique solutions to Wordle. Let's look at simple ways to get 23-24-letter-unique solutions. The ipython notebook is up on Github.
Prune the dictionary
If we want to get a lot of unique letters, we need unique letters in the dictionary words. For example, HELLO
is wasteful in the sense that it already has a duplicate. So let's only keep words where all 5 letters are unique. While we do this, let's a build a letter-to-word ltw
index, which contains a list of words which contain each letter:
ltw = defaultdict(set) # letter-to-word index
for word in words:
if len(set(list(word))) == len(word): # make sure all 5 letters are unique
for letter in word:
ltw[letter].add(word)
Two-stage Monte-Carlo
Let's take the following approach:
- Select 5 unique random letters.
- For each letter, take a random word to generate a 5-word candidate wordlist.
- If the number of unique letters is less than 22, go to 1.
- Else, this looks promising, go to stage 2, then go to 1.
- Stage 2: Given the promising wordlist, try to improve it. Select a letter which is not currently in the wordlist, and replace one of the words with a word that contains this missing letter. If the new wordlist is better than the old, continue stage 2 with that. Perform this stage 2 search a million (or more) times.
In code:
def solve_wordle(num_tests, random_seed, num_loops):
# stage 1:
seed_solutions = generate_seed_solutions(num_tests, random_seed)
# stage 2:
solutions = run_improve_loops(deepcopy(seed_solutions), num_loops)
return solutions
def generate_seed_solutions(num_tests, random_seed):
random.seed(random_seed * 131071)
seed_solutions = []
for _ in range(num_tests):
random_letters = random.sample(ltw.keys(), 5) # 5 random letters
wordlist = sorted([random.sample(ltw[letter], 1)[0] for letter in random_letters])
letters = set(list(''.join(wordlist)))
if len(letters) >= 22 and wordlist not in seed_solutions:
seed_solutions.append(wordlist)
return seed_solutions
def run_improve_loops(seed_solutions, num_loops):
loop_wordlist, solutions = seed_solutions, []
for i in range(num_loops):
additionals = []
for j, wordlist in enumerate(loop_wordlist):
new_wordlists, sub_solutions = generate_wordlists(wordlist)
additionals.extend(new_wordlists)
solutions.extend(sub_solutions)
loop_wordlist = clean_solutions(additionals)
return clean_solutions(solutions)
def generate_wordlists(wordlist):
new_wordlists, solutions = [], []
all_letters = set(list(string.ascii_lowercase))
letters = set(list(''.join(wordlist)))
remaining_letters = all_letters - letters
for letter in remaining_letters:
for new_word in ltw[letter]:
for old_word in wordlist:
new_wordlist = sorted(list(set(wordlist) - set([old_word]) | set([new_word])))
new_letters = set(list(''.join(new_wordlist)))
if len(new_letters) >= len(letters):
new_wordlists.append(new_wordlist)
if len(new_letters) >= 24 and new_wordlist not in solutions:
solutions.append(new_wordlist)
return new_wordlists, solutions
We can run this independently in parallel:
n_jobs = 16
num_tests = 10*1000*1000
solutions = Parallel(n_jobs=n_jobs)(delayed(solve_wordle)
(num_tests=num_tests, random_seed=i, num_loops=2)
for i in range(n_jobs))
solutions = clean_solutions(flatten_list(solutions))
Yields in 33 minutes of runtime:
24-unique-letter solution: ['ablet', 'crunk', 'jimpy', 'vozhd', 'waqfs']
24-unique-letter solution: ['absit', 'fling', 'jumpy', 'vozhd', 'wreck']
...
24-unique-letter solution: ['jumps', 'qubit', 'vozhd', 'wreck', 'xylan']
24-unique-letter solution: ['micky', 'pelfs', 'twang', 'urbex', 'vozhd']
Found 1863 solutions...
So this approach yields about one 24-letter-unique solution per second.
Conclusion
In the next article, I will show a Monte-Carlo approach which finds the jackpot, 25-unique-letter solution(s) in a few minutes.