GPT News Poet: silly AI poems based on today's news

Marton Trencseni - Sun 07 May 2023 - Machine Learning

Introduction

I think one of the coolest (and most useless) uses of Large Language Models is generating poetry. This resonates strongly with me, because I could never write poetry of any quality, but I do appreciate silly 4-liners. It's live here!

ROC curve

The code is up on Github.

News

There are multiple sites to get news from. For this toy project I did not want to scrape, I wanted to use a site that has a nice API and already returns structured JSON. GNews (not Google News) is exactly like that, and it's free up to 100 requests/day, and this only takes 1 request/day.

The endpoint is:

https://gnews.io/api/v4/top-headlines?category={category}&lang=en&country=us&max=10&apikey={apikey}

It returns a list of JSON dictionaries that looks like this:

[
    {
        "title": "Loaf-size mission to improve hurricane forecasting is ready to launch",
        "description": "A new NASA mission called TROPICS, designed to improve hurricane forecasting, is ready to launch ahead of the June 1 arrival of the 2023 Atlantic hurricane season.",
        "content": "Sign up for CNN’s Wonder Theory science newsletter. Explore the universe with news on fascinating discoveries, scientific advancements and more.\nCNN —\nA new mission designed to improve hurricane forecasting is ready to launch, just ahead of the June ... [3592 chars]",
        "url": "https://www.cnn.com/2023/05/07/world/nasa-tropics-mission-launch-scn/index.html",
        "image": "https://media.cnn.com/api/v1/images/stellar/prod/230428155358-01-nasa-tropics-mission-042023.jpg?c=16x9&q=w_800,c_fill",
        "publishedAt": "2023-05-07T10:53:00Z",
        "source": {
        "name": "CNN",
        "url": "https://www.cnn.com"
        },
    },
    ...
,

GPT-3

ChatGPT-4 generates significantly better quality poems that GPT-3, but there is no official API yet for it. So I used GPT-3 for this toy project, specifically text-davinci-003. The prompt I use is very simple and mostly works:

Write a witty 4 line poem about the following news: {articles_text}

Since I'm a lazy programmer in the age of AI, I used ChatGPT-4 to generate the Python code to access GPT-3 over the OpenAI API. I chose to save the generated poems into the JSON returned above, and then dump the whole thing to disk. Generating the HTML for the website will be a second, completely removed step.

The complete code for the first step of the pipeline, which downloads the news and generates the poems is less than 50 LOC:

import json
import openai
import urllib.request
import urllib.parse
from datetime import datetime
import secrets

def gnews_top_news(category='general', q=None):
    apikey = secrets.gnews_apikey
    url = f"https://gnews.io/api/v4/top-headlines?category={category}&lang=en&country=us&max=10&apikey={apikey}"
    if q is not None:
        url += "&" + urllib.parse.urlencode({"q": q})
    with urllib.request.urlopen(url) as response:
        data = json.loads(response.read().decode("utf-8"))
        return data["articles"]

def query_gpt_35(prompt, model="text-davinci-003", max_tokens=4000):
    openai.api_key = secrets.openai_apikey
    response = openai.Completion.create(
        engine=model,
        prompt=prompt, 
        max_tokens=max_tokens-len(prompt),
        n=1,
        stop=None,
        temperature=0.8,
    )
    generated_text = response.choices[0].text.strip()
    return generated_text

articles = gnews_top_news()
for i, a in enumerate(articles):
    articles_text = f"{a['title']}. {a['description']}"
    poet_prompt = f"Write a witty 4 line poem about the following news: {articles_text}"
    for _ in range(5):
        response = query_gpt_35(poet_prompt)
        if len(response.split("\n")) == 4:
            break
    a['poem'] = response.strip()

directory = "articles/"
filename = f"articles-{datetime.now().strftime('%Y-%m-%d')}.json"
with open(directory + filename, "w") as f:
    json.dump(articles, f)

Frontend

At this point I just need a web front-end. Initially I was planning to write a Flask app which reads the above JSON dynamically and convert it to HTML, but then I realized it's easier to just generate static HTML with Jinja templates and serve that up. I again used ChatGPT-4 to generate skeleton HTML+JS+CSS code, which needed heavy editing to actually make it work. The Python code to generate the HTML is quite short:

import os
import re
import json
from datetime import date, datetime
from jinja2 import Environment, FileSystemLoader

def get_latest_articles(directory="/home/mtrencseni/gpt-news-poet/articles/"):
    pattern = re.compile(r'articles-\d{4}-\d{2}-\d{2}\.json')    
    files = [f for f in os.listdir(directory) if pattern.match(f)]
    files.sort(reverse=True)
    latest_file_path = os.path.join(directory, files[0])
    with open(latest_file_path, 'r') as f:
        return json.load(f)

env = Environment(loader=FileSystemLoader('templates'))
html = env.get_template('index.jinja.html').render(
    articles=get_latest_articles(),
    dt=date.today().strftime("%A, %B %d, %Y")
    )
directory = "www/archive/"
filename = datetime.now().strftime('%Y-%m-%d') + ".html"
with open(directory + filename, "w") as f:
    f.write(html)
directory = "www/"
filename = "index.html"
with open(directory + filename, "w") as f:
    f.write(html)

Other files:

Crontab

First, I wrote a quick shell script to run the above pipeline:

#!/usr/bin/env bash

venv/bin/python3 generate_articles.py
venv/bin/python3 generate_html.py

I then put a line in my crontab so the cron daemon runs the pipeline every day at 9AM NYC time (13 UTC):

0 13 * * * cd /home/mtrencseni/gpt-news-poet && /home/mtrencseni/generate.sh

Conclusion

I think it's very cool that you can generate reasonable quality poetry from a cronjob in 2023. Once ChatGPT-4 becomes available over the OpenAI API, the quality of the poems will be even better!

ROC curve