How good a Data Scientist is GPT-3? - Part II

Marton Trencseni - Sat 03 September 2022 - gpt

In the previous blog post, I asked GPT-3 interview-level Data Science questions to gauge "how good a Data Scientist" it is. Here I continue this line of questioning, with questions that are about real life Data Science work I've done at previous jobs. These are still posed in the style of interview questions, but are a bit closer to how a Data Scientist might want to get help from GPT-3 on real Data Science work.

GPT-3

You can also talk to GPT-3, just create an account on OpenAI and go to the sandbox!

Product metrics

Q: Imagine you're a Product Analyst or Data Scientist working with a Product Manager at a social networking company like Facebook or Twitter. What kind of product metrics would you recommend the Product Manager look at?

A: There are many potential product metrics that a Product Manager could look at when working with a social networking company. Some examples of potential metrics could include:
- Number of active users
- Number of daily/weekly/monthly active users
- User engagement levels (e.g. time spent on site, number of page views, etc.)
- Number of new users
- Number of referrals
- Churn rate
- Demographics of users (e.g. age, gender, location, etc.)

Note: this is not a bad starting list. Having said that, demographics aren't metrics, but it's no big deal.

Q: Imagine you're a Product Analyst or Data Scientist working with a Product Manager at a social networking company like Facebook or Twitter. Give some examples of values of product metrics that might indicate a healthy or unhealthy product.

A: Some examples of values of product metrics that might indicate a healthy product are:
- The number of people using the product on a daily basis
- The number of people using the product on a monthly basis
- The number of people using the product on a yearly basis

Some examples of values of product metrics that might indicate an unhealthy product are:
- The number of people who have stopped using the product
- The number of people who have uninstalled the product
- The number of people who have given the product a low rating

Note: the first list is essentially worthless, the second one is good. Let's try the same question, but with more leading:

Q: Imagine you're a Product Analyst or Data Scientist working with a Product Manager at a social networking company like Facebook or Twitter. Give some examples of values of product metrics that might indicate a healthy or unhealthy product. For example, if DAU to MAU ratio is 1 in 30, that might indicate a product that is not sticky enough.

A: A healthy product metric might be something like DAU to MAU ratio being 1 in 10. This would indicate that the product is being used frequently enough. An unhealthy product metric might be something like a DAU to MAU ratio of 1 in 30. This would indicate that the product is not sticky enough and people are not using it as frequently as they should be.

Note: the answer is worthless mumbo-jumbo.

Q: Imagine you're a Product Analyst or Data Scientist working with a Product Manager. Your team owns a mobile app, imagine something like Twitter or Facebook. You are designing the Monthly Active User of MAU metric, which you will compute on a rolling lookback window basis. In other words, every day the metric counts the number of unique users in the last X days. You have a choice of picking X, you're options are 28 days, 30 days, 31 days. Which one would you pick and why?

A: 31 days. This is the longest month, so it will capture the most data.

Note: in a Product Analytics setting, usually the right number to pick is 28. Let's try the same question, but with more leading:

Q: Imagine you're a Product Analyst or Data Scientist working with a Product Manager. Your team owns a mobile app, imagine something like Twitter or Facebook. You are designing the Monthly Active User of MAU metric, which you will compute on a rolling lookback window basis. In other words, every day the metric counts the number of unique users in the last X days. You have a choice of picking X, you're options are 28 days, 30 days, 31 days. Why would we want to pick 28 over the others?

A: There are a few reasons why we might want to pick 28 days over the other options:
1. 28 days is a multiple of 7, which means that the metric will always count the same number of weekends. This can be helpful if we want to compare MAU week over week and don't want the metric to be skewed by differences in the number of weekends.
2. 28 days is a common month length, so it will be easy for people to understand and compare the MAU metric to other metrics that are also computed on a monthly basis.
3. 31 days is the longest possible month length, so choosing 28 days instead will help to reduce noise in the data.

Note: 1. is the expected answer here, and the reason internally we usually use a 28 day MAU definition. 2 and 3 are are false/mumbo-jumbo.

Checkout counters

Q: Imagine you are a Data Scientist. You want to build a model to recommend how many checkout counters should be open at a hypermarket, for upcoming days, in one hour slots. How would you go about it?

A: I would first look at the data to see if there are any patterns that can be observed. For example, are there more people shopping on certain days of the week or at certain times of the day? I would also look at the data to see how many people are typically checking out at the same time and see if there are any bottlenecks. Once I have a good understanding of the data, I would build a model that takes into account the number of people shopping and the number of people checking out at the same time.

Q: Imagine you are a Data Scientist. You want to build a model to recommend how many checkout counters should be open at a hypermarket, for upcoming days, in one hour slots. How would you build the model, what parts would your model pipeline consist of?

A: The model pipeline would consist of the following parts:
1. Data cleaning and preparation: This would involve cleaning the data set to remove any invalid or missing data points.
2. Data exploration: This would involve exploring the data set to identify any patterns or trends.
3. Model training: This would involve training a model to predict the number of checkout counters that should be open for upcoming days, in one hour slots.
4. Model evaluation: This would involve evaluating the trained model to see how accurate it is in predicting the number of checkout counters that should be open for upcoming days, in one hour slots.

Note: the answer is not incorrect, but very generic. Also, what it describes are phases of a Data Scientist's work, but not the stages of a model pipeline.

Q: Imagine you are a Data Scientist. You want to build a model to recommend how many checkout counters should be open at a hypermarket, for upcoming days, in one hour slots. As a first step, you need to forecast the number of future checkouts, given historic data going back to 2018. How would you do this, given that we also had Covid-19 happening starting in 2020 May?

A: There are a few ways to approach this problem. One way would be to build a time series model that takes into account the effects of Covid-19 on the number of checkouts. Another way would be to build a model that predicts the number of checkouts based on a variety of factors, including the time of day, the day of the week, and whether or not there is a Covid-19 pandemic.

Note: while not incorrect, the answer contains no useful information.

Q: Imagine you are a Data Scientist. You want to build a model to recommend how many checkout counters should be open at a hypermarket, for upcoming days, in one hour slots. As a first step, you need to forecast the number of future checkouts, given historic data going back to 2018. How would you do this, given that we also had Covid-19 happening starting in 2020 May? Specifically, assume you're using Python and Scikit Learn.

A: There are many ways to approach this problem, but one possible approach would be to use a time series forecasting model such as SARIMAX. This model can take into account seasonality and trends in the data, as well as exogenous variables such as Covid-19.

Q: Imagine you are a Data Scientist. You want to build a model to recommend how many checkout counters should be open at a hypermarket, for upcoming days, in one hour slots. In this problem, how would you use an M/M/1 model from queueing theory?

A: If you were using an M/M/1 model from queueing theory, you would need to know the arrival rate of customers and the service rate of the checkout counters. With this information, you would be able to calculate the average number of customers in the system, the average time a customer spends in the system, and the average number of customers in the queue. From this, you would be able to recommend how many checkout counters should be open in order to minimize the average time a customer spends in the system.

Receipt scanning

Q: Imagine you're a data scientist. We want to build a model to parse out store, date, time and total amount from a picture of a receipt. How would you go about it?

A: There are a few ways to tackle this problem:
1. Use Optical Character Recognition (OCR) to extract the text from the image. This can be a difficult problem on its own, but there are many commercial and open-source OCR libraries available.
2. Use a trained machine learning model to classify the text in the image. This would require a lot of training data, but could be more accurate than OCR.
3. Use a combination of OCR and heuristics to parse out the relevant information. For example, you could use OCR to extract all the text from the image and then use heuristics to look for patterns that match store names, dates, times, and total amounts.

Q: Imagine you're a data scientist. We want to build a model to parse out store, date, time and total amount from a picture of a receipt. Suppose you have an OCR program that can extract the text in the image. How would you use it to build this model?

A: There are a few ways to build this model. One way would be to use a regular expression to parse out the store, date, time and total amount from the text. Another way would be to use a machine learning algorithm to learn the patterns in the data and then predict the store, date, time and total amount from new data.

Note: the fact that it recommends regular expressions is very cool. When I was working on this problem, the baseline version I was improving on, built by a previous team, was using regular expressions!

Q: Imagine you're a data scientist. We want to build a model to parse out store, date, time and total amount from a picture of a receipt. Suppose you have an OCR program that can extract the text in the image. What kind of Machine Learning libraries would you use here?

A: There are many ways to approach this problem, but some common libraries for machine learning include TensorFlow, scikit-learn, and Keras.

Note: since it recommends scikit-learn, let's ask about it.

Q: Imagine you're a data scientist. We want to build a model to parse out store, date, time and total amount from a picture of a receipt. Suppose you have an OCR program that can extract the text in the image. Assuming you're using scikit-learn to solve this problem, which models would you use, and how?

A: There are a few ways to approach this problem. One way would be to use a regex model to extract the relevant information from the text. Another way would be to use a series of scikit-learn models to extract the store, date, time, and total amount separately.

Note: the answer is not specific to scikit-learn.

Q: Imagine you're a data scientist. We want to build a model to parse out store, date, time and total amount from a picture of a receipt. Suppose you have an OCR program that can extract bounding boxes and the text in each bounding box from the image. How would you use this to solve the problem?

A: There are a few different ways to solve this problem. One way would be to use the OCR program to extract the text in each bounding box, and then use a regular expression to parse out the store, date, time and total amount from the text. Another way would be to use the OCR program to extract the bounding boxes, and then use a computer vision algorithm to parse out the store, date, time and total amount from the image.

Note: the answer is not very useful. A more useful answer would be about using the positions and alignments of the bounding boxes to predict which contains useful information, or adding up individual amounts to increase confidence about total amount.

Conclusion

In this round, GPT-3 did significantly worse than on the previous, more straightforward interview questions. Based on these sessions, I don't think it's valuable as an assistant for Data Scientists in real world settings.