Testing GPT-4 spatial reasoning and comprehension

Marton Trencseni - Sun 21 May 2023 - gpt

Introduction

In an earlier post, I tested GPT-3 spatial reasoning abilities:

While on some tasks it did fairly well, I suspect that was due to the question being similar to training data. It did very poorly on the "people in a room" task, it couldn't answer a single question correctly, so my overall conclusion was that GPT-3 has not developed a sense for spatial reasoning. Let's see how GPT-4 (ChatGPT-4) does.

People in a room

Q: There is square room.
Alice is standing in the north-west corner.
Bob is standing in the south-west corner.
Charlie is standing in the south-east corrner.
David is standing in the north-east corner.
Ed is standing in the center of the room, looking at Alice.
How many people are there in the room?

A: There are five people in the room: Alice, Bob, Charlie, David, and Ed.

GPT-3

Q: Who is standing to the left of Ed?

A: If Ed is facing Alice who is in the north-west corner, then his left would be towards the west direction. Therefore, Bob, who is standing in the south-west corner, is to the left of Ed.

Q: Who is standing to the right of Ed?

A: If Ed is facing Alice who is in the north-west corner, then his right would be towards the north direction. Therefore, David, who is standing in the north-east corner, is to the right of Ed.

Q: Who is standing behind Ed?

A: If Ed is facing Alice who is in the north-west corner, then behind him would be the south direction. Therefore, Charlie, who is standing in the south-east corner, is behind Ed.

GPT-4 got all of them right. Very impressive! Let's spice it up:

Q: Now everybody goes around clockwise 90 degrees. Who is now in the north-west corner?

A: If everyone in the room moves 90 degrees clockwise, they would move as follows:
• Alice, originally in the north-west corner, would move to the north-east corner.
• Bob, originally in the south-west corner, would move to the north-west corner.
• Charlie, originally in the south-east corner, would move to the south-west corner.
• David, originally in the north-east corner, would move to the south-east corner.
• Ed, originally in the center, would remain in the center (assuming he also turns 90 degrees clockwise).
Therefore, Bob is now in the north-west corner.

Correct!

Clock faces

Can GPT-4 reason about the hands of a clock?

Q: It is half past eleven. What number is the minute hand of the clock pointing at?

A: The minute hand of a clock points to "6" when it is half past the hour. This is because a traditional analog clock is divided into 12 sections, and each section corresponds to 5 minutes. Therefore, when 30 minutes have passed, the minute hand points to the 6th section.

Note: correct.

Q: I have a watch. The hour hand is pointing to 11. The minute hand is at a right angle to it, on the right side. What number is the minute hand pointing at?

A: If the hour hand is pointing to 11, and the minute hand is at a right angle to the hour hand on the right side, the minute hand is pointing to 2. This corresponds to 10 minutes past the hour. Therefore, the time would be approximately 11:10.

Note: correct.

Q: I have a watch. The hour hand is pointing to 11:30. The minute hand is at a right angle to it, on the right side. What number is the minute hand pointing at?

A: If the hour hand is pointing to 11:30, it means it is halfway between 11 and 12. If the minute hand is at a right angle to the hour hand on the right side, the minute hand is pointing to approximately 3. This corresponds to 15 minutes past the hour. Therefore, the time would be approximately 11:15.

Note: not completely correct, but close; the minute hand is pointing at 2:30, the time is 11:12:30

Here GPT-3 also did fairly well on the first questions, but not on the later ones.

Triangles

Q: We have a triangle. One of the angles is 70 degrees, another is 80 degrees. What is the third angle?

A: To find the third angle of a triangle, we can use the fact that the sum of the angles in any triangle is always 180 degrees. Let's denote the third angle as x. We know that one angle is 70 degrees, another is 80 degrees, and the sum of all three angles is 180 degrees. We can set up the equation:
70 + 80 + x = 180
Simplifying the equation, we have:
150 + x = 180
To isolate x, we can subtract 150 from both sides:
x = 180 - 150
x = 30
Therefore, the third angle is 30 degrees.

Note: correct. It's impressive how GPT-4 derives the answer; unlike with GPT-3, where it just stated the answer, and I wasn't sure whether it's seen this exact question in the training data.

Q: We have a triangle. One of the angles is 70.51 degrees, another is 80.49 degrees. What is the third angle?

A: To find the third angle of a triangle when two angles are known, we can use the fact that the sum of the angles in any triangle is always 180 degrees. Let's denote the third angle as x. We know that one angle is 70.51 degrees, another is 80.49 degrees, and the sum of all three angles is 180 degrees. We can set up the equation:
70.51 + 80.49 + x = 180
Simplifying the equation, we have:
151 + x = 180
To isolate x, we can subtract 151 from both sides:
x = 180 - 151
x = 29
Therefore, the third angle is 29 degrees.

Note: correct.

Q: We have two identical right-angled triangles. We snap them together along their longest edges. What is the resulting shape?

A: When two identical right-angled triangles are snapped together along their longest edges, the resulting shape is a rectangle.

Note: correct.

Q: We have a very small triangle on a very large sphere. What is the sum of its angles, in degrees?

A: The sum of the angles of a triangle is always 180 degrees, regardless of the size or shape of the triangle. This property holds true for triangles on a sphere as well. Therefore, the sum of the angles of a very small triangle on a very large sphere would still be 180 degrees.

Note: partially correct; if the triangle is very small, the surface of the sphere is locally flat, so we can approximate with Euclidean geometry.

Q: We have a triangle on a sphere. What is the sum of its angles, in degrees?

A: On a sphere, the sum of the angles of a triangle is greater than 180 degrees. This property is known as the spherical excess or Gauss-Bonnet theorem. The formula for calculating the spherical excess (E) of a triangle on a sphere with angles A, B, and C (measured in radians) is:
E = A + B + C - π
To convert the angles from degrees to radians, we need to multiply them by π/180.
Let's say the angles of the triangle are A, B, and C (in degrees). The sum of the angles on the sphere is:
A + B + C - 180
Note that if the angles are given in radians, the sum of the angles would be:
A + B + C - π
So, in short, the sum of the angles of a triangle on a sphere is greater than 180 degrees (or π radians).

Note: correct; GPT-3 got this wrong.

Q: We have a triangle on a sphere. The sum of it's angles is 200 degrees. We scale up the sphere by 10x. What is the sum of the triangle's angles now?

A: When you scale up the sphere by 10x, the shape and proportions of the triangle on the sphere will also scale accordingly. However, the sum of the angles of the triangle will remain the same, regardless of the size of the sphere. Therefore, even after scaling up the sphere by 10x, the sum of the triangle's angles will still be 200 degrees.

Note: correct; GPT-3 got this wrong.

Unlike GPT-3, GPT-4 is able to correctly reason about triangles on a sphere. Very impressive!

Areas and volumes

Q: We have a white cube, each side is 1 unit long. We paint it red using 1 liter of paint.
We now have a white cube, each side is 2 units long. How many liters of paint do we need to paint it red?

A: The amount of paint needed is proportional to the surface area of the cube. The surface area of a cube is given by 6 times the square of the edge length (since a cube has six identical square faces).
1. For the cube with an edge length of 1 unit, the surface area is 6 * (1)^2 = 6 square units.
2. For the cube with an edge length of 2 units, the surface area is 6 * (2)^2 = 24 square units.
If it takes 1 liter of paint to cover 6 square units, then it should take 24/6 = 4 liters of paint to cover 24 square units. Therefore, you'd need 4 liters of paint to paint the larger cube.

Note: correct.

Q: We have a square. We cut up the square into pieces and rearrange the pieces to form a really long rectangle. Which has bigger area, the original square, or the really long rectangle?

A: The area of a shape remains the same regardless of how its pieces are rearranged. When you cut up a square and rearrange its pieces to form a rectangle, the total area remains the same. That's because area is a measure of the amount of "space" inside the shape, and cutting and rearranging the pieces doesn't create or remove any of this space. So the original square and the really long rectangle have exactly the same area.

Note: correct; GPT-3 got this wrong.

Q: There is a cube, and a sphere inside the cube. Which has bigger volume?

A: The volume of the cube is larger than the volume of the sphere inside it, provided the sphere is inscribed in the cube (meaning the diameter of the sphere is equal to the side of the cube).

Note: correct.

Q: The area of a unit square in 2D is 1, the area of the contained circle is Pi/4.
The volume of a unit cube in 3D is 1, the volume of the contained sphere is Pi/6.
What is the volume of the sphere contained in a unit cube in 4D?

A: In 4D, a "unit cube" would be a hypercube (also known as a tesseract) with edges of length 1, and a "sphere" would be a hypersphere. The general formula for the volume of a hypersphere in n dimensions (also known as an n-sphere), with a radius r, is:
V = (pi^(n/2) / Gamma(n/2 + 1)) * r^n
where Gamma is the gamma function. The gamma function of n is equal to (n-1)! for positive integers.
For the unit hypercube (a cube in 4D), each side length is 1, so the "radius" of the inscribed hypersphere is 0.5 (since the diameter is 1).
Let's substitute n=4 and r=0.5 into the formula:
V = (pi^(4/2) / Gamma(4/2 + 1)) * (0.5)^4
This simplifies to:
V = (pi^2 / 3) * (1/16)
Further simplifying gives:
V = pi^2 / 48 ≈ 0.06544984695
So, the volume of the hypersphere inscribed in a unit hypercube in 4D is approximately 0.06544984695.

Note: incorrect, the correct answer would be $\frac{\pi^2}{32}$.

Q: In higher dimensions what is the ratio of the volumes of a cube and the sphere contained in the cube? Do not give an exact mathematical answer, give a short qualitative answer.

A: As the number of dimensions increases, the volume of the sphere inscribed in a cube occupies a smaller and smaller proportion of the volume of the cube. This happens because the corners of the cube, which are outside the sphere, contribute more volume in higher dimensions. The more dimensions you have, the more "corner" there is relative to the total volume, and so the sphere takes up less of the space. In practical terms, the higher the dimension, the less "efficient" a sphere becomes at filling the space within a cube.

Note: correct.

ASCII art

Can GPT-4 read or write ASCII art? This is the question that originally triggered my exploration into spatial comprehension. Let's see:

Q: What is the letter in the following ASCII art?

██╗░░██╗
██║░░██║
███████║
██╔══██║
██║░░██║
╚═╝░░╚═╝

A: The ASCII art you've posted represents the capital letter "A".

Note: incorrect.

Q: What is the text in the following ASCII art?

██╗░░██╗███████╗██╗░░░░░██╗░░░░░░█████╗░
██║░░██║██╔════╝██║░░░░░██║░░░░░██╔══██╗
███████║█████╗░░██║░░░░░██║░░░░░██║░░██║
██╔══██║██╔══╝░░██║░░░░░██║░░░░░██║░░██║
██║░░██║███████╗███████╗███████╗╚█████╔╝
╚═╝░░╚═╝╚══════╝╚══════╝╚══════╝░╚════╝░

A: The ASCII art you've posted represents the word "GREAT".

Note: incorrect.

Q: What is the text in the following ASCII art?

███╗░░░███╗░█████╗░██████╗░████████╗░█████╗░███╗░░██╗
████╗░████║██╔══██╗██╔══██╗╚══██╔══╝██╔══██╗████╗░██║
██╔████╔██║███████║██████╔╝░░░██║░░░██║░░██║██╔██╗██║
██║╚██╔╝██║██╔══██║██╔══██╗░░░██║░░░██║░░██║██║╚████║
██║░╚═╝░██║██║░░██║██║░░██║░░░██║░░░╚█████╔╝██║░╚███║
╚═╝░░░░░╚═╝╚═╝░░╚═╝╚═╝░░╚═╝░░░╚═╝░░░░╚════╝░╚═╝░░╚══╝

A: The ASCII art you've posted represents the word "FUTURE".

Note: incorrect.

Q: Generate ASCII art for the letter H.

A: Sure, here is a simple ASCII art representation of the letter "H":

H   H
H   H
HHHHH
H   H
H   H

Note: correct.

Q: Generate ASCII art for the word HELLO.

A: Sure, here's a simple ASCII representation of the word "HELLO":

H   H  EEEEE  L     L      OOO  
H   H  E      L     L     O   O 
HHHHH  EEEE   L     L     O   O 
H   H  E      L     L     O   O 
H   H  EEEEE  LLLLL LLLLL  OOO  

This is a simple form of ASCII art where each letter is 5 units tall and about 5 units wide (not counting spaces between letters). More complex or stylized versions could be created with different shapes or patterns.

Note: correct. Let's see if it can follow my ASCII style.

Q: Generate ASCII art for the word HELLO, in the style I used above.

A:Here's the ASCII representation of the word "HELLO" similar to the style you provided:

██╗░░██╗███████╗██╗░░░██╗███████╗
██║░░██║██╔════╝██║░░░██║██╔════╝
███████║█████╗░░╚██╗░██╔╝█████╗░░
██╔══██║██╔══╝░░░╚████╔╝░██╔══╝░░
██║░░██║███████╗░░╚██╔╝░░███████╗
╚═╝░░╚═╝╚══════╝░░░╚═╝░░░╚══════╝

Each letter is represented in a style similar to the style you provided. Please note that ASCII art can vary widely in complexity and style!

Note: incorrect, but it got the HE right!

Conclusion

Based on these experiments, GPT-4 is a lot better at spatial reasoning than GPT-3. If these models keep advancing at this pace, GPT-5 or 6 will be flawless at these tasks.