Identifying Snakes: AI vs. Human
Human Vision vs. Computer Vision
I saw a post on Reddit the other day about ChatGPT misidentifying a Pygmy Rattlesnake (super venomous) as an Eastern Hognose (super not venomous). This, for sure, isn’t the first time generative AI mistook dangerous objects for benign alternatives, the most famous example of late being the UK family poisoned by an AI-generated foraging book. This is also, unfortunately, for sure not the last time AI will make a dangerous recommendation. However, I think this is an excellent example of all the ways AI doesn't work like human reasoning.
When you or I see the snake, we (hopefully) know what to look for in a venomous snake. Over the years, we’ve all likely learned about how a venomous snake will have a triangle-shaped head to make room for venom glands. Or, much more obviously in this case, a rattle. On the other hand, when AI sees the snake, it doesn't "see" these features; it "sees" a giant vector of non-sensical (to a human) numbers.
I think this is an excellent example of all the ways AI *doesn't* work like human reasoning.
Characteristics vs. Features
What happens is that, during training, the AI is shown a huge set of images labeled as containing a snake, and then the AI applies some form of encoding in a feature space. This encoding essentially creates a vector in feature space, and this feature vector is made up of a whole bunch of numbers that are nonsensical to a human. For a good example, the SIFT algorithm returns a feature vector of 128 values.
After training, the AI will have a feature vector encoded that defines what type of snake is in the image. When you give it a new image and ask “hey what’s this,” it calculates a feature vector for the image you gave it and compares it to the trained feature vector it learned during training. The algorithm runs through these comparisons until it finds the closest match; if it finds Hognose first, it gives back Hognose. If it finds Rattlesnake first, it gives back Rattlesnake.
The SIFT algorithm looks for keypoints that don’t often make sense to a human’s perception.
The Takeaway
The bottom line is that, under the hood, AI does not "think" like a human. We see characteristics – head shape, length, color, etc.; it sees features – seemingly random numbers that are related to these characteristics in a way that makes sense to the machine, not the human. A core component in the battle for explainable AI is this disconnect between human vision and computer vision, and it’s important we take these events as learning opportunities to teach users about the nature of AI, not as a warning to abandon image matching altogether.