Computer Vision and Coordinate Transforms

Hear me out: all of computer vision is coordinate transforms with extra steps. The image format defines a coordinate space, and the pixels are points located within this coordinate space. Here’s my case for why.

Digital Images

If we look at the most basic concept of a digital image, it is a function whose value is determined by its position within an array.

I = f[x]

We can start by looking at a 1-dimensional digital image, or, a single row of black or white pixel values in 1D. This array can be expressed as something like:

[1 0 0 1 1 0]

However, we can quickly turn this into our very first digital image:

Our image format is binary, with each pixel being either 0 or 1. The pixel values are points in the binary space. To create our image, we simply set 1 to black and 0 to white. Binary image format, binary pixel values of 0 or 1: a 1-dimensional image in binary space.

Making 2D Images

Let’s move on to our old friend 2D color images. In this case:

I = f[x,y]

Our 1-D array has become a 2D array of colors, with each pixel containing an array of values that define the color of the pixel.

F[x,y] = [r,g,b]

After this, we’re left with an array of arrays. In fact, you can think of our binary case as an array of arrays, the internal pixel arrays just happened to be single values. Pretty much any abstract data structure that is an array of arrays can, for engineering purposes, be treated like a digital image. Except for one catch: pixels are discrete, and, like Romeo and Juliette, they are destined to never be together. Cause, you know, they had to be discreet in their rendezvous (I kill me…). Our binary image becomes:

[[255,255,255] [0,0,0] [0,0,0] [255,255,255] [255,255,255] [0,0,0]]

Pixels are Discrete Values

The one factor that can disqualify an array of arrays is if the values are continuous. See (get it, “see”), each pixel contains its own information, and no information exists between pixels. Never the two shall have an extrapolatable, continuous value. The best example of this is a time series. A time series is an array of arrays containing a set of data changing over time. From our first example, we should be able to turn this into a binary image, right?

Well, no, not really. Time is a continuous parameter, and you can plug in any time you want and get a solution. You can only “plug in” exact pixel locations with a fixed-size image array. Likewise, there’s only so much display information you can shove into a pixel. Meaning…

Pixels Don’t Get Better, They Get More

Which leads to the question, how do we make better images? Well, the answer isn’t “put more data in them,” because we can’t. Best we can do (and it’s pretty good) is make more pixels, because more pixels = more continuity. In my next blog, I’ll go into detail on the hows and whys of increasing resolutions.

Fun bonus: check out the computer vision demos I have on my GitHub

Previous
Previous

A pixel by any other name

Next
Next

Big Data and Cybersecurity