Introduction to Photometric Stereo: 1 – The Basics

Introduction

Photometric Stereo is a method for capturing the shape of a surface using the shading differences from lighting conditions, which provides pixel-resolution surface normal maps, meaning the only limit to spatial resolution is that of the camera itself. It has the added advantage that it separates the aesthetic texture (albedo) of the surface from the underlying surface shape – it can essentially see through camouflage!

How does it work?

In simple terms, we take several pictures (typically grey-scale) of the same static object, from the same static camera, but we change the lighting direction for each image. By looking at how each pixel’s intensity changes under the different conditions, we can estimate how that surface is oriented, relative to the lights.

Some Key Assumptions

Before we start, there are some import assumptions about the surface we’re imaging and the lights.

Firstly, we assume that the surface exhibits Lambertian reflectance, i.e. the amount of light reflected from a surface is directly proportional to the angle between the light direction and the surface normal.

The other key assumption is that the lights are point sources – they come from infinitely far away, so the rays are parallel and you don’t get differences in illumination across the image.

Realistically, rarely are both these assumptions held completely, and so there are certain steps we can take to improve estimation accuracy, but I’ll take about that in a later post.

Solving the Surface at a Given Pixel

The relationship between the light direction, $L = (L_x, L_y, L_z)$ , albedo, $\rho$ , and surface normal $N = (N_x, N_y , N_z)$ for a given pixel is provided below:

$I (x,y) = \rho(x,y) L . N (x,y) = \rho(x,y) \left[ \begin{array}{c} L_x\\ L_y\\ L_z \end{array}\right]^\top \left[ \begin{array}{c} N_x (x,y) \\ N_y (x,y) \\ N_z (x,y) \end{array}\right]$

We assume that $L$ is known, and the $N (x,y)$ is a Unit Vector, thus we are left with three unknowns, $\rho$ , $N_x (x,y)$ and $N_y (x,y)$ . Therefore we only need 3 differently-lit images to estimate the surface normal and albedo pair at each pixel.

By re-arranging the above equation, we can isolate the lighting directions from the surface normals and albedo, such that we first solve for $G(x,y) = \rho(x,y) N(x,y)^ \top$ , the we’ll split $G(x,y)$ back into the $\rho$ and $N$ later.

$I = G(x,y)^\top L = L^\top G(x,y)$

Now we simply solve G for a given pixel using the using normal equations as follows:

$G(x, y) = (L^\top L)^{-1} L^\top \cdot I(x, y)$

We can work out $N(x,y)$ and $\rho(x,y)$ from $G(x,y)$ trivially:

$\rho(x,y) = ||G (x,y) || = \sqrt{G_x (x,y) ^2 + G_y (x,y) ^2 + G_z (x,y) ^2}$

$N(x,y) = \frac{G(x,y)}{\rho}$

Solving for all Pixels

We could loop over every pixel and work calculate that value, but it’s not overly efficient. Instead, we can take advantage of a technique known as “broadcasting” (in Python), and also supported by Matlab. We get all three greyscale images and flatten them into column vectors, then vertically stack them into a new matrix, which we’ll call $J$ . $J$ is therefore an $3 \times n$ matrix, where $n$ is the number of pixels.

Now we can solve similarly to the above for all pixels at simultaneously, just as we do above:

$G = (L^\top L)^{-1} L^\top \cdot J$

Once we’ve calculated G, we can normalise it as above to get the albedo and the normal map. Finally, we reshape it back to the original image shape.

The surface normal map can then be visually represented as a 3-channel image, where $(R, G, B)^\top \mapsto (N_x, N_y, N_z)^ \top$ . Thus we can produce colour visualisations like the one below:

Improving Robustness

As mentioned above, three images are the minimum required to implement photometric stereo; however, heavily shadowed pixels, or those under specular reflection, offer incorrect information, which is a major source of error. It is therefore common to create an overdetermined system by including additional light-sources.

In future posts, I’ll look at how we might be able to use such an over-determined system and what effect this has on the end-result.

Code

Some basic demo code, with example data is available on my GitHub repo.

References

[1] Woodham, R.J. (1980) Photometric Method For Determining Surface Orientation From Multiple Images. Optical Engineering [online]. 19 (1), pp.191139.

[2] Jiuai Sun, Melvyn Smith, Abdul Farooq and Lyndon Smith. (2010) Counter camouflage through the removal of reflectance.IEEE the 3rd International Conference on Machine Vision. Hong Kong.

Ian Hales

Computer Vision | Augmented Reality | Software Engineering