How do NeRFs work?

Lars Vagnes
4 min readMar 6, 2022
Source: https://www.matthewtancik.com/nerf

Neural radiance fields or NeRFs have become a wildly popular area of research. They may very well be the future of computer graphics, holding the promise of real time photo realism. But what are NeRFs and how do they work? To understand this we first need to understand the basics of computer graphics.

Basic Computer Graphics

How do we represent a camera?

Source: https://learnopengl.com/Getting-started/Camera

In computer graphics we represent a cameras position and orientation (rotation) in 3D space by 4 vectors, a position vector, a direction vector (indicating the direction the camera is looking), a right vector and a up vector. These vectors make up a frame of reference that for the camera space which we is typically called view space, while frame of reference defined by X,Y,Z axis is the world space.

If we want to transform any given point in the world space to view space we can multiply by our view matrix, consisting of our camera space unit vectors and camera position vector.

World to view space transform
World point to view space

In a typical computer graphics engine, several other frames of reference are used:

  • Local space / object space, one for each object in the scene.
  • World space, where the origin is at the center of the scene.
  • View space / camera space
  • Clip space, where the camera frustum has been transformed to a [-1;1] cuboid.
  • Screen space
Source : https://developer.unigine.com/en/docs/2.15/code/fundamentals/matrices/

If computer graphics is new to you, I found this demo helpful in building my understanding.

https://www.realtimerendering.com/udacity/transforms.html

With a basic understanding of camera representation and the different reference frames we can delve deeper into the NeRF pipeline.

The key idea with NeRFs is to make neural network learn a scene specific function that for every point (x,y,z) and a view direction (theta,phi) will give us a density and RGB color value.

Source: https://www.matthewtancik.com/nerf
Source: https://www.matthewtancik.com/nerf

The figure above visualizes the whole NeRF procedure. It consists of the following steps.

  1. Construct a grid of pixels in screen space at camera position
  2. Shoot rays out of each pixel position in camera direction.
  3. Sample points along each ray and query neural network for density and color values.
  4. Aggregate color values for all points along rays and get RGB pixel values for image.
  5. Compute MSE loss by comparing predicted image with GT.

This aggregation function is quite complicated and I will not get into it here, please check out the original paper for more details.

In order to better understand these concepts I implemented a very basic version of NeRF.

After just a few minutes of training we can generate some very grainy novel views. Moving the Camera around the path of a circle gives us these novel views.

Novel view synthesis with minimal NeRF

The code can be found here:

Rapid Progress toward real time photo realism

There is so much incredible work being done in this space, and the pace of progress is absolutely mind-boggling.

The original NeRF paper by Mildenhall et al was published in 2 years ago. It required 12 hours + training time and rendered images at 0.033 fps on a nvidia V100. Less than 2 years later Müller et al. and NVIDIA released Instant Neural Graphics primitives, were they train NeRFs in SECONDS and render novel views in real-time > 60fps on comparable hardware.

Check out their project page here.

And it seems like the rate of progress is only accelerating, hold on tight!

--

--