How do NeRFs work?

4 min readMar 6, 2022

Source: https://www.matthewtancik.com/nerf

Neural radiance fields or NeRFs have become a wildly popular area of research. They may very well be the future of computer graphics, holding the promise of real time photo realism. But what are NeRFs and how do they work? To understand this we first need to understand the basics of computer graphics.

Basic Computer Graphics

How do we represent a camera?

Source: https://learnopengl.com/Getting-started/Camera

In computer graphics we represent a cameras position and orientation (rotation) in 3D space by 4 vectors, a position vector, a direction vector (indicating the direction the camera is looking), a right vector and a up vector. These vectors make up a frame of reference that for the camera space which we is typically called view space, while frame of reference defined by X,Y,Z axis is the world space.

If we want to transform any given point in the world space to view space we can multiply by our view matrix, consisting of our camera space unit vectors and camera position vector.

In a typical computer graphics engine, several other frames of reference are used:

Local space / object space, one for each object in the scene.
World space, where the origin is at the center of the scene.
View space / camera space
Clip space, where the camera frustum has been transformed to a [-1;1] cuboid.
Screen space

Source : https://developer.unigine.com/en/docs/2.15/code/fundamentals/matrices/

If computer graphics is new to you, I found this demo helpful in building my understanding.

https://www.realtimerendering.com/udacity/transforms.html

With a basic understanding of camera representation and the different reference frames we can delve deeper into the NeRF pipeline.

The key idea with NeRFs is to make neural network learn a scene specific function that for every point (x,y,z) and a view direction (theta,phi) will give us a density and RGB color value.

The figure above visualizes the whole NeRF procedure. It consists of the following steps.

Construct a grid of pixels in screen space at camera position
Shoot rays out of each pixel position in camera direction.
Sample points along each ray and query neural network for density and color values.
Aggregate color values for all points along rays and get RGB pixel values for image.
Compute MSE loss by comparing predicted image with GT.

This aggregation function is quite complicated and I will not get into it here, please check out the original paper for more details.

NeRF: Neural Radiance Fields

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing…

www.matthewtancik.com

In order to better understand these concepts I implemented a very basic version of NeRF.

After just a few minutes of training we can generate some very grainy novel views. Moving the Camera around the path of a circle gives us these novel views.

The code can be found here:

GitHub - LTV7/nerf

This is a basic pytorch implementation of NERF, heavily based on the keras implementation found here…

github.com

Rapid Progress toward real time photo realism

There is so much incredible work being done in this space, and the pace of progress is absolutely mind-boggling.

The original NeRF paper by Mildenhall et al was published in 2 years ago. It required 12 hours + training time and rendered images at 0.033 fps on a nvidia V100. Less than 2 years later Müller et al. and NVIDIA released Instant Neural Graphics primitives, were they train NeRFs in SECONDS and render novel views in real-time > 60fps on comparable hardware.

Check out their project page here.

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

description [Jan 19th 2022] Paper released on arXiv. integration_instructions [Jan 14th 2022] Code released on GitHub…

nvlabs.github.io

And it seems like the rate of progress is only accelerating, hold on tight!