NeRF

Representing Scenes as Neural Radiance Fields for View Synthesis

ECCV 2020 Oral

1 UC Berkeley
2 Google Research
3 UC San Diego

UC Berkeley

UC Berkeley

UC Berkeley

Google Research

UC San Diego

UC Berkeley

**Denotes Equal Contribution
Paper
</Code>
Data

Overview Video

ECCV Technical Talk Video

Abstract & Method

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.

Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location.

We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis.

Synthetic Results

Here are results on our synthetic dataset of pathtraced objects with realistic non-Lambertian materials. The dataset will be released soon.

View-Dependent Appearance

Here we visualize the view-dependent appearance encoded in our NeRF representation by fixing the camera viewpoint but changing the queried viewing direction.

Geometry Visualization

NeRFs are able to represent detailed scene geometry with complex occlusions. Here we visualize depth maps for rendered novel views computed as the expected termination of each camera ray in the encoded volume.

Our estimated scene geometry is detailed enough to support mixed-reality applications such as inserting virtual objects into real world scenes with compelling occlusion effects.

We can also convert the NeRF to a mesh using marching cubes.

360° Scene Capture with Real Data

NeRFs can even represent real objects captured by a set of inward-facing views, without any background isolation or masking.

Positional Encoding

Fully-connected deep networks are biased to learn low frequencies faster. Surprisingly, applying a simple mapping to the network input is able to mitigate this issue. We explore these input mappings in our followup work.

Citation

@misc{mildenhall2020nerf,
   title={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
   author={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
   year={2020},
   eprint={2003.08934},
   archivePrefix={arXiv},
   primaryClass={cs.CV}
}