Paper Summary: DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

8 min readJul 27, 2023

Park, Jeong Joon, et al. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

Abstract: Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data. DeepSDF, like its classical counterpart, represents a shape’s surface by a continuous volumetric field: the magnitude of a point in the field represents the distance to the surface boundary and the sign indicates whether the region is inside (-) or outside (+) of the shape, hence our representation implicitly encodes a shape’s boundary as the zero-level-set of the learned function while explicitly representing the classification of space as being part of the shapes’ interior or not. While classical SDF’s both in analytical or discretized voxel form typically represent the surface of a single shape, DeepSDF can represent an entire class of shapes. Furthermore, we show state-of-the-art performance for learned 3D shape representation and completion while reducing the model size by an order of magnitude compared with previous work.

1. Introduction

Prior research in computer graphics and 3D computer vision has proposed numerous approaches for representing 3D shapes. Such methods are useful for storing memory-efficient representations of known shapes, generating new shapes, and fixing/reconstructing shapes based on limited or noisy data.

But wait a minute! What is 3D shape representation?

Geometry processing is mostly about applying algorithms to geometric models. For these algorithms, we can identify a characteristic set of operations to perform the operations. However, we may notice that a few of the operations dominate the computation and hence we have to choose an appropriate representation that supports the efficient implementation of these operations. These include spline surfaces, triangle meshes, point clouds, regular grids, etc.

Directly storing a 3D shape using a point cloud, mesh, or voxels requires a lot of memory. Instead, we will usually want to store an indirect representation of the shape that’s more efficient. A simple 2D example of it is using the parametric form (x = r cos θ, y = r sin θ) to represent a circle.

The approach we will look at is called the signed distance function (SDF). Given a spatial [x, y, z] point as input, SDFs will output the distance from that point to the nearest surface of the underlying object being represented. The sign of the SDF’s output indicates whether that spatial point is inside (negative) or outside (positive) of the object’s surface. Using this, the surface can be identified by finding the locations at which the SDF is equal to zero.

Deep learning can be used to represent 3D shapes. To do this, we can train a neural network to output a representation of a 3D shape, allowing representations for a variety of shapes to be indirectly stored within the weights of the neural network. Using this, we can query the neural network to produce new shapes. This, in a better way and more specifically is called a generative model.

In this paper, the authors present a novel representation approach for generative 3D modelling that is efficient, expressive, and fully continuous, using the concept of an SDF. Their contributions are as follows:

the formulation of generative shape-conditioned 3D modelling with a continuous implicit surface
a learning method for 3D shapes based on a probabilistic auto-decoder
the demonstration and application of this formulation to shape modelling and completion

2. Related Work

In order to learn compact data representations, we can have an auto-encoder which is an encoder-decoder architecture to basically learn representations of your input and then using those representations, reconstruct it back. However, the authors propose using a decode-only network, wherein the latent vector assigned to each data point as well as the decoder weights are both optimized through backpropagation. For inference, an optimal latent vector is searched to match the new observation with fixed decoder parameters. The authors refer to this class of networks as auto-decoders.

I found a reference for the same from a paper from 1998 titled: “Input-Training Neural Network Approach for Gross Error Detection”, wherein the auto-decoders are dubbed as Input-Training Networks.

3. Modeling SDFs with Neural Networks

The authors describe modeling shapes as the zero iso-surface decision boundaries of feed-forward networks trained to represent SDFs:

The key idea is to directly regress the continuous SDF from point samples using deep neural networks so that the resulting trained network is able to predict the SDF value of a given query point. The most direct application of this approach is to train a single deep network for a given target shape.

Given a target shape, the authors prepare a set of pairs X which comprises of 3D point samples and their SDF values. Using this, a multi-layer fully connected neural network is trained to minimise the following loss function:

One nice property of this approach is that accurate normals can be computed by calculating the derivative with respect to the query point.

4. Learning the Latent Space of Shapes

Training a specific neural network for each space is neither feasible nor very useful. Ideally, we would want to model a variety of shapes with a single neural network. To achieve this, the authors introduce a latent vector z as a second input to the neural network, which can be thought of as encoding the desired shape. Now the neural network is the function of a latent code and a query point which outputs the shape’s approximate SDF at that point.

How can we achieve the latent vector for a shape?

The authors use auto-decoder networks for learning a shape embedding without an encoder. Given a dataset of N shapes, the authors prepare a set of K sample point and their signed distance values. The latent vectors are initialized randomly from N(0, 0.01²). Now using the same loss function as before, the authors maximise the joint log posterior over all training shapes with respect to the individual latent vectors and network parameters:

At inference time, the network parameters are fixed, and the shape code for each shape can be estimated via a maximum-a-posterior estimation:

Crucially, this formulation is valid for SDF samples of arbitrary size and distribution. This implies that DeepSDF can handle any form of partial observations such as depth maps.

5. Data Preparation

The authors make use of the ShapeNet dataset which provides complete 3D shape meshes. To prepare data, they start by normalizing each mesh to a unit sphere and sampling 5,00,000 spatial points, with more aggressive sampling near the surface.

6. Results

To show the ability of DeepSDF in terms of generalization capability and to describe geometric details. They propose 4 experiments to test its ability to:

Represent training data

First, they evaluate the capacity of the model to represent known shapes (already in the training set) from only a restricted-size latent code. The metric used for comparison is Chamfer Distance (CD) which is the sum of the squared distance between nearest neighbour correspondences of two point clouds. The authors find that their proposed model significantly beats the existing state-of-the-art models by a large margin.

2. Use learned feature representations

For encoding unknown shapes, it again significantly outperforms other models on a wide variety of shape classes (chair, plane, table, lamp, sofa). They note that other models fail to represent the fine details of the shapes but DeepSDF leads to more detailed reconstructions.

3. Apply shape priors to complete partial shapes

Shape completion amounts to solving for the latent code that best explains a partial shape observation. Given the latent vector, a complete shape can be rendered easily using the model. The authors test the completion scheme using single view depth observations and sample points from that image. The authors note that their model produces more visually pleasing and accurate shape reconstructions.

4. Learn smooth and complete shape embeddings

To show the learned shape embedding is complete and continuous, the authors render the results of the decoder when a pair of shapes are interpolated in the latent vector space. The note that the results suggest that the embedded continuous SDF’s are of meaningful shapes and that their representation extracts common interpretable shape features.

7. Conclusion

DeepSDF significantly outperforms the benchmarked methods across shape representation and completion tasks, with significantly less memory than previous models. They note that while point-wise forward sampling of a shape’s SDF is efficient, shape complete tasks take considerably more time due to the need for optimization over the latent vector. They hope to increase performance by replacing Adam with more efficient methods like Gauss-Newton. On another note, DeepSDF currently assumes models are in a canonical pose and as such completion in-the-wild requires more optimization.

8. Final Words

The paper is very well written, even me being a beginner to 3D deep learning was able to easily follow along. The approach is also quite novel and simplistic, generating significant results.

9. References

Park, Jeong Joon, et al. “Deepsdf: Learning continuous signed distance functions for shape representation.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
Reddy, V. N., and M. L. Mavrovouniotis. “An input-training neural network approach for gross error detection and sensor replacement.” Chemical Engineering Research and Design 76.4 (1998): 478–489.