Learning accross different 3D representations

High res voxel grids are impractical in terms of memory or compute

OctNet	Use Octree as input to the network
Octree Generating Networks	Have network produce an Octree

Submanifold sparse convolutions #

Only apply convolutions where geometry is -> saves compute
Can be combined with hash based structures like spatial hashing to only store positions where there is something -> saves memory
Together: Allows processing of the scene at a higher resolution
Sparse Generative Neural Networks

From multi-view #


MVCNN	Multi view images -> classification

Render Point Clouds as spheres and do MVCNN on them
Multi view considerations
1. What viewpoints to use?
2. How many viewpoints?
3. How to handle noisy/incomplete data?

Convolutions on points #

PointNet++ already introduced hierarchical multi resolution point handling for local neighborhood structure.

Convolutions on meshes #

Meshes are graphs
How to define convolutions over graphs?
- Should be agnostic to number of vertices / faces

Geometric Operators #

Tangent planes
Geodesic distances
Laplacian mesh operator

Spectral graph convolution #

Apply the convolutions in frequency domain
“convolution in spacial domain == multiplication in frequency domain”

Convert mesh to frequency domain by using the eigenvectors of the Laplacian mesh operator .
Multiply with kernel

No guarantee that filters will have local support on the graph
No shift invariance
No pooling

Message Passing Graph Neural Networks #

See Graph Neural Network
Vertices have features associated to them
Messages get passed from vertices to neighbors
- Messages get aggrigated at vertices (sum or avg)
This is run a few times iteratively over the whole graph
Optionally edges also have features that get aggegated
See: Scan2Mesh

Attempts #

Geodesic CNN	Work with geodesic patches; no pooling
MeshCNN	Edges have features; convolutions and pooling

Combining Representations #

Leverage benefits of multiple representations
Convert Point Cloud to graph based on local neighbohood and use convolutions on graphs
Joint 3d-Multi view learning to predict semantic labels in scene