Learning accross different 3D representations
- High res voxel grids are impractical in terms of memory or compute
| OctNet | Use Octree as input to the network |
|---|---|
| Octree Generating Networks | Have network produce an Octree |
Submanifold sparse convolutions #
-
Only apply convolutions where geometry is -> saves compute
-
Can be combined with hash based structures like spatial hashing to only store positions where there is something -> saves memory
-
Together: Allows processing of the scene at a higher resolution
From multi-view #
| MVCNN | Multi view images -> classification |
-
Render Point Clouds as spheres and do MVCNN on them
-
Multi view considerations
- What viewpoints to use?
- How many viewpoints?
- How to handle noisy/incomplete data?
Convolutions on points #
PointNet++ already introduced hierarchical multi resolution point handling for local neighborhood structure.
Convolutions on meshes #
- Meshes are graphs
- How to define convolutions over graphs?
- Should be agnostic to number of vertices / faces
Geometric Operators #
- Tangent planes
- Geodesic distances
- Laplacian mesh operator
Spectral graph convolution #
- Apply the convolutions in frequency domain
- “convolution in spacial domain == multiplication in frequency domain”
- Convert mesh to frequency domain by using the eigenvectors of the Laplacian mesh operator .
- Multiply with kernel
- No guarantee that filters will have local support on the graph
- No shift invariance
- No pooling
Message Passing Graph Neural Networks #
-
Vertices have features associated to them
-
Messages get passed from vertices to neighbors
- Messages get aggrigated at vertices (sum or avg)
-
This is run a few times iteratively over the whole graph
-
Optionally edges also have features that get aggegated
-
See: Scan2Mesh
Attempts #
| Geodesic CNN | Work with geodesic patches; no pooling |
|---|---|
| MeshCNN | Edges have features; convolutions and pooling |
Combining Representations #
-
Leverage benefits of multiple representations
-
Convert Point Cloud to graph based on local neighbohood and use convolutions on graphs
-
Joint 3d-Multi view learning to predict semantic labels in scene