mesh
Table of Contents
1. OTHERS
1.1. DIFFERENT PRIMITIVE
1.1.1. FLEXIDREAMER
- FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
- leveraging a flexible gradient-based extraction known as FlexiCubes
- direct acquisition of the target mesh
- 1 minute, first extracting mesh then texturing
1.2. MESH HUMAN FEEDBACK
- DreamReward: Text-to-3D Generation with Human Preference
- 3D reward model
2. CAD
- BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry
- structured latent geometry in a hierarchical tree, representing a whole CAD solid
- generating complicated geometry
- structured latent geometry in a hierarchical tree, representing a whole CAD solid
3. FROM SOME SOURCE
- ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections
- using 2d diffusion as prior
- PIXEL ALIGNMENT
3.1. SKETCH
- Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
- both sketches and CADs into meshes
- without requiring any paired datasets during training
- Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation (sketch control to 3D generation)
- 3Doodle: Compact Abstraction of Objects with 3D Strokes
- abstract sketches containing 3D characteristic shapes of the objects
- Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation
- generating realistic 3D Gaussians consistent with color-style described in textual description
3.2. SIMPLIFICATION
- Point-Cloud Completion with Pretrained Text-to-image Diffusion Models
- from incomplete point cloud to 3d model
- Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives
- turned into big blocks; interpretable, easy to manipulate and suited for physics-based simulations
- FlexiCubes: Flexible Isosurface Extraction for Gradient-Based Mesh Optimization
- hierarchically-adaptive meshes, local flexible adjustments
- ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
- learning to combine multiple models
- adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image
- GetMesh: A Controllable Model for High-quality Mesh Generation and Manipulation
- mesh autoencoder, e-organized to the triplane representation by projecting the points to the triplane
- combining mesh parts across categories, adding/removing mesh parts,
3.3. IMAGE TO 3D
- SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections (landscapes, scenarios)
- Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
- image + text + shape prior = mesh
- One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization
- use sd to force consistency of diffusion generation
- Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
=best=
- for both stages; first stage coarse geometry first then details
- DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model (generates nerf)
- ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
=best=
- Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior
- subject-specific and multi-modal diffusion model; gets NeRF
- enhances texture from coarse
- DREAMDISTRIBUTION
- HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
- ZeroShape: Regression-based Zero-shot Shape Reconstruction (just the shape)
- rained to directly regress the object shape, computationally efficient
- FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
- extract 3D geometric features from the 2D input
- incorporating attention to fuse images from different viewpoints
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
- encodes the image, depth, and normal into latent space, fed into U-Net to generate
- T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image
- coarse-to-fine to refine mesh details
- Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion
- refines coarse generative results, high consistency
3.3.1. FAST
3.3.2. ZERO-1-TO-3
- Zero-1-to-3: Zero-shot One Image to 3D Object: Zero123Plus is model
- DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
=best=
- image to guide the geometry sculpting and texture boosting
- imbuing it with 3D knowledge of the scene being optimized = view-consistent guidance for the scene
- Wonder3D: Single Image to 3D using Cross-Domain Diffusion
=best=
- geometry-aware normal fusion algorithm that merges the generated 2D representations
- PERFLOW
- iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views
- repurposing Zero123 for camera pose estimation
- DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
- ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
- auto-regressive method, previously generated views as context for the next view generation
3.3.2.1. ISOTROPIC3D
- Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding
- fine-tune a text-to-3D diffusion model by substituting its text encoder with an image encoder
- single image CLIP embedding to generating multi-view images
3.3.2.2. HYPERDREAMER
- HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image
- interactively select region via clicks and edit the texture text guidance
- modeling region-aware materials with high-resolution textures and enabling user-friendly editing
- guidance: semantic segmentation and data-driven priors(albedo, roughness, specular properties)
3.3.2.3. ESCHERNET
- EscherNet: A Generative Model for Scalable View Synthesis
- self-attention with M target views to ensure target consistency, more consistent across frames
3.3.2.4. TRIPOSR
3.3.2.5. FDGAUSSIAN
- FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
- extract 3D geometric features from the 2D input
3.3.2.6. MVD-FUSION
- MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
- output to multiple 3D representations such as point clouds, textured meshes, and Gaussian splats
4. MESH EDITION
- SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
- 3D editing directly by a feed-forward network, one second per edit
- Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
- visibility-aware adaptive repainting (view-consistent)
- IMAGE SCULPTING
=best=
- STABLEIDENTITY inserting identity
- threefiner: interface for text-guided mesh refinement
4.1. INPAINTING
- SC-Diff: 3D Shape Completion with Latent Diffusion Models
- realistic shape completions at superior resolutions
4.2. MESH TO MESH
- Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
- MVEdit as 3D counterpart of SDEdit
- 3D Adapter lifts 2D; also for texture generation
4.3. MOUSE EDITING
- MagicClay: Sculpting Meshes With Generative Neural Fields
- maintains s consistency between both a mesh and a Signed Distance Field (SDF) representations
4.3.1. DRAGTEX
- DragTex: Generative Point-Based Texture Editing on 3D Mesh
- draggint the texture
- diffusion model to blend locally inconsistent textures between different views
- enabling locally consistent texture editing
5. MULTIVIEW DIFFUSION
- 1.1.1
- MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, panorama
- MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single to Sparse-view 3D Object Reconstruction
- 2D latent features learns 3D consistency
- MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single to Sparse-view 3D Object Reconstruction
- Zero123++: a Single Image to Consistent Multi-view Diffusion
- ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
- single-image novel view synthesis; multi-object scenes with complex backgrounds, indoor and outdoor
- camera conditioning parameterization and normalization scheme
- LIFTING 3.3.2
- HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
5.1. FROM VIDEO
5.2. FROM SD
- CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
- start from sd checkpoints
- 3D novel view synthesis into object customization process
=best=
- with flexible background control
- DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
=best=
- view-aware 3D lifting module to obtain 3D representations which are injected into 2d-model
- 7.3.1.1
5.2.1. MVDREAM
- MVDream: Multi-view Diffusion for 3D Generation
=best one=
- generate nerf without janus, from normal 2d diffusion
- sd + multi-view dataset rendered from 3D assets
- 2D diffusion + consistency of 3D data
- SPAD: Spatially Aware Multiview Diffusers
- utilize Plucker coordinates derived from camera rays and inject them as positional encoding
- janus issue resolved
- leverage the multi-view Score Distillation Sampling (SDS) for 3D asset generation
5.3. NOVEL VIEW
- GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models (1 image to 3d video)
- MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion
- generate new consistent perspective view
- MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion
- SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
- multiview diffusion model that models the joint probability distribution of multiview images
- 3D-aware feature attention
- Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
- Multi-view Reconstruction Consistency (MRC) metric
- POSE - POSITION
5.3.1. GAUSSIAN NOVEL VIEW
5.3.2. DREAMDISTRIBUTION
- DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
=best=
- finds a prompt distribution of reference images, then use it to generates new 2D-3D instances
5.3.3. 3D-AWARE IMAGE EDITING
- Reference-Based 3D-Aware Image Editing with Triplane
- Leveraging 3D-aware triplanes, our edits are versatile, allowing for rendering from various viewpoints
6. TEXTURES
- TEXTURE
- Learning Disentangled Avatars with Hybrid 3D Representations
- hair, face, body and clothing can be learned then disentangled yet jointly rendered
- Paint Anything 3D with Lighting-Less Texture Diffusion Models (Tencent)
=best=
- lighting-less textures; image painted over input model, generate texture from text
6.1. SCENE TEXTURES
- SCENE SYNTHESIS
- Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models (textures)
- DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation (vr)
- coarse-to-fine panoramic texture generation which both considers geometry and texture cues
- imagine a panoramic then propagate it with inpainting
- coarse-to-fine panoramic texture generation which both considers geometry and texture cues
6.2. MATERIAL
- ControlMat: A Controlled Generative Approach to Material Capture; diffusion
- uncontrolled illumination as input; get variety of materials which could correspond to the input image
- TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models
- synthesize textures for given 3D geometries
- aggregate the different denoising predictions on a shared latent texture map
- Alchemist: Parametric Control of Material Properties with Diffusion Models
- control material attributes of objects like roughness, metallic, albedo, and transparency in real images
- edit material properties in real-world images while preserving all other attributes
- TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
- relightable textures from a small number of input images to target 3D shapes
- UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
- removes lighting effects to then rendered under various lighting conditions
- Holo-Gen: by Unity, generate physically-based rendering (PBR) material properties for 3D objects
- FlashTex: Fast Relightable Mesh Texturing with LightControlNet
- texturing an input 3D mesh with prompt, high-quality and relightable textures
- based on the ControlNet architecture
- 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models
=best=
- given input mesh, use 2d diffusion to generate coherent and relightable textures exploiting depth maps
- 4.2
- SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
- denoising in multiple instance spaces
- 3D Mesh Texturing, gaussian splat texturing, depth to panorama
- MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment
- sd as a prior to texture a 3D model
6.2.1. FURTHER ENHANCEMENT
- Enhancing Texture Generation with High-Fidelity Using Advanced Texture Priors
- rough texture as the initial input, texture enhancement, eliminating noise and ”point gaps”
6.2.2. HAIR
- Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction, realism hair modeling, personalization
- CT2Hair: High-Fidelity 3D Hair Modeling using Computed Tomography
- real-world hair wigs as input, populates dense strands
- HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
- latent diffusion model that operates in a common hairstyle UV space
6.2.3. CLOTH
- parent: diffusion
- DANCING
- Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models
- clothes mesh diffusion using text generation
- TryOnDiffusion: A Tale of Two UNets (dress one with clothes from another, pose-body change)
- DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
- X-MDPT: Cross-view Masked Diffusion Transformers for Person Image Synthesis
- change the pose keep the clothes, employs masked diffusion transformers on latent patches
- OOTDiffusion: A highly controllable open source tool for virtual clothing try-on
=best one=
- GALA: Generating Animatable Layered Assets from a Single Scan
- from one image get all the clothes in independant separated segmentated pieces-parts
- DressCode: Autoregressively Sewing and Generating Garments from Text Guidance
- generate sewing patterns with text guidance, also for editing
6.2.3.1. GEN MESH
- Garment3DGen: 3D Garment Stylization and Texture Generation
- synthesize 3D garment assets from a base mesh given a single input image as guidance
- Design2Cloth: 3D Cloth Generation from 2D Masks
- synthesize the decomposed 3d meshes from single image
6.3. FACE
- GAUSSIAN FACE NERF FACE
- HEAD POSE face-style swap, GET HEAD POSE
- HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation
- BlendFields: Few-Shot Example-Driven Facial Modeling
6.3.1. AVATAR
- TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
- diffusion model, imagines it, outputs geometry and texture
- Relightable and Animatable Neural Avatar from Sparse-View Video
- relightable neural avatars from monocular inputs; pose AND surface intersection, light visibility
- SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance
- geometry is constrained to human-body prior; high-quality meshes and textures
- AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation
- GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
- SDF-based implicit mesh learning; primitive-based 3D Gaussian representation to facilitate animation
- Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute
- intermediate codebook features
- AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using Garment Rigging Model
- generating animatable human avatars in loose clothes from sparse multi-view videos
- M3Face: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing
- multi-modal framework for face generation and editing
- One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation
=best=
- 3D animatable photo-realistic head avatar as prior, personalizable with few shot (one image)
- MAGICMIRROR: style and subject transformations
=best=
6.3.1.1. EXPRESSION
- Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
- to generate a talking portrait video, estimating expression and head pose
- Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
- annotated emotional and style labels, auto-encoder decoupling expressions and identities
- Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters
=best=
- automatically animate virtual human faces
- AUDI TO VIDEO
- EMO: Emote Portrait Alive
=best=
- audio to video (directly)
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
- driven by audio and a reference portrait image
- EMO: Emote Portrait Alive
6.3.2. FACE DIFFUSION
- parent: diffusion
- Relightify: Relightable 3D Faces from a Single Image via Diffusion Models
- create from photo a set of textures(BRDF) to generate realistic 3d face
- SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder
- mitigates identity leakage by masking facial regions and utilizing disentangled identity and non-identity features
7. MESH DIFFUSION
- diffusion GAUSSIAN
- MeshDiffusion paper: 3D Mesh Modeling
- LDM3D: Latent Diffusion Model for 3D, normal stable diffusion, with depth map generation,
=best=
- LUCIDDREAMER
7.1. TEXT-TO-3D
- Text-to-3D with classifier score distillation
=best=
- guidance alone is enough for effective text-to-3D generation
- StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D
=best quality=
(Gaussians)- image-space diffusion for geometric precision, latent-space diffusion for vivid color
- high-fidelity(quality) 3D models
- Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
=best=
- both a 3D and a 2D diffusion process (priors), bidirectional guidance (20 minutes)
- UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
- albedo-normal aligned multi-view diffusion (to enable relighting)
- PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models
- operates natively on the polygonal mesh, trained to restore the original mesh structure
- 3.3.1.1
- RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
- trained with extra image-to-depth and image-normal priors; maps diffused together
- HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation (7 seconds)
- AToM: Amortized Text-to-Mesh using 2D Diffusion
- high-quality textured meshes, 1 second inference, 10 times reduction in training cost, unseen prompts
- L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
- compose a desired object via trial-and-error within the 3D simulation environment
- Instant Text-to-3D Mesh with PeRFlow-T2I + TripoSR
7.1.1. VOLUME DIFFUSION
- WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space
- trained without direct supervision from multiview or 3D and dont require pose or camera distributions
- autoencoder captures the images underlying 3D structure (unlike previous gan approaches)
- trained without direct supervision from multiview or 3D and dont require pose or camera distributions
- VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
- 3D latent representation, seconds to minutes
7.2. CHARACTER
- AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose
- text descriptions and pose guidance
- uses mlp nerf
- HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
- Fast Registration of Photorealistic Avatars for VR Facial Animation
- Synthesizing Moving People with 3D Control
7.3. FROM 2D DIFFUSION
- MULTIVIEW DIFFUSION
- Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
- 3D-aware Image Generation using 2D Diffusion Models
- Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
- integration into existing 3D generative models, enhancing synthesis of the texture
- for image generation Latent Diffusion Model (LDM)
- integration into existing 3D generative models, enhancing synthesis of the texture
- CONTROLNET FOR 3D
=best=
- 3.3.2.2 DEPTH DIFFUSION 4.2
7.3.1. 3D PRIOR
- GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation
- multi-view diffusion serves as native 3D geometric priors
- disentangling 2D and 3D priors allows us to refine 3D geometric priors further
- DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
- diffusion guides the 3D generator finetuning with informative direction
- CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
- Convolutional Reconstruction Model (CRM), feed-forward single image-to-3D generative
- triplane exhibits spatial correspondence of six orthographic images
- Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
- modulate the output of the 2D diffusion model to the correct patterns of the template views
- ISOTROPIC3D
- Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
- encodes 3D models into a compact triplane latent space; 7 seconds diffusion
7.3.1.1. 3D ADDED TO 2D PRIOR
- Retrieval-Augmented Score Distillation for Text-to-3D Generation
=best=
- adapt the diffusion model’s 2D prior toward view consistency
- added controllability and negligible training cost
7.3.1.2. VIEWDIFF
- ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
- autoregressive 3D-consistent images at any viewpoint
- integrate 3D volume-rendering and cross-frame-attention layers into each block of sd
- the prior could instead be an animation prior? since puts attention on the batch
- 3d consistent space
7.3.2. FAST
- Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
=best, fast=
- generates 4 views and then regresses a NeRF from them
- Instant3D: Instant Text-to-3D Generation
- One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
- multi-view image generation, then to 3D using multi-view 3D native diffusion models
- MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture (2 stages, within 20 minutes)
8. MESH NERF
- parent: nerf
- A Unified Framework for Surface Reconstruction, 3d from nerf
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures; latent nerf
- AutoDecoding Latent 3D Diffusion Models, view-consistent appearance and geometry
- SweetDreamer converts 2D drawings from certain models into 3D by fixing geometry-related issues
- SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
- regularization term that encourages the gaussians to align well with the surface of the scene
- and binds them which enables easy editing, sculpting, rigging, animating, compositing and relighting
- NERF FROM TEXT LUCIDDREAMER