domain
Table of Contents
1. ARCHITECTURE
1.1. PHYSICS - CHEMICAL - PARTICLES
- Complex Physics with Graph Networks https://arxiv.org/pdf/2002.09405
- PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields
- Scaling Spherical CNNs: vs graph neural network, molecular
- better than the spectral domain through the convolution theorem
1.1.1. POINT CLOUD
1.1.1.1. PIXEL ALIGNMENT
1.1.1.2. POINT CLOUD DIFFUSION
- 3D molecule generation by denoising voxel grids
- diffusion model applied to atom point clouds
- DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model
- divide the noisy point clouds into irregular patches, target points based on input images
- PointInfinity: Resolution-Invariant Point Diffusion Models
- efficient training low-resolution point clouds, allowing high-resolution generated during inference
- transformer-based architecture with a fixed-size, resolution-invariant latent representation
1.2. BIOLOGY - ANALOG
- biology inspired AI: genes or local-context-evolution https://youtu.be/vf18FLdKkY4 CPPN algorithm
- next paper: hyper http://axon.cs.byu.edu/~dan/778/papers/NeuroEvolution/stanley3**.pdf
- continuation: http://eplex.cs.ucf.edu/ESHyperNEAT/
- neat algorithm: https://youtu.be/3nbvrrdymF0
- next paper: hyper http://axon.cs.byu.edu/~dan/778/papers/NeuroEvolution/stanley3**.pdf
- The AI Epiphany: NeuroEvolution of Augmenting Topologies (NEAT) and Compositional Pattern Producing Networks (CPPN)
- Forward-Forward (vs Backpropagation) analog computers
1.3. ENERGY BASED
- JEPA - https://youtu.be/jSdHmImyUjk
- Self-Supervised Learning, Energy-Based Models, and hierarchical predictive
- the encoder ignoring useless information
- https://openreview.net/forum?id=BZ5a1r-kVsf
- Self-Supervised Learning, Energy-Based Models, and hierarchical predictive
- Energy Transformer more efficient, electric
- transformers without skip connections or normalisation layers https://arxiv.org/pdf/2302.10322.pdf
- Conformers: local and global attention
1.4. OTHER ALTERNATIVES
2. APPLICATION
- NERF ALIKES representations for video-image
- supervision: time objects spend in the zone
- like employees at their table, cars at parking lot
2.1. UI
- Interactive Garment Recommendation with User in the Loop (algorithm)
- ingesting user feedback so to improve its recommendations and maximize user satisfaction
2.1.1. OS CONTROL
- OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
- strong generalization to unseen applications via accumulated skills from previous tasks
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- accomplish complex computer tasks with minimal human intervention
- multimodal agents
2.1.1.1. WEBSITE CONTROL
- open-source Rabbit
- WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
- interacting with real-world websites
:ID: 62b0c837-182d-4d2c-9289-ce7259330e08
2.1.2. DRAGGING UI
- dragGan, DRAG, DAG
- DRAGTEX mesh
- DRAGANYTHING motion control in video
2.1.3. HAPTIC
- Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation
- tracking and reconstruction of novel objects for in-hand manipulation
- MACS: Mass Conditioned 3D Hand and Object Motion Synthesis
- improve naturalness of the synthesized 3D hand object motions
- generalize to unseen masses
- CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
- simulated human demonstrations for real-world tasks
2.1.4. STORYTELLING
- SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
- takes a graph representing plot points and creates bridges between plot points
- OPEN-VOCABULARY STORYTELLING CAPTIONING
- STORYTELLER DIFFUSION
- CARTOON INTO MANGA COLORIZATION
- LAYOUT LLM LAYOUT DIFFUSION
2.2. FACE
2.2.1. FACE RECOGNITION
- PAM: A Parallel Attention Network for Cattle Face Recognition
- focuses on local and global features
- for animal husbandry and behavioral research
2.2.2. FACE SWAP
- Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms
- novel loss equation for the training of face swapping models
2.2.3. EMOTIONS
- A Unified and Interpretable Emotion Representation and Expression Generation
- compound emotions
2.3. SPEECH RECOGNITION
- CAPTIONING
- Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
- unpaired text-only data used to enhance paired audio-text data
- to detect turns
- VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
2.4. GEOGRAPHY
- CityDreamer: Compositional Generative Model of Unbounded 3D Cities (imagining map layout city)
- GENERATE BLENDER
2.4.1. NEURAL MAPPING
- Active Neural Mapping; scene reconstruction, gain knowledge of the environment
- Doppelgangers: Learning to Disambiguate Images of Similar Structures
- can distinguish illusory matches in difficult cases, then spatial distribute local keypoints