domain

1. ARCHITECTURE
2. APPLICATION

1. ARCHITECTURE

1.1. PHYSICS - CHEMICAL - PARTICLES

Complex Physics with Graph Networks https://arxiv.org/pdf/2002.09405
PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields
Scaling Spherical CNNs: vs graph neural network, molecular
- better than the spectral domain through the convolution theorem

1.1.1. POINT CLOUD

SUPERPRIMITIVE

1.1.1.1. PIXEL ALIGNMENT

DUSt3R: Geometric 3D Vision Made Easy
- global alignment of pixels from sparse views, no need for camera position

1.1.1.2. POINT CLOUD DIFFUSION

3D molecule generation by denoising voxel grids
- diffusion model applied to atom point clouds
DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model
- divide the noisy point clouds into irregular patches, target points based on input images
PointInfinity: Resolution-Invariant Point Diffusion Models
- efficient training low-resolution point clouds, allowing high-resolution generated during inference
- transformer-based architecture with a fixed-size, resolution-invariant latent representation

1.2. BIOLOGY - ANALOG

biology inspired AI: genes or local-context-evolution https://youtu.be/vf18FLdKkY4 CPPN algorithm
- next paper: hyper http://axon.cs.byu.edu/~dan/778/papers/NeuroEvolution/stanley3**.pdf
  - continuation: http://eplex.cs.ucf.edu/ESHyperNEAT/
- neat algorithm: https://youtu.be/3nbvrrdymF0
The AI Epiphany: NeuroEvolution of Augmenting Topologies (NEAT) and Compositional Pattern Producing Networks (CPPN)
Forward-Forward (vs Backpropagation) analog computers

1.3. ENERGY BASED

JEPA - https://youtu.be/jSdHmImyUjk
- Self-Supervised Learning, Energy-Based Models, and hierarchical predictive
  - the encoder ignoring useless information
- https://openreview.net/forum?id=BZ5a1r-kVsf
Energy Transformer more efficient, electric
- transformers without skip connections or normalisation layers https://arxiv.org/pdf/2302.10322.pdf
- Conformers: local and global attention

1.4. OTHER ALTERNATIVES

2. APPLICATION

NERF ALIKES representations for video-image
supervision: time objects spend in the zone
- like employees at their table, cars at parking lot

2.1. UI

Interactive Garment Recommendation with User in the Loop (algorithm)
- ingesting user feedback so to improve its recommendations and maximize user satisfaction

2.1.1. OS CONTROL

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
- strong generalization to unseen applications via accumulated skills from previous tasks
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- accomplish complex computer tasks with minimal human intervention
- multimodal agents

2.1.1.1. WEBSITE CONTROL

open-source Rabbit
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
- interacting with real-world websites

:ID: 62b0c837-182d-4d2c-9289-ce7259330e08

2.1.2. DRAGGING UI

dragGan, DRAG, DAG
DRAGTEX mesh
DRAGANYTHING motion control in video

2.1.3. HAPTIC

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation
- tracking and reconstruction of novel objects for in-hand manipulation
MACS: Mass Conditioned 3D Hand and Object Motion Synthesis
- improve naturalness of the synthesized 3D hand object motions
- generalize to unseen masses
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
- simulated human demonstrations for real-world tasks

2.1.4. STORYTELLING

SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
- takes a graph representing plot points and creates bridges between plot points
- OPEN-VOCABULARY STORYTELLING CAPTIONING
- STORYTELLER DIFFUSION
- CARTOON INTO MANGA COLORIZATION
- LAYOUT LLM LAYOUT DIFFUSION

2.2. FACE

2.2.1. FACE RECOGNITION

PAM: A Parallel Attention Network for Cattle Face Recognition
- focuses on local and global features
- for animal husbandry and behavioral research

2.2.2. FACE SWAP

Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms
- novel loss equation for the training of face swapping models

2.2.3. EMOTIONS

A Unified and Interpretable Emotion Representation and Expression Generation
- compound emotions

2.3. SPEECH RECOGNITION

CAPTIONING
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
- unpaired text-only data used to enhance paired audio-text data
- to detect turns
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

2.4. GEOGRAPHY

CityDreamer: Compositional Generative Model of Unbounded 3D Cities (imagining map layout city)
GENERATE BLENDER

2.4.1. NEURAL MAPPING

Active Neural Mapping; scene reconstruction, gain knowledge of the environment
Doppelgangers: Learning to Disambiguate Images of Similar Structures
- can distinguish illusory matches in difficult cases, then spatial distribute local keypoints

domain

Table of Contents

1. ARCHITECTURE

1.1. PHYSICS - CHEMICAL - PARTICLES

1.1.1. POINT CLOUD

1.1.1.1. PIXEL ALIGNMENT

1.1.1.2. POINT CLOUD DIFFUSION

1.2. BIOLOGY - ANALOG

1.3. ENERGY BASED

1.4. OTHER ALTERNATIVES

2. APPLICATION

2.1. UI

2.1.1. OS CONTROL

2.1.1.1. WEBSITE CONTROL

2.1.2. DRAGGING UI

2.1.3. HAPTIC

2.1.4. STORYTELLING

2.2. FACE

2.2.1. FACE RECOGNITION

2.2.2. FACE SWAP

2.2.3. EMOTIONS

2.3. SPEECH RECOGNITION

2.4. GEOGRAPHY

2.4.1. NEURAL MAPPING