domain

Table of Contents

1. ARCHITECTURE

1.1. PHYSICS - CHEMICAL - PARTICLES

  • Complex Physics with Graph Networks https://arxiv.org/pdf/2002.09405
  • PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields
  • Scaling Spherical CNNs: vs graph neural network, molecular
    • better than the spectral domain through the convolution theorem

1.1.1. POINT CLOUD

1.1.1.1. PIXEL ALIGNMENT
  • DUSt3R: Geometric 3D Vision Made Easy
    • global alignment of pixels from sparse views, no need for camera position
1.1.1.2. POINT CLOUD DIFFUSION
  • 3D molecule generation by denoising voxel grids
    • diffusion model applied to atom point clouds
  • DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model
    • divide the noisy point clouds into irregular patches, target points based on input images
  • PointInfinity: Resolution-Invariant Point Diffusion Models
    • efficient training low-resolution point clouds, allowing high-resolution generated during inference
    • transformer-based architecture with a fixed-size, resolution-invariant latent representation

1.2. BIOLOGY - ANALOG

1.3. ENERGY BASED

1.4. OTHER ALTERNATIVES

2. APPLICATION

  • NERF ALIKES representations for video-image
  • supervision: time objects spend in the zone
    • like employees at their table, cars at parking lot

2.1. UI

  • Interactive Garment Recommendation with User in the Loop (algorithm)
    • ingesting user feedback so to improve its recommendations and maximize user satisfaction

2.1.1. OS CONTROL

  • OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
    • strong generalization to unseen applications via accumulated skills from previous tasks
  • OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
    • accomplish complex computer tasks with minimal human intervention
    • multimodal agents
2.1.1.1. WEBSITE CONTROL
  • open-source Rabbit
  • WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
    • interacting with real-world websites

:ID: 62b0c837-182d-4d2c-9289-ce7259330e08

2.1.2. DRAGGING UI

2.1.3. HAPTIC

  • Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation
    • tracking and reconstruction of novel objects for in-hand manipulation
  • MACS: Mass Conditioned 3D Hand and Object Motion Synthesis
    • improve naturalness of the synthesized 3D hand object motions
    • generalize to unseen masses
  • CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
    • simulated human demonstrations for real-world tasks

2.1.4. STORYTELLING

2.2. FACE

2.2.1. FACE RECOGNITION

  • PAM: A Parallel Attention Network for Cattle Face Recognition
    • focuses on local and global features
    • for animal husbandry and behavioral research

2.2.2. FACE SWAP

  • Towards mitigating uncann(eye)ness in face swaps via gaze-centric loss terms
    • novel loss equation for the training of face swapping models

2.2.3. EMOTIONS

  • A Unified and Interpretable Emotion Representation and Expression Generation
    • compound emotions

2.3. SPEECH RECOGNITION

  • CAPTIONING
  • Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
    • unpaired text-only data used to enhance paired audio-text data
    • to detect turns
  • VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

2.4. GEOGRAPHY

2.4.1. NEURAL MAPPING

  • Active Neural Mapping; scene reconstruction, gain knowledge of the environment
  • Doppelgangers: Learning to Disambiguate Images of Similar Structures
    • can distinguish illusory matches in difficult cases, then spatial distribute local keypoints

Author: Tekakutli

Created: 2024-04-13 Sat 04:35