segmentation

Table of Contents

1. TARGET-ING

  • Materialistic: Selecting Similar Materials in Images
  • Background Prompting for Improved Object Depth
    • learned background prompt, thus focuses in the object
  • LISA: Reasoning Segmentation via Large Language Model
    • Language Instructed Segmentation Assistant, speak to it and it segments
  • SegGPT: Segmenting Everything In Context
    • Painter & SegGPT Series: Vision Foundation Models from BAAI (radiography components, top of box)
  • Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
    • clip can perform zero-shot open-vocabulary segmentation; probability-like experiance
  • CartoonSegmentation: Instance-guided Cartoon Editing with a Large-scale Dataset (anime fine details) =best=

1.1. OBJECT DETECTION

  • Tracking Any Object Amodally
    • comprehend complete objects from partial visibility; boxes for occluded objects

1.1.1. CUTLER

  • CutLer: object detection and segmentator
    • Detecting censors with deep learning and computer vision; location (to later inpaint over them)
  • U2Seg: Unsupervised Universal Image Segmentation (vs CutLER) =best=
    • clustering of seudo semantic labels

1.1.2. CONTROLNET FOR 3D

  • 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
    • finetune(controlnet) 2d diffusion to perform novel view synthesis from a single image (using epipolar warp operator) =best=
    • 3D detection and identifying cross-view point correspondences

1.1.3. NERF SEGMENTATION

  • NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
    • indoor 3D detection(and depth) with images as input; unseen scenes, without requiring per-scene optimization
  • EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
    • captures scene geometry, appearance, motion, represent highly-dynamic scenes self-sufficiently
  • SAGA: Segment Any 3D Gaussians
    • multi-granularity segmentation, instantaneous(unlike SA3D)
  • GARField: Group Anything with Radiance Fields
    • use sam 2D masks, coarse-to-fine hierarchy

2. SAM

2.1. FASTER

  • Fast Segment Anything, 40ms per image PyPI
  • EfficientSAM: 20x fewer parameters and 20x faster runtime
  • SlimSam: 0.1% Data Makes Segment Anything Slim
    • 0.9%(5.7M) parameters, 0.1% data
  • TinySAM: Pushing the Envelope for Efficient Segment Anything Model
    • knowledge distillation to distill a lightweight student model

2.2. VIDEOS

  • segment videos https://github.com/gaomingqi/Track-Anything
  • Tracking Anything with Decoupled Video Segmentation
  • Video Instance Matting
    • estimating each instance at each frame of a video sequence
  • UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
    • unify four reference-based object segmentation tasks with a single architecture (box, area from prompt)
  • Lester: rotoscope animation through video object segmentation and tracking
    • mask and track across frames

2.3. USE CASES

2.3.1. UNDERSTANDING

  • RelateAnything: see relationships between them
  • Osprey: Pixel Understanding with Visual Instruction Tuning Understand everything for SAM
    • click on and get description of cluster of pixels

2.3.2. FOLLOW AREA

  • Segment Anything Meets Point Tracking, follow pixels, OPTICAL FLOW
  • DreamTeacher: Pretraining Image Backbones with Deep Generative Models
    • following 3d concepts with 3d understanding

3. DIFFUSION SEGMENTATION

  • parent: stablediffusion
  • SLIME
  • Diffusion Models as Masked Autoencoders
  • ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models.
  • Diffusion Models for Zero-Shot Open-Vocabulary Segmentation (considers the contextual background)
  • MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
    • generate synthetic labeled data, for rare and novel categories to then teach segmentation
  • FIND: Interface Foundation Models’ Embeddings
    • segment and correlate to prompt token
  • SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process =best=
    • augment the segmentation accuracy by denoising it (exceedingly fine details)
  • EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
    • identifies correspondences between pixels and latent space features
  • FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
    • through a diffusion model and an image captioner model
      • both frozen

3.1. 3D SD SEG

  • 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
    • synthesis conditioned on a single image using epipolar warp operator
    • 3D-aware features for 3D detection identifying cross-view point correspondences

4. AUDIO

  • AudioSep: Separate Anything You Describe, Separate Anything Audio Model

5. 3D SEGMENATION

  • 3.1 1.1.3 LIFT3D
  • Segment Anything in 3D with NeRFs (SA3D)
    • SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
  • SAD is able to perform 3D segmentation (segment out any 3D object) with RGBD inputs
  • VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking (convnext)
    • predict objects directly upon sparse voxel features
      • no sparse-to-dense conversion, anchors, or center proxies needed anymore
    • use: 2D segmentation mask into 3D boxes: code
  • EgoLifter: Open-world 3D Segmentation for Egocentric Perception
    • segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects
  • iSeg: Interactive 3D Segmentation via Interactive Attention
    • based on clicking, positive and negative clicks directly on the shape’s surface

5.1. SUPERPRIMITIVE

  • into point cloud:
    • SuperPrimitive: Scene Reconstruction at a Primitive Level
      • splitting images into semantically correlated local regions, then enhancing with normals
      • for tasks: depth completion(per pixel), few-view structure from motion, and monocular dense visual odometry(get pov angles)

5.2. GAUSSIAN

  • LangSplat: 3D Language Gaussian Splatting
    • ground CLIP features into 3D language Gaussians, faster than LERF
  • SA-GS: Segment Anything in 3D Gaussians
    • without any training process and learned parameters

6. OPTICAL FLOW

  • RAFT: Recurrent All-Pairs Field Transforms for Optical Flow (video optical flow)
    • OmniMotion: Tracking Everything Everywhere All at Once (following pixels, optical flow)
    • INVE: Interactive Neural Video Editing; painting pixels, then following them
  • Tracking Anything in High Quality
    • pretrained MR model is employed to refine the tracking result
  • CoTracker: models correlation of the points in time, using attention
    • can track every pixel or selected
  • generate rainbow visualizations from a set of point tracks
  • SpatialTracker: Tracking Any 2D Pixels in 3D Space
    • dealing with occlusions and discontinuities in 2d, mitigate the issues caused by image projection
      • using monocular depth estimators
  • 2.3.2

6.1. DIFFUSION OPTICAL FLOW

  • parent: diffusion
  • The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

7. FINETUNING

  • ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
    • freezing model parameters, fine-tuning a small set of prompt embeddings
      • addressing both catastrophic forgetting and plasticity
        • significantly reducing the trainable parameters

Author: Tekakutli

Created: 2024-04-13 Sat 04:35