diffusion train

Table of Contents

1. RLCM

  • RL for Consistency Models, Faster Reward Guided Text-to-Image Generation
    • to optimize for task specific rewards, enable fast training-inference, we propose fine-tuning via RL
    • Reinforcement Learning for Consistency Model (RLCM)
    • objectives challenging with prompting, like image compressibility and human feedback

2. IDEAS

2.1. REMEMBER

  • LP-DiF: Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning
    • continuously learn new classes without forgetting old ones
  • S2-DMs:Skip-Step Diffusion Models
    • new training method, Lskip, designed to reintegrate omitted info during the selective sampling phase

2.2. BEFORE-AFTER

  • Switch EMA: A Free Lunch for Better Flatness and Sharpness
    • switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA)
    • free lunch by boosting convergence speeds
  • Rolling Diffusion Model (VIDEO)
    • a sliding window denoising process
    • more noise to frames that appear later in a sequence

2.3. ONLY ONCE

  • Fixed Point Diffusion Models
    • reallocating computation across timesteps and reusing fixed point solutions between timesteps
    • 87% fewer parameters, consumes 60% less training memory
  • Analyzing and Improving the Training Dynamics of Diffusion Models
    • redesigned, so better networks at equal computational complexity
    • precise tuning of EMA length without the cost of performing several training runs

2.4. CONTEXT

  • ConPreDiff: Improving Diffusion-Based Image Synthesis with Context Prediction (better zeroshot)
  • Any-Shift Prompting for Generalization over Distributions
    • encode the distribution information and their relationships
      • guide the generalization of the CLIP image-language model from training to any test distribution

3. PRIORS

3.1. STRUCTURE

  • 10.1
    • Structure Preserving Diffusion Models
      • result: if you rotate the input, the output also rotates unharmed; learn structures

3.2. VAE TRAINING

  • Deconstructing Denoising Diffusion Models for Self-Supervised Learning
    • gradually transforming a Denoising Diffusion Models (DDM) into a classical Denoising Autoencoder (DAE) (VAE)
  • FLAWED, The VAE used for Stable Diffusion 1.x/2.x and other models (KL-F8) has a critical flaw, probably due to bad training, needs a new trained from scratch like SDXL =best=
    • the encoder is having to do a lot of extra work to get around the bad latent space

3.3. 3D INCORPORATED

4. DISTRIBUTED TRAINING

5. DIFFUSION QUANTIZATION

  • 4, 8 bit models, Q-Diffusion insight reddit quantization
    • Memory-Efficient Personalization using Quantized Diffusion Model (enhancing it)
  • Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models
    • align outputs of the quantized model and the full-precision model at different network granularity
  • QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
    • finetuning the quantized model to better adapt to the activation distribution (mitigation)
  • Task-Oriented Diffusion Model Compression
    • satisfactory output quality with 39.2% and 56.4% reduction in model footprint and 81.4% and 68.7%
    • applying it to InstructPix2Pix and StableSR

6. ACADEMIC

  • GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES
    • transfer knowledge between teacher to student model
  • Idempotent Generative Network
    • f(f(z))=f(z), can generate an output in one step
    • step towards a “global projector” = projecting any input into a target data distribution

7. CLIP RELATED

  • uform: clip not required, trained in a day
  • cloneofsimo: learning from the clip
    • wanna perform affordable kernel regression on l2-normalized data?
      • get yourself Spherical Random Features for Polynomial Kernels
      • relevant if you are aiming for large scale non-parametric regression on CLIP projected feature spaces

8. CHEAPER TRAINING

  • Efficient Diffusion Training via Min-SNR Weighting Strategy
    • slow convergence due to conflicting optimization directions between timesteps, 3.4 times faster
  • Imagen suggests that scaling the text encoder is much more impactful than scaling the UNet
    • at least for diffusion models
  • mosaiclml: custom $50k stable diffusion training, reddit post
  • Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
  • compressed-stable-diffusion 36% reduced parameters and latency
  • Wuerstchen: Efficient Pretraining of Text-to-Image Models
    • 16 times faster to train, 2 times faster inference, , only 9200 GPU hours (42 time compression rate vs 8 of sd)
  • DREAM: Diffusion Rectification and Estimation-Adaptive Models (requiring minimal code changes)
    • 2 to 3 times faster training convergence
  • PERCEPTUAL LOSS best=

8.1. FASTER TOO

  • LCM =best=
  • UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
    • integrating diffusion with a GAN objective, one step

9. DIFFERENT ARCHITECTURE

10. DATASET MANIPULATION

  • Shifted Diffusion =Corgi= for Text-to-image Generation: from clip straight to diffusion, =only 1.7 of the images required captions=
  • Object Detection: CutLER
  • D3S: Invariant Learning via Diffusion Dreamed Distribution Shifts, separating foreground-background
    • disentangling foreground from background by chopping-pasting them out in the synthetic training dataset
    • like SVDiff
  • A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
    • automatic captioning is better than crawled low quality captions
    • CapsFusion: Rethinking Image-Text Data at Scale
      • hindered by simplistic captioners, consolidate and refine information

10.1. BATCH STRUCTURE

  • Structure-Guided Adversarial Training of Diffusion Models
    • compel the model to learn manifold structures between samples in each training batch

10.1.1. ATLAS

10.2. MASKS

11. MATHEMATICAL (COPY PASTED COMMENT YET TO ANALYZE)

I have recently written a paper on understanding transformer learning via the lens of coinduction & Hopf algebra. https://arxiv.org/abs/2302.01834

The learning mechanism of transformer models was poorly understood however it turns out that a transformer is like a circuit with a feedback.

I argue that autodiff can be replaced with what I call in the paper Hopf coherence which happens within the single layer as opposed to across the whole graph.

Furthermore, if we view transformers as Hopf algebras, one can bring convolutional models, diffusion models and transformers under a single umbrella.

I’m working on a next gen Hopf algebra based machine learning framework.

Join my discord if you want to discuss this further https://discord.gg/mr9TAhpyBW

Author: Tekakutli

Created: 2024-04-13 Sat 04:35