diffusion train

1. RLCM
2. IDEAS
3. PRIORS
4. DISTRIBUTED TRAINING
5. DIFFUSION QUANTIZATION
6. ACADEMIC
7. CLIP RELATED
8. CHEAPER TRAINING
- 8.1. FASTER TOO
9. DIFFERENT ARCHITECTURE
10. DATASET MANIPULATION
- 10.1. BATCH STRUCTURE
  - 10.1.1. ATLAS
- 10.2. MASKS
11. MATHEMATICAL (COPY PASTED COMMENT YET TO ANALYZE)

parent: stable_diffusion train
BETTER DECODER blue noise: NOISE CONTROL
400x (and use vae leafing to make big)
Diffusers Compatible SDXL Unet Rewrite (520 lines)
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
- scaling the coefficients of LSC(which connect distant blocks) in UNet to improve training stability of UNet
Cas-DM: Bring Metric Functions into Diffusion Models (incorporating additional metric functions, objectives)
Quantum Denoising Diffusion Models
- explores integrating variational quantum circuits to augment efficacy of diffusion
MPI: Masked Pre-trained Model Enables Universal Zero-shot Denoiser
- spontaneously attains the underlying potential for strong image denoising
Simplified Diffusion Schrödinger Bridge
- simplification of the Diffusion Schrödinger Bridge (DSB) that facilitates its unification with Score-based Generative Models (SGMs)

1. RLCM

RL for Consistency Models, Faster Reward Guided Text-to-Image Generation
- to optimize for task specific rewards, enable fast training-inference, we propose fine-tuning via RL
- Reinforcement Learning for Consistency Model (RLCM)
- objectives challenging with prompting, like image compressibility and human feedback

2. IDEAS

2.1. REMEMBER

LP-DiF: Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning
- continuously learn new classes without forgetting old ones
S2-DMs:Skip-Step Diffusion Models
- new training method, Lskip, designed to reintegrate omitted info during the selective sampling phase

2.2. BEFORE-AFTER

Switch EMA: A Free Lunch for Better Flatness and Sharpness
- switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA)
- free lunch by boosting convergence speeds
Rolling Diffusion Model (VIDEO)
- a sliding window denoising process
- more noise to frames that appear later in a sequence

2.3. ONLY ONCE

Fixed Point Diffusion Models
- reallocating computation across timesteps and reusing fixed point solutions between timesteps
- 87% fewer parameters, consumes 60% less training memory
Analyzing and Improving the Training Dynamics of Diffusion Models
- redesigned, so better networks at equal computational complexity
- precise tuning of EMA length without the cost of performing several training runs

2.4. CONTEXT

ConPreDiff: Improving Diffusion-Based Image Synthesis with Context Prediction (better zeroshot)
Any-Shift Prompting for Generalization over Distributions
- encode the distribution information and their relationships
  - guide the generalization of the CLIP image-language model from training to any test distribution

3. PRIORS

3.1. STRUCTURE

10.1
- Structure Preserving Diffusion Models
  - result: if you rotate the input, the output also rotates unharmed; learn structures

3.2. VAE TRAINING

Deconstructing Denoising Diffusion Models for Self-Supervised Learning
- gradually transforming a Denoising Diffusion Models (DDM) into a classical Denoising Autoencoder (DAE) (VAE)
FLAWED, The VAE used for Stable Diffusion 1.x/2.x and other models (KL-F8) has a critical flaw, probably due to bad training, needs a new trained from scratch like SDXL =best=
- the encoder is having to do a lot of extra work to get around the bad latent space

3.3. 3D INCORPORATED

GIBR

4. DISTRIBUTED TRAINING

distributed-diffusion using hivemind (distributed training) vs Deepspeed
COMPOSITIONAL DIFFUSION
SiT discrete transformers

5. DIFFUSION QUANTIZATION

4, 8 bit models, Q-Diffusion insight reddit quantization
- Memory-Efficient Personalization using Quantized Diffusion Model (enhancing it)
Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models
- align outputs of the quantized model and the full-precision model at different network granularity
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
- finetuning the quantized model to better adapt to the activation distribution (mitigation)
Task-Oriented Diffusion Model Compression
- satisfactory output quality with 39.2% and 56.4% reduction in model footprint and 81.4% and 68.7%
- applying it to InstructPix2Pix and StableSR

6. ACADEMIC

GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES
- transfer knowledge between teacher to student model
Idempotent Generative Network
- f(f(z))=f(z), can generate an output in one step
- step towards a “global projector” = projecting any input into a target data distribution

7. CLIP RELATED

uform: clip not required, trained in a day
cloneofsimo: learning from the clip
- wanna perform affordable kernel regression on l2-normalized data?
  - get yourself Spherical Random Features for Polynomial Kernels
  - relevant if you are aiming for large scale non-parametric regression on CLIP projected feature spaces

8. CHEAPER TRAINING

Efficient Diffusion Training via Min-SNR Weighting Strategy
- slow convergence due to conflicting optimization directions between timesteps, 3.4 times faster
Imagen suggests that scaling the text encoder is much more impactful than scaling the UNet
- at least for diffusion models
mosaiclml: custom $50k stable diffusion training, reddit post
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
compressed-stable-diffusion 36% reduced parameters and latency
Wuerstchen: Efficient Pretraining of Text-to-Image Models
- 16 times faster to train, 2 times faster inference, , only 9200 GPU hours (42 time compression rate vs 8 of sd)
DREAM: Diffusion Rectification and Estimation-Adaptive Models (requiring minimal code changes)
- 2 to 3 times faster training convergence
PERCEPTUAL LOSS best=

8.1. FASTER TOO

LCM =best=
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
- integrating diffusion with a GAN objective, one step

9. DIFFERENT ARCHITECTURE

faster using electric flow-charges https://www.assemblyai.com/blog/an-introduction-to-poisson-flow-generative-models/
- better than inference: https://twitter.com/_akhaliq/status/1620958983639924736 https://arxiv.org/pdf/2302.00482.pdf
Spectral Diffusion: slim Standard Diffusion, 20 times smaller in size
- Wavelet diffusion code
  - Wavelet Diffusion Models are fast and scalable Image Generators
Score-Based Diffusion Models as Principled Priors for Inverse Imaging (more complex priors)
COMPOSITIONAL DIFFUSION DIFFUSION TRANSFORMER

10. DATASET MANIPULATION

Shifted Diffusion =Corgi= for Text-to-image Generation: from clip straight to diffusion, =only 1.7 of the images required captions=
Object Detection: CutLER
D3S: Invariant Learning via Diffusion Dreamed Distribution Shifts, separating foreground-background
- disentangling foreground from background by chopping-pasting them out in the synthetic training dataset
- like SVDiff
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
- automatic captioning is better than crawled low quality captions
- CapsFusion: Rethinking Image-Text Data at Scale
  - hindered by simplistic captioners, consolidate and refine information

10.1. BATCH STRUCTURE

Structure-Guided Adversarial Training of Diffusion Models
- compel the model to learn manifold structures between samples in each training batch

10.1.1. ATLAS

IMAGE CLUSTERING
Neural Congealing: Aligning Images to a Joint Semantic Atlas
- zeroshot leaning concept-shapes
- ASIC: Aligning Sparse in-the-wild Image Collections
Ablating Concepts in Text-to-Image Diffusion Models (adobe)

10.2. MASKS

masking to accelerate learning VQ-Diffusion https://arxiv.org/pdf/2111.14822.pdf
DeepMIM: Deep Supervision for Masked Image Modeling
- pre-trains a Vision Transformer (ViT) via a mask-and-predict scheme.
MDT: Masked Diffusion Transformer (3 times faster)
Predicting masked tokens in stochastic locations improves masked image modeling
- learning features that are more robust to location uncertainties; Masked Image Modeling (MIM)

11. MATHEMATICAL (COPY PASTED COMMENT YET TO ANALYZE)

I have recently written a paper on understanding transformer learning via the lens of coinduction & Hopf algebra. https://arxiv.org/abs/2302.01834

The learning mechanism of transformer models was poorly understood however it turns out that a transformer is like a circuit with a feedback.

I argue that autodiff can be replaced with what I call in the paper Hopf coherence which happens within the single layer as opposed to across the whole graph.

Furthermore, if we view transformers as Hopf algebras, one can bring convolutional models, diffusion models and transformers under a single umbrella.

I’m working on a next gen Hopf algebra based machine learning framework.

Join my discord if you want to discuss this further https://discord.gg/mr9TAhpyBW