software

1. IMAGE GENERATION
2. OTHERS VISUAL
3. TEXT
4. VOICE GENERATION
- 4.1. REALTIME

CUDA built on top of ROCm by AMD

1. IMAGE GENERATION

parent: stable_diffusion
upscale model database: https://openmodeldb.info/ old version site

1.3. SIDE TOOLS

generate: sprites age of empires
real time painting(diffusing while painting): https://github.com/houseofsecrets/SdPaint
SVGcode: Convert color bitmap images to color SVG vector images, auto version vector graphics
read the stable diffusion stored(the prompts) metadata from images: prompt-reader
inpainter (watermark remover) https://github.com/advimman/lama huggingface
DiffusionToolkit: Metadata-indexer and Viewer for generated images
RatioScope: bucketing images for dataset

1.3.1. FACE GENERATION - SWAP

loraless
- FaceChain: now two images = any angle, sd-webui support
FaceSwapLab: stable diffusion, webui
an IP Adapter face model
FaceFusion: face swapper and enhancer for video =best=
Roop: High-performance face swapper (Opal)
1.8.4.12.2

1.4. HELPERS

DeepBump: generate normal & height maps from single pictures
ProPainter: removal for videos using metaclip and sam
VCHITECT
supervision: people-object trakers ready to use (reusable computer vision tools) employees at the zone

1.4.1. 2SKETCH, TO SKETCH

Anime2Sketch: generate sketch from anime image
Stylized Face Sketch Extraction via Generative Prior with Limited Data
- generate sketch from image, using sketch-image-example to set style

1.4.2. STORYTELLING CAPTIONING

The Manga Whisperer: Automatically Generating Transcriptions for Comics (magi) storytelling, storyboard
- generate a transcript, ocr, order panels, cluster characters

1.4.3. OPENPOSE EDITOR

online 3d openpose editor: https://zhuyu1997.github.io/open-pose-editor/
PMX model - MMD(mikumikudance): https://civitai.com/models/28916/openpose-pmx-model-mmd
blender https://toyxyz.gumroad.com/l/ciojz
OpenPose Man lora: https://civitai.com/models/76718
Stable SegMap: using unity on web

1.4.4. TAGGER

1.4.4.1. WD TAGGER

1.4.4.2. CAPTIONING MODELS

CogVLM and Moonshot2 both are insanely good at captioning
Qwen-VL-Max #1, THUDM/cogagent-vqa-hf #2, liuhaotian/llava-v1.6-vicuna-13b #3.
taggui for cog - https://github.com/jhc13/taggui/releases
For llava 1.6 - https://github.com/DEVAIEXP/image-interrogator
Qwen-VL-Max - https://huggingface.co/spaces/Qwen/Qwen-VL-Max
1.8.4.12.3

UNLABELED
- https://github.com/autodistill/autodistill

1.4.5. DETECTORS (COMPUTER VISION)

https://github.com/hysts/anime-face-detector

1.4.6. ANTI GLOW

Nightshade Antidote: remove poison pill from image

1.5. UIs

1.5.1. FRONT-ENDS

ComfyUI
- comfyui but from python-emacs: https://codeberg.org/tekakutli/apapach
- StableSwarmUI: making comfyui easily accessible
Auto1111 Webui
Fooocus
- Focus on prompting and generating, manual tweaking is not needed
- Refocus: Alternative ui for Fooocus
ENFUGUE: Stable Diffusion web app
Chibi: comfyui gui in Vue

1.5.1.1. MAKE YOUR GUI

https://github.com/space-nuko/ComfyBox

1.5.1.2. CODE

Diffusers (python pipelines): https://huggingface.co/docs/diffusers/index
- https://github.com/ddPn08/Radiata
  - Stable diffusion webui based on diffusers
nodejs: https://github.com/dakenf/stable-diffusion-nodejs
fastembed: lightweight Python for embedding generation

1.5.1.3. CPU

ggml: inference in pure c/c++ (interoperability, no python dependency hell)
- https://github.com/leejet/stable-diffusion.cpp
- Running Stable Diffusion XL 1.0 in 298MB of RAM (Raspberry Pi Zero 2)
  - OnnxStream consumes 55x less memory than OnnxRuntime while being only 0.5-2x slower
FastSD CPU: Faster version of stable diffusion running on CPU
- FastSD CPU beta 16 release with 2 steps fast inference

1.5.1.4. FASTER

Stable-Fast: on NVIDIA GPU
ComfyUI-AIT: faster inference using cpp/cuda.

1.6. TRAINING

LoRA training extention for Web-UI
kohya training scripts: https://github.com/kohya-ss/sd-scripts
- https://github.com/bmaltais/kohya_ss
SCEPTER: training, fine-tuning, and inference with generative models
OneTrainer: one-stop solution for all your stable diffusion training needs

1.6.1. FINETUNING

SimpleTuner: fine-tuning kit geared toward Stable Diffusion 2.1 and SDXL
- StableTuner: 1.5
EveryDream2trainer
Sensorial System’s Stable Diffusion
- automate all the steps of finetuning Stable Diffusion models.

1.7. MODELS

Taiyi-Stable-Diffusion: finetuned in chinese

1.7.1. CONTROLNET

1.8.4.10.2
face landmarks: get landmarks from face
MasaCtrl: change pose by changing prompt of input image, optionally with controlnet
Würstchen: more controlnets
Freecontrol: wireframe, rag doll, lidar, face mesh

1.7.1.1. SDXL-1.0

openpose t2i-adapter: https://huggingface.co/TencentARC/T2I-Adapter/tree/main/models_XL
list of them all: https://six-loganberry-ba7.notion.site/23-08-23-SDXL-ControlNet-619fdd7fff954df2ae918c69e2814fe1
- all on: https://huggingface.co/lllyasviel/sd_control_collection/tree/main
- https://huggingface.co/collections/TencentARC/t2i-adapter-sdxl-64fac9cbf393f30370eeb02f
TTPLanet_SDXL_Controlnet_Tile_Realistic_V1
- adds feature details

SMOL
- controlnet-loras instead: https://huggingface.co/stabilityai/control-lora
  - seems to extract the difference between the model and ControlNet with svd
- controlnet-lltite (for now only sdxl) by kohya
  - controlnet as a hypernetwork.
  - comfyui node: https://github.com/kohya-ss/ControlNet-LLLite-ComfyUI
  - =what's the difference between them?=
    - by furusu: https://twitter.com/gcem156/status/1693597263912899046/photo/1
CONTROLNET CANNY
- https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/tree/main

1.8. COMFY

krita plugin
AP Workflow: complex workflow with everything and organized, including interoperability with oobabooga

1.8.1. INSTALLATION SNIPPET

1.8.2. NEGATIVE LORAS

feeding Stable Diffusion XL examples of bad images that it itself generated as a lora makes SDXL behave much better to the spirit of the prompt
- https://twitter.com/minimaxir/status/1693657050990227640
  - https://minimaxir.com/2023/08/stable-diffusion-xl-wrong/

1.8.3. PROGRAMMATIC

1.8.3.1. PYTHON

https://github.com/pydn/ComfyUI-to-Python-Extension
python code remotely
ComfyScript: workflows, A Python front end for ComfyUI
Comfy Runner: Automatically install ComfyUI nodes and models and use it as a backend (like diffusers)

1.8.3.2. CUSHY

programmatic pipelines using typescript
https://github.com/rvion/CushyStudio

1.8.4. COMFY NODES

Math nodes
diffdiff: Differential Diffusion
Core ML models: leverage Apple Silicon
bundled nodes, lower node count (like highresfix)
LCMSampler-ComfyUI: In order to take advantage of the high-speed generation by LCM lora, loras
- Alternative
- comfyui-tcd-scheduler, default value 0.3, use higher eta when more inference steps
AnyText: text generation on the image
InstructIR: image restoration, watermark removal, fuzziness removal
StableSR: superresolution
ComfyUI_VLM_nodes: querying(llava, kosmos), captioning(joytag)
ComfyUI_FaceAnalysis: evaluate the similarity between two faces
3D Text, Comfyroll Studio
dynamicprompts: combinatorial prompts, prompt enhancement
ComfyUI-DragAnything

1.8.4.1. AUDIO

ComfyUI-VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

1.8.4.2. REGIONAL EDITING

RMBG: background removal
Inpaint Nodes: better inpaint
Prompt-To-Prompt: change words
OOTDiffusion: integrates the OOTDiffusion
BrushNet: better inpainting

1.8.4.3. NATIVE OFFSET NOISE

Vectorscope CC: Offset Noise* natively (control over light, contrast, shadows)

1.8.4.4. UI MANAGER

comfy-browser: An image/video/workflow browser and manager for ComfyUI
AIGODLIKE-ComfyUI-Studio: loading models more intuitive, create model thumbnails
ComfyUI-N-Sidebar: for fav nodes

1.8.4.5. UPSCALE

upscale
- SuperResolution
- ComfyUI-CCSR
- SUPIR
- clarity-upscaler
- ComfyUI-APISR: anime upscaler

1.8.4.6. OPTIMIZATION

comfy-todo: Token Downsampling for Efficient Generation of High-Resolution Images

1.8.4.7. 3D

ComfyUI-3D-Pack: process 3D inputs (Mesh & UV Texture, etc) using cutting edge algorithms (3DGS, NeRF, etc)
ComfyTextures: Unreal Engine ⚔️ ComfyUI - Automatic texturing using generative diffusion models
- ComfyUI-Texture-Simple
ComfyUI-Flowty-CRM: generate meshes one image to 3d

1.8.4.8. STYLE

VisualStylePrompting: style from example image =best=
ComfyUI-PixelArt-Detector: Generate, downscale, change palletes and restore pixel art images

TARGET STYLE-SUBJECT
- StyleAligned: consistent style to all images in a batch
- face swap
- ComfyUI Portrait Master: generates prompts for skin color, expresion, shape, light direction

1.8.4.9. PLUGIN LISTS

https://github.com/WASasquatch/comfyui-plugins/

1.8.4.10. IMAGE PROCESSING

MASKING
- ComfyUI-BiRefNet: best
1. EDITING MASK
  - ComfyUI-KJNodes: RBG to mask, grow mask with blur
2. IMPACT PACK
  - inpainting, masking, sam, automasking face
    - select face minus hair or the inverse
  - https://github.com/ltdrdata/ComfyUI-Impact-Pack
  - hair restyling
    - tutorial: https://www.youtube.com/watch?v=5SCP9ijDuuA
    - workflow: https://raw.githubusercontent.com/ltdrdata/ComfyUI-extension-tutorials/Main/ComfyUI-Impact-Pack/workflow/hair-restyle.json
3. PER-INSTANCE MASK
  - YOLO-World + EfficientSAM for ComfyUI
4. GENERATE WITH TRANSPARENCY
  - Layer Diffusion custom nodes
PREPROCESSORS
- 1.7.1
- turn image into canny, openpose, etc
  - pose editor
- it also has an inpaint node
- old preprocessors new preprocessors
  - HandRefiner Support
- DepthFM: monocular depth estimation
1. LIGHTING
  - comfyUI_TJ_{NormalLighting}: Custom Node for comfyUI for virtual lighting based on normal map
    - relightening, based on animatediff
  - Line2Normalmap
  - ComfyUI-DiffusionLight: method of creating light probes

1.8.4.11. TEXT

LLM
- ComfyUI_LLM_Node: deployment of models like T5, GPT-2
- Tara - ComfyUI Node for LLM Integration
- ComfyUI-Gemini: Gemini in ComfyUI
TEXT ENCODERS
- prompt control: example: a [large::0.1] [cat|dog:0.05] [<lora:somelora:0.5:0.6>::0.5]
- ComfyUI-ScenarioPrompt: prompt creation helper
- ComfyUI_ELLA: llm instead of clip
1. PROMPT ENHANCE
  - Plush-for-ComfyUI: prompt enhancing using llm
  - Comfyui-Superprompt-Unofficial: make dull prompts detailed
2. AUTO1111 TOKENS ON COMFY
  
  A1111’s token normalization and weighting in ComfyU This means you can reproduce the same images generated on stable-diffusion-webui on ComfyUI.
  1. LINK
    
    https://github.com/shiimizu/ComfyUI_smZNodes
3. ADVANCED TOKEN WEIGHTS
  
  https://github.com/BlenderNeko/ComfyUI_ADV_CLIP_emb

1.8.4.12. IMAGE ENCODING

clip-vision model
SEECODERS Comfy
ComfyUI-InstantID: no more lora per subject, just one picture is enough
- ComfyUI PhotoMaker Plus
Vector_Sculptor_ComfyUI
- Gather similar vectors within the CLIP weights and use them to redirect the original weights

IP ADAPTER
- IP-Adapter
  - you need clip-vision model
  - comfyui examples
  - ip-composition-adapter: general composition of an image while ignoring the style and content
    - like controlnet but less acurate
- ComfyUI_IPAdapter_plus
COMFYUI INSTANTID
- old: ComfyUI InstantID Faceswapper
- native: ComfyUI_InstantID
VISION MODEL, IMAGE TO PROMPT
- VLM nodes examples
- comfyui-moondream: tiny vision language model; image to prompt
- Comfyui_image2prompt: image to prompt by vikhyatk/moondream1
- ComfyUI_DanTagGen: LLM model designed for generating Danboou Tags with provided informations, trained on Danbooru datasets

1.8.4.13. VIDEO

Comfy-SVDTools
Jovimetrix: Nodes for procedural masking, live composition and video manipulation
DynamiCrafter: diffusion priors
- native DynamiCrafter
Font to Animation

DANCING
- AnimateAnyone: dancing
  - ComfyUI-AnimateAnyone
- Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
  - incorporates depth, normal maps, semantic maps from SMPL sequences, skeleton-based motion guidance
MOTION
- MotionCtrl: Flexible Motion Controller for Video Generation
- DragNUWA: manipulate backgrounds or objects motions
  - LightGlue(required)
- LiveDirector: use reference video to drive motion
1. LIP SYNC
  - ComfyUI-MuseTalk
NOT JUST IMAGES
- AnimateDiff
  - Steerable Motion: for steering videos with batches of images
- Stable Zero123 for ComfyUI
- fastblend: smooth out video frames
- ComfyUI_cspnodes: ZeroScope nodes

1.9. ONLINE SERVICES

main registry of loras: https://www.civitai.com/
comfy pipelines https://comfyworkflows.com/

1.9.1. THE HORDE

https://www.stablehorde.net
- distributed cluster, built on top of ComfyUI, you can use any lora on CivitAI
- https://dbzer0.com/blog/state-of-the-ai-horde-july-2023/
- clients: Lucid Creations, ArtBot
  - https://github.com/daveschumaker/artbot-for-stable-diffusion/
  - krita plugin: https://github.com/dunkeroni/krita-stable-horde

2. OTHERS VISUAL

PALLAIDIUM: Generative AI for the Blender VSE(Blender Video Sequence Editor)
- Text, video or image to video, image and audio
blender-stable-diffusion-render: addon for using Stable Diffusion to render texture bakes for objects

2.1. SEGMENTATION

https://github.com/ltdrdata/ComfyUI-Impact-Pack
- facedetailer
https://github.com/facebookresearch/segment-anything/blob/main/notebooks/predictor_example.ipynb
- hd finetune https://github.com/SysCV/sam-hq
- with text prompt https://github.com/IDEA-Research/Grounded-Segment-Anything
  - merged segment-anything and grounding-dino
    - grounding = get bounding box(or mask) from text prompt
https://github.com/biegert/ComfyUI-CLIPSeg

2.2. 3D

2.2.1. MESH GENERATION

threestudio: A unified framework for 3D content generation
- ProlificDreamer, DreamFusion, Magic3D, SJC, Latent-NeRF, Fantasia3D, TextMesh, Zero-1-to-3, Magic123, InstructNeRF2NeRF, and Control4D are all implemented in this framework.
GSGEN: Text-to-3D using Gaussian Splatting
3DTopia: Two-stage text-to-3D generation model (5 minutes)

2.2.2. NERF

https://github.com/nerfstudio-project/nerfstudio

2.2.3. GAUSSIAN

JavaScript Gaussian Splatting library
- https://github.com/dylanebert/gsplat.js

2.3. ANIMATEDIFF

https://github.com/JaredTherriault/sd-webui-animatediff

3. TEXT

aphrodite: chat bots, roleplay (by the horde)
LLaMA-Factory: Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)

3.1. INFERENCE

https://github.com/simonw/llm (cli)
- online and local
ollama: golang, uses llama.cpp, local models
- llama.cpp
  - intel-enhanced llama.cpp
  - ChatGLM.cpp (chinese, llama.cpp derived)
- https://github.com/s-kostyaev/ellama (emacs)
  - LLM-Bash: Wrapper for llm & Ollama to be used by your code editor
fast-llm git

3.1.1. UI

oobabooga: main, supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models
koboldcpp-rocm: various GGML models with KoboldAI’s UI with AMD ROCm offloading
ChuanhuChatGPT: webui, gradio

3.2. CODE

Open Interpreter, an open-source Code Interpreter
- create and edit photos, summarize pdfs, control your browser, plot and analyze large datasets
DeepSeek Coder: several models

3.3. DATASET

lewd roleplay dataset: https://huggingface.co/datasets/lemonilia/LimaRP

4. VOICE GENERATION

parent: voice

4.1. REALTIME

Realtime Voice Changer
- https://github.com/w-okada/voice-changer/tree/master
- AI covers(music): INSTANTLY make AI covers with ANY voice https://www.youtube.com/watch?v=pdlhk4vVHQk
  - https://github.com/SociallyIneptWeeb/AICoverGen

software

Table of Contents

1. IMAGE GENERATION

1.1. WORKFLOWS

1.1.1. KEYWORDS

1.2. SD GUIDES

1.2.1. LORA

1.3. SIDE TOOLS

1.3.1. FACE GENERATION - SWAP

1.4. HELPERS

1.4.1. 2SKETCH, TO SKETCH

1.4.2. STORYTELLING CAPTIONING

1.4.3. OPENPOSE EDITOR

1.4.4. TAGGER

1.4.4.1. WD TAGGER

1.4.4.2. CAPTIONING MODELS

1.4.5. DETECTORS (COMPUTER VISION)

1.4.6. ANTI GLOW

1.5. UIs

1.5.1. FRONT-ENDS

1.5.1.1. MAKE YOUR GUI

1.5.1.2. CODE

1.5.1.3. CPU

1.5.1.4. FASTER

1.6. TRAINING

1.6.1. FINETUNING

1.7. MODELS

1.7.1. CONTROLNET

1.7.1.1. SDXL-1.0

1.8. COMFY

1.8.1. INSTALLATION SNIPPET

1.8.2. NEGATIVE LORAS

1.8.3. PROGRAMMATIC

1.8.3.1. PYTHON

1.8.3.2. CUSHY

1.8.4. COMFY NODES

1.8.4.1. AUDIO

1.8.4.2. REGIONAL EDITING

1.8.4.3. NATIVE OFFSET NOISE

1.8.4.4. UI MANAGER

1.8.4.5. UPSCALE

1.8.4.6. OPTIMIZATION

1.8.4.7. 3D

1.8.4.8. STYLE

1.8.4.9. PLUGIN LISTS

1.8.4.10. IMAGE PROCESSING

1.8.4.11. TEXT

1.8.4.12. IMAGE ENCODING

1.8.4.13. VIDEO

1.9. ONLINE SERVICES

1.9.1. THE HORDE

2. OTHERS VISUAL

2.1. SEGMENTATION

2.2. 3D

2.2.1. MESH GENERATION

2.2.2. NERF

2.2.3. GAUSSIAN

2.3. ANIMATEDIFF

3. TEXT

3.1. INFERENCE

3.1.1. UI

3.2. CODE

3.3. DATASET

4. VOICE GENERATION

4.1. REALTIME