software
Table of Contents
1. IMAGE GENERATION
- parent: stablediffusion
- upscale model database: https://openmodeldb.info/ old version site
1.1. WORKFLOWS
- https://comfyworkflows.com/
- others: cookingrepertoire
- reposer plus: comfyui workflow for pose + face(ip-adapter) + clothing
- https://learn.thinkdiffusion.com/a-list-of-the-best-comfyui-workflows/
- Stable Cascade Canny ControlNet https://github.com/ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO
1.1.1. KEYWORDS
- sdxl artists-repertoire: keywords
- nai artists with examples
1.2. SD GUIDES
- entry guide (old): https://imgur.com/a/VjFi5uM
- all the links about stable diffusion categorized https://rentry.co/RentrySD
- https://gitgud.io/gayshit/makesomefuckingporn (lora index)
- https://rentry.org/sdg-link (index of everything)
- https://rentry.org/hdgrecipes (model merging index)
- block merging (unet) https://rentry.org/BlockMergeExplained
- neuralnomicon node: cookingrepertoire
1.2.1. LORA
- https://rentry.org/lora_train (how to lora)
- CharFramework explanations framework to create character loras
- charturner several faces of one character
1.3. SIDE TOOLS
- generate: sprites age of empires
- real time painting(diffusing while painting): https://github.com/houseofsecrets/SdPaint
- SVGcode: Convert color bitmap images to color SVG vector images, auto version vector graphics
- read the stable diffusion stored(the prompts) metadata from images: prompt-reader
- inpainter (watermark remover) https://github.com/advimman/lama huggingface
- DiffusionToolkit: Metadata-indexer and Viewer for generated images
- RatioScope: bucketing images for dataset
1.3.1. FACE GENERATION - SWAP
- loraless
- FaceChain: now two images = any angle, sd-webui support
- FaceSwapLab: stable diffusion, webui
- an IP Adapter face model
- FaceFusion: face swapper and enhancer for video
=best=
- Roop: High-performance face swapper (Opal)
- 1.8.4.12.2
1.4. HELPERS
- DeepBump: generate normal & height maps from single pictures
- ProPainter: removal for videos using metaclip and sam
- VCHITECT
- supervision: people-object trakers ready to use (reusable computer vision tools) employees at the zone
1.4.1. 2SKETCH, TO SKETCH
- Anime2Sketch: generate sketch from anime image
- Stylized Face Sketch Extraction via Generative Prior with Limited Data
- generate sketch from image, using sketch-image-example to set style
1.4.2. STORYTELLING CAPTIONING
- The Manga Whisperer: Automatically Generating Transcriptions for Comics (magi) storytelling, storyboard
- generate a transcript, ocr, order panels, cluster characters
1.4.3. OPENPOSE EDITOR
- online 3d openpose editor: https://zhuyu1997.github.io/open-pose-editor/
- PMX model - MMD(mikumikudance): https://civitai.com/models/28916/openpose-pmx-model-mmd
- blender https://toyxyz.gumroad.com/l/ciojz
- OpenPose Man lora: https://civitai.com/models/76718
- Stable SegMap: using unity on web
1.4.4. TAGGER
1.4.4.1. WD TAGGER
1.4.4.2. CAPTIONING MODELS
- CogVLM and Moonshot2 both are insanely good at captioning
- Qwen-VL-Max #1, THUDM/cogagent-vqa-hf #2, liuhaotian/llava-v1.6-vicuna-13b #3.
- taggui for cog - https://github.com/jhc13/taggui/releases
- For llava 1.6 - https://github.com/DEVAIEXP/image-interrogator
- Qwen-VL-Max - https://huggingface.co/spaces/Qwen/Qwen-VL-Max
- 1.8.4.12.3
1.4.5. DETECTORS (COMPUTER VISION)
1.4.6. ANTI GLOW
- Nightshade Antidote: remove poison pill from image
1.5. UIs
1.5.1. FRONT-ENDS
- ComfyUI
- comfyui but from python-emacs: https://codeberg.org/tekakutli/apapach
- StableSwarmUI: making comfyui easily accessible
- Auto1111 Webui
- Fooocus
- Focus on prompting and generating, manual tweaking is not needed
- Refocus: Alternative ui for Fooocus
- ENFUGUE: Stable Diffusion web app
- Chibi: comfyui gui in Vue
1.5.1.1. MAKE YOUR GUI
1.5.1.2. CODE
- Diffusers (python pipelines): https://huggingface.co/docs/diffusers/index
- https://github.com/ddPn08/Radiata
- Stable diffusion webui based on diffusers
- https://github.com/ddPn08/Radiata
- nodejs: https://github.com/dakenf/stable-diffusion-nodejs
- fastembed: lightweight Python for embedding generation
1.5.1.3. CPU
- ggml: inference in pure c/c++ (interoperability, no python dependency hell)
- https://github.com/leejet/stable-diffusion.cpp
- Running Stable Diffusion XL 1.0 in 298MB of RAM (Raspberry Pi Zero 2)
- OnnxStream consumes 55x less memory than OnnxRuntime while being only 0.5-2x slower
- FastSD CPU: Faster version of stable diffusion running on CPU
- FastSD CPU beta 16 release with 2 steps fast inference
1.5.1.4. FASTER
- Stable-Fast: on NVIDIA GPU
- ComfyUI-AIT: faster inference using cpp/cuda.
1.6. TRAINING
- LoRA training extention for Web-UI
- kohya training scripts: https://github.com/kohya-ss/sd-scripts
- SCEPTER: training, fine-tuning, and inference with generative models
- OneTrainer: one-stop solution for all your stable diffusion training needs
1.6.1. FINETUNING
- SimpleTuner: fine-tuning kit geared toward Stable Diffusion 2.1 and SDXL
- StableTuner: 1.5
- EveryDream2trainer
- Sensorial System’s Stable Diffusion
- automate all the steps of finetuning Stable Diffusion models.
1.7. MODELS
- Taiyi-Stable-Diffusion: finetuned in chinese
1.7.1. CONTROLNET
- 1.8.4.10.2
- face landmarks: get landmarks from face
- MasaCtrl: change pose by changing prompt of input image, optionally with controlnet
- Würstchen: more controlnets
- Freecontrol: wireframe, rag doll, lidar, face mesh
1.7.1.1. SDXL-1.0
- openpose t2i-adapter: https://huggingface.co/TencentARC/T2I-Adapter/tree/main/models_XL
- list of them all: https://six-loganberry-ba7.notion.site/23-08-23-SDXL-ControlNet-619fdd7fff954df2ae918c69e2814fe1
- TTPLanetSDXLControlnetTileRealisticV1
- adds feature details
- SMOL
- controlnet-loras instead: https://huggingface.co/stabilityai/control-lora
- seems to extract the difference between the model and ControlNet with svd
- controlnet-lltite (for now only sdxl) by kohya
- controlnet as a hypernetwork.
- comfyui node: https://github.com/kohya-ss/ControlNet-LLLite-ComfyUI
=what's the difference between them?=
- controlnet-loras instead: https://huggingface.co/stabilityai/control-lora
- CONTROLNET CANNY
1.8. COMFY
- krita plugin
- AP Workflow: complex workflow with everything and organized, including interoperability with oobabooga
1.8.1. INSTALLATION SNIPPET
1.8.2. NEGATIVE LORAS
- feeding Stable Diffusion XL examples of bad images that it itself generated as a lora makes SDXL behave much better to the spirit of the prompt
1.8.3. PROGRAMMATIC
1.8.3.1. PYTHON
- https://github.com/pydn/ComfyUI-to-Python-Extension
- python code remotely
- ComfyScript: workflows, A Python front end for ComfyUI
- Comfy Runner: Automatically install ComfyUI nodes and models and use it as a backend (like diffusers)
1.8.3.2. CUSHY
- programmatic pipelines using typescript
- https://github.com/rvion/CushyStudio
1.8.4. COMFY NODES
- Math nodes
- diffdiff: Differential Diffusion
- Core ML models: leverage Apple Silicon
- bundled nodes, lower node count (like highresfix)
- LCMSampler-ComfyUI: In order to take advantage of the high-speed generation by LCM lora, loras
- Alternative
- comfyui-tcd-scheduler, default value 0.3, use higher eta when more inference steps
- AnyText: text generation on the image
- InstructIR: image restoration, watermark removal, fuzziness removal
- StableSR: superresolution
- ComfyUIVLMnodes: querying(llava, kosmos), captioning(joytag)
- ComfyUIFaceAnalysis: evaluate the similarity between two faces
- 3D Text, Comfyroll Studio
- dynamicprompts: combinatorial prompts, prompt enhancement
- ComfyUI-DragAnything
1.8.4.1. AUDIO
- ComfyUI-VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
1.8.4.2. REGIONAL EDITING
- RMBG: background removal
- Inpaint Nodes: better inpaint
- Prompt-To-Prompt: change words
- OOTDiffusion: integrates the OOTDiffusion
- BrushNet: better inpainting
1.8.4.3. NATIVE OFFSET NOISE
- Vectorscope CC: Offset Noise* natively (control over light, contrast, shadows)
1.8.4.4. UI MANAGER
- comfy-browser: An image/video/workflow browser and manager for ComfyUI
- AIGODLIKE-ComfyUI-Studio: loading models more intuitive, create model thumbnails
- ComfyUI-N-Sidebar: for fav nodes
1.8.4.5. UPSCALE
- upscale
- SuperResolution
- ComfyUI-CCSR
- SUPIR
- clarity-upscaler
- ComfyUI-APISR: anime upscaler
1.8.4.6. OPTIMIZATION
- comfy-todo: Token Downsampling for Efficient Generation of High-Resolution Images
1.8.4.7. 3D
- ComfyUI-3D-Pack: process 3D inputs (Mesh & UV Texture, etc) using cutting edge algorithms (3DGS, NeRF, etc)
- ComfyTextures: Unreal Engine ⚔️ ComfyUI - Automatic texturing using generative diffusion models
- ComfyUI-Flowty-CRM: generate meshes one image to 3d
1.8.4.8. STYLE
- VisualStylePrompting: style from example image
=best=
- ComfyUI-PixelArt-Detector: Generate, downscale, change palletes and restore pixel art images
- TARGET STYLE-SUBJECT
- StyleAligned: consistent style to all images in a batch
- face swap
- ComfyUI Portrait Master: generates prompts for skin color, expresion, shape, light direction
1.8.4.9. PLUGIN LISTS
1.8.4.10. IMAGE PROCESSING
- MASKING
- ComfyUI-BiRefNet: best
- EDITING MASK
- ComfyUI-KJNodes: RBG to mask, grow mask with blur
- IMPACT PACK
- inpainting, masking, sam, automasking face
- select face minus hair or the inverse
- https://github.com/ltdrdata/ComfyUI-Impact-Pack
- hair restyling
- inpainting, masking, sam, automasking face
- PER-INSTANCE MASK
- YOLO-World + EfficientSAM for ComfyUI
- GENERATE WITH TRANSPARENCY
- Layer Diffusion custom nodes
- PREPROCESSORS
- 1.7.1
- turn image into canny, openpose, etc
- pose editor
- it also has an inpaint node
- old preprocessors new preprocessors
- HandRefiner Support
- DepthFM: monocular depth estimation
- LIGHTING
- comfyUITJNormalLighting: Custom Node for comfyUI for virtual lighting based on normal map
- relightening, based on animatediff
- Line2Normalmap
- ComfyUI-DiffusionLight: method of creating light probes
- comfyUITJNormalLighting: Custom Node for comfyUI for virtual lighting based on normal map
1.8.4.11. TEXT
- LLM
- ComfyUILLMNode: deployment of models like T5, GPT-2
- Tara - ComfyUI Node for LLM Integration
- ComfyUI-Gemini: Gemini in ComfyUI
- TEXT ENCODERS
- prompt control: example: a [large::0.1] [cat|dog:0.05] [<lora:somelora:0.5:0.6>::0.5]
- ComfyUI-ScenarioPrompt: prompt creation helper
- ComfyUIELLA: llm instead of clip
- PROMPT ENHANCE
- Plush-for-ComfyUI: prompt enhancing using llm
- Comfyui-Superprompt-Unofficial: make dull prompts detailed
- AUTO1111 TOKENS ON COMFY
A1111’s token normalization and weighting in ComfyU This means you can reproduce the same images generated on stable-diffusion-webui on ComfyUI.
- ADVANCED TOKEN WEIGHTS
1.8.4.12. IMAGE ENCODING
- clip-vision model
- SEECODERS Comfy
- ComfyUI-InstantID: no more lora per subject, just one picture is enough
- ComfyUI PhotoMaker Plus
- VectorSculptorComfyUI
- Gather similar vectors within the CLIP weights and use them to redirect the original weights
- IP ADAPTER
- IP-Adapter
- you need clip-vision model
- comfyui examples
- ip-composition-adapter: general composition of an image while ignoring the style and content
- like controlnet but less acurate
- ComfyUIIPAdapterplus
- IP-Adapter
- COMFYUI INSTANTID
- old: ComfyUI InstantID Faceswapper
- native: ComfyUIInstantID
- VISION MODEL, IMAGE TO PROMPT
- VLM nodes examples
- comfyui-moondream: tiny vision language model; image to prompt
- Comfyuiimage2prompt: image to prompt by vikhyatk/moondream1
- ComfyUIDanTagGen: LLM model designed for generating Danboou Tags with provided informations, trained on Danbooru datasets
1.8.4.13. VIDEO
- Comfy-SVDTools
- Jovimetrix: Nodes for procedural masking, live composition and video manipulation
- DynamiCrafter: diffusion priors
- Font to Animation
- DANCING
- AnimateAnyone: dancing
- Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
- incorporates depth, normal maps, semantic maps from SMPL sequences, skeleton-based motion guidance
- MOTION
- MotionCtrl: Flexible Motion Controller for Video Generation
- DragNUWA: manipulate backgrounds or objects motions
- LightGlue(required)
- LiveDirector: use reference video to drive motion
- NOT JUST IMAGES
- AnimateDiff
- Steerable Motion: for steering videos with batches of images
- Stable Zero123 for ComfyUI
- fastblend: smooth out video frames
- ComfyUIcspnodes: ZeroScope nodes
- AnimateDiff
1.9. ONLINE SERVICES
- main registry of loras: https://www.civitai.com/
- comfy pipelines https://comfyworkflows.com/
1.9.1. THE HORDE
- https://www.stablehorde.net
- distributed cluster, built on top of ComfyUI, you can use any lora on CivitAI
- https://dbzer0.com/blog/state-of-the-ai-horde-july-2023/
- clients: Lucid Creations, ArtBot
2. OTHERS VISUAL
- PALLAIDIUM: Generative AI for the Blender VSE(Blender Video Sequence Editor)
- Text, video or image to video, image and audio
- blender-stable-diffusion-render: addon for using Stable Diffusion to render texture bakes for objects
2.1. SEGMENTATION
- https://github.com/ltdrdata/ComfyUI-Impact-Pack
- https://github.com/facebookresearch/segment-anything/blob/main/notebooks/predictor_example.ipynb
- hd finetune https://github.com/SysCV/sam-hq
- with text prompt https://github.com/IDEA-Research/Grounded-Segment-Anything
- merged segment-anything and grounding-dino
- grounding = get bounding box(or mask) from text prompt
- merged segment-anything and grounding-dino
- https://github.com/biegert/ComfyUI-CLIPSeg
2.2. 3D
2.2.1. MESH GENERATION
- threestudio: A unified framework for 3D content generation
- ProlificDreamer, DreamFusion, Magic3D, SJC, Latent-NeRF, Fantasia3D, TextMesh, Zero-1-to-3, Magic123, InstructNeRF2NeRF, and Control4D are all implemented in this framework.
- GSGEN: Text-to-3D using Gaussian Splatting
- 3DTopia: Two-stage text-to-3D generation model (5 minutes)
2.2.2. NERF
2.2.3. GAUSSIAN
- JavaScript Gaussian Splatting library
2.3. ANIMATEDIFF
3. TEXT
- aphrodite: chat bots, roleplay (by the horde)
- LLaMA-Factory: Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
3.1. INFERENCE
- https://github.com/simonw/llm (cli)
- online and local
- ollama: golang, uses llama.cpp, local models
- llama.cpp
- intel-enhanced llama.cpp
- ChatGLM.cpp (chinese, llama.cpp derived)
- https://github.com/s-kostyaev/ellama (emacs)
- LLM-Bash: Wrapper for llm & Ollama to be used by your code editor
- llama.cpp
- fast-llm git
3.1.1. UI
- oobabooga: main, supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models
- koboldcpp-rocm: various GGML models with KoboldAI’s UI with AMD ROCm offloading
- ChuanhuChatGPT: webui, gradio
3.2. CODE
- Open Interpreter, an open-source Code Interpreter
- create and edit photos, summarize pdfs, control your browser, plot and analyze large datasets
- DeepSeek Coder: several models
3.3. DATASET
- lewd roleplay dataset: https://huggingface.co/datasets/lemonilia/LimaRP
4. VOICE GENERATION
- parent: voice
4.1. REALTIME
- Realtime Voice Changer
- https://github.com/w-okada/voice-changer/tree/master
- AI covers(music): INSTANTLY make AI covers with ANY voice https://www.youtube.com/watch?v=pdlhk4vVHQk