glow
Table of Contents
- ANTI REGULATION - GLOWS
- JGAAP: De-Anonymous posting (stylometry)
- counter: remove all punctuation marks and make everything lower case
- counter: purposeful grammatical and spelling errors (antigeographical location)
- Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
- categorizing strings determining likelihood of prompt being unsafe using different attack-styles
- Towards Implicit Prompt For Text-To-Image Models
- implicit prompts: hint at a target without explicitly mentioning it
- censorship can be bypassed with implicit prompts
1. DETECTING HUMAN
- DensePose From WiFi three wifis to get pose of human
- SoundCam: A Dataset for Finding Humans Using Room Acoustics
2. SPOOFING
- Transparency Attacks: How Imperceptible Image Layers Can Fool AI Perception
- dataset poisoning using the attack to mislabel a collection, in background(hidden) layer in grayscale
- cause mislabeling
- use cases:evading facial recognition and surveillance, digital watermarking, content filtering, dataset curating, automotive and drone autonomy, forensic evidence tampering, and retail product misclassifying
2.1. VIDEO GLOW
- PRIME: Protect Your Videos From Malicious Editing
3. DIFFUSION CENSOR
- parent: stablediffusion
- Ambient Diffusion: train diffusion models given only corrupted images as input (copyrightless-ed)
- Seeing the World through Your Eyes (getting image from reflection of the eyes)
- LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
- successfully undo the safety training using lora
- Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors
- MMP-Attack, confusing a model into adding a target object into the image content while simultaneously removing the original object
3.1. DETECTING AI GENERATED
- AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
- does not require any training
- Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?
3.2. ERASING CONCEPTS
- erasing concepts https://note.com/gcem156/n/n9f74d7d1417c
- Using stable diffusion eraser to replace a concept in one model with the same concept from another
- Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
- All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
- new approach without issues
- ORES: Open-vocabulary Responsible Visual Synthesis
- synthesize images avoiding concepts but following query as much as possible
- using a llm
- One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
- solution to erase or edit concepts for diffusion models (DMs), 0.5% extra parameters of the DM
- EraseDiff: Erasing Data Influence in Diffusion Models
- SepME: Separable Multi-Concept Erasure from Diffusion Models
- avoid unlearning substantial information
- MACE: Mass Concept Erasure in Diffusion Models
- successfully scaling the erasure scope up to 100 concepts and balancing generality and specificity
- Robust Concept Erasure Using Task Vectors
- concept erasure that uses Task Vectors (TV) is more robust to unexpected user inputs
- diverse inversion: used to estimate the required strength of the tv edit
- apply a TV edit only to a subset of the model weights
3.2.1. LLM
- TOFU: A Task of Fictitious Unlearning for LLMs
- so that it truly behaves as if never trained on the forgeted data
- Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
- retrains the trimmed model through a optimization process
- seeking parameters that preserve information on the remaining data while discarding information related to the forgetting data
3.3. FINGERPRINTING
- and watermarking
- CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields
- replacing the original color representation in NeRF with a watermarked color representation
- Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust
- patterns hiddens in fourier space
- WOUAF:Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
- model fingerprinting that assigns responsibility for the generated images
- FLIRT: Feedback Loop In-context Red Teaming
- automatic framework that exposes unsafe, inappropriate, content generation and vulnerabilities
- ZoDiac: Robust Image Watermarking using Stable Diffusion
- inject a watermark into the trainable latent space, which are reliably detected in the latent vector
- RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees
3.4. ANTI-GLOW
- MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
- purification technique, approximates the clean image as input
- DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models
- detecting poisoned input noise, 100% detection rate for trojan triggers