glow

ANTI REGULATION - GLOWS
JGAAP: De-Anonymous posting (stylometry)
- counter: remove all punctuation marks and make everything lower case
- counter: purposeful grammatical and spelling errors (antigeographical location)
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
- categorizing strings determining likelihood of prompt being unsafe using different attack-styles
Towards Implicit Prompt For Text-To-Image Models
- implicit prompts: hint at a target without explicitly mentioning it
- censorship can be bypassed with implicit prompts

1. DETECTING HUMAN

Transparency Attacks: How Imperceptible Image Layers Can Fool AI Perception
- dataset poisoning using the attack to mislabel a collection, in background(hidden) layer in grayscale
- cause mislabeling
- use cases:evading facial recognition and surveillance, digital watermarking, content filtering, dataset curating, automotive and drone autonomy, forensic evidence tampering, and retail product misclassifying

parent: stable_diffusion
Ambient Diffusion: train diffusion models given only corrupted images as input (copyrightless-ed)
Seeing the World through Your Eyes (getting image from reflection of the eyes)
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
- successfully undo the safety training using lora
Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with Multi-Modal Priors
- MMP-Attack, confusing a model into adding a target object into the image content while simultaneously removing the original object

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
- does not require any training
Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?

erasing concepts https://note.com/gcem156/n/n9f74d7d1417c
- Using stable diffusion eraser to replace a concept in one model with the same concept from another
- Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
- All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
  - new approach without issues
ORES: Open-vocabulary Responsible Visual Synthesis
- synthesize images avoiding concepts but following query as much as possible
- using a llm
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
- solution to erase or edit concepts for diffusion models (DMs), 0.5% extra parameters of the DM
EraseDiff: Erasing Data Influence in Diffusion Models
SepME: Separable Multi-Concept Erasure from Diffusion Models
- avoid unlearning substantial information
MACE: Mass Concept Erasure in Diffusion Models
- successfully scaling the erasure scope up to 100 concepts and balancing generality and specificity
Robust Concept Erasure Using Task Vectors
- concept erasure that uses Task Vectors (TV) is more robust to unexpected user inputs
- diverse inversion: used to estimate the required strength of the tv edit
  - apply a TV edit only to a subset of the model weights

TOFU: A Task of Fictitious Unlearning for LLMs
- so that it truly behaves as if never trained on the forgeted data
Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
- retrains the trimmed model through a optimization process
- seeking parameters that preserve information on the remaining data while discarding information related to the forgetting data

and watermarking
CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields
- replacing the original color representation in NeRF with a watermarked color representation
Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust
- patterns hiddens in fourier space
WOUAF:Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
- model fingerprinting that assigns responsibility for the generated images
FLIRT: Feedback Loop In-context Red Teaming
- automatic framework that exposes unsafe, inappropriate, content generation and vulnerabilities
ZoDiac: Robust Image Watermarking using Stable Diffusion
- inject a watermark into the trainable latent space, which are reliably detected in the latent vector
RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees

MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
- purification technique, approximates the clean image as input
DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models
- detecting poisoned input noise, 100% detection rate for trojan triggers