train

Table of Contents

1. RESEARCH

  • Deep neural networks are robust to weight binarization and other non-linear distortions
    • 0.68 effective bits per weight (below 1 bit models)
      • points to the idea that a stochastic memory element can be used

2. SOFTWARE WISE

3. WITH REWARD

  • feedback: FEEDBACK AS TARGET HUMAN FEEDBACK PROPER-ING INSTRUCTIONS
  • AlignProp: Aligning Text-to-Image Diffusion Models with Reward Backpropagation
    • aligns to reward functions
  • CPL: Contrastive Prefence Learning: Learning from Human Feedback without RL
    • learning optimal policies from preferences without learning reward functions
    • regret-based model of human preferences instead of reward

3.1. CLIP AS REWARD

  • Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
    • reward function = often infeasible(not posible), reward model from human feedback = often very expensive
    • VLMs(CLIP) as reward models: a single sentence text prompt describing the desired task

3.2. REINFORCEMENT LEARNING

  • TD-MPC2: Scalable, Robust World Models for Continuous Control
    • agent to perform 80 tasks across multiple task domains, embodiments, and action spaces
    • performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model

3.3. LLM AS REWARD

  • Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
    • automates the generation of dense reward functions based on llm
  • Eureka: Human-Level Reward Design via Coding Large Language Models
    • generates reward functions that outperform expert human-engineered rewards
      • so now can acquire complex skills via reinforcement learning, optimization over reward
        • to get sequential decision-making tasks
      • in-context RLHF to incorporate feedback and steer and align the reward function
    • outer loop: inference-only LLM instructs a learnable NN to refine the reward function
      • inner loop: reinforcement learning to train a controller
    • pen spinning

4. STRUCTURE

  • LORA
  • ConvNeXt (vs ViT, for image classification)
  • CNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
    • replacing linear recurrence with a special temporal convolutional network
      • permits larger receptive field size with shallower networks
      • reduces the computational complexity to O(L)
  • PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation
    • shortcut used to enhance the model nonlinearity, 10% inference speed-up
    • non linearity usual in convolutional networks for vision tasks

4.1. HYPERPARAMETER

  • muP is proposes “right way to scale”, effective weight init scheme; searching the optimal hyperparameters

5. CLASSIFIER

5.1. GZIP VS GPT

  • are llm just text compression algorithms?
    • LLMZip: Lossless Text Compression using Large Language Models
    • gzip instead of parameters for classification
      • Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

6. SMALLER

6.1. COMPRESSION

  • Knowledge Translation: A New Pathway for Model Compression
    • teacher-student model that receives parameters and generates compressed ones

6.2. QUANTIZATION

  • DIFFUSION QUANTIZATION
  • AdaLoRA adaptively allocates the parameter budget among weight matrices according to their importance (adaptive lora)
  • FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
    • mixed-precision quantization, eliminates the need for retraining

7. OPTIMIZER

  • Lion: better than Adam, optimizer
  • Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
    • Kronecker-factored diagonal eigenvalues, Frequent Directions

8. CHEAPNESS

  • One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
  • Optimized Network Architectures for Large Language Model Training with Billions of Parameters
    • only small subgroups of GPUs require high-bandwidth any-to-any communication within them

9. DATASET

  • CAPTIONING
  • dimensionality reduction algorithms
    • t-SNE and UMAP had long been the favorites
    • “Deep TDA” combines self-supervised learning and Topological Data Analysis (TDA)
      • unlock new insights from complex datasets
      • more robust to noise and outliers in the data
  • Gen2Det: Generate to Detect
    • directly generating scene-centric images (synthetic)
    • improves the performance on rare categories
  • Image classification network enhancement methods based on knowledge injection
    • knowledge injection dataset to improve interpretability and classification performance of hidden layers
  • MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
    • generate a script and correspoinding video as dataset

9.1. MISTAKES

  • In-Context Principle Learning from Mistakes
    • induce model to make mistakes; then we reflect on these mistakes, and learn explicit task-specific “principles” from them which help solve similar mistakes

9.2. ACTUAL DATASET

  • MatSynth: Physically Based Rendering (PBR) materials dataset (4,000 ultra-high resolution)
  • FindingEmo: An Image Dataset for Emotion Recognition in the Wild
    • annotated dimensions include: valence, arousal and emotion
  • English public domain books

9.2.1. HANDS DATASET

  • Annotated Hands for Generative Models
    • with three additional channels that provide annotations to hands in the image, additional structure

9.3. ENHANCEMENT

  • AUDIO VISION
  • Learning to Identify Critical States for Reinforcement Learning from Videos
    • mask-based sensitivity analysis to extract/identify important critical states =identify important=
    • recognize relevant states/actions/rewards. = untagged videos
  • Let’s Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
    • extrapolating the errors made by a small model trained on the synthesized dataset using llm
  • GeNIe: Generative Hard Negative Images Through Diffusion (synthetic enhanced dataset)
    • generate challenging samples for the target category
  • DistDiff: Distribution-Aware Data Expansion with Diffusion Models
    • dataset expansion framework based on the distribution-aware diffusion model
    • hierarchical prototypes to approximate the real data distribution

9.4. SIMULATION

  • madrona-engine: ECS-based game engine that runs 10,000s of environments in parallel on a single GPU
  • V-IRL: Grounding Virtual Intelligence in Real Life
    • test foundation models in virtual real world cities, geospatial data and street view imagery

10. FINETUNING

  • Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
    • surrogate network to finetune a pretrained model with substantially reduced memory consumption
    • comparable performance to conventional finetuning but with significantly less memory usage
  • Data-Free Generalized Zero-Shot Learning (using only it’s clip features)
  • Gradient Correlation Subspace Learning against Catastrophic Forgetting
    • detects a subspace of the weights that is least affected by previous tasks trains the new task into said subspace
  • Evolutionary Optimization of Model Merging Recipes
    • facilitates crossdomain merging, automated model composition
  • The Unreasonable Ineffectiveness of the Deeper Layers
    • identify optimal block of layers to prune by considering similarity across layers
      • then, to “heal” the damage, we perform a small amount of finetuning

11. GAN ALTERNATIVE

Author: Tekakutli

Created: 2024-04-13 Sat 04:35