⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’
Excalidraw Data
Text Elements
LLM Safety
Alignment
(Training Phase)
Robustness
(Attack Resistance)
Monitoring
(Deployment Phase)
Techniques:
RLHF
DPO
SFT
Constitutional AI
Techniques:
Red Teaming
Jailbreak Testing
Adversarial Attacks
Prompt Injection
Techniques:
Guardrails
Logging & Auditing
Content Filtering
Rate Limiting
Goal: Train model to
follow human values
Goal: Maintain safety
under adversarial input
Goal: Catch failures
in production