⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

LLM Safety

Alignment

(Training Phase)

Robustness

(Attack Resistance)

Monitoring

(Deployment Phase)

Techniques:

RLHF

DPO

SFT

Constitutional AI

Techniques:

Red Teaming

Jailbreak Testing

Adversarial Attacks

Prompt Injection

Techniques:

Guardrails

Logging & Auditing

Content Filtering

Rate Limiting

Goal: Train model to

follow human values

Goal: Maintain safety

under adversarial input

Goal: Catch failures

in production