⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’
Excalidraw Data
Text Elements
Layered Defense Architecture
Multi-Layer LLM Safety Pipeline
User Input
INPUT FILTER
• Blocklist Matching
• Classifier Detection
• Embedding Similarity
Catches obvious attacks
Low latency (~10ms)
Brittle to novel attacks
MAIN LLM
Aligned via:
• RLHF (Reward Model + PPO)
• DPO (Direct Preference Opt)
• SFT on safety data
• Constitutional AI
Primary defense layer
Most compute-intensive
Can be jailbroken with
sophisticated prompts
OUTPUT FILTER
• Toxicity Classification
• PII Detection & Masking
• Format Validation
Last line of defense
Catches LLM failures
Low latency (~20ms)
User Output
DEFENSE LAYERS
Each layer catches
what the previous
layer missed.
No single layer is
sufficient alone!
Trade-off:
More layers =
↑ Safety, ↑ Latency