WangResearchLab/SteeringSafety
Viewer
•
Updated
•
71.6k
•
313
•
3
A benchmark for evaluating effectiveness and entanglement in representation steering across seven safety-relevant perspectives