None defined yet.
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
LTD-Bench: Evaluating Large Language Models by Letting Them Draw