Surely, he can’t mean the lanky character from the Mario franchise. I can, I do, and don’t call me Shirley (yeah, I’m that old).
The Waluigi effect in the AI world references training an AI model to do something and inadvertently making it more likely that it will do the opposite of what you want it to do.
For example, let’s say that you are using an AI chatbot that does not know about orange cups. You don’t want it to talk about orange cups for whatever reason. You first have to explain to it what orange cups are and then tell it to never talk about them. Since you’ve now taught it about something that it had no knowledge of, it is now more likely that it will provide this information in its output.
The Waluigi effect is a weird, sometimes circular phenomenon. It’s rarely an issue, but it’s also good to know about it just in case you see it in action.
Now I want to play Mario Kart…
Comments