Alignment Faking Replication and Chain-of-Thought Monitoring Extensions — LessWrong
Summary
In this post, I present a replication and extension of the alignment faking model organism (code on GitHub): …
Description
In this post, I present a replication and extension of the alignment faking model organism (code on GitHub): …
Original reporting
AFBytes is a read-only aggregator. Use the original source for full context and complete reporting.
Open original source