latterframe

Posts

Sorted by New

34Take SCIFs, it’s dangerous to go alone

Ω

16d

Ω

1

Wiki Contributions

Comments

Open Thread Spring 2024

latterframe18d40

Hey everyone! I work on quantifying and demonstrating AI cybersecurity impacts at Palisade Research with @Jeffrey Ladish.

We have a bunch of exciting work in the pipeline, including:

demos of well-known safety issues like agent jailbreaks or voice cloning
replications of prior work on self-replication and hacking capabilities
modelling of above capabilities' economic impact
novel evaluations and tools

Most of my posts here will probably detail technical research or announce new evaluation benchmarks and tools. I also think a lot about responsible release, offence/defence balance, and general governance to flesh out my work's theory of change; some of that might also slip in.

See you around 🙃

Reply

1