Wiki Contributions

Comments

Hey everyone! I work on quantifying and demonstrating AI cybersecurity impacts at Palisade Research with @Jeffrey Ladish.

We have a bunch of exciting work in the pipeline, including:

  • demos of well-known safety issues like agent jailbreaks or voice cloning 
  • replications of prior work on self-replication and hacking capabilities
  • modelling of above capabilities' economic impact
  • novel evaluations and tools

Most of my posts here will probably detail technical research or announce new evaluation benchmarks and tools. I also think a lot about responsible release, offence/defence balance, and general governance to flesh out my work's theory of change; some of that might also slip in.

See you around 🙃