AI Governance & Strategy

As AI systems continue to advance and develop even stronger capabilities, can we develop policies, standards, and frameworks to guide the ethical development, deployment, and regulation of AI technologies, focusing on ensuring safety and societal benefit?

Mentors

David Krueger
Assistant Professor, University of Cambridge KASL

Assistant Professor in Deep Learning and AI Safety, pivoting to focus more on outreach, advocacy, activism, governance, and/or strategy.

  • I am an Assistant Professor of Machine Learning  My work focuses on reducing the risk of human extinction from artificial intelligence (AI x-risk) through technical research as well as education, outreach, policy, and advocacy.  My research spans many areas of Deep Learning, AI Alignment, AI Safety, and AI Ethics, including alignment failure modes, algorithmic manipulation, interpretability, robustness, and understanding how AI systems learn and generalize.  I’ve been featured in media outlets including ITV's Good Morning Britain, Al Jazeera's Inside Story, France 24, New Scientist, and the Associated Press.  I completed my graduate studies at the University of Montreal and Mila, working with Yoshua Bengio, Roland Memisevic, and Aaron Courville, and I am a research affiliate of Mila, UC Berkeley's Center for Human-Compatible AI (CHAI), and the Center for the Study of Existential Risk (CSER) at the University of Cambridge.

  • A variety of vague projects, mostly not technical research, but also a few technical research projects.

  • I'm broadly interested in any work that seems to reduce AI x-risk, including outreach, advocacy, activism, governance, and strategy.

    Additionally, I believe that work grounded in present day issues and harms is neglected by the AI x-safety community, and am interested in exploring relevant directions.

    In terms of technical topics, I'm most interested in work that aims for publication in Machine Learning conferences. This includes established areas of research (e.g. robustness and security problems, broadly construed), and I'm currently enthusiastic about work that highlights (potential) flaws in existing methodologies. It also includes work that introduces novel (to them) safety concerns to the ML community, e.g. goal misgeneralization.

    Of particular interest to me are the following (unordered):

    • Work advocating for an indefinite pause on AI and analyzing how this might be achieved.

    • A policy-facing document modelled after Bowman's "Eight Things to Know about Large Language Models" listing challenges for assuring AI safety that most technical ML researchers would expect to persist for the indefinite future.

    • A strategic analysis of the roles that evaluations might play in different scenarios, a technical analysis of their suitability for those roles, and (optionally) a research agenda highlighting technical questions.

    • A project socializing and popularizing existential risks which arise from handing over power to (e.g. imperfectly aligned) AI systems e.g. due to competitive pressures (cf "Meditations on Moloch", "What Failure Looks Like", "Two Types of AI Existential Risk: Decisive and Accumulative").

    • Work outlining ways in which safety and performance are likely to trade-off, and arguing that these trade-offs are likely to be extreme (e.g. due to competitive pressures to deploy AI decision making at superhuman speed and scale).

    • Developing generic methods for producing "AI Savants" that are competitive with general-purpose models (e.g. LLMs) in narrow application domains.

    • A generic, learning-based jailbreaking method exploiting the full attack surface: unlike with adversarial examples, jailbreaking permits using multiple queries and combining results in arbitrary ways. (cf "Adversaries Can Misuse Combinations of Safe Models", "LLM Censorship: A Machine Learning Challenge or a Computer Security Problem").

    • Work on learned "computation editors" that translate descriptions of modifications of models' behaviors into changes to their forward pass computation (a la steering vectors).

    I'm happy to hear from researchers pursuing any of these topics (for potential collaboration, feedback, or even just an "FYI, we did this"), even if we do not end up working together through MATS.

  • The key requirement is a high degree of motivation, independence, and alignment of research interests.

    I will be most interested in supervising researchers who independently have a similar vision to me for one of the 9 ideas listed above, or who propose ideas I find as exciting (or more!) than these.

    I may co-supervise projects with some of my group members (most likely candidates are Dima and Usman); this is something for us to discuss and decide on together. In this case, our weekly meetings may be shorter and you may interact more with them. Usman is interested in understanding in-context learning, and Dima is interested in implicit meta-learning & out-of-context learning.

    Coding and ML skills, etc. are important for the technical projects. Strong communication skills are required for any of them.

Daniel Kokotajlo
Executive Director, AI Futures Project

Daniel is working on forecasting detailed AI scenarios with Eli Lifland and Thomas Larsen.

  • Daniel is working on forecasting detailed AI scenarios with Eli Lifland and Thomas Larsen. He previously worked at OpenAI on dangerous capability evaluations, scenario forecasting, and chain-of-thought faithfulness.

  • This stream's mentors will be Eli Lifland and Daniel Kokotajlo, with Eli Lifland being the primary mentor.

    We are interested in mentoring projects in AI forecasting and governance. We are currently working on a detailed mainline scenario forecast, and this work would build on that to either do more scenario forecasting or explore how to positively affect key decision points, informed by our scenario.

  • We are interested in mentoring projects in AI forecasting and governance. We are currently working on a detailed mainline scenario forecast, and this work would build on that. Potential projects include:

    1. Scenario forecasting: Writing scenarios that branch off of our mainline scenario, exploring what might happen if important variables are changed. This could include detailed concrete threat modeling or detailed theories of victory.

    2. Projects related to a potential government-supported AGI project (aka "The Project"): Mapping out what it could look like, proposals for making it go well, and navigating tradeoffs e.g. around transparency and security.

    3. Emergency preparedness: Building upon our detailed scenarios regarding opportunities and risks around advanced AI, make recommendations regarding what we should do now to prepare.

    4. Memos: Writing concise memos or short papers on various important AI forecasting and governance topics e.g. AI timelines, takeoff speeds, threat models, etc.

  • The most important characteristics include:

    1. Strong reasoning and writing abilities

    2. Excitement about AI forecasting/governance research

    3. Autonomous research skills

    4. Ability to learn quickly

    Also important, but not required characteristics include:

    1. Significant background knowledge in at least one area relevant to AI forecasting and governance (e.g. government experience, technical background, etc.)

    2. Significant background knowledge in AGI and existential risks

Eli Lifland
Researcher, AI Futures Project

Eli is working on forecasting detailed AI scenarios with Daniel Kokotjlo and Thomas Larsen.

  • Eli is working on forecasting detailed AI scenarios with Daniel Kokotajlo and Thomas Larsen. He advises Sage, an organization he cofounded that works on AI explainers and forecasting tools. He previously worked at Ought on the AI-powered research assistant Elicit.

  • This stream's mentors will be Eli Lifland and Daniel Kokotajlo, with Eli Lifland being the primary mentor.

    We are interested in mentoring projects in AI forecasting and governance. We are currently working on a detailed mainline scenario forecast, and this work would build on that to either do more scenario forecasting or explore how to positively affect key decision points, informed by our scenario.

  • We are interested in mentoring projects in AI forecasting and governance. We are currently working on a detailed mainline scenario forecast, and this work would build on that. Potential projects include:

    1. Scenario forecasting: Writing scenarios that branch off of our mainline scenario, exploring what might happen if important variables are changed. This could include detailed concrete threat modeling or detailed theories of victory.

    2. Projects related to a potential government-supported AGI project (aka "The Project"): Mapping out what it could look like, proposals for making it go well, and navigating tradeoffs e.g. around transparency and security.

    3. Emergency preparedness: Building upon our detailed scenarios regarding opportunities and risks around advanced AI, make recommendations regarding what we should do now to prepare.

    4. Memos: Writing concise memos or short papers on various important AI forecasting and governance topics e.g. AI timelines, takeoff speeds, threat models, etc.

  • The most important characteristics include:

    1. Strong reasoning and writing abilities

    2. Excitement about AI forecasting/governance research

    3. Autonomous research skills

    4. Ability to learn quickly

    Also important, but not required characteristics include:

    1. Significant background knowledge in at least one area relevant to AI forecasting and governance (e.g. government experience, technical background, etc.)

    2. Significant background knowledge in AGI and existential risks

Lisa Thiergart
Research Lead of Technical Governance Team, MIRI

Lisa Thiergart's work aims to bridge technical safety research with evolving policy demands by designing robust safety standards and regulatory frameworks that anticipate the challenges posed by advances toward smarter-than-human AI.

  • Lisa is the Research Lead of Technical Governance Team at MIRI. Lisa has a BSc in Computer Science from TU Munich and specialized in ML research at Georgia Tech, including some research on RLHF for robotic control. Lisa also studied partial degrees in neuroscience and medicine and holds a graduate degree in technology management. Lisa was a MATS 3.0 and 3.1 scholar mentored by Alex Turner where she worked on model internals and contributed to the new technique of activation additions. After MATS, Lisa joined MIRI where she’s been working on research management, strategy and new research directions during the last year. Lisa is also a fellow and AI safety grant advisor at the Foresight Institute, where she explores whole brain emulation and its potential for AI safety. After focussing primarily on technical alignment research, this spring Lisa opened a new team at MIRI in response to rapid developments in the AI policy space and the need for further technical governance research bridging the gap between current safety standards and possible future government regulation which could provide strong safety cases. The team is recruiting this year, and strong scholars could potentially be hired to continue at MIRI after MATS.

  • This stream will do technical research in service of governance goals, including for example research on verification mechanisms for international coordination, enabling technologies for 3rd party verification, privacy-preserving mechanisms, and designing components for more robust safety standards.

  • My team at MIRI focuses on researching and designing technical aspects of regulation and policy that could lead to safer AI, giving special consideration to which methods can (continue to) function as we move closer toward smarter-than-human AI. This involves research on verification mechanisms (eg. compute governance), off-switch design, 3rd party verification, as well as new research around necessary technical components for regulation which could provide strong safety cases. At MATS there is room for scholars to propose projects they are interested in and the scholar may be working closely together with other MIRI team members.

    MATS projects I’d be most interested in supervising would likely be in the areas of:

    • Technical Criteria for an off-switch. Contributions to a MIRI project on the concrete implementation of various levels of an off-switch.

    • Developing new safety cases

    • Research on Verification Mechanisms for International Coordination

    • Enabling technologies for 3rd party verification (eg. privacy preserving mechanisms)

    • Creating research that contributes to MIRI inputs into regulations and requests for comment by policy bodies (ex. NIST/US AISI, EU, UN)

    • Creating research that helps communicate with policymakers and governance organizations (eg. concrete stories of how various AI outcomes may influence the future)

  • I'm looking for technical scholars with an interest in policy. The scholars should have a strong background in either CS, electrical engineering or physics (or be a strong exotic candidate who can apply some of the tools of these domains well), as well as have:

    - strong understanding of AI alignment strategy

    - strong written communication

    - hardware background is a plus

    - previous experience working in policy is a plus

Janet Egan
Senior Fellow, Center for a New American Security

Janet's work focuses on compute and how tech developments could make this a more or less effective governance node for AI, with a focus on translating technical insights into advice salient to policymakers.

  • Janet Egan is a Senior Fellow with the Technology and National Security Program at the Center for a New American Security (CNAS). Her research focuses on the national security implications of AI and other emerging technologies, including how compute can be leveraged for the governance of advanced AI systems. Janet brings a policy lens to tech issues: translating technical research into insights that are salient with policymakers.

    Prior to joining CNAS, Janet was a Director in the Australian Government Department of the Prime Minister and Cabinet. She has applied experience working on policy at the intersection of national security, economics and international relations, on issues spanning 5G security, cyber security, countering foreign interference, foreign investment and trade, and critical infrastructure regulations. Janet has a MPP from the Harvard Kennedy School, and a BA (political philosophy) from Monash University in Australia.

  • Mentee/s would explore the future of data centers and compute capabilities, with a focus on implications for AI governance. Additionally/Alternatively, mentee/s could develop empirical methods to measure compute power, supporting the establishment and application of compute thresholds for effective AI governance.

  • What will the future of compute look like, and how will that shape AI governance?

    Current compute governance policies for AI have been developed in the context of today's compute ecosystem and operating practices. However, with record levels of investment flowing into optimizing existent, and building new, data center capabilities, it will be important to consider likely future developments (both hardware and software; operational and technical) that could shape how we use compute to govern AI. Can we analyze what innovations, and characteristics of new capabilities coming online, that may be relevant here? How are compute providers innovating to improve efficiency and availability of AI compute? And what are the security and governance implications? This work could look at both opportunities and challenges to compute governance, and could help inform current and future policy development.

    Empirically measuring compute

    Compute thresholds are being used by governments to identify which AI systems may warrant greater oversight and risk management. Yet the process for measuring compute - compute accounting - is not standardized across organizations/countries. It would be technically possible to measure this empirically - using metadata to measure observations of a cluster's hardware-level characteristics. Could we design a process for this, and undertake a proof of concept, to demonstrate how empirical compute accounting could assist with uniform application of regulatory thresholds?

  • Proactive, motivated individuals with experience getting deep on tech issues. An interest in hardware/data centers is a bonus.

Timothy Fist
Senior Technology Fellow, Institute for Progress; Adjunct Senior Fellow, CNAS

Timothy is investigating methods to classify and monitor AI workloads in compute environments, developing techniques to ensure robustness against adversarial obfuscation, and addressing regulatory compliance through innovative AI governance strategies.

  • Tim Fist is a Senior Technology Fellow with the Institute for Progress, and a Senior Adjunct Fellow with the Technology and National Security Program at the Center for New American Security (CNAS). His work focuses on policy for AI compute and differential technological development. Previously, Tim was a Fellow at CNAS, and worked as the Head of Strategy & Governance at Fathom Radiant, an artificial intelligence hardware company. Prior to that work, Fist held several senior machine learning engineering positions, and was the Head of Product at Eliiza, a machine learning consultancy. His writing has been featured in Foreign Policy and ChinaTalk, and his work has been cited in numerous outlets, including Wired, Reuters, Inside AI Policy, The Wire China, and the Wall Street Journal.

  • Frontier AI workload classification

    It will likely be useful to move toward a governance regime where compute providers (e.g. large cloud firms) provide an independent layer of verification on the activities of their customers. For example, it would be useful if compute providers submitted independent reports to the U.S. government whenever one of their customers ran a large-scale pre-training workload exceeding 10^26 operations (mirroring the current requirements imposed by the U.S. AI Executive Order on model developers). However, compute providers will typically (by design) have limited information about the workloads run by their customers. Can we use this information to develop and implement classification techniques that reliably classify workloads run on large clusters?

    Making workload classification techniques robust to distributional shift

    Changes in the hardware, software packages, and specific algorithms used to run frontier AI workloads will likely degrade the accuracy of workload classification techniques over time, perhaps in sudden and unexpected ways. Can we design an overall process for developing, implementing, and monitoring the efficacy of workload classification techniques that is robust to this kind of degradation?

    This is a follow-up project to "Frontier AI workload classification".

    Making workload classification techniques robust to adversarial gaming

    Workload classification becomes much harder in cases where the customer is actively trying to obfuscate their activities, by introducing deliberate noise in the code and/or data run as part of a workload. Can we design workload classification approaches that are robust to this kind of gaming (to within certain cost-efficiency bounds from the customer's perspective), or that are able to detect when this kind of gaming is occurring?

    This is a follow-up project to "Frontier AI workload classification", and could possibly be combined with "Making workload classification techniques robust to distributional shift".

    Tackling "compute structuring"

    Compute-based reporting thresholds for training runs (e.g. the requirements in the U.S. AI Executive order) could be a useful way to ensure model developers are complying with best practices for AI safety. However, such thresholds are likely to be highly "gameable". One relevant case study comes from finance: in the United States, a law was introduced requiring banks to report all transactions above a particular threshold to the government, in order to monitor money laundering, tax evasion, and other nefarious financial activities. In response, "transaction structuring" became a widespread practice, where customers of banks eager to avoid scrutiny broke up their transactions into multiple transactions, each below the reporting threshold. There are two clear ways this could be done in compute world:

    Breaking up a large workload into multiple sequential workloads, where each workload is run using a different cloud account or provider, each workload is below the compute threshold, and each workload takes as input partially-trained model weights from the previous workload. Distributing a large workload across multiple cloud accounts/providers, where each account/provider serves as a single data parallel worker within the larger distributed training run, periodically broadcasting/receiving weight updates from other workers.

    What other forms of compute structuring might be possible? What are some effective technical solutions to both detect and prevent different forms of compute structuring? For example, tackling (1) might involve having compute providers report all workloads above a particular throughput (in terms of operations per second), and then aggregating those reports in order to identify (perhaps also informed by shared identity information based on KYC) likely sequential training runs. Tackling (2) might involve imposing technical boundaries on off-cluster latency/bandwidth in order to make such a scheme prohibitive from a cost-efficiency perspective.

    A system for secure chip IDs

    Being able to assign un-spoofable IDs to hardware devices in data centers will be extremely useful for tracking and verifying compute usage. Such IDs could also be useful for linking AI agents back to the physical infrastructure on which they are deployed, in order to provide an additional layer of oversight. What are some promising technical approaches for implementing secure IDs that are sufficiently robust against hardware and software-level attacks? How can ID creation, provisioning, and tracking be handled in a way that makes downstream policy measures maximally easy (e.g. chip tracking for anti-smuggling)?

  • Technical familiarity with the compute supply chain, and how large-scale AI training works from a hardware/parallelization perspective. At least a basic level of programming/ML ability will also be useful, though this will vary depending on project.