AI Governance

As AI systems continue to advance and develop even stronger capabilities, can we develop policies, standards, and frameworks to guide the ethical development, deployment, and regulation of AI technologies, focusing on ensuring safety and societal benefit?

Mentors

Nico Miailhe
CEO, PRISM Eval; Founder, The Future Society

Nico is developing a comprehensive cognitive profiling benchmark for LLMs to evaluate their higher-order cognitive abilities, with a focus on governance and policy implementation for AI safety.

  • Co-founder and CEO of PRISM Eval, a new Paris-based evals organization I am starting with 2 MATS alumni. I am also the Founder and President (now non-executive) of the independent AI Policy think tank The Future Society (TFS) which I originally started in 2014 at HKS and which is active in Brussels, SF, DC, Paris, London, Montreal, and more. I am also an OECD, UNESCO, GPAI appointed expert. I have been working on AI Governance since 2011.

  • AI Policy institutional innovation focused on the following: building of a global governance regime for AI and in particular the race to AGI; implementation of the EU AI Act. This includes the design, development and "policy injection" of key AI alignment institutions and governance mechanisms' blueprints such as metrology, evals and certification protocols, AI Safety Institutes, content provenance protocols, major accident prevention regime protocols and more.

    In particular, I would like to work with mentees on the governance aspect (what, how, who, when, etc.) of the following project: developing a comprehensive Cognitive Profiling Benchmark for LLMs. The goal is to create a robust and comprehensive benchmark to evaluate the high order cognitive abilities of LLMs in a more realistic and unbiased manner compared to existing benchmarks and compute governance thresholds. Such a benchmark will focus on assessing the high order cognitive capabilities of LLMs, such as reasoning, theory of mind, and world modeling, rather than just their knowledge or task-specific performance. By establishing a set of cognitive ability levels, such a new cognitive benchmark can inform safety considerations and guide the governance of responsible development and deployment of LLMs.

    • 2-3 years of experience in AI Governance minimum (more preferred)

    • Good understanding of AI Eval landscape and challenges is a plus.

    • Good understanding info hazard dynamics

    • Robust safety mindset

    • Basic understanding of the simulator theory

    • Basic understanding of the new LLM Psychology research agenda

Mauricio Baker
Technology and Security Policy Fellow, RAND

Mauricio specializes in the technical aspects of AI governance, focusing on compute governance and ensuring compliance with AI regulations. His research integrates technical insights with policy frameworks to manage and verify AI system operations effectively.

  • Mauricio Baker is a Technology and Security Policy Fellow at RAND. His research focuses on technical aspects of AI governance, including compute governance and verification of compliance with rules on AI (on which he previously published some non-technical work). Before RAND, Mauricio contracted with OpenAI’s Policy Research team and completed a master’s in Computer Science at Stanford, with an AI specialization.

  • Projects would aim to bring technical insights to AI governance, including compute governance. Some potential projects (also open to your ideas!):

    Non-AI uses of GPU clusters: What does the landscape of non-AI uses of GPU clusters look like? More precisely, for various types of non-AI workloads, roughly how much compute is used for them in aggregate, what are the typical specs of a cluster used for this workload, roughly how economically concentrated is this market, and so on?

    • Theory of change: Through a better understanding of non-AI uses of GPU clusters, AI regulations will be better able to avoid both false negatives (dangerous loopholes involving “non-AI” GPU clusters) and false positives (unnecessarily burdening genuinely irrelevant GPU clusters, sparking unnecessary backlash). This research would help identify how risks from non-AI GPU clusters can be addressed with minimal downsides, if needed.

    • Hour-to-hour tasks for this research may include: reading about technical and non-technical aspects of GPU workloads, compiling data, interviewing experts, and doing writing + simple data visualization about your results.

    Temporarily disabling compute clusters in emergencies: If compute owners or governments found it necessary to temporarily disable some (potentially hacked) AI compute clusters in an emergency, what exactly would be a good way to do so? (Maybe cutting power or connectivity—but how, specifically?)

    • Theory of change: perhaps compute owners or governments will need emergency measures for temporarily disabling AI compute clusters. If so, applied research on how to do so may help ensure it is done effectively and without excess downsides.

    • Hour-to-hour tasks for this research may include: reading about data center, electrical, and internet infrastructure; tactful expert interviews; and writing up findings.

    Compute and algorithmic progress: How does compute affect algorithmic progress? More precisely, how does the amount of compute used for running ML experiments affect the rate of algorithmic progress in LLMs?

    • Theory of change: Insofar as compute is important to algorithmic progress, this suggests that compute offers a lever over the rate and proliferation of algorithmic progress (which might by default lead to excess proliferation of the ability to carry out globally harmful AI activities).

    • (Note it would be good to sync with researchers at Epoch and IAPS on whether they are doing similar work.)

    • Hour-to-hour tasks for this research may include: reading ML papers, interviewing ML experts, building a theoretical model, running ML experiments.

  • Must-have qualifications:

    • Basic computer science knowledge (one introductory class or equivalent).

    • Strong analytical reasoning and writing skills.

    Ideal qualifications:

    • Significant expertise on ML, software, hardware, or other relevant topics.

    • Prior research experience, ideally in a related area.

    Mentorship looks like:

    • Weekly 30-minute meetings (or 1-hour if that seems productive, and possibly in a small group if that seems most productive, in which case Mauricio would also make himself available for further 1:1 calls)

    • Detailed feedback on drafts

    • Slack/Signal/email response time typically ≤ 48 hours

    • (If you’d like) Support via suggesting project ideas, introducing you to some others in the field, and career advising

    • Mauricio’s mentorship experience: currently mentoring two students on part-time research projects, and has done some management as part of a summer job (back in high school) and university groups.

Lisa Thiergart
Research Manager, MIRI

Lisa Thiergart's work aims to bridge technical safety research with evolving policy demands by designing robust safety standards and regulatory frameworks that anticipate the challenges posed by advances toward smarter-than-human AI.

  • Lisa is the research manager at MIRI and leads the newly formed technical governance team. Lisa has a BSc in Computer Science from TU Munich and specialized in ML research at Georgia Tech, including some research on RLHF for robotic control. Lisa also studied partial degrees in neuroscience and medicine and holds a graduate degree in technology management. Lisa was a MATS 3.0 and 3.1 scholar mentored by Alex Turner where she worked on model internals and contributed to the new technique of activation additions. After MATS, Lisa joined MIRI where she’s been working on research management, strategy and new research directions during the last year. Lisa is also a fellow and AI safety grant advisor at the Foresight Institute, where she explores whole brain emulation and its potential for AI safety. After focussing primarily on technical alignment research, this spring Lisa opened a new team at MIRI in response to rapid developments in the AI policy space and the need for further technical governance research bridging the gap between current safety standards and possible future government regulation which could provide strong safety cases. The team is recruiting this year, and strong scholars could potentially be hired to continue at MIRI after MATS.

  • My team at MIRI focuses on researching and designing technical aspects of regulation and policy that could lead to safer AI, giving special consideration to which methods can (continue to) function as we move closer toward smarter-than-human AI. This involves research on threat modeling, analyzing current proposals for safety standards, as well as new research around necessary technical components for regulation which could provide strong safety cases. At MATS there is room for scholars to propose projects they are interested in and the scholar may be working closely together with other MIRI team members.

    MATS projects I’d be most interested in supervising would likely be in the areas of:

    • Analyzing limitations of current proposals such as Responsible Scaling Policies.

    • Creating research that contributes to MIRI inputs into regulations and requests for comment by policy bodies (ex. NIST/US AISI, EU, UN)

    • (most likely) Researching and designing alternative Safety Standards, or amendments to existing proposals

    • Creating research that helps communicate with policymakers and governance organizations

  • I'm looking for technical scholars with an interest in contributing to AI policy. Ideally scholar would have a strong background in either CS, math or philosophy (although more exotic candidates who can apply the tools of these domains well could be a good fit too), as well as have:

    • a strong overview of the technical alignment space and challenges in AI Safety

    • strong written and verbal communication skills

    • excitement about interfacing with a fast moving team that produces research that is relevant to current policy developments

    • conceptual familiarity with and interest in RSPs and Safety Standards

    • previous experience with Policy or Evals is a bonus but not a requirement.

Timothy Fist
Senior Technology Fellow, Institute for Progress
Adjunct Senior Fellow, CNAS

Timothy is investigating methods to classify and monitor AI workloads in compute environments, developing techniques to ensure robustness against adversarial obfuscation, and addressing regulatory compliance through innovative AI governance strategies.

  • Tim Fist is a Senior Technology Fellow with the Institute for Progress, and a Senior Adjunct Fellow with the Technology and National Security Program at the Center for New American Security (CNAS). His work focuses on policy for AI compute and differential technological development. Previously, Tim was a Fellow at CNAS, and worked as the Head of Strategy & Governance at Fathom Radiant, an artificial intelligence hardware company. Prior to that work, Fist held several senior machine learning engineering positions, and was the Head of Product at Eliiza, a machine learning consultancy. His writing has been featured in Foreign Policy and ChinaTalk, and his work has been cited in numerous outlets, including Wired, Reuters, Inside AI Policy, The Wire China, and the Wall Street Journal.

    With a background in machine learning and emerging AI hardware technologies, Fist brings a technical lens to problems in national security and AI policy. His career has ranged from developing and deploying AI systems for Fortune 500 companies, to writing code currently being used on a satellite in orbit. He has built software in a wide range of domains, including agriculture, finance, telecommunications, and space navigation. Originally from Australia, Fist holds a B.A. (Honors) in Aerospace Engineering, and a B.A. in Political Science, from Monash University.

  • Frontier AI workload classification

    It will likely be useful to move toward a governance regime where compute providers (e.g. large cloud firms) provide an independent layer of verification on the activities of their customers. For example, it would be useful if compute providers submitted independent reports to the U.S. government whenever one of their customers ran a large-scale pre-training workload exceeding 10^26 operations (mirroring the current requirements imposed by the U.S. AI Executive Order on model developers). However, compute providers will typically (by design) have limited information about the workloads run by their customers. Can we use this information to develop and implement classification techniques that reliably classify workloads run on large clusters? Where classes of interest will likely be: "not deep learning", "pre-training", "fine-tuning", "inference".

    Making workload classification techniques robust to distributional shift

    Changes in the hardware, software packages, and specific algorithms used to run frontier AI workloads will likely degrade the accuracy of workload classification techniques over time, perhaps in sudden and unexpected ways. Can we design an overall process for developing, implementing, and monitoring the efficacy of workload classification techniques that is robust to this kind of degradation?

    This is a follow-up project to "Frontier AI workload classification".

    Making workload classification techniques robust to adversarial gaming

    Workload classification becomes much harder in cases where the customer is actively trying to obfuscate their activities, by introducing deliberate noise in the code and/or data run as part of a workload. Can we design workload classification approaches that are robust to this kind of gaming (to within certain cost-efficiency bounds from the customer's perspective), or that are able to detect when this kind of gaming is occurring?

    This is a follow-up project to "Frontier AI workload classification", and could possibly be combined with "Making workload classification techniques robust to distributional shift".

    Tackling "compute structuring"

    Compute-based reporting thresholds for training runs (e.g. the requirements in the U.S. AI Executive order) could be a useful way to ensure model developers are complying with best practices for AI safety. However, such thresholds are likely to be highly "gameable". One relevant case study comes from finance: in the United States, a law was introduced requiring banks to report all transactions above a particular threshold to the government, in order to monitor money laundering, tax evasion, and other nefarious financial activities. In response, "transaction structuring" became a widespread practice, where customers of banks eager to avoid scrutiny broke up their transactions into multiple transactions, each below the reporting threshold. There are two clear ways this could be done in compute world:

    Breaking up a large workload into multiple sequential workloads, where each workload is run using a different cloud account or provider, each workload is below the compute threshold, and each workload takes as input partially-trained model weights from the previous workload. Distributing a large workload across multiple cloud accounts/providers, where each account/provider serves as a single data parallel worker within the larger distributed training run, periodically broadcasting/receiving weight updates from other workers.

    What other forms of compute structuring might be possible? What are some effective technical solutions to both detect and prevent different forms of compute structuring? For example, tackling (1) might involve having compute providers report all workloads above a particular throughput (in terms of operations per second), and then aggregating those reports in order to identify (perhaps also informed by shared identity information based on KYC) likely sequential training runs. Tackling (2) might involve imposing technical boundaries on off-cluster latency/bandwidth in order to make such a scheme prohibitive from a cost-efficiency perspective.

    A system for secure chip IDs

    Being able to assign un-spoofable IDs to hardware devices in data centers will be extremely useful for tracking and verifying compute usage. Such IDs could also be useful for linking AI agents back to the physical infrastructure on which they are deployed, in order to provide an additional layer of oversight. What are some promising technical approaches for implementing secure IDs that are sufficiently robust against hardware and software-level attacks? How can ID creation, provisioning, and tracking be handled in a way that makes downstream policy measures maximally easy (e.g. chip tracking for anti-smuggling)?

  • Timothy Fist has worked with a variety of professionals spanning multiple fields.