Introduction
Every year in the United States, ~50,000 graduating medical students (doctors, for short) are placed into the residencies where they will begin their careers by the National Resident Matching Program (or The Match). To do so, the program finds a stable matching: an assignment of doctors to medical programs (hospitals, for short) such that there is no pair of doctor and hospital that would prefer to be assigned each other over the match(es) they ended up with. Each candidate doctor ranks their preferred hospitals and each hospital ranks their preferences in candidates, and the matching proceeds algorithmically. One of the original problems in the field of algorithmic mechanism design, the matching program uses the Roth-Peranson algorithm (1999), a modification of the earlier Gale-Shapley algorithm (1962). These contributions in large part earned Roth & Shapley the 2012 Nobel Prize in Economics. The Match is a classic example of a computation whose transparency and accountability can benefit from private verifiable computation: at present, doctors receive no direct assurance that an algorithmic decision that will shape their careers (and their career earnings) is being executed correctly. However, The Match cannot simply publish a transcript of the matching, as it would harm the privacy of the doctors in particular (“Alice said she wanted to come back home for residency, but she didn’t rank any local hospitals” or “when we interviewed Bob he said fifteen years ago he ranked us highly in The Match, but he only had us seventh”). Anonymizing the doctors can only accomplish so much as it offers only the illusion of privacy; modern re-identification techniques routinely pierce de-identified data. As a result, anonymization fails to actually anonymize. Our goal is to create a system that provides complete transparency about the correctness of the matching algorithm while preserving the privacy of all preference rankings. Specifically, we want to generate a cryptographic proof that demonstrates the Gale-Shapley algorithm was executed correctly and produced a stable matching, without revealing any individual’s private preferences. The Nexus zkVM enables us to achieve exactly this balance between accountability and privacy.Implementation
To start, we will follow along with the SDK Quick Start. After installing the zkVM, to initialize the host and guest programs, we run:Guest Program
We begin with the guest program. This is the program to be proven, which will need to implement the matching itself. In our case, we will just use the simpler Gale-Shapley approach, which runs in quadratic time. We will not explain how the following implementation works in detail, as there are plenty of existing explainers of the algorithm, its runtime, and its correctness.matching/src/guest/src/main.rs
i
of the employer_prefs
matrix encodes the ranking for the i
th doctor), and a parameter n = employers.len()
that counts the number of required matches to be made. It then returns a vector of length n
, where hires[j] = i
means the j
th hospital will hire the i
th doctor.
To use this function, we will also need a main
function that handles the input and output of the zkVM, especially as we will want the preference matrices themselves to be kept private.
matching/src/guest/src/main.rs
Note that in the above, we do not actually need to write
#[nexus_rt::private_input(prefs)]
, as all inputs default to private unless they are explicitly declared to be public.Host Program
To prove an execution of this guest program, we use the host program. Our host program compiles the guest program and runs it using the preferences provided by both the candidates and hospitals. Since the host program will not be made public, it is perfectly fine for us to include private information like the preference rankings directly within it.matching/src/main.rs
The specific ranking matrices we use here were randomly generated, for demonstration purposes.
Verification
Once the program and its public input and output (theview
) and the proof
are published, any interested party — a participating doctor or hospital, or a third party such as a medical school, journalist, or regulator — can verify the correctness of the execution without needing access to the preferences.
hires
vector) has been tampered with to match candidates to hospitals in violation of the Gale-Shapley guarantees, this verification will provably fail.
One limitation of the guest program used above is that a verifier cannot directly confirm that the preferences themselves have not been tampered with. The guarantee of the zkVM is only existential over private inputs: that there exists some private input for which the execution is correct, not necessarily that the private input is what it should be in the context of the broader application. With a bit more effort however, we can even mitigate this concern.
At a high level, we’d extend the system so that when submitting their preferences, a concerned candidate would also supply The Match with a locally-generated, random passphrase known only to them. These passphrases would also be provided to the guest program as private input. The guest program would then go row by row through the preference matrix and compute hashes of each candidate’s preferences salted with their passphrase, and append those records to the public output. The candidate could then go and check the correctness of their hash to verify that their preferences were input honestly. The hash (as a random oracle) also prevents anyone else without a candidate’s passphrase from determining that candidate’s provided preferences.
Although in this case we focus on stable matching, this design of using the zkVM for transparency and accountability applies to any form of algorithmic decision making: including decisions produced by more complex decision models such as those created through machine learning.