Stable Marriage in Q/KDB

A friend of mine is applying to Medical Residency programs, which motivated me to investigate the matching algorithm used to match residents with hospitals. NRMP the organization behind the matching process has a dedicated video explaining how the algorithm works. The simplest version of the problem can be thought to consist of n guys and n gals who are looking to pair off. Each gal has a ranked list of guys and each guy a ranked list of gals. We want to find a pairing such that no guy and gal can pair off and be better off. We call this stable, since any gal who wants to leave can’t find a guy who isn’t currently happier with someone else. It takes two to tango, or in this case disrupt a stable pairing.

Gale-Shapley published a result on this problem in 1962 showed a few amazing properties:

  1. A stable pairing always exists
  2. There is an algorithm to get to this pairing in O(n^2)
  3. It is truthful from the proposing side, i.e. if men are proposing then they can’t misrepresent their preferences and achieve a better match
  4. The proposing side gets the best stable solution and reviewing side gets the worst stable solution
  5. The reviewing side could misrepresent their preferences and achieve a better outcome

Knuth wrote about this problem pretty extensively in a French manuscript. The University of Glasgow has become a hub for this problem, Professor Emeritus Robert Irving wrote the book on the topic and his student David Manlove has written another. New variations and approaches keep coming out, so in a way this problem is still an active area of research. For this post I want to look at only the simplest version and describe a vectorized implementation.

First let’s walk through an example by hand (note: lowercase letters are men)

Men ranking women

d| `D`B`C
b| `B`C`D
c| `C`D`B

Women ranking men

D| `b`c`d
B| `b`c`d
C| `d`b`c

First we pair off d,D
Then we pair off b,B
Finally we pair c,C

If we had started from the women:

First we pair D,b
Then we pair B,b and drop D to be unpaired, because b prefers B to D
Then we pair D,c
Finally we pair C,D

So we see there are two stable pairings. B always pairs with b, but either the men (d,c) or the women (D,C) will get their worst choice.

The algorithm:

Propose a pairing and engage the pair:

  • If the reviewing party (woman) prefers the current proposing party to her current engagement:
    1. Jilt the current party (A jilted party goes back into the pile of unengaged proposing parties, with all the parties he has proposed to so far crossed off)
    2. Become engaged to the proposing party (A reviewing party always prefers any partner to being alone)
  • If the current proposal did not work:
    1. Continue down the ranked list of the proposing party until they find a match, i.e. go back to ste

Here is how we can implement the naive version of this algorithm in Q/KDB:

Here is how we can implement the naive version of this algorithm in Q/KDB:
i:3 /number of each party
c:0N /number of ranked individuals on each list 
     /(null takes a random permutation, convenient since we want everyone to rank everyone) we can also use neg[i]
m:neg[i]?`1 /choose i random symbols with 1 char to represent each possible man
w:upper m /assign same symbols but upper case as list of all possible women
dm:m!(#[i;c]?\:w) /create a dictionary of male preferences
       / each man chooses a random permutation of women
dw:w!#[i;c]?\:m /each women chooses a random permutation of men
t:1!([]w:w;m:`;s:0W) /our initial pairing table, each women is unpaired, 
	          /so her current score is infinity, 
	          /the table is keyed on w, so that a women can only have 1 match
ifin:{$[count[x]>c:x?y;c;0W]} 
 /ifin returns the position in the list like find(?), 
 /but returns infinity if not found instead of length of list
 /example 
 /ifin[`a`b`c;`b] --> 1 
 /ifin[`a`b`c;`d] --> 0W 
The algorithm
f:{[t]
 if[null cm:first key[dm] except (0!t)`m;:t];
 im:first where t[([]w:k);`s]>s:dw[k:dm cm]iifin\: cm;
 t upsert (k im;cm;s im)};

f takes a pairing table of the current pairings. It then looks for the first unmatched man. We call this (c)urrent (m)an, if current man is null, we are done all men are matched, return t, our pairings.

Otherwise, find the first woman on his list of ordered rankings whose happiness (s)core he can improve. We are using iifin to see where his rankings score from the women’s side, and by choosing the first such match, we are preferencing his choice. We then insert this pair into the pairing (t)able, along with the happiness (s)core.
f over t /repeat the process of adding pairs into the table until all men are matched.

Before we look at the vectorized (batched) version, there are some simple optimizations we can make. For example, since we notice that we will be calculating how much a woman likes a man back when he chooses her, we can pre-tabulate all pairs in advance.

Here is the code that creates a table that will have every man and woman pairing with both his and her score next to it:

/first convert the dictionary to a table that has each man and his rankings
/along with a column which ranks the match, 
tm:{([]m:key x;w:value x;rm:(til count ::)each value x)} dm
/looks like 
m w     rm   
-------------
m E M P 0 1 2
e P M E 0 1 2
p M E P 0 1 2
/then ungroup looks like this:
tm:ungroup tm 

m w rm
------
m E 0 
m M 1 
m P 2 
e P 0 
e M 1 
e E 2 
p M 0 
p E 1 
p P 2 

repeat the procedure with women's table. 

tw:ungroup {([]w:key x; m:value x;rw:(til count ::) each value x)} dw

/now left join each match 
ps:tm lj `w`m xkey tw
m w rm rw
---------
m E 0  2 
m M 1  0 
m P 2  1 
e P 0  2 
e M 1  2 
e E 2  1 
p M 0  1 
p E 1  0 
p P 2  0 

Now that we have pre-computed all the scores we can look at the vectorized version:

First, we notice that we can technically make proposals from all the men in one batch. There is one caveat, if two or more men rank one woman as their top choice during a matching batch the tie is broken by the woman. So we can grab each man’s current first choice and break all the ties with that woman’s ranking.

Second, we notice that any woman who is currently engaged will stay engaged unless a man she desires more becomes available. This means we can eliminate all women who are already engaged to someone they prefer more from each man’s ranking. In other words, if Alice is currently engaged to Bob, and she prefers only Carl to Bob, Sam doesn’t stand a chance and we can eliminate Alice from Sam’s ranking completely.

These two tricks are all it takes to vectorize(batch) the solution.

Now let’s look at the code:

Here is the algorithim in one block:

imin:{x?min x} /get the index of min element
smVec:{[ps;t]ms:exec last s by w from t;
 update check:(check+1)|min rm by m from ps where not m in (0!t)`m,rw<ms[w];
 p:select m:m imin rw, s:min rw by w from ps where rm=check;t upsert p}

Let’s break it down statement by statement:

smVec:{[ps;t]
/
/ smVec takes a data table which are our (p)recomputed (s)tatistics 
/ it also takes the pairs table t, which is the same as above
/
 ms:exec last s by w from t;
/
/ms is our current best score for each woman who is currently engaged
/
 update check:(check+1)|min rm by m from ps where not m in (0!t)`m,rw<ms[w];
/ For each man we keep a counter of who we should check next, 
/ we increment this each time he is not yet engaged (not m in (0!t)`m) 
/ so we look at his next viable candidate, 
/ also we remove any candidates that he cannot satisfy (rw<ms[w]). 
/ He needs a woman who prefers him to her current match.
/ So we take the max of either the next candidate 
/ or the next one whose score would go up if he matched.
 p:select m:m imin rw, s:min rw by w from ps where rm=check;
/ now we simply select all the ones to check and break ties by min rw,
/ ie the woman's rank.
 t upsert p}
/lastly update/insert current matches into the pairings table

We need a stopping condition so that we can stop iterating:

Originally, I thought we need a condition that said something like “as long as there are unmatched men, keep iterating. {not all key[dm] in (0!x)`m}”

pairings:smVec[update check:-1 from ps]/[{not all key[dm] in (0!x)m};t]

As it turns out, once we remove all the clutter and always append only viable matches this turns out to be unnecessary. We can simply iterate until convergence which is the default behavior of over(/). So running the function looks like this:

pairings:smVec[update check:-1 from `ps] over t

We are done, but we can extend this one step further by allowing for incomplete preference rankings. The canonical way to treat incomplete ranking is to assume that a person would rather be a widower than be married to someone he didn’t rank.


To do this, we simply add the person matching with himself at the end of every ranking and add a shadow woman which is just the man ranking himself first. This means that once we get to the end of his ranked list he will automatically match with himself. With this small addition to the data, the complete setup function looks like this:

setup:{[i;c]c:neg c;m:neg[i]?`4;w:upper m;
 dm:m!(#[i;c]?\:w),'m;dw:w!#[i;c]?\:m;
 dw,:(m!0N 1#m);t:1!([]w:w,m;m:`;s:0W);
 tm:{([]m:key x;w:value x;rm:(til count ::)each value x)} dm;
 tw:{([]w:key x; m:value x;rw:(til count ::) each value x)} dw;
 ps:ej[`m`w;ungroup tm;ungroup tw];
 `dm`dw`t`ps set' (dm;dw;t;ps)}
setup[1000;0N] /ON means rank every member. 
pairings:smVec[update check:-1 from `ps] over t

Now we can verify that we have the right solution by comparing to the official python implementation

/write out some test data for python
setup[1000;0N]
pairings:smVec[update check:-1 from `ps] over t
`:m.csv 0: "," 0: ungroup {([]m:key x;w:-1 _' value x)}dm
`:w.csv 0: "," 0: ungroup {([]w:key x;m:value x)}count[dm]#dw

/python solve: (we need to increase the recursion limit in python because the library can't solve large problems without it, also I wasn't able to run problems bigger than 2000 on my machine without a segfault)
/
import pandas as pd
from matching.games import StableMarriage
import sys
sys.setrecursionlimit(50000)  
m=pd.read_csv("m.csv") 
w=pd.read_csv("w.csv") 
md=m.groupby("m")['w'].apply(list).to_dict()  
wd=w.groupby("w")['m'].apply(list).to_dict()  
game=StableMarriage.create_from_dictionaries(md,wd)
s=game.solve()
pd.DataFrame([(k, v) for k, v in s.items()], columns=["m","w"]).to_csv("mwsol.csv", index=False)

\
pysol:("SS";enlist ",") 0: `mwsol.csv
pysol:pysol lj update mm:m from pairings
0=count select from pysol where mm<>m

If we join on the women and our men match the python men, then we have the same answer.

Lastly, a note on performance: we get a 3x performance boost from vectorization
The naive version solves the
\t pairs:f over t
5000 -> 185771 (3 min)
2000 -> 15352 (15 sec)
1000 -> 2569 (2.5 sec)
\t smVec[update check:-1 from `ps] over t
5000 -> 59580 (1 min)
2000 -> 4748 (5 sec)
1000 -> 1033 (1 sec)

The python version takes 20 seconds when n is 1000 and I couldn’t test higher.
Which is about 7x slower than naive in q, and 20x slower than vectorized.

Further the python library doesn’t allow for incomplete preference lists. (we can create complete lists that have sentinels values where the list ends, but the memory usage is prohibitively expensive for large problems). Lastly, the python library runs into the recursion limit, which we avoided by iterating over a state data structure instead of using the stack.

Meanwhile in practice there are about 40k med students a year and each only ranks less than 100 choices. Which means we can solve that problem really quickly.
setup[100000;100]
\t pairings:smVec[update check:-1 from `ps] over t
1465 (1.5 seconds)

In the next post, I will describe how to extend this solution to apply the stable marriage problem to resident hospital matching problem where each hospital may have more than 1 slot.