I’m working on code to resolve committee members, to improve the organization records in openstates/people. I need to convert people’s names to their OpenStates OCD IDs. However, I know that in many cases, names are not written the same way in the committee memberships as they are for the people themselves. Anyone who’s worked on scrapers is familiar with the problem, in this context, or in recording roll call votes, or other contexts.
What is current best practice for fuzzy matching of legislator names? I’m hoping there’s a piece of code somewhere that, given a name “Doe” and a list of legislator names, returns one of:
- it’s definitely Jane Doe
- it’s probably Jane Doe
- it’s probably missing from the list
- it’s ambiguous; there are legislators Jane Doe and John Doe, and I can’t tell which.
If anyone’s intrigued from a research perspective, a Google search for
“edit distance” “proper name”
yields some papers.