NCRN Virtual Seminar - Blocking by Locality Sensitive Hashing for Entity Resolution

Speaker: Rebecca Steorts (Carnegie Mellon University)

Title: Blocking by Locality Sensitive Hashing for Entity Resolution

Abstract: Record linkage for monitoring on-going human-rights abuses faces unique challenges: it must handle extremely noisy data and it must be able to deal with large volumes of data rapidly.

Finally, it must be able to compare names with high accuracy. We propose blocking schemes based on novel locality sensitive hashing techniques that can be integrated with any entity resolution (record linkage) procedure. We evaluate our methods via an application to the Syrian civil war. Using a database of over 250,000 death records, we partition these individuals into blocks and assess the recall of our method using over 11,000 record pairs labeled as matches and non-matches by the Human Rights Data Analysis Group. We discuss challenges of this work that is specific to this application involving Arabic names.

º Carnegie Mellon: contact William Eddy (
º Census Bureau headquarters: Room TBD, contact Dan Weinberg (
º Cornell University, Ithaca campus: Ives 381
º Duke University: contact Alan Karr (
º University of Missouri: contact Scott Holan (
º University of Nebraska-Lincoln: Room TBD: contact: Allan McCutcheon (
º Northwestern University: contact Zach Seeskin (

  • Live video conference. Please contact Lars Vilhuber ( if you wish to participate by video conference, by Monday, March 3, 2014.
Mar 05, 2014, 3:00pm to 4:30pm EST
