The faces of African American women were falsely identified more often in the kinds of searches used by police investigators, potentially increasing their chances of being falsely accused or arrested for a crime.
Algorithms developed in the U.S. also showed high error rates for “one-to-one” searches of Asians, African Americans, Native Americans and Pacific Islanders. Such searches are critical to functions including cellphone sign-ons and airport boarding schemes, and errors could make it easier for impostors to gain access to those systems.
Women were more likely to be falsely identified than men, and the elderly and children were more likely to be misidentified than other age groups, the study found. Middle-aged white men generally benefited from the highest accuracy rates.
The National Institute of Standards and Technology, the federal laboratory known as NIST that develops standards for new technology, found “empirical evidence” that most of the facial-recognition algorithms exhibit “demographic differentials” that can worsen their accuracy based on a person’s age, gender or race.
The study could fundamentally shake one of American law enforcement’s fastest-growing tools for identifying criminal suspects and witnesses, which privacy advocates have argued is ushering in a dangerous new wave of government surveillance tools.
The FBI alone has logged more than 390,000 facial-recognition searches of state driver’s license records and other federal and local databases since 2011, federal records show. Members of Congress this year have voiced anger over the technology’s lack of regulation and its potential for discrimination and abuse.
The federal report confirms previous studies from researchers who found similarly staggering error rates. Companies such as Amazon had criticized those studies, saying they reviewed outdated algorithms or used the systems improperly.
One of those researchers, Joy Buolamwini, said the study was a “comprehensive rebuttal” to skeptics of what researchers call “algorithmic bias.”
“Differential performance with a factor of up to 100?!?” she told The Washington Post in an email Thursday. The study, she added, is “a sobering reminder that facial recognition technology has consequential technical limitations alongside posing threats to civil rights and liberties.”
Investigators said they did not know what caused the gap but hoped the findings would, as NIST computer scientist Patrick Grother said in a statement, prove “valuable to policymakers, developers and end users in thinking about the limitations and appropriate use of these algorithms.”
NIST’s test examined most of the industry’s leading systems, including 189 algorithms voluntarily submitted by 99 companies, academic institutions and other developers. The algorithms form the central building blocks for most of the facial-recognition systems around the world.
The algorithms came from a range of major tech companies and surveillance contractors, including Idemia, Intel, Microsoft, Panasonic, SenseTime and Vigilant Solutions. Notably absent from the list was Amazon, which develops its own software, Rekognition, for sale to local police and federal investigators to help track down suspects.
NIST said Amazon did not submit its algorithm for testing. The company did not immediately offer comment but has said previously that its cloud-based service cannot be easily examined by NIST’s test. Amazon founder and chief executive Jeff Bezos also owns The Washington Post.
Grother, the NIST lead researcher, said other companies with cloud-based systems had been able to submit their algorithms, including Microsoft, who he said “sent us very capable and very reliable software.” Of Amazon, he added: “Our test remains open if they elect to participate.”
The NIST team tested the systems with roughly 18 million photos of more than 8 million people, all of which came from databases run by the State Department, the Department of Homeland Security and the FBI. No photos were taken from social media, video surveillance or the open Internet, they said.
The test studied both how algorithms work on “one-to-one” matching, used for unlocking a phone or verifying a passport, and “one-to-many” matching, used by police to scan for a suspect’s face across a vast set of driver’s license photos. Investigators tested both false negatives, in which the system fails to realize two identical faces are the same, as well as false positives, in which the system identifies two different faces as being the same — a dangerous failure for police, who could end up arresting an innocent person.
Some algorithms produced few errors, but the disparity in accuracy between different systems could be enormous. There is no national regulation or standard for facial-recognition algorithms, and local law-enforcement agencies rely on a wide range of contractors and systems with different capabilities and levels of accuracy. The algorithms themselves — with names such as “anyvision-004” and “didiglobalface-001″ — are almost entirely unknown to anyone outside the industry.
Algorithms developed in Asian countries had smaller differences in error rates between white and Asian faces, suggesting a relationship “between an algorithm’s performance and the data used to train it,” the researchers said.
“You need to know your algorithm, know your data and know your use case,” said Craig Watson, a manager at NIST. “Because that matters.”