As tech companies developed facial recognition systems that quickly resume government surveillance and compromise privacy, they may have received help from an unexpected source: your face.
Corporations, universities, and government laboratories have used millions of images obtained from a variety of online sources to develop the technology. Now researchers have created an online tool called Exposing.AI that allows users to search many of these collections of images for their old photos.
The tool, which compares images from the Flickr online photo-sharing service, provides a glimpse into the vast amounts of data required to build a wide variety of AI technologies, from facial recognition to online chatbots.
“People need to realize that some of their most intimate moments have been armed,” said one of its creators, Liz O’Sullivan, technology director for the Surveillance Technology Oversight Project, a privacy and civil rights group. She helped create Exposing.AI with Adam Harvey, a researcher and artist in Berlin.
Artificial intelligence systems don’t magically get intelligent. They learn by locating patterns in human-generated data – photos, voice recordings, books, Wikipedia articles, and all sorts of other materials. Technology just keeps getting better, but it can learn human prejudices against women and minorities.
People may not know that they are contributing to AI education. For some, that’s a curiosity. For others, it’s hugely scary. And it can be against the law. A 2008 Illinois law, the Biometric Information Privacy Act, imposes financial penalties if the facial scans are used by residents without their consent.
In 2006, Brett Gaylor, a documentary filmmaker from Victoria, British Columbia, uploaded his honeymoon photos to Flickr, a service popular at the time. Almost 15 years later, using an early version of Mr. Harvey’s Exposing.AI, he discovered that hundreds of these photos had invaded multiple data sets that may have been used to train facial recognition systems around the world.
Flickr, bought and sold by many companies over the years and now owned by the photo sharing service SmugMug, allowed users to share their photos under what is known as a Creative Commons license. This license, common on websites, meant that others could use the photos with certain restrictions, although those restrictions may have been ignored. In 2014, Yahoo, which at the time owned Flickr, used many of these photos in a data set that should be helpful when working on Computer Vision.
Mr. Gaylor, 43, wondered how his photos could have jumped from place to place. He was then told that the photos may have contributed to surveillance systems in the US and other countries, and that one of those systems was used to track the Uighur population in China.
“My curiosity turned to horror,” he said.
How honeymoon photos helped build surveillance systems in China is, in some ways, a story of unintended or unexpected consequences.
Years ago, AI researchers at leading universities and technology companies began collecting digital photos from a variety of sources, including photo sharing services, social networks, dating sites like OkCupid, and even cameras installed on college quads. You shared these photos with other organizations.
That was just the norm for researchers. They all needed data to feed into their new AI systems, so they shared what they had. It was usually legal.
One example was MegaFace, a dataset created by professors at the University of Washington in 2015. They were created without the knowledge or consent of the people whose pictures they folded into the huge pool of photos. The professors put it on the Internet for others to download.
MegaFace has been downloaded more than 6,000 times by corporations and government agencies around the world, according to a request by the New York Times for public records. These included US defense contractor Northrop Grumman; In-Q-Tel, the investment arm of the Central Intelligence Agency; ByteDance, the parent company of the Chinese social media app TikTok; and the Chinese surveillance company Megvii.
The researchers built MegaFace for use in an academic competition to advance the development of facial recognition systems. It was not intended for commercial use. But only a small percentage of those who downloaded MegaFace have publicly entered the competition.
“We are unable to discuss third-party projects,” said Victor Balta, a spokesman for the University of Washington. “MegaFace has been taken out of service and MegaFace data is no longer distributed.”
Some of those who downloaded the data used facial recognition systems. Megvii was blacklisted by the Ministry of Commerce last year after the Chinese government used its technology to monitor the country’s Uighur population.
The University of Washington took MegaFace offline in May and other organizations removed other records. However, copies of these files can be anywhere, and they are likely to provide new research.
Ms. O’Sullivan and Mr. Harvey spent years trying to develop a tool that would tell how all this data was used. It was more difficult than expected.
They wanted to accept someone’s photo and use facial recognition to instantly tell that person how often their face was in one of those records. However, they feared that such a tool would be poorly used – by stalkers or by corporations and nation states.
“The potential for harm seemed too great,” said Ms. O’Sullivan, who is also vice president of responsible AI at Arthur, a New York company that helps companies control the behavior of AI technologies.
In the end, they had to limit how users could search the tool and what results it produced. The tool as it works today is not as effective as they would like it to be. However, researchers feared they might not be able to uncover the breadth of the problem without making it worse.
Exposing.AI itself does not use face recognition. Photos are only located if you can already refer to them online, for example with an Internet address. Users can only search for photos that have been posted to Flickr, and they need a Flickr username, tag, or web address that can be used to identify those photos. (This provides the researchers with the right level of security and privacy protection.)
While this limits the utility of the tool, it is still an eye opener. Flickr images make up a significant portion of the facial recognition records that have been circulated across the internet, including MegaFace.
It’s not difficult to find photos that people have a personal relationship with. By simply searching old emails for Flickr links, The Times found photos that, according to Exposing.AI, were used in MegaFace and other facial recognition records.
Some belonged to Parisa Tabriz, a well-known security researcher at Google. She did not respond to a request for comment.
Mr. Gaylor is particularly concerned about what he discovered through the tool because he once believed that the free flow of information on the Internet was largely positive. He used Flickr because it gave others the right to use his photos under the Creative Commons license.
“I now live the consequences,” he said.
His hope – and the hope of Ms. O’Sullivan and Mr. Harvey – is that business and government will develop new standards, guidelines, and laws that will prevent the bulk collection of personal information. He’s making a documentary about the long, winding, and occasionally disruptive journey of his honeymoon photos to shed light on the problem.
Mr. Harvey firmly believes that something has to change. “We have to get rid of these as quickly as possible – before they cause more damage,” he said.