July 2, 2024
https://www.hrw.org/news/2024/07/02/australia-childrens-personal-photos-...
(Sydney) – Personal photos of Australian children are being used to create powerful artificial intelligence (AI) tools without the knowledge or consent of the children or their families, Human Rights Watch said today. These photos are scraped off the web into a large data set that companies then use to train their AI tools. In turn, others use these tools to create malicious deepfakes that put even more children at risk of exploitation and harm.
“Children should not have to live in fear that their photos might be stolen and weaponized against them,” said Hye Jung Han, children’s rights and technology researcher and advocate at Human Rights Watch. “The Australian government should urgently adopt laws to protect children’s data from AI-fueled misuse.”
Analysis by Human Rights Watch found that LAION-5B, a data set used to train popular AI tools and built by scraping most of the internet, contains links to identifiable photos of Australian children. Some children’s names are listed in the accompanying caption or the URL where the image is stored. In many cases, their identities are easily traceable, including information on when and where the child was at the time their photo was taken.
One such photo features two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural. The accompanying caption reveals both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia. Information about these children does not appear to exist anywhere else on the internet.
Human Rights Watch found 190 photos of children from all of Australia’s states and territories. This is likely to be a significant undercount of the amount of children’s personal data in LAION-5B, as Human Rights Watch reviewed fewer than 0.0001 percent of the 5.85 billion images and captions contained in the data set.
The photos Human Rights Watch reviewed span the entirety of childhood. They capture intimate moments of babies being born into the gloved hands of doctors and still connected to their mother through their umbilical cord; young children blowing bubbles or playing instruments in preschools; children dressed as their favorite characters for Book Week; and girls in swimsuits at their school swimming carnival.
The photos also capture First Nations children, including those identified in captions as being from the Anangu, Arrernte, Pitjantjatjara, Pintupi, Tiwi, and Warlpiri peoples. These photos include toddlers dancing to a song in their Indigenous language; a girl proudly holding a sand goanna lizard by its tail; and three young boys with traditional body paint and their arms around each other.
Many of these photos were originally seen by few people and previously had a measure of privacy. They do not appear to be possible to find through an online search. Some photos were posted by children or their family on personal blogs and photo- and video-sharing sites. Other photos were uploaded by schools, or by photographers hired by families to capture personal moments and portraits. Some of these photos are not possible to find on the publicly accessible versions of these websites. Some were uploaded years or even a decade before LAION-5B was created.
Human Rights Watch found that LAION-5B also contained photos from sources that had taken steps to protect children’s privacy. One such photo is a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating Schoolies week after their final exams. The video’s creator took precautions to protect the privacy of those featured in the video: Its privacy settings are set to “unlisted,” and the video does not show up in YouTube’s search results.
YouTube’s terms of service prohibit scraping or harvesting information that might identify a person, including images of their faces, except under certain circumstances; this instance appears to violate these policies. YouTube did not respond to our request for comment.
Once their data is swept up and fed into AI systems, these children face further threats to their privacy due to flaws in the technology. AI models, including those trained on LAION-5B, are notorious for leaking private information; they can reproduce identical copies of the material they were trained on, including medical records and photos of real people. Guardrails set by some companies to prevent the leakage of sensitive data have been repeatedly broken.
Moreover, current AI models cannot forget data they were trained on, even if the data was later removed from the training data set. This perpetuity risks harming Indigenous Australians in particular, as many First Nations peoples restrict the reproduction of photos of deceased people during periods of mourning.
These privacy risks pave the way for further harm, Human Rights Watch said. Training on photos of real children enables AI models to create convincing clones of any child, based on a handful of photos or even a single image. Malicious actors have used LAION-trained AI tools to generate explicit imagery of children using innocuous photos, as well as explicit imagery of child survivors whose images of sexual abuse were scraped into LAION-5B.
Likewise, the presence of Australian children in LAION-5B contributes to the ability of AI models trained on this data set to produce realistic imagery of Australian children. This substantially amplifies the existing risk children face that someone will steal their likeness from photos or videos of themselves posted online and use AI to manipulate them into saying or doing things that they never said nor did.
In June 2024, about 50 girls from Melbourne reported that photos from their social media profiles were taken and manipulated using AI to create sexually explicit deepfakes of them, which were then circulated online.
Fabricated media have always existed, but required time, resources, and expertise to create, and were largely unrealistic. Current AI tools create lifelike outputs in seconds, are often free, and are easy to use, risking the proliferation of nonconsensual deepfakes that could recirculate online forever and inflict lasting harm.
LAION, the German nonprofit organization that manages LAION-5B, confirmed on June 1 that the data set contained the children’s personal photos found by Human Rights Watch, and pledged to remove them. It disputed that AI models trained on LAION-5B could reproduce personal data verbatim. LAION also said that children and their guardians were responsible for removing children’s personal photos from the internet, which it contended was the most effective protection against misuse.
Mark Dreyfus, Australia’s attorney general, recently introduced a bill in parliament banning the nonconsensual creation or sharing of sexually explicit deepfakes of adults, noting that such imagery of children would continue to be treated as child abuse material under the Criminal Code. However, Human Rights Watch said that this approach misses the deeper problem that children’s personal data remains unprotected from misuse, including the nonconsensual manipulation of real children’s likenesses into any kind of deepfake.
In August, Australia’s government is set to introduce reforms to the Privacy Act, including drafting Australia’s first child data protection law, known as the Children’s Online Privacy Code. This Code should protect the best interests of the child, as recognized in the United Nations Convention on the Rights of the Child, and their full range of rights in the collection, processing, use, and retention of children’s personal data, Human Rights Watch said.
The Children’s Online Privacy Code should prohibit scraping children’s personal data into AI systems. It should also prohibit the nonconsensual digital replication or manipulation of children’s likenesses. And it should provide children who experience harm with mechanisms to seek meaningful justice and remedy.
Australia’s government should also ensure that any proposed AI regulations incorporate data privacy protections for everyone, and especially for children.
“Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable,” Han said. “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”