Photos of children from Brazil have been used to train AI data sets without permission, a report has discovered. Some of these children have had photos from their entire childhood used without their knowledge.
Human Rights Watch (HRW) reported that AI image generators such as Stable Diffusion had scraped the data, causing potential harm if the children can be recognised and traced.
An HRW researcher, Hye Jung Han, discovered images in a sample dataset that linked 170 children from over ten Brazilian states. The majority of the images had been taken from family photos uploaded to personal and parenting blogs. Other images were taken from stills in YouTube videos with low view counts, presumably intended to be shared only with family and friends.
Despite working to remove the images linked to the data set, Han told Wired that it may not be enough to completely resolve the problem. HRW’s report warned that the removed links are “likely to be a significant undercount of the total amount of children’s personal data that exists,” and Han fears that the dataset may still be referencing personal photos of kids “from all over the world.”
According to HRW, many of the children were easily traceable due to names and location data being attached to the image captions that were included in the data set.
For many parents, this will raise alarm bells if they think that photos of their children are intended to be viewed, but a select few family and friends are training AI data sets. It is a huge violation of privacy and something which will certainly make me think twice about posting photographs of my son online.
[via ars technica]