Deep Learning for Training AI Algorithms using Images of Young Subjects

Dec 30, 2023 Stable Diffusion

Massive Artificial Intelligence Database LAION

The massive artificial intelligence database LAION contains over 3,200 photos of children who may have been sexually abused. This database has been used to train top AI image-makers like Stable Diffusion. Together with the Canadian Centre for Child Protection and other anti-abuse organizations, the Stanford University watchdog group was able to detect the inappropriate content and notify the authorities of the original photo connections. It claimed that about 1,000 of the pictures it discovered had been independently verified.

Malicious Content Creation

There was a lightning-fast reaction. While the Stanford Internet Observatory’s study was expected to be released, LAION informed The Associated Press that it would be temporarily withdrawing its datasets. In a statement, the nonprofit organization known as LAION (Large-scale Artificial Intelligence Open Network) stated that it “has a zero-tolerance policy for illegal content, and in an abundance of caution, we have taken down the LAION datasets to ensure they are safe before Republishing them.” Stability AI, a London-based firm that makes the Stable Diffusion text-to-image models, is a notable LAION user who contributed to the development of the dataset. While recent updates to Stable Diffusion have significantly reduced the likelihood of malicious content creation, a previous version from last year—which Stability AI denies releasing—is still embedded in various applications and tools and is reportedly “the most popular model for generating explicit imagery,” as stated in the Stanford report.

The Stanford Internet Observatory Thoughts

The Stanford Internet Observatory has called for more extreme measures because it is difficult to clean up the data retroactively. “Delete them or work with intermediaries to clean the material” is one option for everyone who has constructed training sets using LAION-5B, which is named for the over 5 billion image-text pairs it contains. Another option is to completely remove an earlier version of Stable Diffusion from the internet, leaving only the most obscure places for it. For instance, Thiel criticized CivitAI, a platform popular among those who create AI-generated pornography, for what he perceived as a lack of safeguards that would prevent the creation of photos of minors. Hugging Face, an AI startup that provides model training data is also urged to improve its reporting and removal processes in the study.

In light of safeguards provided by the federal Children’s Online Privacy Protection Act, the Stanford paper raises the question of whether any images of children, no matter how innocent, should be inputted into AI systems without the permission of their families. To identify and remove child abuse content, tech corporations and child protection organizations already issue films and photographs a “hash”—unique digital signatures. If you believe Portnoff, you may apply the same idea to abused AI models.

Stable Diffusion