AI Dataset Re-Launched After Removal of Child Sexual Abuse Material

LAION collaborates with safety organizations to clean Re-LAION-5B dataset, setting new safety standards.

Overview

Re-LAION-5B dataset is a cleaned version of LAION-5B, removing 2,236 links to suspected CSAM.
Partnerships with Internet Watch Foundation and Canadian Center for Child Protection were crucial in the cleanup.
The dataset removal and re-release follow findings of CSAM in the original dataset by Stanford Internet Observatory.
Experts urge further improvements and regulations to protect against AI-generated harmful content.
LAION recommends research labs transition to the cleaned dataset to ensure safer AI training.