Overview
- The dataset contains 10,318 images of 1,981 participants from 81 countries, with stronger representation from Africa, Asia and Oceania.
- Participants gave informed consent, were compensated, can opt out at any time, and the terms bar law enforcement, military, arms and surveillance uses.
- Contributors supplied demographic labels such as age, ancestry, location and pronouns to reduce error-prone inference by algorithms.
- FHIBE is intended to benchmark the accuracy of computer-vision and generative-image systems rather than serve as a large training corpus.
- Nature notes the collection cost was under US$1 million and says the example could inform regulators and ongoing disputes over web-scraped data.