Particle.news
Download on the App Store

Sony Releases FHIBE, a Consent-Built Image Dataset to Benchmark AI

Nature highlights the project as proof that a diverse, responsibly sourced image dataset can be created at modest cost.

Overview

  • The dataset contains 10,318 images of 1,981 participants from 81 countries, with stronger representation from Africa, Asia and Oceania.
  • Participants gave informed consent, were compensated, can opt out at any time, and the terms bar law enforcement, military, arms and surveillance uses.
  • Contributors supplied demographic labels such as age, ancestry, location and pronouns to reduce error-prone inference by algorithms.
  • FHIBE is intended to benchmark the accuracy of computer-vision and generative-image systems rather than serve as a large training corpus.
  • Nature notes the collection cost was under US$1 million and says the example could inform regulators and ongoing disputes over web-scraped data.