What will you find in this repository?

  • 15 datasets & datasheets from different stages of the criminal justice system.
  • Each datasheet includes information about how the dataset was collected, what it contains, information on potential uses and limitations, and much more.
  • The index can be used to easily compare the datasets.
  • The paper contains more in-depth discussion around the use of criminal justice datasets in machine learning, their limitations, and gaps in the data landscape.

Who is this repository for?

  1. Interested: Want to work with criminal justice data?
    • Find an appropriate dataset on the website.
  2. Familiar: Want to grow this repository or share your work?
    • Contribute by adding or editing the datasets and datasheets, and link to your work which uses any of the datasets on this website.
  3. Allies: Interested in social and ethical implications of using AI in the criminal justice system?
    • Use information on this webpage in your own research, and get in touch to let us help publicize your work.

See the “Intended use” page for further information.

Why create this repository?

  • Predictive tools are being widely used by police, courts, and prison systems.
  • Few benchmarks like COMPAS have received significant attention, but often without proper regard for domain context.
  • There was no resource describing available datasets and their context, nor a place for exchange of information independently gained by researchers working with these data.

See the paper for detailed discussion.


The information on this website is correct to the best of our knowledge, based on available dataset documentation. We recommend that researchers consult the official documentation when using the datasets. We are not the creators of the datasets, and do not provide endorsement or guarantees.


Please cite the accompanying paper.

  title={A Survey and Datasheet Repository of Publicly Available {US} Criminal Justice Datasets},
  author={Miri Zilka and Bradley Butcher and Adrian Weller},
  booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},