National Incident-based Reporting System (NIBRS)
Datasheet

Motivation

For what purpose was the dataset created?

NIBRS was created to improve the overall quality of crime data collected by law enforcement. It aims to provide useful statistics to promote constructive discussion, measured planning, and informed policing. Giving context to specific crime problems such as drug/narcotics and sex offenses, as well as issues like animal cruelty, identity theft, and computer hacking. It intends to provide a nationwide view of crime based on the submission of crime information by law enforcement agencies throughout the country, offering law enforcement and the academic community more comprehensive data than ever before available for management, training, planning, and research .

Who created the dataset?
Is it an official law enforcement or government body? An academic research team? Other?

NIBRS is collected and managed by the Federal Bureau of Investigation (FBI). Data is submitted by participating agencies.

Was there a specific task in mind, or gap that needed to be filled?

NIBRS is an extensive dataset, collecting information on all Group A police incidents from across the United States. Including:

As such, it’s potential uses are multi-faceted. It has not been created with a specific task in mind, but as a national centralized repository of police incident data.

The FBI has been reporting aggregated crime statistics through the uniform crime reporting (UCR) summary reporting system (SRS) since 1930. NIBRS is aimed at improving on UCR by reporting detailed information on an incident level, allowing for more detailed analysis.

Composition

What do the instances that comprise the dataset represent?
For example: crimes, offenders, court cases, police officers

In NIBRS instances are recorded crime incidents. An incident is defined as a set of offenses committed by one or a group of individuals, at the same time and place.

Are there multiple types of instances?
For example: offenders, victims, and the relationship between them.

Incidents are the ‘base’ unit of NIBRS. Each incident is linked to an agency and may be linked to one of more: offenses, offenders, victims, proprieties.

How many instances are there in total?
Of each type, if appropriate.

In 2019, there were just under 7.7 million incidents recorded in NIBRS.

Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?
For example, if it is traffic stops from a territory, is it all traffic stops conducted within that territory within a specific time? If not, is it a representative sample of all stops? Describe how representativeness was validated/verified. If it is not representative, please describe why.

NIBRS contain all incidents recorded by participating agencies. Incidents recorded by non-participating agencies are not included. Additionally, this is not a record of all crime. Only a subset of crimes are every encountered by police, and a subset of those are recorded as incidents.

NIBRS contains population coverage information, it can be determined how representative the incidents recorded are of the jurisdiction in which the agency operates.

What data does each instance consist of?
If there is a large number of variables, please provide a broad description of what is included.

Each instances contains the following information:

Is there a target label or associated with each instance?
Please include labels that are likely to be used as target labels, e.g. recidivism.

There is no set target label, though a few of interest may be: whether on not an arrest was made, the type of arrest, exceptional clearance.

There is not offical split. However, some points to consider:

When splitting data into multiple sets, be aware that the data is a single database that has been compiled from many agencies. If one wishes to test a predictive model, it may be reasonable to split along agency lines, assessing performance on unseen agencies.

If a temporal model is being used, to predict future offense numbers for example, the above is not applicable. Instead, it would make sense to have the same agencies across each split, with each split containing a different time segment.

Does the dataset contain data on race and ethnicity?
If so, is it based on the individual’s self-description, or based on officer’s impression? Was it collected or derived in post-processing? For example, by name analysis.

Yes. Race and ethnicity are entered based on the officer’s impression, in principle. In practice, it may be that in some instances the individuals is asked about their race or ethnicity. These instances can not be distinguished. In addition, the ethnicity field is not used by all agencies.

Are there any known errors, sources of noise, bias or missing data, or variables collected for only part of the datasets?
If so, please provide a description.

There are a number of fields which are officer estimates, and thus error prone: race, ethnicity, value of property, and drug amount.

In addition, value of property, and drug amount seems to sometimes be filled standardized amounts (1, 10, etc.). The policy regarding filling in those variables may differ between agencies.

The data is self-contained.

The data contains records of crimes, some of which are violent. However, descriptions are minimal. Demographic information is recorded on both offender and victim. Additionally, it identifies whether the offense committed was a hate crime against any marginilised group, including LGBTQ+.

Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other data) from the dataset?
If so, please describe how.

No.

Uses

Has the dataset been used for any tasks already?
If so, please provide a description.

The dataset has been used in many studies. Including, but not limited to:

among many others.

The Inter-university Consortium for Political and Social Research (ICPSR) provide a non-exhaustive repository of publications using NIBRS data at:
https://www.icpsr.umich.edu/web/ICPSR/series/128/publications

What (other) tasks could the dataset be used for?

This dataset can be used for investigating crime, where a significant amount of time, location and offense information is required. It is a highly flexible dataset that can answer many research questions when used correctly.

NIBRS is a collection of incident records, recorded and provided by thousands of police agencies. While NIBRS attempts to enforce standardisation, each agency will have it’s own idiosyncrasies in recording. Some agencies do not record ethnicity, or use different units for recording drug quantities, among other differences. It is important to control for these differences when performing analysis on NIBRS.

Incidents that are related in real life cannot be connected within NIBRS. For example, a crime the occurred in the same time and place with two offenders who committed the same offense but one committed an additional offense will be recorded as separate incidents recording in NIBRS. There is no direct manner to connect these, so counting the same incident multiple times is possible if not careful. In addition, there are no unique identifiers for offenders or victims. Two offenses committed by the same offender at different times will not appear connected.

Collection Process

How was the data associated with each instance acquired?
e.g. the data collected survey, the raw data is routinely collected by the courts.

Incident information is collected by and updated by each respective police agency using their own respective systems as the events occur. Once a year, incidents recorded by a participating agency are converted from their format to the NIBRS format, with help from the state UCR program. This data is them reported to NIBRS.

Was the information self-reported?
If the data was self-reported, was the data validated/verified? If so, please describe how.

No. The data is recorded by police officers. However, some crimes may be recorded via victim’s reporting.

The data is quality controlled and validated twice, once by state UCR programs, and again on reception by the NIBRS program.

Who was involved in the data collection process?
Was this done as part of their other duties? If not, were they compensated?

Local police agencies. Data is recorded as part of routine police work.

Over what timeframe was the data collected? Does this timeframe match the creation timeframe of the data associated with the instances (e.g., recent crawl of old news articles)?
If not, please describe the timeframe in which the data associated with the instances was created. If the collection was not continuous within the timeframe, please specify the intervals, for example, annually, every 4 years, irregularly.

The data has been continuous collected since 1988. However, the level of agency participation has changed during the years. For some states, data is available from 1998 onwards.

Unknown.

Individuals may have known data is recorded. However, consent was not granted as the Individuals do not have the option to opt out.

Unknown.

Pre-processing, cleaning, labeling

Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, removal of instances, processing of missing values)?
If so, please provide a description and reference to the documentation. If not, you may skip the remaining questions in this section.

Police agencies will have local records that make the raw data sent to the NIBRS program, but these cannot be accessed.

No.

Distribution

Is the data publicly available? How and where can it be accessed (e.g., website, GitHub)?
Does the dataset have a digital object identifier (DOI)?

The dataset is avilable for download on the FBIs crime explorer website:
https://crime-data-explorer.fr.cloud.gov/pages/downloads

The dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Maintenance

Is the dataset maintained? Who is supporting/hosting/maintaining the dataset?

The FBI.

How can the owner/curator/manager of the dataset be contacted (e.g., email address)?

The owners can be contacted at: UCR-NIBRS@fbi.gov

Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)? If so, please describe how often, by whom, and how updates will be communicated to dataset consumers (e.g., mailing list, GitHub)?

The dataset is published annually. Occasionally UCR will publish blocks of years, e.g. 2000-2010.

Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)?

New data is released annually.

Are older versions of the dataset continue to be supported/hosted/maintained?

Yes. Data from previous years remains available for download from their website.

If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so?
If so, please provide a description.

No.