Law Enforcement Management and Administrative Statistics (LEMAS)
Datasheet

Motivation

For what purpose was the dataset created?

Conducted periodically since 1987, the LEMAS survey was designed to collect data on police agencies, including: agency responsibilities, operating expenditures, job functions of sworn and civilian employees, officer salaries and special pay, demographic characteristics of officers, weapons and armor policies, education and training requirements, computers and information systems, vehicles, special units, and community policing activities .

Who created the dataset?
Is it an official law enforcement or government body? An academic research team? Other?

The Bureau of Justice Statistics (BJS). The Relevant data expert at BJS is Elizabeth Davis.

Was there a specific task in mind, or gap that needed to be filled?

No extensive nationwide police agency survey existed before LEMAS.

Composition

What do the instances that comprise the dataset represent?
For example: crimes, offenders, court cases, police officers

Each instance of the dataset is a survey response from a different police agency.

Are there multiple types of instances?
For example: offenders, victims, and the relationship between them.

No.

How many instances are there in total?
Of each type, if appropriate.

There are a total of 3,471 police agencies were sent the survey, including: 2,612 local police departments, 810 sheriffs’ offices, and the 49 state agencies. A total of 2,779 agencies responded to the LEMAS survey, a response rate of 80%. The final dataset includes responses from 2,135 local police departments, 600 sheriffs’ offices, and 49 state agencies.

Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?
For example, if it is traffic stops from a territory, is it all traffic stops conducted within that territory within a specific time? If not, is it a representative sample of all stops? Describe how representativeness was validated/verified. If it is not representative, please describe why.

The dataset includes all police agencies with 100 or more sworn personnel to be included, with smaller agencies sampled via stratified random sampling based on the number of sworn officers, and type of agency . A total of 28 local police departments were determined to be out-of-scope for the survey because they were special jurisdiction agencies, had closed, had outsourced their operations, or were operating on a part-time basis .

What data does each instance consist of?
If there is a large number of variables, please provide a broad description of what is included.

Each instance consists of:

Agency responsibilities.
Agency expenditures.
Officer salaries.
Officer demographics.
Agency policies.
Officer requirements.
Technology used at agency.
Agency vehicles.
Agency community policing practices.

Is there a target label or associated with each instance?
Please include labels that are likely to be used as target labels, e.g. recidivism.

No.

Are there recommended data splits (e.g., training, development/validation, testing)?
If so, please provide a description of these splits, explaining the rationale behind them.

No.

Does the dataset contain data on race and ethnicity?
If so, is it based on the individual’s self-description, or based on officer’s impression? Was it collected or derived in post-processing? For example, by name analysis.

Yes. The dataset contains information about the demographic composition of the officers within each agency. This is (likely) based on self-description.

Are there any known errors, sources of noise, bias or missing data, or variables collected for only part of the datasets?
If so, please provide a description.

No.

No. Demographics are not reported on an individual level.

Does the dataset contain data that might be considered confidential?
For example: data that is protected by legal privilege or by doctor–patient confidentiality, data that includes the content of individuals’ nonpublic communications. If so, please provide a description.

No.

Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other data) from the dataset?
If so, please describe how.

No.

Uses

What type of tasks, if any, has the dataset been used for?
If so, please provide examples and include citations.

The dataset has been used for a number of tasks, not limited to, but including:

Investigating the effect of community policing practices .
Investigating the changing demographics of police agencies .
Investigating technology being used by police agencies, including crime analysis tools .

Is there a repository that links to any or all papers or systems that use the dataset?
If so, please provide a link or other access point.

Yes. Please see here:
https://www.icpsr.umich.edu/web/NACJD/series/92/publications

What (other) tasks could the dataset be used for?
For example: testing predictive policing systems, predicting recidivism.

The dataset could be used for any tasks which involve agency:

Responsibilities.
Expenditures.
Salaries.
Officer demographics.
Policies.
Officer requirements.
Technology used.
Vehicles.
Community policing activities.

Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses?
For example, is there anything that a dataset consumer might need to know to avoid uses that could result in unfair treatment of individuals or groups (e.g., stereotyping, quality of service issues) or other risks or harms (e.g., legal risks, financial harms)? If so, please provide a description. Is there anything a dataset consumer could do to mitigate these risks or harms?

No.

Collection Process

How was the data associated with each instance acquired?
e.g. the data collected survey, the raw data is routinely collected by the courts.

The data was collected via survey.

Was the information self-reported?
If the data was self-reported, was the data validated/verified? If so, please describe how.

The information was reported directly from the agencies. However, it is organizations, not individuals, that were surveyed.

Who was involved in the data collection process?
Was this done as part of their other duties? If not, were they compensated?

Police agencies, no compensation was given.

Over what timeframe was the data collected? Does this timeframe match the creation timeframe of the data associated with the instances (e.g., recent crawl of old news articles)?
If not, please describe the timeframe in which the data associated with the instances was created. If the collection was not continuous within the timeframe, please specify the intervals, for example, annually, every 4 years, irregularly.

The data is collected on an irregular basis. For example, the last collection was 2016, and before that was 2013. When collection takes place, it is done over the specified year. The earliest available data if from 1987.

Were any ethical review processes conducted (e.g., by an institutional review board)?
If so, please provide a description of these review processes, including the outcomes, as well as a link or other access point to any supporting documentation.

Unknown.

N/A. The data is not on an individual level.

N/A

Has an analysis of the potential impact of the dataset and its use on data subjects (e.g., a data protection impact analysis) been conducted?
If so, please provide a description of this analysis, including the outcomes, as well as a link or other access point to any supporting documentation.

N/A

Pre-processing, cleaning, labeling

Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, removal of instances, processing of missing values)?
If so, please provide a description and reference to the documentation. If not, you may skip the remaining questions in this section.

The codebook does not specify pre-processing .

Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated future uses)?
If so, please provide a link or other access point to the “raw” data.

Skip.

Is the software that was used to preprocess/clean/label the data available?
If so, please provide a link or other access point.

Skip.

Distribution

Is the data publicly available? How and where can it be accessed (e.g., website, GitHub)?
Does the dataset have a digital object identifier (DOI)?

The dataset is distributed freely at: https://www.icpsr.umich.edu/web/NACJD/series/92

Is the dataset be distributed under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)?
If so, please describe this license and/or ToU, and provide a link or other access point to, or otherwise reproduce, any relevant licensing terms or ToU, as well as any fees associated with these restrictions.

The data-use statement says the following:

Citation Requirement:
Publications based on ICPSR data collections should acknowledge those sources by means of bibliographic citations. To ensure that such source attributions are captured for social science bibliographic utilities, citations must appear in footnotes or in the reference section of publications.

Deposit Requirement:
To provide funding agencies with essential information about use of archival resources and to facilitate the exchange of information about ICPSR participants’ research activities, users of ICPSR data are requested to send to ICPSR bibliographic citations for each completed manuscript or thesis abstract. Visit the ICPSR Web site for more information on submitting.

Maintenance

Who will be supporting/hosting/maintaining the dataset?

The Bureau of Justice Statistics.

How can the owner/curator/manager of the dataset be contacted (e.g., email address)?

The BJS can be contacted at: askbjs@usdoj.gov

Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)?

Unknown.

If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were the individuals in question told that their data would be retained for a fixed period of time and then deleted)?
If so, please describe these limits and explain how they will be enforced.

The dataset does not relate to people.

Are older versions of the dataset continue to be supported/hosted/maintained?

Yes. Data from previous years continue to be hosted and are available for download.

If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so?
If so, please provide a description.

No.

Motivation

For what purpose was the dataset created?

Who created the dataset? Is it an official law enforcement or government body? An academic research team? Other?

Was there a specific task in mind, or gap that needed to be filled?

Composition

What do the instances that comprise the dataset represent? For example: crimes, offenders, court cases, police officers

Are there multiple types of instances? For example: offenders, victims, and the relationship between them.

How many instances are there in total? Of each type, if appropriate.

What data does each instance consist of? If there is a large number of variables, please provide a broad description of what is included.

Is there a target label or associated with each instance? Please include labels that are likely to be used as target labels, e.g. recidivism.

Are there recommended data splits (e.g., training, development/validation, testing)? If so, please provide a description of these splits, explaining the rationale behind them.

Does the dataset contain data on race and ethnicity? If so, is it based on the individual’s self-description, or based on officer’s impression? Was it collected or derived in post-processing? For example, by name analysis.

Are there any known errors, sources of noise, bias or missing data, or variables collected for only part of the datasets? If so, please provide a description.

Does the dataset contain data that might be considered confidential? For example: data that is protected by legal privilege or by doctor–patient confidentiality, data that includes the content of individuals’ nonpublic communications. If so, please provide a description.

Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other data) from the dataset? If so, please describe how.

Uses

What type of tasks, if any, has the dataset been used for? If so, please provide examples and include citations.

Is there a repository that links to any or all papers or systems that use the dataset? If so, please provide a link or other access point.

What (other) tasks could the dataset be used for? For example: testing predictive policing systems, predicting recidivism.

Collection Process

How was the data associated with each instance acquired? e.g. the data collected survey, the raw data is routinely collected by the courts.

Was the information self-reported? If the data was self-reported, was the data validated/verified? If so, please describe how.

Who was involved in the data collection process? Was this done as part of their other duties? If not, were they compensated?

Were any ethical review processes conducted (e.g., by an institutional review board)? If so, please provide a description of these review processes, including the outcomes, as well as a link or other access point to any supporting documentation.

Were the individuals in question notified about the data collection? Did they give their consent? If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses?

If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses? If so, please provide a description, as well as a link or other access point to the mechanism (if appropriate).

Has an analysis of the potential impact of the dataset and its use on data subjects (e.g., a data protection impact analysis) been conducted? If so, please provide a description of this analysis, including the outcomes, as well as a link or other access point to any supporting documentation.