Law Enforcement Management and Administrative Statistics (LEMAS)


For what purpose was the dataset created?

Conducted periodically since 1987, the LEMAS survey was designed to collect data on police agencies, including: agency responsibilities, operating expenditures, job functions of sworn and civilian employees, officer salaries and special pay, demographic characteristics of officers, weapons and armor policies, education and training requirements, computers and information systems, vehicles, special units, and community policing activities .

Who created the dataset?
Is it an official law enforcement or government body? An academic research team? Other?

The Bureau of Justice Statistics (BJS). The Relevant data expert at BJS is Elizabeth Davis.

Was there a specific task in mind, or gap that needed to be filled?

No extensive nationwide police agency survey existed before LEMAS.


What do the instances that comprise the dataset represent?
For example: crimes, offenders, court cases, police officers

Each instance of the dataset is a survey response from a different police agency.

Are there multiple types of instances?
For example: offenders, victims, and the relationship between them.


How many instances are there in total?
Of each type, if appropriate.

There are a total of 3,471 police agencies were sent the survey, including: 2,612 local police departments, 810 sheriffs’ offices, and the 49 state agencies. A total of 2,779 agencies responded to the LEMAS survey, a response rate of 80%. The final dataset includes responses from 2,135 local police departments, 600 sheriffs’ offices, and 49 state agencies.

Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?
For example, if it is traffic stops from a territory, is it all traffic stops conducted within that territory within a specific time? If not, is it a representative sample of all stops? Describe how representativeness was validated/verified. If it is not representative, please describe why.

The dataset includes all police agencies with 100 or more sworn personnel to be included, with smaller agencies sampled via stratified random sampling based on the number of sworn officers, and type of agency . A total of 28 local police departments were determined to be out-of-scope for the survey because they were special jurisdiction agencies, had closed, had outsourced their operations, or were operating on a part-time basis .

What data does each instance consist of?
If there is a large number of variables, please provide a broad description of what is included.

Each instance consists of:

Is there a target label or associated with each instance?
Please include labels that are likely to be used as target labels, e.g. recidivism.



Does the dataset contain data on race and ethnicity?
If so, is it based on the individual’s self-description, or based on officer’s impression? Was it collected or derived in post-processing? For example, by name analysis.

Yes. The dataset contains information about the demographic composition of the officers within each agency. This is (likely) based on self-description.

Are there any known errors, sources of noise, bias or missing data, or variables collected for only part of the datasets?
If so, please provide a description.


Does the dataset contain data on criminal history or other data that might be considered confidential or sensitive in any way?
For example: sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; If so, please provide a description.

No. Demographics are not reported on an individual level.


Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other data) from the dataset?
If so, please describe how.



What type of tasks, if any, has the dataset been used for?
If so, please provide examples and include citations.

The dataset has been used for a number of tasks, not limited to, but including:

Yes. Please see here:

What (other) tasks could the dataset be used for?
For example: testing predictive policing systems, predicting recidivism.

The dataset could be used for any tasks which involve agency:


Collection Process

How was the data associated with each instance acquired?
e.g. the data collected survey, the raw data is routinely collected by the courts.

The data was collected via survey.

Was the information self-reported?
If the data was self-reported, was the data validated/verified? If so, please describe how.

The information was reported directly from the agencies. However, it is organizations, not individuals, that were surveyed.

Who was involved in the data collection process?
Was this done as part of their other duties? If not, were they compensated?

Police agencies, no compensation was given.

Over what timeframe was the data collected? Does this timeframe match the creation timeframe of the data associated with the instances (e.g., recent crawl of old news articles)?
If not, please describe the timeframe in which the data associated with the instances was created. If the collection was not continuous within the timeframe, please specify the intervals, for example, annually, every 4 years, irregularly.

The data is collected on an irregular basis. For example, the last collection was 2016, and before that was 2013. When collection takes place, it is done over the specified year. The earliest available data if from 1987.


N/A. The data is not on an individual level.



Pre-processing, cleaning, labeling

Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, removal of instances, processing of missing values)?
If so, please provide a description and reference to the documentation. If not, you may skip the remaining questions in this section.

The codebook does not specify pre-processing .




Is the data publicly available? How and where can it be accessed (e.g., website, GitHub)?
Does the dataset have a digital object identifier (DOI)?

The dataset is distributed freely at:

The data-use statement says the following:

Citation Requirement:
Publications based on ICPSR data collections should acknowledge those sources by means of bibliographic citations. To ensure that such source attributions are captured for social science bibliographic utilities, citations must appear in footnotes or in the reference section of publications.

Deposit Requirement:
To provide funding agencies with essential information about use of archival resources and to facilitate the exchange of information about ICPSR participants’ research activities, users of ICPSR data are requested to send to ICPSR bibliographic citations for each completed manuscript or thesis abstract. Visit the ICPSR Web site for more information on submitting.


Who will be supporting/hosting/maintaining the dataset?

The Bureau of Justice Statistics.

How can the owner/curator/manager of the dataset be contacted (e.g., email address)?

The BJS can be contacted at:

Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)?


If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were the individuals in question told that their data would be retained for a fixed period of time and then deleted)?
If so, please describe these limits and explain how they will be enforced.

The dataset does not relate to people.

Are older versions of the dataset continue to be supported/hosted/maintained?

Yes. Data from previous years continue to be hosted and are available for download.

If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so?
If so, please provide a description.