Skip to main content

Section 5.2 Obtaining and Cleaning the Data

Here, we will find reputable sources for data on anti-trans bills and get the data into a form we can analyze.

Subsection 5.2.1 Importing and Merging the Data

Exercises Exercises

Finding Data Online.
Our first goal is to find data. Note that any data found online may change web addresses, stop updating, go offline, or become unreliable over time. Your instructor will be able to tell you which sources of data to use if the ones below are no longer reliable.
Go to the website 90 , which in turn obtains its data from LegiScan 91 , and browse around. What are two things that you notice about the site? Two things that surprise you? Two things that aren't on the site that you wonder about?
This data is up-to-date from 2021 through April 21, 2023. In order to track bills prior to 2021, we use data from the ACLU's “Past Legislation Affecting LGBT Rights Across the Country” pages for 2018, 2019, and 2020 92 .
Go to the ACLU "Past Legislation Affecting LGBT Rights Across the Country 2020" 93  webpage and skim that page. What similarities do you notice with the data found on the Track Trans Legislation website? What differences do you notice? Click on the "View 2019" and "View 2018 Session bills" links and do the same for those years.
We'd like to get a sense of how anti-trans legislation in the U.S. changed over time, so we're going to try to merge the Track Trans Legislation (TTL) data with the ACLU data. Since the ACLU data has different bill "Status" categories than TTL, we'll need to figure out how to classify each ACLU bill into one of the TTL categories.
Use the Terminology 94  page on the TTL website to answer the following question. Which of the TTL categories would you classify "Referred to committee" into? "Hearing scheduled"? "Withdrawn"? You may want to click on the bill numbers on the ACLU site to see how the website LegiScan, a constantly-updated bill tracker, classifies each bill.
Note that the 2020 ACLU page was last updated on March 20, 2020, since many state legislatures were suspended or closed during the first year of the COVID-19 pandemic; the ACLU page promised to “update the tracker as major new developments occur[red] .” This data has distinct variable names and organization, so we also modify the ACLU data to match the Track Trans Legislation data as closely as possible.
For example, bills that were withdrawn, not passed by the end of a given legislative session, explicitly listed as “Dead”, or were recommended against by a committee and did not proceed in the legislature were relabeled “Dead” (at least for that year). The exception was when the bill description is specifically listed as “hearing scheduled”, “referred to committee”, carried over from another year, or otherwise makes clear that the bill is still under consideration, in which case “Introduced” or “Crossed Over” (depending on whether the bill had been passed by at least one chamber) were used.
We only include bills in categories tracked by both data sources; this leaves out, for example, bills preventing localities from passing anti-discrimination ordinances within a state. We use a broad reading of the “religious freedom” category to include bills that allow for people with “sincerely-held religious beliefs” in that state to challenge nondiscrimination laws, discriminate against LGBTQ+ people, refuse to provide healthcare to LGBTQ+ people, refuse to provide adoption services to LGBTQ+ people; discriminate against married LGBTQ+ people, and receive funding for discriminatory student groups at public universities, among others.

Subsection 5.2.2 Cleaning the Data

First, the 2018-2020 ACLU datasets code state names by their two-letter abbreviations (e.g., “AZ” instead of “Arizona”), while Track Trans Legislation uses full names. So we use a program (that you won't have to worry about!) to convert full names to abbreviations in the whole dataset. We also note that, for example, the 2021 dataset includes some bills passed in January 2022, so we eliminate duplicate bills.
This decreases the number of bills in our dataset from 927 to 893. Next, note that the bills whose status is labeled Introduced* by TTL are those that failed to meet their state's “crossover deadline”, the date by which a bill must pass out of the chamber in which it was introduced and to the other chamber (e.g. State House vs. Senate). According to the site Track Trans Legislation 95 , a bill that is not passed in its initial chamber by the crossover deadline “faces high procedural hurdles in order to move forward.” Thus, we wish to classify these bills (at least for the current session) as “Dead/Failed”.
Moreover, one bill's status is listed as “Posted”, Kentucky's HB132 in 2020. LegiScan research 96  reveals that this bill died in committee, so we update its status to Dead/Failed.