Skip to content

JanakiShah/Machine_learning_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Machine_learning_test

The project aimed to investigate the association between the availability of abortion providers and crisis pregnancy centers (CPCs) over time, particularly in the context of increasing restrictive state-level abortion access in the United States. The hypothesis was that as abortion access becomes more restricted, pregnant individuals may have increased access to CPCs at the expense of abortion services, potentially negatively impacting maternal health outcomes. We aimed to identify the CPCs based on locations using the Data Axle nationwide database that documents businesses using Standard Industrial Classification (SIC) and North American Industry Classification Systems (NAICS) business codes.

1.Data Querying: The project began by querying 2017-2021 Data Axle nationwide database to identify establishments providing abortion services. We shortlisted the most common specific SIC and NAICS codes associated with abortion services and related counseling that were used for this purpose.

  1. Investigation of alternative CPC data sources: We aggregated names and addresses of CPCs using an online tool titled Crisis Pregnancy Center Map, and cross-referenced with additional resources such as Care Net, Heartbeat International, and others. We matched these locations with Data Axle data to confirm SIC/NAICS code patterns indicative of CPCs, and to identify gaps in existing data databases.

  2. Refinement of Data Extraction: Through multiple iterations, we refined the data extraction process, adjusting SIC and NAICS codes based on patterns observed among identified CPCs.

  3. Manual coding of training sets: Due to the complexity of these data, we elected to conduct manual verification of CPCs and abortion providers within the 2021 dataset by visiting clinic websites and validating provision of abortion services. We repeated this process for the 2019 dataset to ensure our models could effectively predict pre- and post-Dobbs listing information. We used this hand-coded data to evaluate patterns in SIC and NAICS code listings for CPCs and abortion providers. We found that CPCs often identify as “abortion alternative organizations” (SIC), “family planning information centers” (SIC) and “other social advocacy organizations” (NAICS)

  4. Automated classification of CPCs: We use a support vector machine (SVM) classifier to identify CPCs within our training set. This classifier used business metadata - including employee size and business status code (e.g. whether a location is a branch, headquarters or subsidiary) – as well as binary indicators constructed from SIC and NAICS code patterns observed in step (4). This simple model allowed us to predict 2021 CPC locations with 76% accuracy (recall = .74, precision = .97). We were only able to achieve 50% accuracy (recall = 0.41, precision = 0.89) using the 2019 data, indicating that self-identification of CPCs may have changed post-Dobbs.

  5. Application of classifier to full 2021 dataset and evaluation of sociodemographic correlates: Using the 2021 classifier, we created a complete dataset of estimated 2021 CPCs. We merged this dataset with a county-level social determinants of health dataset created for the Sharecare project and examined associations between the concentration of CPCs across counties and the sociodemographic characteristics of these counties. Preliminary results indicate that there are proportionally more CPCs in places with less rental expenditure, more rental housing, less rental and language diversity, lower educational attainment, older populations, less labor force participation, and fewer medical resources.

  6. Challenges Faced: One significant challenge encountered was in the investigation of missing or defunct CPCs and abortion providers in the extracted datasets. Further investigation revealed that some CPCs may have selected codes indicating "clinics" or "physicians and surgeons," requiring additional scrutiny to ensure comprehensive inclusion.

  7. Next Steps: We plan to improve our classification model for 2021 and 2019 and investigate why performance changed across years. We will then apply these classifiers to both 2021 and 2019 datasets to evaluate changes in CPC availability and sociodemographic correlates. These results will support applications for additional funding that will support the creation of additional datasets and more extensive epidemiological analysis of changes in CPC access and consequences thereof.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages