1 option
Methods for improving confounding control in comparative effectiveness research using electronic healthcare databases / Richie Wyss.
- Format:
- Author/Creator:
- Language:
- English
- Subjects (All):
- Physical Description:
- 1 online resource (66 pages)
- Place of Publication:
- Washington, DC : Patient-Centered Outcomes Research Institute, 2019.
- Summary:
- BACKGROUND: Patient-centered outcomes research (PCOR) can be successful only with valid analytics. The routine operation of the US health care system produces an abundance of electronically stored data that capture the care of patients as it is provided in settings outside of controlled research environments. The potential for using these data to inform future treatment choices and improve patient care and outcomes in the very system that generates the data is widely acknowledged. Given these key properties of secondary data and the abundance of electronic health care databases covering millions of patients, it is critical to strengthen the rigor of causal inferences that can be drawn from such data. Innovative analytic approaches based on defined algorithms that will maximize confounding control have recently been developed--high-dimensional propensity score (HDPS) and collaborative targeted maximum likelihood estimation (CTMLE)--that (1) are grounded in epidemiological principles of causal inference and (2) maximize confounding adjustment in a given data source. Their performance is not well understood in many relevant settings. We will evaluate such methods in empirical studies and complex simulations based on empirical data structures. OBJECTIVES: Implement, adapt, and compare novel algorithmic approaches for improved confounding control in comparative effectiveness research (CER) using available health care databases. Using simulation studies, we will characterize and optimize the performance of these algorithms, then disseminate insights concerning these methods through publications and symposia/conferences and provide an interactive webpage with free software and result libraries. METHODS: We evaluated the performance of data-adaptive algorithms for variable selection and propensity score (PS) estimation using both simulations and empirical examples that reflect a range of settings common to large electronic health care databases. The algorithms included the HDPS, a combination of super learner (SL) prediction modeling and HDPS, a modified version of CTMLE that is scalable to large health care databases, and many traditional machine learning algorithms. We based simulations on the plasmode framework in which empirical data are incorporated into the simulation process to more accurately reflect the complex relations that occur among baseline covariates in practice. RESULTS: Overall, the basic heuristic of variable reduction in the HDPS adjustment performed well in diverse settings. However, the HDPS can be sensitive to the number of variables included for adjustment and severe overfitting of the PS model can negatively impact the properties of effect estimates. Combining the HDPS with the modified version of CTMLE performed well for many of the scenarios considered but was sensitive to parameter specifications within the modified algorithm. Combining the HDPS with SL was the most consistent selection strategy and may be promising for semiautomated data-adaptive PS estimation and confounding control in high-dimensional covariate data sets. CONCLUSIONS: This project provides guidance on the optimal use and advantages of novel data-adaptive methods for variable selection and PS estimation for CER. This project is the first to adapt, test, and improve novel approaches based on the combination of SL, CTMLE, and HDPS for variable selection and confounding control in CER using routine care data. We found that combining the HDPS with SL prediction modeling is promising for data-adaptive PS estimation in large health care databases. We provided free software with instructions and guidance to enhance the utility of the proposed methods. LIMITATIONS AND SUBPOPULATION CONSIDERATIONS: The application of data-adaptive algorithms in electronic health care data is promising, but no single method was optimal across all data sets and scenarios. While plasmode simulations and empirical examples allow investigators to evaluate methods in settings that reflect real-world practice, they also make it difficult to elucidate reasons for observed differences in the performance across methods. This project provides strong evidence for the utility of data-adaptive algorithms in electronic health care data--in particular the combination of SL with the HDPS--and provides guidance and software for implementing the recommended tools. However, more research is needed to elucidate specific factors that influence the performance of the discussed methods.
- Notes:
- Description based on publisher supplied metadata and other sources.
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.