May 20, 2020
Digitalization Will Improve Clinical Trial Data
A perspective from biostatistician Victoria Cooley
Research Biostatistician at Weill Cornell MedicineGuest
In honor of Clinical Trials Day 2020, guest blogger and biostatistician Victoria Cooley shares her perspective on clinical trial data quality and the opportunities for improvement with digitalization.
From my first undergraduate course in biostatistics, I knew I wanted to pursue a career as a biostatistician. Eager to combine my passions for math and medicine, I enrolled in a graduate program for biostatistics at Columbia University. One of my first courses was an introduction to randomized clinical trials.
Learning about the design and statistical aspects of clinical trials fascinated me. After all, this particular study design was considered to be the “gold standard” in research. Meaning that if properly designed, balance on baseline covariates could be achieved, eliminating the need to adjust for such confounders in the multivariable modeling phase or to have to explore other adjustment techniques.
We even designed a mock phase II trial during the course. Looking back, I realize I should have included a section about the instruction of proper data collection and randomization techniques. However, no additional lectures or classroom time would prepare me for the design and analysis complexities that I would encounter in the real world as a biostatistician.
Since starting my career, I have worked on the planning and analysis of a variety of observational studies as well as some clinical trials. One trial in particular stands out to me, for it challenged several of the points that I had learned in my graduate course.
When the data were first sent to me, I was notified that I would need to manually exclude several patients as they were later identified as violating the randomization parameters. Some patients were randomized to the different study groups but violated the study protocol and others were screened but never randomized.
Discarding Valuable Data from the Primary Analysis
From a statistician’s point of view, I could not believe that I had to discard valuable patient data from the primary analysis. As a patient myself, I would be disheartened if I found out that months or years of the collection of my data resulted in information that could not be used in the way it was intended.
As I learned more about the trial, it seemed several people had been working on the data collection and much of the data were collected on paper forms. Many papers were misplaced, and the data were collected in a scattered, disorganized manner. In order to pull the necessary data into a format for me to analyze, the investigator spent several weeks manually creating a database with all the necessary information, alongside performing their regular clinical duties.
Even then, when the data were sent to me, I spent many hours cleaning and transforming the data into a format that I could work with. Despite the fact that this study was documented as a randomized clinical trial, the investigator and I still had to consider the adjustment of certain baseline characteristics in the multivariable models as we were doubtful that balance was achieved between the study groups.
To complicate matters further, we were faced with a considerable amount of missing data, largely due to paper forms being misplaced or values not being entered as desired. While missing data is practically inevitable, the reason perplexed me. If the data were recorded in a different format other than on paper, perhaps digitally recorded via computer or smartphone device, it is likely that the missing data would have been far less than we observed.
Opportunity for Going Digital
This is not to say that the missing data could have been eliminated completely (as some variables are inherently prone to missingness or lack of subject response), but the digitization of the recording of this data could have greatly helped. Essentially less of the data would have been “lost in translation” from paper to computer. Moreover, when data are automatically taken or recorded in real time, there are less opportunities for human data entry errors (as is the case with reading and entering information off of paper forms).
The accessibility of the study protocol is another factor that could have greatly improved the success of this trial. Given that so many different people were at one point or another working on this trial, I can imagine that not everyone was completely familiar with the protocol. If the protocol were housed in some central, digital area, would the randomization parameters still have been violated? While I cannot answer this question with 100% confidence (there is always some form of doubt in statistics), I do believe that many more patients would have been randomized correctly.
In case you were wondering, the trial did produce some interesting results, and after several hours of careful consideration, analysis, and discussion, our abstract was accepted at a well-regarded conference. Overall, I can say my experience working on this trial was extremely eye-opening and thought-provoking. While I am grateful for this learning experience of working on a challenging study, I do hope that clinical trials continue to become increasingly digitized in the years to come. The amount of time, effort, and money that could be saved will be worthwhile, as this will allow for more focus to be placed on saving and improving the lives of millions of people worldwide.
You can connect with Victoria on LinkedIn.
Research Biostatistician at Weill Cornell MedicineGuest
Victoria Cooley is a biostatistician at Weill Cornell Medicine in NYC. She consults with investigators, analyzes data, prepares reports for grants and manuscripts, and teaches biostatistics. She has an M.S. in biostatistics from Columbia and previously worked on mitochondrial disease analysis at Columbia University Medical Center.