Poster Presentation - Joint Biostatistics Symposium at Ohio State University (2014)

One of the main talks highlighted our novel text mining of free-text responses of prediagnostic symptoms among ovarian cancer survivors. In order to trace the diagnostic path of each of these respondents, we adopted a modified version of the vector space model (VSM) to support our text-mining techniques. This later became published in the Journal of Cancer Informatics: Design and Implementation of a Comprehensive Web-based Survey for Ovarian Cancer Survivorship with Analysis of Pre-diagnostic Symptoms via Text Mining


“Ovarian cancer is the most lethal gynecologic disease in the United States, with more women dying from this cancer than all gynecological cancers combined. Ovarian cancer has been termed the “silent killer” because many patients do not show clear symptoms at an early stage. Currently, there is no approved early diagnostic tool; effective treatments for late-stage patients are limited.However, more than 80% of ovarian cancer patients actually showed symptoms, even when the disease was still limited to the ovaries. Some late-stage patients do live a long time. To research and find some possible reasons or factors that contribute to the long-term survivorship of ovarian cancer survivors, we designed and conducted anonline comprehensive Ovarian Cancer Survivorship Survey from 2009 to 2013, hosted at the Women Cancer Network, wcn.org. The survey included 1502 fields grouped into 15 groups of questions encompassing all stages of ovarian cancer management from the patient or her caregiver’s perspective, including initial symptoms that led to diagnosis, biomarkers, medical history, environment, treatments and pre-and post-diagnosis lifestyle. In this talk, we showcase our valuable survey, present methodological challenges in studying such resulting data and its connection to Patient-Reported Outcome Measurement Information System (PROMIS)–related research, provide a glimpse of our OVA-CRADLE (Clinical Research Analytics and Data Lifecycle Environment), and then focus on our analyses and findings in the pre-diagnostic symptoms, using a combination of text mining and statistics. This work is a testimony of the collaboration by a team of researchers from multiple scientific areas and stakeholders.”

About the Symposium

This was a nervewracking conference for me, as one of the main speakers was my advisor and she was presenting on a massive chunk of my part of the project with text mining and dataviz. My classmate Jang presented on image data program a bunch of us had been working on together, of which he has since taken the lead.


Carter, R.R., DiFeo, A., Bogie, K., Zhang, GQ, & Sun, J. (April, 2014). Crowdsourcing Awareness: A Nationwide Ovarian Cancer Survey with Amazon Mechanical Turks. Poster presentation at the Joint Biostatistics Symposium at Ohio State University in Columbus, OH.

Cho, J.I., Xu, Y., Carter, R.R., Matthiesen, J., Wang, X., & Sun, J. (April, 2014). LAAR: A New Technique for Analyzing Large Sequences of Image Data. Poster presentation at the Joint Biostatistics Symposium at Ohio State University in Columbus, OH.