Data collection for reportable diseases and epidemics has always been a focus for local, state, and federal health agencies in the US, and of great interest to health science librarians. In recent years, a key government initiative has been to “put public health data to work” and make it as transparently available as possible to any interested entity or individual, with the larger goal of free and easy access to the vast reservoir of data, in order to improve the nation’s health. Following this trend, on February 9, 2012, the Public Health Practice Committee of the International Society for Disease Surveillance (ISDS) hosted the webinar Public Health Surveillance in the Internet Cloud: The BioSense 2.0 Experience, featuring two key presenters. The inital segment of the webinar, with speaker Jeff Barr from Amazon, introduced basic principles of cloud computing and related security issues. The second speaker, Mike Alletto, a member of the BioSense 2.0 Redesign Team, provided a schematic of the BioSense 2.0 environment, including details regarding data storage and transmission. The presentation, including the electronic slides, was recorded and archived for viewing.
Jeff Barr began by comparing cloud computing to a utility service; it is available on demand, and the user only pays for resources used. He highlighted the Amazon Web Services (AWS) product and the Simple Storage Service (S3), which has been available since 2006. Since then, the number of objects stored on the server has increased immensely. Security for these systems takes place at several levels, such as physical security measures of housing the cloud servers in nondescript buildings with strictly controlled access. The physical server locations are based througout the world, in eight AWS Regions, four of which are in the US. This allows the user to have control over where the data is stored and information is processed. The AWS GovCloud is a special region designated for US government agencies and contractors. Over 100 agencies currently use this cloud.
Mike Alletto then continued the presentation with an introduction to BioSense, a program of the Centers for Disease Control and Prevention (CDC), that tracks health problems as they evolve and provides public health officials with the data, information, and tools needed to better prepare for and coordinate responses to safeguard and improve the health of the American people. Originally mandated by the Public Health Security and Bioterrorism Preparedness and Response Act of 2002, the CDC BioSense Program was launched in 2003, to establish an integrated national public health surveillance system for early detection and rapid assessment of potential bioterrorism-related illness, later expanded to include disease and syndrome data. In the initial version of BioSense, hospitals and health information exchanges sent data to CDC host computers in Atlanta. After a hardware infrastructure upgrade, individual states obtained the ability to “own” their data. The system was moved into cloud computing, utilizing Amazon’s GovCloud server rather than CDC computers. After these enhancements, BioSense received security accreditation and government approval to operate effective November 14, 2011.
The release of BioSense 2.0 transitioned away from the model of CDC owning the data. Hospitals, health exchanges, and other entities began sending data to the GovCloud server, after gaining permission from state authorities. The data is then stored in a locker owned by the state. Individual states also have the option of collecting data locally, and then transmitting the data to their state locker. In an effort to reduce costs, another option of data storage and transmission is being piloted, where individual state health departments have cloud computers collecting data from hospitals, health exchanges, etc., and the data is subsequently transmitted to the state locker. This approach gives state health administrators more control over their data, after approval is granted for sharing the data, which is then housed in the BioSense warehouse. The data is made available through the BioSense front end, allowing public health professionals to study trends or make other analyses involving single or multiple jurisdictions.
BioSense operates through open architecture, to facilitate open data sharing through application programming interfaces (APIs), after state authorities grant permission. Data is stored in a shared access area, and made available through BioSense front end software, in response to requests from state health authorities. One early example of a data sharing project was a collaboration between BioSense and the Tarrant County (TX) Public Health (TCPH) department, involving an effort to visualize TCPH health data collected by Biosense using Google Fusion Table technology, and making that visualization publicly available. The data consists of patient visits to hospital emergency departments associated with Tarrant County Public Health (TCPH) that had illness categories of Gastro Intestinal, Heat Related, or Upper Respiratory, divided by all emergency department visits that occurred for the same time period and in the geographic granularity for which the calculation was made.
Some data comes to BioSense with ICD-9 codes. This data is normalized into syndrome and sub-syndrome levels. Data is stored in both raw and normalized formats for access through the BioSense front end. Access to the secure data is currently limited to authorized state and local health administrators, as well as CDC users. Since the value of this massive data aggregation lies in its widespread availability, future plans for BioSense include providing data to universities and other public users. As BioSense continues to evolve to meet the needs of its users, feedback and recommendations are being sought from public health officials and others, in order to develop essential requirements for the new BioSense system, and inform the overall BioSense program. For more information, please visit the BioSense Program Redesign web site.