Data Science, Analytics and Collaboration for a Biosurveillance Ecosystem

ObjectiveWhile there is a growing torrent of data that disease surveillance could leverage, few effective tools exist to help public health professionals make sense of this data or that provide secure work-sharing and communication. Meanwhile, our ever more-connected world provides an increasingly r...

Full description

Saved in:
Bibliographic Details
Published inOnline journal of public health informatics Vol. 11; no. 1
Main Authors Stark, Karen, Shah, Amol, Borgman, JAcob, Somborac, Miko, Carson, Jeremy, Hauser, Lauren, Kola, Krishna, Virkar, Hermant
Format Journal Article
LanguageEnglish
Published University of Illinois at Chicago Library 30.05.2019
Subjects
Online AccessGet full text
ISSN1947-2579
1947-2579
DOI10.5210/ojphi.v11i1.9702

Cover

Abstract ObjectiveWhile there is a growing torrent of data that disease surveillance could leverage, few effective tools exist to help public health professionals make sense of this data or that provide secure work-sharing and communication. Meanwhile, our ever more-connected world provides an increasingly receptive environment for diseases to emerge and spread rapidly making early warning and collaborative decision-making essential to saving lives and reducing the impact of outbreaks. Digital Infuzion's previous work on the Defense Threat Reduction Agency (DTRA)'s Biosurveillance Ecosystem (BSVE) built a cloud-based platform to ingest big data with analytics to provide users a robust surveillance environment. We next enhanced the BSVE data sources and analytics to support an integrated One Health paradigm. The resulting BSVE and Digital Infuzion's HARBINGER platform include: 1) identifying and ingesting data sources that span global human, animal and crop health; 2) inclusion of non-health data such as travel, weather, and infrastructure; 3) the data science tools, analytics and visualizations to make these data useful and 4) a fully-featured Collaboration Center for secure work-sharing and communication across agencies.IntroductionAfter the 2009 H1N1 pandemic, the Assistant Secretary of Defense for Nuclear, Chemical and Biological Defense indicated “biodefense” would include emerging infectious disease. In response, DTRA launched an initiative for an innovative, rapidly emerging capability to enable real-time biosurveillance for early warning and course of action analysis. Through competitive prototyping, DTRA selected Digital Infuzion to develop the platform and next generation analytics. This work was extended to enhance collaboration capabilities and to harness data science and advanced analytics for multi-disciplinary surveillance including climate, crop, and animal as well as human data. New analysis tools ensure the BSVE supports a One Health paradigm to best inform public health action. Digital Infuzion and DTRA first introduced the BSVE to the ISDS community at the 2013 annual conference SWAP Meet. Digital Infuzion is pleased to present the mature platform to this community again as it is now a fully developed capability undergoing FedRAMP certification with the Department of Homeland Security’s National Biosurveillance Integration Center and Is the basis for Digital Infuzion's HARBINGER ecosystem for biosurveillance.MethodsWe integrated over 170 global One Health data sources using cloud-based automated data ingestion workflows that provide unified access with data provenance. We used modular automated workflows to implement data science including Natural Language Processing (NLP), machine learning, anomaly detection, and expert systems for extraction of concepts from unstructured text. A first of its kind ontology for biosurveillance permits linking of data across sources. This ontology allows users to rapidly find all relevant data by looking at semantic relationships within and across data sets having varying quality, types, and usages to understand the best, most complete indicators of impending threats.We applied the following principles to the development of data science tools: 1) mathematics should be fully automated and operate 'under the hood' without need for user intervention; 2) 'At-a-Glance' visualizations should summarize Information, draw attention to key aspects and permit drill down into underlying data; 3) data science analytics and tools need to be validated with real-world data and by disease surveillance experts and 4) secure collaboration capabilities are essential to biosurveillance activities.This was a highly complex effort. We worked closely with surveillance analysts from multiple agencies and organizations to continuously guide the development of capabilities. We drew upon subject matter expertise in public health, machine learning, social media, NLP, semantics, big data integration, computational science, and visualization. A high level of automation, security and immediacy of data was applied to support rapid identification and investigation of potential outbreaks.ResultsThe platform now provisions integrated One Health information. Data sources were harmonized and expanded, along with historical information, to better predict and understand biothreats. These include global social media, human, plant, animal, and weather data. An Analyst Workbench delivers logical, intuitive and interactive visualizations enabling disease surveillance professionals to identify critical, predictive information without extensive manual research. Over 700 approved users currently have access to the prototype.Biosurveillance activities can be performed collaboratively among governmental agencies, public health officials, and the general public using the Collaboration Center and its sharing and messaging systems. Data sharing is HIPAA compliant and distinguishes public from private data using carefully controlled and approved role- and attribute-based access for security.To speed disease surveillance workflows, the workbench generates suggestions to the user on their current work. Anomaly detection to alert to potential developing disease events employs fully automated analytics to conduct over 43 million calculations daily for more than 500 diseases in over 170 data sources, distilling this into a table that ranks the most significant anomalous increases that may indicate an outbreak and warrant investigation.A predictive disease modeling tool based on current and historical data uses fuzzy logic to identify the likeliest outcome, even early in an outbreak when there is much uncertainty about the disease and its characteristics. A complex automated workflow identifies health-related topics that are trending in Twitter and evaluates their severity using novel lexicons and new reactive sentiment analysis. Searches use the ontology to gather all relevant information and are supported by the most advanced NLP with custom surveillance rules to provide succinctly extracted information. This alleviates the need for extensive reading by identifying exactly which data is needed and extracting key concepts from it. Intuitive methods of visual representation, interactive displays, and drill-down capabilities were leveraged in all analytics for rapid understanding of results.Finally, we added a software development kit to enable third party developers to continuously enhance the platform capabilities by adding new data sources and new analytic apps. This allows the platform to be adapted for specific needs and to keep pace with new scientific and technical discoveries and has resulted in over 50 analytic apps.ConclusionsThe addition of One Health data and analytics, and the integration of health data with unconventional data sources and modern approaches to data science and complex workflows, resulted in enhanced situational awareness and decision-making capabilities for users. The expanded Collaboration Center within the workbench, enables users to partner and collaborate with other agencies and biosurveillance professionals both nationally and internationally to maximize the rapidity of responses to serious disease outbreaks.
AbstractList ObjectiveWhile there is a growing torrent of data that disease surveillance could leverage, few effective tools exist to help public health professionals make sense of this data or that provide secure work-sharing and communication. Meanwhile, our ever more-connected world provides an increasingly receptive environment for diseases to emerge and spread rapidly making early warning and collaborative decision-making essential to saving lives and reducing the impact of outbreaks. Digital Infuzion's previous work on the Defense Threat Reduction Agency (DTRA)'s Biosurveillance Ecosystem (BSVE) built a cloud-based platform to ingest big data with analytics to provide users a robust surveillance environment. We next enhanced the BSVE data sources and analytics to support an integrated One Health paradigm. The resulting BSVE and Digital Infuzion's HARBINGER platform include: 1) identifying and ingesting data sources that span global human, animal and crop health; 2) inclusion of non-health data such as travel, weather, and infrastructure; 3) the data science tools, analytics and visualizations to make these data useful and 4) a fully-featured Collaboration Center for secure work-sharing and communication across agencies.IntroductionAfter the 2009 H1N1 pandemic, the Assistant Secretary of Defense for Nuclear, Chemical and Biological Defense indicated “biodefense” would include emerging infectious disease. In response, DTRA launched an initiative for an innovative, rapidly emerging capability to enable real-time biosurveillance for early warning and course of action analysis. Through competitive prototyping, DTRA selected Digital Infuzion to develop the platform and next generation analytics. This work was extended to enhance collaboration capabilities and to harness data science and advanced analytics for multi-disciplinary surveillance including climate, crop, and animal as well as human data. New analysis tools ensure the BSVE supports a One Health paradigm to best inform public health action. Digital Infuzion and DTRA first introduced the BSVE to the ISDS community at the 2013 annual conference SWAP Meet. Digital Infuzion is pleased to present the mature platform to this community again as it is now a fully developed capability undergoing FedRAMP certification with the Department of Homeland Security’s National Biosurveillance Integration Center and Is the basis for Digital Infuzion's HARBINGER ecosystem for biosurveillance.MethodsWe integrated over 170 global One Health data sources using cloud-based automated data ingestion workflows that provide unified access with data provenance. We used modular automated workflows to implement data science including Natural Language Processing (NLP), machine learning, anomaly detection, and expert systems for extraction of concepts from unstructured text. A first of its kind ontology for biosurveillance permits linking of data across sources. This ontology allows users to rapidly find all relevant data by looking at semantic relationships within and across data sets having varying quality, types, and usages to understand the best, most complete indicators of impending threats.We applied the following principles to the development of data science tools: 1) mathematics should be fully automated and operate 'under the hood' without need for user intervention; 2) 'At-a-Glance' visualizations should summarize Information, draw attention to key aspects and permit drill down into underlying data; 3) data science analytics and tools need to be validated with real-world data and by disease surveillance experts and 4) secure collaboration capabilities are essential to biosurveillance activities.This was a highly complex effort. We worked closely with surveillance analysts from multiple agencies and organizations to continuously guide the development of capabilities. We drew upon subject matter expertise in public health, machine learning, social media, NLP, semantics, big data integration, computational science, and visualization. A high level of automation, security and immediacy of data was applied to support rapid identification and investigation of potential outbreaks.ResultsThe platform now provisions integrated One Health information. Data sources were harmonized and expanded, along with historical information, to better predict and understand biothreats. These include global social media, human, plant, animal, and weather data. An Analyst Workbench delivers logical, intuitive and interactive visualizations enabling disease surveillance professionals to identify critical, predictive information without extensive manual research. Over 700 approved users currently have access to the prototype.Biosurveillance activities can be performed collaboratively among governmental agencies, public health officials, and the general public using the Collaboration Center and its sharing and messaging systems. Data sharing is HIPAA compliant and distinguishes public from private data using carefully controlled and approved role- and attribute-based access for security.To speed disease surveillance workflows, the workbench generates suggestions to the user on their current work. Anomaly detection to alert to potential developing disease events employs fully automated analytics to conduct over 43 million calculations daily for more than 500 diseases in over 170 data sources, distilling this into a table that ranks the most significant anomalous increases that may indicate an outbreak and warrant investigation.A predictive disease modeling tool based on current and historical data uses fuzzy logic to identify the likeliest outcome, even early in an outbreak when there is much uncertainty about the disease and its characteristics. A complex automated workflow identifies health-related topics that are trending in Twitter and evaluates their severity using novel lexicons and new reactive sentiment analysis. Searches use the ontology to gather all relevant information and are supported by the most advanced NLP with custom surveillance rules to provide succinctly extracted information. This alleviates the need for extensive reading by identifying exactly which data is needed and extracting key concepts from it. Intuitive methods of visual representation, interactive displays, and drill-down capabilities were leveraged in all analytics for rapid understanding of results.Finally, we added a software development kit to enable third party developers to continuously enhance the platform capabilities by adding new data sources and new analytic apps. This allows the platform to be adapted for specific needs and to keep pace with new scientific and technical discoveries and has resulted in over 50 analytic apps.ConclusionsThe addition of One Health data and analytics, and the integration of health data with unconventional data sources and modern approaches to data science and complex workflows, resulted in enhanced situational awareness and decision-making capabilities for users. The expanded Collaboration Center within the workbench, enables users to partner and collaborate with other agencies and biosurveillance professionals both nationally and internationally to maximize the rapidity of responses to serious disease outbreaks.
Author Virkar, Hermant
Borgman, JAcob
Carson, Jeremy
Shah, Amol
Kola, Krishna
Stark, Karen
Somborac, Miko
Hauser, Lauren
Author_xml – sequence: 1
  givenname: Karen
  surname: Stark
  fullname: Stark, Karen
– sequence: 2
  givenname: Amol
  surname: Shah
  fullname: Shah, Amol
– sequence: 3
  givenname: JAcob
  surname: Borgman
  fullname: Borgman, JAcob
– sequence: 4
  givenname: Miko
  surname: Somborac
  fullname: Somborac, Miko
– sequence: 5
  givenname: Jeremy
  surname: Carson
  fullname: Carson, Jeremy
– sequence: 6
  givenname: Lauren
  surname: Hauser
  fullname: Hauser, Lauren
– sequence: 7
  givenname: Krishna
  surname: Kola
  fullname: Kola, Krishna
– sequence: 8
  givenname: Hermant
  surname: Virkar
  fullname: Virkar, Hermant
BookMark eNp1kM1KAzEURoNUsNbuXeYBnJqkk0yyEWqtWii4UNchkx-bMk1KMi307Z22Iip4N_fCx_kunEvQCzFYAK4xGlGC0W1cbZZ-tMPY45GoEDkDfSzKqiC0Er0f9wUY5rxC3YwrikvcB_MH1Sr4qr0N2t7ASVDNvvU6QxUMnMamUXVMqvUxQBcTVPDex7xNO-u7qEPgTMe8z61dX4Fzp5psh197AN4fZ2_T52Lx8jSfThaFxpySgtMSOYNtbcbEMGGwEUQjQUtBmWMME6Q4Rc46x5S2XCnDEK-54CUVlGIxHoC7U-9mW6-t0Ta0STVyk_xapb2MysvfSfBL-RF3kjHECMddAToV6BRzTtZ9sxjJg0551CmPOuVBZ4ewP4j27dFK98E3_4Of6n1_8g
CitedBy_id crossref_primary_10_18034_ajase_v10i1_17
crossref_primary_10_3390_pathogens10101348
ContentType Journal Article
Copyright ISDS Annual Conference Proceedings 2019 2019 2019 the author(s)
Copyright_xml – notice: ISDS Annual Conference Proceedings 2019 2019 2019 the author(s)
DBID AAYXX
CITATION
5PM
DOI 10.5210/ojphi.v11i1.9702
DatabaseName CrossRef
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 1947-2579
ExternalDocumentID PMC6606281
10_5210_ojphi_v11i1_9702
GroupedDBID 5VS
AAYXX
ADBBV
AFMMW
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BAWUL
BCNDV
CITATION
DIK
F5P
FRP
GROUPED_DOAJ
GX1
H13
HYE
KQ8
M48
M~E
OK1
RNS
RPM
TR2
5PM
ID FETCH-LOGICAL-c1852-8540fd1ebd32d69d1d92c0954956f66120a850feff6ace8aad608b89845955193
IEDL.DBID M48
ISSN 1947-2579
IngestDate Thu Aug 21 14:13:26 EDT 2025
Tue Jul 01 00:56:03 EDT 2025
Thu Apr 24 23:12:20 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1852-8540fd1ebd32d69d1d92c0954956f66120a850feff6ace8aad608b89845955193
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.5210/ojphi.v11i1.9702
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_6606281
crossref_primary_10_5210_ojphi_v11i1_9702
crossref_citationtrail_10_5210_ojphi_v11i1_9702
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-05-30
20190530
PublicationDateYYYYMMDD 2019-05-30
PublicationDate_xml – month: 05
  year: 2019
  text: 2019-05-30
  day: 30
PublicationDecade 2010
PublicationTitle Online journal of public health informatics
PublicationYear 2019
Publisher University of Illinois at Chicago Library
Publisher_xml – name: University of Illinois at Chicago Library
SSID ssj0000375141
Score 2.0721257
Snippet ObjectiveWhile there is a growing torrent of data that disease surveillance could leverage, few effective tools exist to help public health professionals make...
SourceID pubmedcentral
crossref
SourceType Open Access Repository
Enrichment Source
Index Database
SubjectTerms Abstract
Title Data Science, Analytics and Collaboration for a Biosurveillance Ecosystem
URI https://pubmed.ncbi.nlm.nih.gov/PMC6606281
Volume 11
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1947-2579
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000375141
  issn: 1947-2579
  databaseCode: KQ8
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1947-2579
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000375141
  issn: 1947-2579
  databaseCode: DOA
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1947-2579
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000375141
  issn: 1947-2579
  databaseCode: DIK
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1947-2579
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000375141
  issn: 1947-2579
  databaseCode: GX1
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1947-2579
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000375141
  issn: 1947-2579
  databaseCode: M~E
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1947-2579
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000375141
  issn: 1947-2579
  databaseCode: RPM
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1947-2579
  dateEnd: 20190531
  omitProxy: true
  ssIdentifier: ssj0000375141
  issn: 1947-2579
  databaseCode: M48
  dateStart: 20091201
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZQkRAL4inKo_LAgkRCnMSOPSEoLQWpTFTqFjl2rAZVSelL8O-xnVAaqWJgyRLf8vlx3-nuvgPgSkqMFBeho7kFdkIqE4cqJh3JfYIFEQTZ4vH-K-kNwpchHv62R1cAzjaGdmae1GA6dj8_vu70hdf81cWmA6V4n4wyd4lQhlwWGWXJbe2XfHPG-xXZt-9yEGl2gMpc5UbDmm-q10iuOZ3uPtir2CK8L7f3AGyl-SHY6Vf58CPw_MjnHFb38wZahRGjuwx5LmF7fYuhJqeQw4esmC2my9TMGtImsCOKUsv5GAy6nbd2z6mGIzjC9Ds7VFMtJVGayMCXhEkkmS88k7TDRGmn63ucYk-lShEuUsq5JB5NKKMhZtjQthPQyIs8PQWQMZwyGVHFuQ239NpAe_JABPrxU4w1we0PLLGolMPNAItxrCMIA2RsgYwtkLEBsgmuVxaTUjXjj7VRDemVgRG-rv_Js5EVwCbEdH6is39bnoNdTXyYrQLwLkBjPl2kl5pczJOWDcr192mIWvb8fAPFl9aX
linkProvider Scholars Portal
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Science%2C+Analytics+and+Collaboration+for+a+Biosurveillance+Ecosystem&rft.jtitle=Online+journal+of+public+health+informatics&rft.au=Stark%2C+Karen+A.&rft.au=Shah%2C+Amol&rft.au=Borgman%2C+Jacob&rft.au=Somborac%2C+Miko&rft.date=2019-05-30&rft.pub=University+of+Illinois+at+Chicago+Library&rft.eissn=1947-2579&rft.volume=11&rft.issue=1&rft_id=info:doi/10.5210%2Fojphi.v11i1.9702&rft.externalDocID=PMC6606281
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1947-2579&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1947-2579&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1947-2579&client=summon