Nowadays, health data sources are abundant on the Web, including patient-generated sources through social networks and open government data. Healthcare is changing from the traditional authoritative provider-centric model to collaborative and patient-oriented care. In this environment with a deluge of data, stakeholders, including medical providers, patients, and government officials, need to be able to analyze the data streams for patient care as well as for general population trend analysis. The aim of this dissertation is to develop and apply semantic web, machine learning, and network analysis techniques to address the research challenges in social health data analytics. Specific objectives are:

(1) Semantic integration of health data: Open health data can vary from structured to highly unstructured. An information seeker has to spend time visiting many, possibly irrelevant, websites, and has to select information and integrate it into a coherent mental model. Therefore, these streaming data sources need to be integrated. For this purpose, a semantic framework called “Social InfoButtons” was developed to allow the retrieval and linking of the data, reasoning with it and health pattern analysis.

(2) Sentiment mining in social media: Tools for quantifying health-related sentiments in social media will be provided for the stakeholders to supplement the current sentiment surveillance systems. The existing methods, such as questionnaires and clinical tests, can only cover few people and results often appear with long delays. A sentiment mining method based on machine learning and natural language mining techniques was developed to gauge the Degree of Concern (DOC) expressed in tweets about contagious diseases. The discovery of public health concerns can help governments to make timely decisions to refute rumors, and prevent potential panic reactions.

(3) Monitoring developing epidemics with social media: The Epidemics Outbreak and Spread Detection System (EOSDS) prototype was developed to make use of real-time information mined from Twitter. EOSDS, containing interactive visual analytics components, can support users that are not technically savvy for easy use of the system. EOSDS provides three different visualization methods of spreading epidemics: static, distribution, and filter map, to investigate public health threats in space and time.

The problems, technical challenges, solutions, and applications are summarized by the following figure:

Copyright 2011-2015

(973) 596-300. Maintained by Xiang JI