Methods in Measuring Surveillance Disease Data Quality, Somalia Open Access

Russell, Steven (2015)

Permanent URL:


Introduction: The ongoing conflict (2011-2015) and famine (2011-2012) in Somalia have presented challenges to collection of surveillance disease data. In order to inform effective intervention decisions in humanitarian emergencies, the data analyst needs a method to assess data quality based on the database alone. The purpose of this study is to develop methods that allow the data analyst to assess the data quality and to develop a standard index of data quality.

Methods: We scored each facility from 0 to 1 based on 10 individual data quality attributes: proportion of missing weeks, proportion of weeks where counts were all zeros, results of digit preference tests, results of sex ratio tests, proportion of weeks with reporting mistakes, proportion of weeks that did not sum correctly, proportion of duplicate weeks, proportion of weeks with duplicate case counts, and results for 2 different methods of outlier detection. Scores on each attribute were summed and each facility was given an overall score out of 10. A one-way Analysis of Variance (ANOVA) was run to test the differences in data quality scores between the 4 zones of Somalia.

Results: The overall data quality score for each of our 198 facilities, as well as the facility's score on each quality attribute, were calculated and summarized. Over all facilities, the data quality scores ranged from 0 to 1.418 (mean=.241, median= 0.130, sd =.270), with higher scores indicating more severe data quality issues. On average, facilities in the Southern Zone had the worst data quality scores (mean=.273) while facilities in Somaliland had the best scores (mean=.220). We found no significant differences between the data quality scores in different zones (F=0.38, p=0.77).

Discussion: In the future, we hope to further refine the current methods using other surveillance disease datasets. We feel that our data quality index can be an extremely valuable tool in evaluating and improving surveillance systems in humanitarian emergencies. It will allow the data analyst to pick up on unusual patterns that otherwise may have remained undetected and will improve the ability of those on the ground to reduce mortality and morbidity.

Table of Contents

1. Introduction. 1

1.1 Complex Humanitarian Emergencies. 1

1.2 Background for Somalia. 1

1.3 Surveillance. 1

2. Methods. 2

2.1 Completeness of the Data. 3

2.2 Internal Consistency. 3

2.3 Duplication. 4

2.4 Outliers. 5

2.5 Testing Differences Between Zones. 6

3. Results. 6

3.1 Southern Zone. 6

3.2 Central Zone. 8

3.3 Puntland Zone. 9

3.4 Somaliland Zone. 12

3.5 Overall Comparisons. 14

4. Discussion. 15

5. Conclusion. 16

6. Figures and Tables. 17

7. References. 41

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Partnering Agencies
Last modified

Primary PDF

Supplemental Files