Estimating the changing prevalence of tuberculosis infection in the United States, 1971-2015 Open Access

Haddad, Maryam B. (Spring 2019)

Permanent URL:


This dissertation examines the U.S. prevalence of latent tuberculosis infection (LTBI), and the relationship of LTBI with diabetes, during 1971–2015. The only LTBI test for which we have longitudinal results is the tuberculin skin test (TST), which was part of the National Health and Nutrition Examination Survey (NHANES) in 1971–1972, 1999–2000, and 2011–2012.

Based on NHANES 1971–1972, approximately 14% of the noninstitutionalized civilian adult population, more than twice the currently estimated 3%–6, would have a positive result if administered a TST. Simultaneously, the prevalence of diabetes increased sharply and is now more prevalent than LTBI.   

NHANES is designed to provide accurate and stable estimates of conditions with prevalence of ≥10%. Because NHANES samples approximately 30 counties in each 2-year cycle, a single cycle may be inadequate for uncommon health conditions with geographic variation. Additionally, TST results are missing for 1 in 5 eligible participants across all 3 cycles. We assessed several potential sources of bias in NHANES-based estimates of LTBI prevalence. We also scrutinized LTBI’s relationship with diabetes in 2011–2012. 

Back-calculating from genotyping results in the National Tuberculosis Surveillance System in 2011–2015, we derived a non-NHANES estimate of LTBI prevalence for all 3,143 U.S. counties (or equivalents). Similar to the conventional NHANES 2011–2012 estimate, our overall estimate is that 8.9 million (uncertainty limits = 6.3–14.8 million) of the U.S. population has LTBI.

We found no evidence that county sampling biased NHANES-based estimates of LTBI prevalence. Estimates changed little in an analysis that accounted for the selection of multiple participants from the same household, reclassified borderline-positive TST results, adjusted for TST item nonresponse, and considered non-U.S. birth distributions.

We concluded that a conventional analysis for examining LTBI in previous NHANES cycles appears robust. On the other hand, analysis of the overall association between diabetes and a positive TST in 2011–2012 would miss the finding that the association was driven by findings among Hispanic and Asian NHANES participants and thus might not generalize to the entire U.S. population. 

Table of Contents

CHAPTER 1 ― Introduction

Dissertation motivation and overview of aims


Anticipated contributions and overview of this research


CHAPTER 2 ― Literature review for tuberculosis (TB) infection    

Conceptualization of TB infection             

Origins of modern understanding of latent TB infection (LTBI)     

Development of the tuberculin skin test for detecting TB infection            

Development of interferon-gamma release assays for detecting TB infection        

Typical diagnostic dichotomy between “latent” infection and “active” disease     

Dynamic state of “latent” TB infection   

Basic public health strategies for TB control and prevention        

Biologic mechanisms for relationship between TB risk and other conditions          

Consideration of risk factors in prioritizing LTBI treatment            

Recent infection as the strongest risk factor for progression        

Youth as a strong risk factor for progression        

HIV infection as a risk factor for progression and possibly for infection    

Diabetes as a risk factor for progression and possibly for initial infection

Genetic susceptibility and race as potential risk factors for either infection or progression             

Older age cohort as a risk factor for infection      

Being underweight, tall, and malnourished as risk factors for progression              

Chronic kidney disease and dialysis as risk factors for progression             

Treatment for autoimmune diseases as a risk factor for progression         

Vitamin D deficiency as an uncertain risk factor for TB     

Tobacco and other substance use as risk factors for infection and progression     

Estimating the national prevalence of TB infection           

No ongoing population-based national surveillance for LTBI         

First LTBI prevalence surveys in United States: Framingham in 1917, Minnesota in early 1930s      

LTBI prevalence among nursing students in 1943–1949: geographic variability first noted

LTBI prevalence in 1949–1951: overall declines but persistent geographic variability         

LTBI prevalence among Navy recruits in 1958–1969, 1990: differences by national origin, race     


CHAPTER 3 ― National Health and Nutrition Examination Survey (NHANES)         

NHANES overview          

Three NHANES cycles with a tuberculin skin test (TST)    

NHANES applicability for rare conditions               

Sampling design parameters: major strata and primary sampling units (PSUs)      

Unmasked versus masked design parameters    

NHANES participant weights      

American Community Survey (ACS) and Current Population Survey (CPS) population distributions              

Oversampling for the NHANES subdomains of interest   

Target population and sampling frame for the 35 PSUs selected in 1971–1972      

LTBI component of NHANES 1971–1972: primarily white U.S.-born cohort born before TB treatment available

Target population and sampling frame for the 27 PSUs selected in 1999–2000      

LTBI component of NHANES 1999–2000: a more diverse cohort born at any time during 1900s     

Target population and sampling frame for the 28 PSUs selected in 2011–2012      

LTBI component of NHANES 2011–2012: widening disparity between U.S.- and non-U.S.-born     


CHAPTER 4 ― NHANES missing data       

Unit versus item nonresponse  

NHANES design strategies to improve participation: increasing amounts of remuneration and preferential selection of households with multiple eligible participants    

Differential nonparticipation (unit nonresponse) across race/ethnic subdomains

Differential nonparticipation (unit nonresponse) based on income level 

Patterns of item nonresponse: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR)              

Missing TST results (21% item nonresponse) in NHANES 1971–1972          

Missing TST results (16% item nonresponse) in NHANES 1999–2000          

Missing TST results (22% item nonresponse) in NHANES 2011–2012          


CHAPTER 5 ― Development of external validation dataset for use in aim 1            

3,143 county equivalents in the United States    

County population size in 1970  

County population size in 2000  

County population size in 2010  

U.S. Department of Agriculture Rural-Urban Continuum Codes   

U.S. Census Bureau Small Area Income and Poverty Estimates    

National TB Surveillance System (NTSS)

Method for estimating LTBI prevalence 

County-level LTBI prevalence estimates

National map of county-level LTBI prevalence estimates.

Selected characteristics of the 1,976 rural and the 1,167 metropolitan U.S. counties by estimated prevalence of latent Mycobacterium tuberculosis infection — United States, 2011–2015.         

Conclusions about county-level LTBI prevalence estimates           



Influence of geography on national estimates of M. tuberculosis infection prevalence     

Aim 1 Abstract 

Aim 1 Background          

Aim 1 Methods

Creation of dataset with non-NHANES variables for each county

Access to restricted data

Masked analysis of the TB experience in selected versus not selected counties   

Comparison of non-NHANES genotyping-derived LTBI prevalence estimates with NHANES TST results       

Aim 1 Results    

Aim 1 Discussion             



Robustness of NHANES estimates of the U.S. prevalence of a positive tuberculin skin test

Aim 2 Abstract 

Aim 2 Background          

Aim 2 Methods

Data sources and target populations      

Outcome of interest and frequency of item nonresponse             

Statistical approach        

Conventional analysis and analysis using additional sampling design parameters

Influence of preferential selection of multiple participants from same household

Record-level reclassification to address digit preference for borderline-positive TST results           

TST item nonresponse patterns

Participant profiles based on NHANES analytic subdomains          

Aim 2 Table 1. Unweighted participation in tuberculin skin test (TST) component of the National Health and Nutrition Examination Survey (NHANES) Examination, by response to interview questions — 1999–2000 and 2011–2012.

Aim 2 Table 2. Stratified participant profiles created for this NHANES analysis, showing weighted TST results, including effects of reclassification of borderline-positive and adjustments for missing TST results, 1999–2000 and 2011–2012.

Imputation for TST item nonresponse    

Birth distributions among Hispanic and Asian participants             

Aim 2 Results    

Similar estimates with masked and unmasked datasets, and with additional design parameters   

No evidence of bias due to preferential selection of households

Patterns of TST item nonresponse as a potential source of bias   

Modest effect from reclassification of borderline-positive TST results      

Modest effects of imputation for TST item nonresponse

No need to reweight for non-U.S. birth among Hispanic and Asian participants

Aim 2 Figure. Pooled 95% confidence intervals and point estimates for prevalence of tuberculin skin test (TST) ≥10 mm in overall U.S. noninstitutionalized civilian population based on NHANES in 1971–1972, 1999–2000, and 2011–2012.               

Aim 2 Discussion             



Aim 3 Background          

Aim 3 Objective

Aim 3 Methods

Conceptual approach    

Data source and study population            

Definition of exposures and outcomes of interest             

Statistical approach        

Determine unweighted and weighted prevalence of diabetes, prediabetes, and a positive TST      

Arrange participants by age group, diabetes status, and TST results — overall and stratified by race/ethnicity

Consider influence of birthplace on diabetes/LTBI association     

Determine odds ratios for association between diabetes/prediabetes and a positive TST 

Rationale for adjusting for age when examining diabetes/LTBI association in NHANES

Aim 3 Results

Unweighted and weighted prevalence of diabetes and prediabetes, and of a positive TST

Association between diabetes/prediabetes and LTBI only apparent among Hispanic and Asian participants

Aim 3 Discussion             



Summary of findings and public health implications         

U.S. population changes and NHANES changes, yet persistent inequities in LTBI distribution          

Contribution to the field and future directions   

An example of quantitative bias analysis applied to measures of prevalence         

Consideration of race/ethnicity when examining diabetes/LTBI association           

Awareness of caveats to consider when relying on NHANES data

Alternative to NHANES for estimating LTBI prevalence for any given jurisdiction  

TB need not remain “a manifestation of social misery”   



References first mentioned in chapter 1

References first mentioned in chapter 2

References first mentioned in chapter 3

References first mentioned in chapter 4

References first mentioned in chapter 5

References first mentioned in chapter 6

References first mentioned in chapter 7

References first mentioned in chapter 8

References first mentioned in chapter 9



Literature review appendix 1. Positive TST prevalence among nursing students by home state — 1943.     

Literature review appendix 2. Positive TST prevalence among Naval recruits by home county — 1958–1965.

Literature review appendix 3. Positive TST prevalence among men aged 17–21 as they entered the U.S. Navy, stratified by their home county and race — 1958–1961.     

NHANES appendix 1. Overview of the 3 NHANES cycles with a TB component.     

Overview of design changes.      

76 analytic subdomains for NHANES 1999–2000.                

97 analytic subdomains for NHANES 2011–2012.                

Major data elements and participation overview across all 3 NHANES cycles.        

NHANES appendix 2. Data dictionary for public-use and restricted variables.        

NHANES public-use dataset variables.    

NHANES restricted variables available only in the NCHS Research Data Center.    

NHANES appendix 3. NHANES sampling frame details.    

NHANES instructions for how to collapse and recode 1971–1972 study locations to enable accurate variance estimation.        

NHANES sampling frame for 2011–2012.

Map showing the 14 major strata used in the design of NHANES 2011–2012.         

Missing data and NHANES 1971–1972 appendix.

TST completeness by exam date and study location, as recorded in the NHANES 1971‒1972 public-use dataset.

Demographic characteristics of NHANES participants with and without TST results, 1971‒1972.    

Overview of TST results in 1971–1972.    

Overview of diabetes screening results in 1971–1972.     

TST results among NHANES 1971–1972 adult participants by diabetes status and race.    

Aim 1 appendix 1. County-level variables developed for Aim 1 in the NCHS Research Data Center.              

Aim 1 appendix 2. Formula and examples of method for back-calculating an estimate for the prevalence of latent Mycobacterium tuberculosis infection.  

Aim 1 appendix 3. Demographic characteristics pertinent to TB, and historic, recent, and modern TB disease incidence, among all 3,143 counties during NHANES 1971–1972, 1999–2000, and 2011–2012.

Aim 1 appendix 4. Median TB disease incidence, by NHANES 2011–2012 major strata, PSUs, and counties              .

Aim 2 appendix 1. Unweighted TB infection test results by TB history — as recorded in NHANES public-use datasets, 1999–2000 and 2011–2012.        

Aim 2 appendix 2. Influence of using different NHANES sampling design parameters on the estimated population prevalence of a TST result ≥10 mm.

Aim 2 appendix 3. SAS and SUDAAN code for replicating Aim 2 analysis.

Aim 3 appendix 1. Effects of different criteria to define diabetes and prediabetes using NHANES data.      

Number of NHANES participants (all ages) classified as having diabetes, prediabetes, or neither for aim 3.              

Aim 3 Table 1. Demonstration of sensitivity to various definitions of diabetes and prediabetes among NHANES participants aged ≥20 years, 1999–2000 and 2011–2012.               

Aim 3 Table 2. Weighted prevalence of diabetes and prediabetes among NHANES participants aged ≥20 years, 1999–2000 and 2011–2012.     

Aim 3 Table 3. Positive TST prevalence among NHANES participants aged ≥20 years, 1999–2000 and 2011–2012, based on TST results in public-use datasets and after record-level reclassification of 39 borderline-positive TST results in 2011–2012.        

Aim 3 stratified tables — positive TST and IGRA results stratified by diabetes status and race/ethnicity among NHANES 1999–2000 and 2011–2012 participants aged ≥20 years .

Stratified tables for black non-Hispanic NHANES participants in 1999–2000 and 2011–2012.

Stratified tables for Hispanic NHANES participants in 1999–2000 and 2011–2012.

Stratified tables for Asian NHANES participants in 2011–2012.

Stratified tables for white non-Hispanic NHANES participants in 1999–2000 and 2011–2012.

Aim 3 Table 4. Odds ratios for association between diabetes and a positive TST among NHANES participants aged ≥20 years, 1999–2000 and 2011–2012.

Aim 3 Table 5. Odds ratios for association between prediabetes and a positive TST among NHANES participants aged ≥20 years, 1999–2000 and 2011–2012.

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files