Methods:  United States Prevalence Outcome Platform



Methods:  United States Prevalence Outcome Platform

José A. Bartolomei Díaz, PhD


The Outcome Blogs that you will find on our website are usually centered around two main topics: (1) Outcomes, and (2) Methodological tips. For the first topic, “Outcomes”, we will be presenting outcome measures, calculated by us, on a variety of important issues such as health, the economy, and unemployment, among other topics. The Outcome measures we will be presenting are meant to both give the reader a sense of the types of outcome measures that can be calculated using various types of data as well as to stimulate conversation on a range of important issues. The second type of blog postings, “Methodological tips”, are meant to provide you with guidelines on how to better understand statistical reports and to share information with you on other types of methodological concepts. All of these posts will be relatively short and easy to read. It is our intent that the knowledge that you begin to accumulate from these posts will help to guide you in your path toward Believing using Evidence. The present post deals specifically with a methodological concept that we implement in our United States Prevalence Outcome Platform.

Data Source

Several types of data sources can be used to estimate the prevalence of an outcome, including systematically or routinely-collected survey information, such as nationally-administered health surveys. For example, you can find a list of data sets to estimate health related outcomes here. The data source that we will be referring in this post is the Behavioral Risk Factor Surveillance Survey (BRFSS). The BRFSS, administered and supported by the Behavioral Surveillance Branch of the Centers for Disease Control and Prevention (CDC) since 1984, is an ongoing cross-sectional health survey system administered for the purposes of tracking health conditions and risk behaviors (Mokdad, 2009). The BRFSS is a random-digit-dial telephone survey with a complex sample design. This sample design includes stratification, clustering, and unequal weighting to achieve a representative sample of the adult population 18 years of age or older living in households in the US States, District of Columbia and territories (Mokdad, 2009).

Survey Questions

Individuals surveyed as part of the BRFSS are asked to self-report on topics such as demographic factors, health-related quality of life, and chronic conditions, among other topics. Responses to those questions are then used to calculate statistical estimates including prevalence measures. As an example, below are some questions included within the BRFSS questionnaire which address health indicators of chronic diseases:

Has a doctor, nurse, or other health professional EVER told you that you had:

  • Heart attack: a heart attack also called a myocardial infarction?
  • Coronary heart disease: angina or coronary heart disease?
  • Stroke: a stroke?
  • Lifetime asthma: asthma?
  • Diabetes: diabetes?

Statistical Analysis

When we use BRFSS data to perform analyses, we usually calculate (although we are not restricted to) the following statistical estimates: (1) number of cases, prevalence, 95% confidence interval of the prevalence measure, adjusted odds ratio, and the standard error and p-value of the adjusted odds ratio. For our analyses, we use the R environment for statistical computing and pertinent R packages (R Core Team, 2017). The complex sampling scheme of the BRFSS survey is taken into account using the survey package(Lumley, 2014).

As this report is intended for a wide variety of audiences, we will define some of the statistical and epidemiological terms used throughout our work and discussions. The following definitions were obtained from the Dictionary of Epidemiology (Porta, Greenland, Hernán, Silva, & Last, 2014)

  • Prevalence: A measure of disease occurrence that quantifies the proportion of individuals in a population who have an Outcome (Ex., disease or condition). Mathematically it is the total number of individuals who have an attribute at a particular time divided by the population at risk of having the attribute at that time or midway through the period. It could be referred as to the situation of the disease or condition at a specified point in time.
  • Odds: The ratio of the probability of occurrence of an event to that of nonoccurrence, or the ratio of the probability that something is one way to the probability that it is another way.
  • Odds Ratio: The ratio of two odds. The ratio of the odds in favor of disease among the exposed to the odds in favor of disease among the unexposed.
  • Statistical Significance: The probability of the observed or a larger value of a test statistic under the null hypothesis. Often equivalent to the probability or the observed or larger degree of association under the null hypothesis.
  • Clinical Significance (Meaningfully): Difference in effect size considered to be important in medical decisions regardless of the degree of statistical significance.
  • Confounding: The distortion of a measure of the effect of an exposure on an outcome due to the association of the exposure with other factors that influence the occurrence of the outcome. Confounding occurs when all or part of the apparent association between the exposure and outcome is in fact accounted for by other variables that affect the outcome and are not themselves affected by exposure.
  • Confidence Interval: The conventional form of an interval estimate, computed in statistical analyses, based on the theory of frequency probability. If the underlying statistical model is correct and there is no bias, a confidence interval derived from a valid analysis will, over unlimited repetitions of the study, contain the true parameter with a frequency no less than its confidence level. Simplified, prevalence indicates the total number of individuals who match a case definition, such as a disease, at a particular time within a defined population. Many times, a 95% confidence interval is also reported along with the prevalence estimate, which is meant to provide a range within which the accuracy of the outcome can be evaluated. In short, prevalence estimates can be useful because knowing the number (or proportion) of individuals with the outcome of interest can aid in largely understanding the problem at hand as well as in resource allocation. With the aim of better understanding the outcome under observation it is a practice to try to identify sub-groups with higher possibilities of having the measured outcome. Furthermore, prevalence estimates can be calculated by sub-groups as well. These sub-groups might include populations defined by distinct demographics, health related quality of life factors, or other risk factors. Regression modeling strategies are often used to assess the risk of these particular sub-groups, and hypothesis testing is then employed to assess the statistical significance of the established hypothesis. Sometimes additional information about how the outcome is related to other health, risk, or demographic factors is desired when evaluating a population. Odds Ratios can be useful for describing the likelihood of a factor A being associated with a factor B in the population of interest, furthering one’s knowledge beyond a calculation of the prevalence of an event. Odds ratios are adjusted using multiple logistic regression models to control for the possibility of confounding. Adjustments to the odds ratio were made using the socio-demographic, health related quality of life factors, and other risk factors variables. The selection of variables used as controls in the model are not explicitly stated here.

When describing or evaluating a population, several types of variables inevitably come in handy as grouping or description tools: social, demographic and health-related quality of life variables.



Socio-demographic variables that may be present in our analysis are: age categories (18-24, 25-34, 35-44, 45-54, 55-64, 65-74, and 74 years or older); gender; educational attainment (not high school graduate, high school graduate, some years of college, and college graduate); yearly household income (<15,000, 15,000 to 25,000, 25,000 to 35,000, 35,000 to <50,000, and >50,000 US Dollars); marital status (divorced, married, never married, separated, unmarried couple, widow); and employment status (for wage, home maker, out of work, retired, self-employed, student, unable to work).

Final Remarks

With our outcome and methodological blogs, we strive to emphasize the importance of being well informed when using data to make important inferences and decisions related to any topic. Outcome Project firmly believes that having precise and factual information transforms businesses, institutions and even the world by strengthening its capacity to make accurate decisions. You too can be part of this endeavor, and we welcome you to join us at


Ford, E. S., Mannino, D. M., Homa, D. M., Gwynn, C., Redd, S. C., Moriarty, D. G., & Mokdad, A. H. (2003). Self-reported asthma and health-related quality of life: Findings from the behavioral risk factor surveillance system. CHEST Journal123(1), 119–127.

Hagerty, M. R., Cummins, R. A., Ferriss, A. L., Land, K., Michalos, A. C., Peterson, M., … Vogel, J. (2001). Quality of life indexes for national policy: Review and agenda for research. Social Indicators Research55(1), 1–96.

Hennessy, C. H., Moriarty, D. G., Zack, M. M., Scherr, P. A., & Brackbill, R. (1994). Measuring health-related quality of life for public health surveillance. Public Health Reports109(5), 665.

Lumley, T. (2014). Survey: Analysis of complex survey samples. r package version 3.28-2.

Mokdad, A. H. (2009). The behavioral risk factors surveillance system: Past, present, and future. Annual Review of Public Health30, 43–54.

Moriarty, D. G., Zack, M. M., & Kobau, R. (2003). The centers for disease control and prevention’s healthy days measures–Population tracking of perceived physical and mental health over time. Health and Quality of Life Outcomes1(1), 37.

Porta, M., Greenland, S., Hernán, M., Silva, I. D. S., & Last, J. M. (2014). A dictionary of epidemiology. Oxford University Press.

R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from

Strine, T. W., Chapman, D. P., Balluz, L. S., Moriarty, D. G., & Mokdad, A. H. (2008). The associations between life satisfaction and health-related quality of life, chronic illness, and health behaviors among us community-dwelling adults. Journal of Community Health33(1), 40–50.

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email


Receive the latest outcome measures on health, economy and unemployment, among other topics. You will also get insights on how to better understand statistical reports and information on other methodological concepts.

*By subscribing, I agree to the Outcome Project Privacy Policy and Terms of Services.