Basic Survey Design Specifications
The sampling unit during sample selection refers to the entities that are selected for the survey. In this survey, the sampling units are randomly generated MPNs. For the NCD Mobile Phone Survey, stratification cannot occur until after the sample is selected. In such cases, two-phase sampling can be used to control the demographic composition of the final sample.
If sample design Option 1 is chosen, the NCD Mobile Phone Survey sample design will reflect a two-phase sample of mobile phone users from the sampling frame of possible MPNs with stratified sampling disproportionate to the MP user population in the second phase. Within each stratum, each mobile phone user linked to a single MPN will have the same probability of being selected if an equal-probability selection method (e.g., simple random sampling) is used to select the MPN in the first phase. However, the selection probabilities for users with access to multiple MPNs will be directly related to the number of mobile phones they can access (i.e., resulting in a multiplicity adjustment). Selection probabilities will also be inversely related to the number of users when multiple mobile phone users are linked to a single MPN (i.e., a MPN cluster) and only one of them is invited to be interviewed in the survey. These selection probabilities, along with the probability of selecting MPNs in the first phase, will be used in calculating sampling weights.
In the first phase for Option 1, a sample of MPNs is randomly selected from the range of possible MPNs. Respondents selected for the NCD Mobile Phone Survey are screened and assigned to one of several strata categorized by age group and sex. In the second phase, NCD data are collected on respondents within age/sex strata until individual stratum sample sizes are achieved or until the data collection period determined by countries has expired. Once sample size is met in a specific stratum, data are collected for the demographic questions, but not the NCD questions, until the targeted respondent sample sizes for all strata are attained.
As noted in the Introduction, certain requirements and recommendations should be followed to maximize the comparability of the results between countries that are conducting the NCD Mobile Phone Survey. However, each country has the option of introducing design enhancements that allow it to increase the usability of the results from this survey (e.g., selecting the sample to ensure precise estimates by region). In this section, we present some of the basic survey design requirements. Any design enhancement that a country wants to introduce will generally be acceptable provided it does not interfere with these basic requirements and, thus, contribute to a loss of intercountry comparability of survey estimates.
For the purposes of clarity in discussion of sampling features, including sampling weights and adjustments, the demographic questions and questions targeting multiplicity and clustering will be referred to as the Demographic Module and the NCD questions will be referred to as the NCD Module.
Sample Design Features
Requirements related to the sample design and sampling weights include the following:
- Simple random selection (SRS), without replacement, should be used so that every member of the explicit or implicit sampling frame of mobile phone users has a computable, nonzero chance of being selected into the sample.
- Sample strata are constructed based off the known distribution of the general population from official population data (e.g., recent census) because the distribution of the MP user population is not known. Once data is collected the MP user population distribution will be estimated based on age and sex. This estimated MP user population distribution and the known general population distribution will be used to adjust the sampling weights.
- Randomly selected mobile phone users should be enrolled into both the Demographic Module and the NCD Module until the appropriate respondent sample size for a given stratum is met or the data collection period has expired.
- Once a stratum respondent sample size is met, members of the fulfilled stratum should continue to contribute to the Demographic Module until all strata are filled. These data on age and sex will be necessary for the post data collection adjustment of the sampling weights, hence final recruitment status throughout data collection should be retained for later use.
- Survey nonparticipation because of ineligibility and nonresponse should be tracked to properly compute response rates in the NCD Mobile Phone Survey. For instance, MPNs may not be active, a mobile phone user may be under 18 years of age and ineligible for the survey, an eligible mobile phone user may refuse to participate, or an active MPN may lose connection to the mobile platform system during an interview. A list of all applicable disposition codes will be assigned to all MPNs selected for the survey to document eligibility and nonresponse. If local definitions of these codes are used, they should be standardized to the codes provided in this manual. Conversion rules of these codes between local and standard definitions should also be provided.
- Clustering and multiplicity associated with mobile phone usage should also be tracked. Clustering occurs when a single MPN is shared among multiple people. Multiplicity occurs when a respondent has access to multiple MPNs. This information is used to compute/adjust the base weight. The reciprocal of the responding user’s multiplicity and the cluster size for the MPN are multiplied to the inverse of the MPN’s selection probability to produce the adjusted base weight.
Sample Sizes and Expected Precision
Requirements and recommendations related to respondent sample size are based on the following indicators of statistical quality that were established for NCD Mobile Phone Survey:
- The survey should be designed to produce estimates that meet the following precision requirements: estimates computed at the national level by age, by sex, and by the cross of sex and age should have a 95% confidence interval with a margin of error of 5 percentage points or less for NCD risk factor rates of 50%.
- The design effect (Deffo) associated with any particular estimate from a survey is defined as the multiplicative factor increase in the variance of survey estimates because of complex survey design features, such as unequal weighting and clustering. The multiplicative effect as a result of variable weights, defined here as MeffWts, is multiplied times the comparable effect for cluster sampling (MeffCS) to produce the overall value of Deffo. By definition, Deffo is the ratio of the variance of an estimate based on the complex survey design relative to the corresponding variance of the same sample size using simple random sampling. While it is theoretically possible to observe Deffo < 1.00, in practice, the complex design features of a survey nearly always have a detrimental effect on precision of the estimates. Therefore, for survey studies with complex designs, the design effects will typically be greater than 1.00, and Deffo can be much greater than 1.00.
- For the NCD Mobile Phone Survey, sample weights will be variable mostly because of sample variation in second-phase stratum sampling rates. This variability among final sample weights increases the variance of survey estimates by a factor of MeffWts, which may be approximated by , where CVWts is the coefficient of variation among all sample weights. Previous work (Leo, 2015) suggests that MeffWts in mobile phone surveys can be sizable, with observed values of 1.8 in Zimbabwe, with about 80% mobile phone market penetration; 5.2 in Mozambique, with about 40% mobile phone penetration; 6.3 in Afghanistan, with about 60% penetration; and 11.6 in Ethiopia, with about 17% mobile phone penetration. Not surprisingly, MeffWts and mobile phone market penetration rates are inversely related because lower penetration requires more disproportionate sampling or calibration adjustments to the mobile phone user population to match relevant demographics of the NCD Mobile Phone Survey target population, thereby leading to more variable final weights.
- Most quantitative indicators of the statistical quality of estimates from sample surveys are mathematically related to the variance of estimates among all possible outcomes of the survey design. When limiting our attention to health-related estimates of some proportion (P) (informed by a country’s specific health policy and goals) in the population of those at risk of adverse health outcomes, we have that the variance of the estimator () among all possible outcomes of the sample design is approximately,
Where Deffo = MeffWts*MeffCS is the overall design effect (Gabler et al., 1999) MeffCS is the multiplicative increase in variance due to the use of cluster sampling, and n is the number of sample respondents used to produce the estimate of P. From Eq. (1) we see that statistical quality under all three sample design options would be influenced by n, the actual size of P and the value of MeffWts associated with the particular design option. Experience (Gabler, 1999) indicates that MeffWts could be quite large in Options 1 or 2, particularly in countries with lower MP penetration. In this instance, highly disproportionate sampling or major calibration adjustments are needed for the weighted respondent sample to match the target population for subgroup comparisons, and both MeffCS and MeffWts will be more moderate in size.
- The design of the sample should correctly reflect anticipated levels of nonresponse and ineligibility in determining how many MPNs should be selected to yield the recommended number of respondents. For example, a person selected for interview may refuse to participate (nonresponse). Similarly, a selected MPN may prove to not be in service or selected persons may indicate they are less than 18 years old and are therefore ineligible. Rates of eligibility and response among eligible mobile phone users are multiplied to determine the overall attrition rate in samples. These components of attrition during sample recruitment should be estimated as accurately as possible from recent relevant survey experience, regardless of which sample design option is proposed.
For example, suppose the survey in a country following Option 1 is designed to produce nf female respondents and nm male respondents, and it expects to observe the following parameters shown in Table 1.
Table 1. Example parameters for sample size computation
Active Number Rate (ANR)
||Accounts for those MPNs selected via RDD but determined to be non-active numbers
Eligibility Rate (ER)
||Accounts for those cases when respondents are interviewed for the survey and later determined to be ineligible (e.g., they younger than 18 years old)
Response Rate (RR)
||Accounts for those eligible respondents who are selected but do not complete the Demographic Module and at least one question in the NCD Module
The actual values assumed for the active number rate, eligibility rate, and response rates should be informed by the results of the pretest and information obtained about the population age and sex distribution from certified sources (e.g., recent census, United Nations projections).
The formula for calculating the effective sample size is as follows:
In addition, to achieve this effective sample size, we further need to determine how many MPNs to dial (nMPN) by adjusting for the active number rate (ANR), as follows:
where ER, RR, and ANR are defined in Table 1.
Additional guidelines for determining an appropriate sample size at each step of the NCD Mobile Phone Survey sample design are provided in Section 8.