# Can you explain the theory?

**1: How does SUDAAN handle singleton clusters for variance estimation using the Taylor linearization approach and resampling methods?**

SUDAAN's handling of a single cluster within a stratum is based on the assumption that another cluster was in the sample but all data were missing. This is adequate for the case where only a few strata have single clusters. However, this method or any other method for handling singleton clusters in most strata depends heavily on certain assumptions and unless one is willing to accept those assumptions, one should not use such procedures. We have found that a better approach is to collapse strata to create pseudo strata so that each strata has at least two clusters. The creation can be based on subjective judgement about similarity of clusters; for example, in household samples one may use geographic proximity and urban rural character.

SUDAAN calculates the variance contribution for each stage of the design as the square of the difference between each unit's value and the mean of all the units within the stage. When only one sample unit is encountered within a stage, SUDAAN cannot calculate the variance contribution in this manner and will typically halt with an error message.

However, if you specify the MISSUNIT option on the NEST statement, then when only one sample unit is encountered in a stage, SUDAAN will estimate the variance contribution of that unit using the difference in that unit's value and the overall mean value for the population. For example, if you have a two-stage design and have specified a stratum and a primary sampling unit (PSU) variable on the NEST statement, then SUDAAN will abort with an error message if you have a stratum that contains only one PSU. If you specify MISSUNIT, SUDAAN will calculate the mean for the entire file and calculate the variance contribution for that unit as the difference in that unit's value and the overall mean.