Are you having difficulties installing SUDAAN? Get answers to common problems.
This message indicates that your default SASV9.CFG file has not been updated properly during the installation process. The complete path to the directory containing the SUDAAN .DLLs (Windows) or program files (Solaris) must be included in the –PATH statement at the end of the SASV9.CFG file. Note that if you have previous installations of SUDAAN, this reference should precede any older references, or SAS will continue to pick up the older version.
PATH ( other path path to latest SUDAAN program files path to older SUDAAN program files )
Not sure where your default SASV9.CFG file is located?
Many users have more than one SASV9.CFG file. Windows users: If you execute SAS from a shortcut, rightclick on the shortcut, choose Properties, and then select the Shortcut tab. In the Target edit box you will find the command line used to execute SAS. The parameter
config
gives the location of the default configuration file. If you execute SAS from the Run command, then directly examine the command line you use to determine the location of the default configuration file. Solaris users: check with the system manager to obtain the location of this file.
SUDAAN cannot find the SUDAAN.EXP file that should be located in the same directory with the SUDAAN program files. First determine whether this file has in fact been created in the SUDAAN program directory. If it is not present, you may need to rerun SUDRENEW. If it is present, then make sure that the SUDLIB environment variable has been correctly set in the line in the .cshrc or .login file:

CShell users:

Add the following line in the .cshrc or .login file:
setenv SUDLIB /chosen_path/SUDAAN_9/bin

Add the following lines in the .profile file:
SUDLIB= /chosen_path/SUDAAN_9/bin PATH=$PATH:$HOME/bin:/chosen_path/SUDAAN_9/bin export SUDLIB PATH

Add the following line in the .cshrc or .login file:
NOTE: You must logout and then login for this setting to take effect.
The Unix environment variable SUDLIB is not set correctly. In cases where the Unix environment is set up via scripts, these scripts override the variable setting performed during SUDAAN installation.
Modify the scripts you are executing so that the environment variable SUDLIB, required for the SASCallable version, is set correctly.
When you are asked to select your SUDAAN.EXP, open Programs, then SUDAAN, then Release 903, and one of the following folders should be in Release 903.
Directory Location of SUDAAN.EXP  SUDAAN Version 
Program Files\SUDAAN\Release90x\DOS  For DOS Version 
Program Files\SUDAAN\Release90x\WinInd  For Standalone Individual User Version 
Program Files\SUDAAN\Release90x\WinNet  For Standalone LAN User Version 
Program Files\SUDAAN\Release90x\SAS8Ind  For SAS8 Callable Individual User Version 
Program Files\SUDAAN\Release90x\SAS8Net  For SAS8 Callable LAN User Version 
Program Files\SUDAAN\Release90x\SAS9Ind  For SAS9 Callable Individual User Version 
Program Files\SUDAAN\Release90x\SAS9Net  For SAS9 Callable LAN User Version 
Select the appropriate folder that corresponds with the version of SUDAAN you are renewing. When you open that folder, your SUDAAN.EXP should be on the right. Click on it, and the OK button will be highlighted. You should then be able to continue.
SUDAAN is compatible with SAS 9.2. If you upgrade your SAS after SUDAAN has already been installed on your computer, you will need to remove and reinstall SUDAAN to update the SASV9.CFG file with the correct path. If you previously had 32 bit SAS and have upgraded to 64 bit SAS, you will need to get a 64 bit SUDAAN license to remain compatible. The SAS version and SUDAAN must match. The 32 bit and 64 bit SUDAAN Product Keys are not interchangeable, so you will need to transfer your SUDAAN license to the appropriate platform. There is no charge associated with this transfer for Annual Licenses.
Browse examples of SUDAAN programs using NCHS data, replicate weight designs, Taylor series designs and more.
SUDAAN is commonly used to analyze data from such NCHS studies as the National Health Interview Survey (NHIS), National Health and Nutrition Examination Survey (NHANES), National Survey of Family Growth (NSFG) and many more. The NCHS web site contains many examples. Below are some links to useful information:
Variance Estimation for Person Data Using the NHIS Public Use Person Data Tape, 1995. (pdf)
Sample Design, Sampling Weights, Imputation, and Variance Estimation in the 1995 National Survey of Family Growth (NSFG). This is a PDF document that includes several example programs using SUDAAN to analyze NSFG data. (Use Adobe Acrobat Reader to view.)
You can also search for further information and examples about using SUDAAN with NCHS and CDC data by using either the NCHS Web Search or the CDC Web Search sites. Enter SUDAAN as the search word and click SEARCH.
Click here to see our collection of SUDAAN examples. Each example is packaged as a downloadable selfextracting executable. Each executable contains both standalone and SAScallable versions of the program, data and output for the procedure
SUDAAN uses a robust variance estimator that properly accounts implicitly for any number of stages of nesting. This approach is used in all of the SUDAAN procedures and all variance estimates. Therefore, hypothesis tests and confidence intervals also account for all stages of nesting. The critical assumption is that the primary, or firststage, clusters are independent of each other. If this is true, then any number of nesting stages can occur within the primary clusters and SUDAAN will yield valid inferences.
In the above situation, you use the DESIGN=WR option on the PROC statement. With this option, you only need to identify the primary clusters. On the NEST statement, list any strata or blocks used to collect the data, followed by the primary clusters. For example, consider a dental experiment where measurements are taken on each tooth surface from all teeth from a set of patients. Here, there are 3 stages of nesting  tooth surfaces within teeth within patients. Patients are the primary clusters, which are independent of each other. Nested within each patient are teeth and surfaces within teeth. For such a study, you would usually use DESIGN=WR with "NEST _ONE_ PATIENT;". The SUDAAN keyword _ONE_ indicates that there was no stratification or blocks and all patients are in one stratum. Also, the variable PATIENT identifies each patient (the PSU) by taking on a common value for all of the observations from a single patient. The data must be sorted by PATIENT. This is all that SUDAAN needs to know in order to calculate valid variance estimates and hypothesis tests.
Other examples are:

Teratology studies with pups nested within litters.
 NEST _ONE_ LITTERID;

Educational studies with students nested within classrooms nested within schools. The schools were stratified by region of the country.
 NEST REGION SCHOOLID;

A clinical study with longitudinal repeated measurements nested within body sites (e.g., eyes), nested within patients.
 NEST _ONE_ PATIENT;
Yes! Beginning with Release 8.0 you can analyze data with Jackknife weights already computed on the main (or auxiliary) data set. Specify DESIGN=JACKKNIFE on the PROC statement and the Jackknife weights on the JACKWGTS statement. See Chapter 3 of the SUDAAN User’s Manual for details on using this new feature.
Get examples and information on using SAS data in Standalone SUDAAN
To read SAS data within Standalone SUDAAN you must create a SAS XPORT data set. This contains exactly the same information as the original data set, just stored in a different way. Here are the steps:
Sample SAS code for creating a dataset in XPORT format:
Libname out "c:\sasfiles\terata.xpt";
libname in "c:\sasfiles";
data out.terata;
set in.terata;
run;
This will create a data set TERATAXPT in XPORT format.
If you have a format library, you can create a format catalog, also in XPORT format as the following sample SAS code shows:
libname library "c:\sasfiles";
libname out "c:\sasfiles";
proc format library=library cntlout=out.fmtlib;
run;
libname xout xport "c:\sasfiles\teratalev.xpt";
data xout.teratalev;
set out.fmtlib;
run;
You can then use these files in SUDAAN as the following example SUDAAN code shows:
proc logistic filename="c:\sasfiles\terata.xpt" filetype=sasxport levfile="c:\sasfiles\teratalev.xpt" format dose_5 dose_5.;
...Additional logistic statements ...
Beginning with Release 7.5.4 for PCs and Release 7.5.5 for Solaris, you can write your Standalone SUDAAN output in a SASXPORT data set, saving all of the format labels in a companion SAS FORMAT CATALOG in SASXPORT format, and then import these into SAS for further analysis. The following example shows how to do this in Standalone SUDAAN:
proc logistic data="c:\sasfiles\terata" filetype=sasxport levfile="c:\sasfiles\terlev.xpt" noprint;
nest _one_ dam;
weight _one_;
subgroup dose_5;
format dose_5 dose_5.;
levels 5;
reflevel dose_5=1;
model dead=dose_5;
output / predicted=all filetype=sasxport filename="c:\sasfiles\terpred" levfile="c:\sasfiles\terprlev" replace;
The following set of SAS statements imports the format catalogue, format created by SUDAAN, as well as the output data set, terpred,and uses them in an analysis.
libname xin xport "c:\sasfiles\terprlev.stx";
libname out "c:\sasfiles";
data out.terprlev;
set xin1.terprlev;
proc format cntlin=out.terprlev;
run;
libname xdin xport "c:\sasfiles\terpred.stx";
proc descript data=xdin.terpred filetype=sasxport design=wr;
nest _one_ dam;
weight _one_;
var expected;
print nsum wsum mean semean;
run;
SAS has two export file types: XPORT (older) and CPORT. SUDAAN can only read the XPORT format, which is in the public domain. The CPORT format is a SAS proprietary format to which we do not have access. Are you sure you are using the XPORT engine to create the output file? The following sample SAS code shows how to convert a SAS dataset to XPORT format:
libname out xport "c:\sasfiles\testdat.xpt";
libname in "c:\sasfiles";
data out.testdat;
set in.testdat;
Beginning with Release 7.5.4 of standalone SUDAAN for PCs and release 7.5.5 of Standalone SUDAAN for Sun/Solaris, SUDAAN can directly read data files created by SAS in XPORT format, and can also write files in the SAS XPORT format. For both reading and writing these files simply use "FILETYPE=SASXPORT" on the PROC or OUTPUT statements. Note that the SASXPORT file type is the XPORT file type supported by SAS, which has been recently adopted by the FDA as a standard for data submissions. The structure of this file type is fully documented at SAS Institute's web site. This is an important new feature for users of Standalone SUDAAN who wish to use data from SAS or to do further analysis of SUDAAN results within SAS, since SAS no longer exports version 6.04 SAS data sets.
The SAS XPORT file format does not support the longer variable names and labels that are available in SAS beginning with Version 7. Within SAS you cannot write variables with long names and/or labels to an XPORT file. However in SUDAAN, you can do this since SUDAAN automatically creates a second file, also in XPORT format which links long and short forms of variable names and labels. On output to a SASXPORT file, SUDAAN truncates long variable names and labels in a unique way so that different variables always have distinct names up to the limit of 8 characters. Whenever there is any truncation, SUDAAN saves a separate data set, also in SASXPORT format, with records whose data link old and new variable names and labels. In addition, whether or not there are any long variable names or labels, SUDAAN takes preserves any labels associated with the variables in a separate SASXPORT file. This file has the same structure as a SAS format catalog which has been saved in XPORT format. This means that SAS users can easily import their labels from SUDAAN into SAS along with their data.
To support these changes, two new options are now available on both the PROC and OUTPUT statements when FILETYPE=SASXPORT is used:
NAMEFILE=filename  
On the PROC statement use this parameter to specify the name of a file containing records associating long variable names and labels with their shortened versions on an input file in SASXPORT format. If this parameter is not supplied, SUDAAN will look for a file named . If SUDAAN cannot find a name file, whether the name is supplied explicitly or by default, or if the (long) variables used within the SUDAAN program are not on the given name file, then SUDAAN will not recognize them when reading the main data file in SASXPORT format.
On the OUTPUT statement use this parameter to specify the name of a file in SASXPORT format to contain any records associating long variable names and labels with their shortened versions on the main output file. If this parameter is not supplied, and there are one or more variable names or labels which have to be truncated, SUDAAN will use the name NAMEFILE.STX. Note, however, that in no case will SUDAAN overwrite a file already using the specified filename unless the REPLACE option is present on the OUTPUT statement. 

LEVFILE=filename  
On the PROC statement use this parameter to specify the name of a file containing records with SASstyle format information. If this parameter is not supplied, SUDAAN will look for a file named LEVFILE.STX. If SUDAAN cannot find a level file, whether the name is supplied explicitly or by default, then SUDAAN will not be able to use formats named on FORMAT statements within your SUDAAN program.
On the OUTPUT statement use this parameter to specify the name of a file in SASXPORT format to contain SASstyle formats to be associated with variables on the main data set. If this parameter is not supplied, and there are one or more variables with associated level labels, SUDAAN will use the name LEVFILE.STX. Note, however, that in no case will SUDAAN overwrite a file already using the specified filename unless the REPLACE option is present on the OUTPUT statement. SUDAAN will create a name for each format which is at most 8 characters in length, contains only '_', and alphanumeric characters, and does not begin with a numeric digit. This name will be as close to the name of the variable as possible. 

There are two common extensions for files in SAS XPORT format: '.stx' and '.xpt'. To make sure SUDAAN reads or creates the file with the extension you intend, include it in the file specification in any of the following parameters on the PROC statement  
DATA PSUDATA REPDATA NAMEFILE LEVFILE 

and include it in the file specifications in the following parameters on the OUTPUT statement:  
FILENAME NAMEFILE LEVFILE 
If you do not specify an extension, SUDAAN will use the extension '.stx' by default on input and on output. On input, if SUDAAN supplies '.stx' and does not find an input data set by the given name, it will try supplying the extension '.xpt' before giving up.
Get examples and information on using SAS data in SASCallable SUDAAN
SAS users will want to get the SASCallable version of SUDAAN, which is installed as an addon to SAS on your Windows or Sun/Solaris computer system. Thus, you execute SUDAAN procedures within your SAS programs just like any SAS procedures. SUDAAN can read and write any data file that your SAS system can read or write. In addition, you can use the SAS Macro Facility, and formats created by SAS PROC FORMAT within your SUDAAN procedures. This makes all SUDAAN procedures as easy to use as SAS procedures. However, all versions of SUDAAN are developed and sold by the Research Triangle Institute. Please see Order to obtain SUDAAN ordering information. SAS is a product of the SAS Institute, Inc.
SASCallable SUDAAN cannot currently accept the SAS ODS options for output. This is because support for this option is not currently available in SAS’s Toolkit software we use to produce the SASCallable version of SUDAAN.
SAS has a procedure named LOGISTIC, so there is a name conflict. To use SUDAAN's LOGISTIC procedure in SAS you must use the procedure alias, RLOGIST. There are other syntax changes you must make when you use SASCallable SUDAAN. There is a complete list in the appendix of the SUDAAN manual.
Get examples and information on using SPSS data in Standalone SUDAAN
Yes! The Standalone version of SUDAAN Release 7.5 and above on PCs and Solaris can read SPSS data files. Thus, you can use SPSS to manage your data files and easily use SUDAAN to analyze your SPSS data sets. With SUDAAN, you can analyze your SPSS data sets while properly accounting for the complex sampling plan used to collect your data. SUDAAN allows you to apply survey data analysis methods, generalized estimating equations (GEE), linear regression, multinomial logistic regression, loglinear regression for count data, survival analysis (Cox regression) and descriptive statistics, all for clustercorrelated or longitudinal data, which are not available directly from SPSS.
Beginning with Release 8.0.0 of SUDAAN, Standalone PC versions of SUDAAN can write SPSS data sets. Simply specify FILETYPE=SPSS on the OUTPUT statement. Thus you can directly import your SUDAAN results into SPSS for further analysis. Standalone Solaris users of SUDAAN will need to create a text output file (FILETYPE=ASCII) and then import this into SPSS.
Standalone SUDAAN (versions 7.5 and later) can read data sets from SPSS Windows Versions 5.x, and above. Beginning with Release 8.0, standalone PC versions of SUDAAN can also write SPSS Windows data sets. No documentation files are required, although SUDAAN allows you to supply a LEVEL file for labeling the levels of categorical variables. Indicate to SUDAAN that your input and/or output data are an SPSS data set by using the FILETYPE=SPSS parameter on the PROC and OUTPUT statements.
SUDAAN is dependent on the SPSSsupplied SPSSIO32.DLL executable file and SPSSIO32.LIB library file for the routines which read and write the SPSS data files. The versions of these files linked with Release 10 of SUDAAN support reading and writing of SPSS data through Version 17 of SPSS. They may or may not support reading of higher numbered versions of SPSS data in the future. If your release of SUDAAN is unable to read your SPSS data set, try saving it from SPSS as an earlier version file type before using it in SUDAAN.
Are you getting a warning or error message you don’t understand? Get more information.
SAS has a procedure named LOGISTIC. To use SASCallable SUDAAN’s LOGISTIC procedure you must use the procedure alias, RLOGIST.
There are some models and data for which one or more of the parameters are logically infinite. Correspondingly, the probabilities for some or all of the observations become 0 or 1. The process cannot converge in these cases, and may produce floating point divide errors, exponential overflow errors, or other unpredictable results. SUDAAN now analyzes the data before fitting the model in PROCs LOGISTIC and MULTILOG and removes records which are logically associated with infinite betas. In some cases where the number of observations in a cell in the table created by crossing the response variable with the independent effect is very small but nonzero, even removing these records may not completely alleviate the problem. This is the reason for this warning. We recommend that in cases such as this you first run PROC CROSSTAB with a table of the form DEPVAR*(independent effects), and print NSUM and WSUM for each cell. Consider removing observations associated with nearzero cells in the table, and /or removing these terms from the model.
Example:
Suppose you have the following statements in PROC LOGISTIC, and you are getting the warning message above.

PROC LOGISTIC DATA=mydat FILETYPE=SAS DESIGN=WR;
NEST STR PSU;
WEIGHT WGT;
SUBGROUPS A B C;
LEVELS 2 3 4;
MODEL Y = A B A*C;
Execute the following PROC CROSSTAB:

PROC CROSSTAB DATA=mydat FILETYPE=SAS DESIGN=WR;
NEST STR PSU;
WEIGHT WGT;
RECODE Y = (0 1);
SUBGROUPS A B C Y;
LEVELS 2 3 4 2;
TABLES Y*(A B A*C);
PRINT NSUM WSUM;
This message is printed during execution of the modeling procedures when the rank of the estimation matrix (MODELCOV in REGRESS, LOGISTIC, LOGLINK, and MULTILOG) is less than the maximum number of estimable parameters for the model. Consider eliminating one variable at a time from the model to determine which variable(s) are causing the problem.
During the processing of your data, SUDAAN encountered a stratum that contains only one unique PSU. In this example, SUDAAN determined that the stratum coded as STRVAR=2 contained only one PSU, coded PSUVAR=17.
Two or more PSUs are needed in each stratum in order for SUDAAN to be able to estimate variance. You can verify whether this is the case by looking at a crosstabulation of your strata and PSU variables.
If each of your strata contains two or more unique PSUs, then you may need to sort your data in the order given by the variables on the NEST statement. This sorting needs to be done prior to running SUDAAN.
If one of your strata does contain only one PSU, then the following are possible solutions:

1. If you subsetted the data in order to obtain estimates for a specific domain, you should instead try using SUDAAN's SUBPOPN statement to specify the domain. You should not subset your data prior to running SUDAAN since this can result in a loss from the data file of part of the design.
2. Manually combine or collapse the stratum that contains only one PSU with another stratum. This should be done prior to running SUDAAN. You may want to consult with a statistician prior to combining strata.
3. Use the MISSUNIT option on the NEST statement. This option causes the variance contribution of the PSU to be estimated as the difference between the PSU's value and the overall mean value for the population. Chapter 3 of the SUDAAN User's Manual gives further information about this option. You may want to consult with a statistician prior to using this option.
This warning indicates that for your data there is a possible problem with the inversion of the X'X matrix. The model that you specify yields an X matrix that is not numerically stable.
The estimate of the variancecovariance matrix for all 40 parameters is likely to be inaccurate and misleading. The true variancecovariance matrix will have rank=40, whereas the estimated matrix will have at most rank=32. Your data do not support such a large model.
The message "No Data on File" usually indicates that SUDAAN has not found any valid data on file. Remember that SUDAAN rejects records on which the WEIGHT variable is nonpositive. In addition, in the modeling procedures SUDAAN rejects all records on which any model variable (left or righthand side) is missing. SUBGROUP variables outside the range of 1…LEVEL are considered missing.
Suppose you have the following program and you are getting the "No Data on File" Message:

PROC MULTILOG DATA=in.test FILETYPE=SAS DESIGN=WR;
NEST STRATUM PSU;
WEIGHT WGT;
SUBGROUP SEX AGEGRP EDUC OVERWT;
LEVELS 2 6 5 2;
MODEL OVERWT = SEX AGEGRP EDUC;
PRINT BETA SEBETA;
You can check to see whether you have these sorts of problems by executing a program similar to the following RECORDS procedure:
 PROC RECORDS DATA=in.test FILETYPE=SAS CONTENTS COUNTREC;
/* SUBPOPN statement selects only records which SUDAAN can use in the model */
 SUBPOPN WGT>0 & SEX<0 & SEX<=2 & AGEGRP>0 & AGEGRP<=6 & EDUC>0 & EDUC<=5 OVERWT>0 & OVERWT<=2;
/* Print the first few valid records */
 PRINT / MAXREC=20;
If printing the data does not clarify the problem, then send the program and data (zipped!) to sudaan@rti.org so that we can help you.
Are your results not what you expected? Get more information.

If you are analyzing data from a complex sample survey, you will likely get different results in SUDAAN vs. other packages. First, if you cannot use the survey sampling weights in other packages, the point estimates will be different. Some packages allow a WEIGHT statement, and that will ensure that the point estimates are the same between SUDAAN and most other packages. Point estimates are generally biased if the survey sampling weights cannot be utilized. In addition, the variances, standard errors, tests of hypotheses, and pvalues will still be different, even when weights are utilized. This is because SUDAAN allows the user to specify the sampling design and thereby compute a robust variance estimate, yielding valid inferences. If another package does not allow for specification of the complex sampling design (stratification, clustering, etc.), then variance estimation, and hence test statistics and pvalues, will be wrong. Usually, this results in variances that are too small and falsepositive tests of hypothesis.

In some procedures different estimates as well as different standard errors may be due to different tolerances for matrix inversion. Try changing the value of the TOL parameter on the PROC statement.

In the iterative regression procedures, different estimates may be due to a different number of iterations. Try changing the values of MAXITER, EPSILON and / or P_EPSILON on the PROC statement.
The ******* indicates that the default field width is not large enough for the result. Suppose, for instance, you find **** in the output from one of the descriptive procedures where you requested WSUM . You can add something similar to the following to your PRINT statement after the slash:
where w is the overall field width you desire and d is the number of decimal places. You should choose w large enough to accommodate the number of decimal places d, the decimal point, and enough digits to the left of the decimal to contain the largest value.
The large number of records on your data set may be the cause of the problem. The large size reduces the precision of sums of squares and cross products, which are accumulated in order to estimate the parameters. In this case, the roundoff errors may be larger than the default tolerance for matrix inversion (TOL=1e6). We suggest that you supply a larger tolerance on the PROC statement (TOL=1e5 for example) and rerun the job.
SUDAAN is unable to estimate any quantile that is less than or equal to the percentage of data accounted for by the 0 values. This will happen for any variable where the smallest value of that variable has ties.
It makes a difference any time parts of the sampling design (e.g., an entire PSU) are lost after subsetting the data. SUDAAN needs the entire design present in order to estimate variances correctly. In most cases, it will make a difference. The difference shows up in the variance estimation and hypothesis testing.
Here is how the SUBPOPN statement works. Imagine a new variable named ELIGIBLE which is equal to 1. If the observation is to be included in the analysis through SUBPOPN, and ELIGIBLE is equal to 2, it is not included in the analysis. If this variable is used on the SUBGROUP statement with the corresponding LEVELS equal to 2, and also crossed with every term on a TABLES statement, then it will produce results for both levels ELIGIBLE=1 and ELIGIBLE=2. The use of SUBPOPN ELIGIBLE=1 will produce results that are identical to the results for the cell for "ELIGIBLE=1" when both levels are analyzed.
If you instead subset the population outside of SUDAAN and then analyze the data using SUDAAN, the results may be different in the two analyses. One case for which the results will be the same is when "DESIGN=WR" and the subset contains al least one observation (with positive weight) in each of the original PSUs.
In conclusion, the safe (therefore preferred) approach is to use SUBPOPN, and not subset the data prior to using SUDAAN.
You can use "2 log likelihood" to evaluate the relative fit of two models, but not the absolute pvalue to test a hypothesis, since we do not know the distribution of the likelihood for complex samples.
Both methods are good large sample approximations. Here "large" refers to the number of PSUs (Primary Sampling Units), not the number of observations. Which to use is a matter of preference. There is no evidence that one method is superior to the other in general.
You can use either SEMETHOD=ZEGER or SEMETHOD=BINDER to obtain the robust variance. BINDER is most often used in complex sample surveys. ZEGER is most often used in randomized experiments and nonsurvey applications. In many cases, ZEGER and BINDER are identical.
Use SEMETHOD=MODEL to obtain the modelbased or "naive" variance estimate. This estimate assumes that exchangeable intracluster correlations (R=EXCHANGEABLE) are correct. This is the most efficient variance estimate when the "working" correlation assumption (R=EXCHANGEABLE or R=INDEPENDENT) is correct. SEMETHOD=MODEL is most often used with randomized experiments and other nonsurvey applications.
Do not list independent variables that are coded as 01 on the SUBGROUP statement. Values of 0 are treated as missing for variables that are listed on the SUBGROUP statement and will be excluded from your analysis. Independent variables coded 01 may be placed on the CLASS statement if you wish to treat them as categorical, or you can enter them into the model as is.
First, you can use the ttests that are printed by SUDAAN to test H_{0}: Beta=0. Tests of the betas=0 are equivalent to testing for differences in LSMEANS. These ttests automatically compare each level of the categorical covariates to the reference cells. You can also use the EFFECTS statement to compare other specific levels of the categorical covariates, and that is also equivalent to comparing LSMEANS.
I calculated percentiles for a duration variable (number of minutes walked) using:
var walkdur;
tables _one_
percentile 10 25 50 75 90;
And then using a cutoff of 30 minutes, I created a dichotomous variable and calculated using proc crosstab the proportion who walked for 30 or more minutes
Percentile results were:
29.1 min.  25th
34.5 min.  50th
59.3 min.  75th
77.2 min.  90th
I calculated an estimate of 79% walking for 30 minutes or more using a dichotomous variable. However, from the percentiles, one would expect that less than 75% walked for 30 min. or more.
If you have a large percentage of ties, then when SUDAAN interpolates between successive values, you may see the type of behavior that you have indicated.
SUDAAN is interpolating between 30 and the value just below it in order to estimate the 25th percentile. The value closest to but less than 30 that occurs for the walkdur variable is 29. Values at or below 29 account for, roughly, 22% of the data. Since the number 30 is , roughly, 28% of the data, SUDAAN assumes this percentage is distributed between 29 and 30. So, 29 is the 22nd percentile and 30 is the 22+28=50th percentile. In order to find where between 29 and 30 the 25th percentile occurs, we need to find how far away from 29 that the 25th percentile occurs. In order to find the amount to add to 29, solve this equation for x:
Get answers to common "how to" questions.
You can include any number of additional identification variables on the PREDICTED output data set by using the IDVAR statement in your procedure. These extra variables can be used to uniquely connect the predicted values to the original data set.
You can place the variable on a CLASS statement with the options SORT=INTERNAL and DIR=DESCENDING. Thus for example, to convert the 01 variable YESNO to a 12 variable with the 0's flipped to 2's, use the following:
CLASS YESNO / SORT=INTERNAL DIR=DESCENDING;
You can create a new variable whose value on each record is the product of the values of the two continuous variables,(e.g. AB=A*B) and then use this variable in your MODEL statement. Currently, you must do this data manipulation outside of SUDAAN.
Here is one possible way to accomplish this in SUDAAN. Create a new variable A with values 1, 2, 3, and 4 to designate the particular subpopulation to which the observation belongs. Suppose you wish to compare the means for the variable Y. The following SUDAAN program will do this:
PROC REGRESS DATA=<data set> DESIGN=<design>;
NEST < nest variables>;
WEIGHT <weight variable>;
SUBGROUP A;
LEVELS 4;
MODEL Y=A;
The test of hypothesis for the effect of A is the same as the test of equality of subgroup means.
SUDAAN does not directly implement backward or stepwise regression. To run a backward regression using SUDAAN you must sequentially remove one variable from your model, and rerun your job. To run a forward regression, successively add one variable to your model, and rerun your job.
You can effectively perform an ANOVA by using the linear regression (REGRESS) model in SUDAAN. It works very much like the GLM procedure in SAS. You specify the categorical covariates (coded 1,2,3,...) on the SUBGROUP and LEVELS statements, the dependent variable on the lefthand side of the MODEL statement, and all independent variables (categorical and continuous) on the righthand side of the MODEL statement.
SUBGROUP X;
LEVELS 4;
MODEL Y=X Z;
Here X is categorical with 4 levels (coded 1,2,3,4), Y is the dependent variable, and Z is a continuous covariate. X will be modeled using dummy variables (one for each level of X), and Z will be modeled with one regression coefficient (the slope of Z). By default, the last level of each of the categorical covariates is used as the reference cell for the covariate. You can change the reference cell of any categorical covariate by using the REFLEVEL statement.
Beginning with Release 9.0.0, in LOGISTIC, SUDAAN computes the following HosmerLemeshow type statistics:
 A Wald F test with numerator degrees of freedom equal to the rank of the variancecovariance matrix (usually G1) and denominator degrees of freedom equal to the (Number of PSUs  number of strata) for Taylor series and Delete1 jackknife designs, and (Number of replicates) for BRR and Replicate weight jackknife designs.
 A simple Chisquare test, which is a weighted analog of the original HosmerLemeshow test. The degrees of freedom are G2.
 For Taylor series designs, the Satterthwaite adjusted Ftest, degrees of freedom and pvalue are also provided.
There are two new options HLGROUPS=count and HLVAR=variable on the MODEL statement. HLGROUPS allows you to specify the number of groups of residuals to form. By default LOGISITC forms 10 groups. Note however that LOGISTIC will not form more groups than are supported by the data. HLVAR permits you to specify a variable on the input data set which gives the group number to associated with each residual. You may use at most one of these options.
LOGISTIC has two new output groups. HLGROUPS contains information on the residual groups formed. HLTEST contains all of the available test statistics. See your SUDAAN Language Manual, Chapter 10 for details.
The concept of "mean square error" is defined only for the case of simple random samples. There is no equivalent definition for complex survey data. For computing the variance of a predicted value for given X:
PREDICTED Y = B'X;
VARIANCE(PREDICTED Y) = X' {V(B)} X;
SUDAAN prints out the variancecovariance matrix V(B) of the estimated coefficients B. You can use this to compute the variance of any predicted value.
You can get the equivalent of "adjusted means" for logistic regression and other nonlinear models using the PREDMARG and CONDMARG statements. You can test hypotheses and form general linear contrasts among the marginals using the PRED_EFF and COND_EFF statements. See the LOGISTIC chapter of the SUDAAN User's Manual for more details.
SUDAAN 9.0.0 and above provides confidence intervals for proportions in the CROSSTAB procedure. These are printed by default as part of the TABLECELL group. The confidence limits for row percent (ROWPER) are LOWROW and UPROW. The confidence limits for column percent (COLPER) are LOWCOL and UPCOL. The confidence limits for total percent (TOTPER) are LOWTOT and UPTOT.
There needs to be some work done prior to running SURVIVAL in order to use timedependent covariates. The method that is assumed one is using to handle timevarying covariates follows the work of Anderson and Gill (1982, "Cox's Regression Model Counting Process: A Large Sample Study," Annals of Statistics, vol. 10, pp. 11001120) who developed the notion of the countingstyle process of inputs.
Suppose you have data with timedependent covariates and the context is survival analysis of people where you are seeing how long people survive over time and there is the possibility that the predictor (independent) variables are timevarying. For the sake of discussion, suppose you follow people from birth to death and you have recorded the weight of each person at varying points in time, say every year of their life on December 31st.
The countingstyle process of input requires each person have a record every time the independent variable value changes, so, here is a "pseudo" case. Note that Time1 and Time2 are the left and right endpoints of the time interval over which the WEIGHT (timedependent covariate) was constant.
PersonID  Weight  time1  time2  Survive? (Indicates survival) 
1  7  0  1  1 
1  15  1  2  1 
1  25  2  3  1 
1  35  3  4  1 
1  40  4  5  1 
1  42  5  6  1 
If you have more than one timedependent covariate, the construction of the multiplicities of records gets trickier.
With timedependent covariates in general, what must be done is to take the interval of time over which each person is followed and break up the interval into periods of time where the timedependent covariates are all CONSTANT.
In the above example, suppose in each year that a person's height changed midyear (it was constant on the first half of the year and constant on the second half but not the same value in both halves). Then, you would have to take each record given above and split it into two records, one where the height represents the value in the first half of the year and one where the height represents the value in the second half of the year. Below we show this for the first record given above:
PersonID  Weight  time1  time2  Survive?  Height 
1  7  0  .5  1  18 inches 
1  7  .5  1  1  20 inches 
This is how you would handle the case of two timedependent covariates.
Once the data is set up outside of SUDAAN, you treat the data in SURVIVAL as if the covariates are NOT timevarying, like so (but still using the counting process style of input):
Model time1 time2 = weight height;
All you specify are the two variables that record the time points of the left and right endpoints of each record's "time interval".
Get answers to your questions about the theory behind SUDAAN’s calculations.
SUDAAN's handling of a single cluster within a stratum is based on the assumption that another cluster was in the sample but all data were missing. This is adequate for the case where only a few strata have single clusters. However, this method or any other method for handling singleton clusters in most strata depends heavily on certain assumptions and unless one is willing to accept those assumptions, one should not use such procedures. We have found that a better approach is to collapse strata to create pseudo strata so that each strata has at least two clusters. The creation can be based on subjective judgement about similarity of clusters; for example, in household samples one may use geographic proximity and urban rural character.
SUDAAN calculates the variance contribution for each stage of the design as the square of the difference between each unit's value and the mean of all the units within the stage. When only one sample unit is encountered within a stage, SUDAAN cannot calculate the variance contribution in this manner and will typically halt with an error message.
However, if you specify the MISSUNIT option on the NEST statement, then when only one sample unit is encountered in a stage, SUDAAN will estimate the variance contribution of that unit using the difference in that unit's value and the overall mean value for the population. For example, if you have a twostage design and have specified a stratum and a primary sampling unit (PSU) variable on the NEST statement, then SUDAAN will abort with an error message if you have a stratum that contains only one PSU. If you specify MISSUNIT, SUDAAN will calculate the mean for the entire file and calculate the variance contribution for that unit as the difference in that unit's value and the overall mean.