FAQs

Are you having difficulties installing SUDAAN? Get answers to common problems.

1: I installed SAS-Callable SUDAAN, and now I am trying to execute a SAS program, but I get the message "PROC CROSSTAB not found."

This message indicates that your default SASV9.CFG file has not been updated properly during the installation process. The complete path to the directory containing the SUDAAN .DLLs (Windows) or program files (Solaris) must be included in the –PATH statement at the end of the SASV9.CFG file. Note that if you have previous installations of SUDAAN, this reference should precede any older references, or SAS will continue to pick up the older version.

	-PATH (
		other path
		path to latest SUDAAN program files
		path to older SUDAAN program files
	)
	

Not sure where your default SASV9.CFG file is located?

Many users have more than one SASV9.CFG  file. Windows users: If you execute SAS from a shortcut, right-click on the shortcut, choose Properties, and then select the Shortcut tab. In the Target edit box you will find the command line used to execute SAS. The parameter

	-config 
	

gives the location of the default configuration file. If you execute SAS from the Run command, then directly examine the command line you use to determine the location of the default configuration file. Solaris users: check with the system manager to obtain the location of this file.


2: (Solaris SAS-Callable) I installed SUDAAN just fine, and executed SUDRENEW with no problems, but now I get a message "You are using an unlicensed copy of SUDAAN" when I try to run other procedures.

SUDAAN cannot find the SUDAAN.EXP file that should be located in the same directory with the SUDAAN program files. First determine whether this file has in fact been created in the SUDAAN program directory. If it is not present, you may need to rerun SUDRENEW. If it is present, then make sure that the SUDLIB environment variable has been correctly set in the line in the .cshrc or .login file:

CShell users:
Add the following line in the .cshrc or .login file:

	setenv SUDLIB /chosen_path/SUDAAN_9/bin
Korn/Bourne shell users:
Add the following lines in the .profile file:

	SUDLIB= /chosen_path/SUDAAN_9/bin
	PATH=$PATH:$HOME/bin:/chosen_path/SUDAAN_9/bin
	export SUDLIB PATH

NOTE: You must logout and then login for this setting to take effect.


3: (Solaris SAS-Callable only) I have installed SUDAAN and have set the environment variable SUDLIB correctly as described the Installation guide. I have also modified the SAS config file, sasv9.cfg with the new SUDAAN path. However, when I execute a program that calls a SUDAAN procedure I get an error message about the wrong message file and the program halts.

The Unix environment variable SUDLIB is not set correctly. In cases where the Unix environment is set up via scripts, these scripts override the variable setting performed during SUDAAN installation.

Modify the scripts you are executing so that the environment variable SUDLIB, required for the SAS-Callable version, is set correctly.


4: I can’t select my SUDAAN.EXP to renew my SUDAAN Release 9 license.

When you are asked to select your SUDAAN.EXP, open Programs, then SUDAAN, then Release 903, and one of the following folders should be in Release 903.

Directory Location of SUDAAN.EXP SUDAAN Version
Program Files\SUDAAN\Release90x\DOS For DOS Version
Program Files\SUDAAN\Release90x\WinInd For Standalone Individual User Version
Program Files\SUDAAN\Release90x\WinNet For Standalone LAN User Version
Program Files\SUDAAN\Release90x\SAS8Ind For SAS8 Callable Individual User Version
Program Files\SUDAAN\Release90x\SAS8Net For SAS8 Callable LAN User Version
Program Files\SUDAAN\Release90x\SAS9Ind For SAS9 Callable Individual User Version
Program Files\SUDAAN\Release90x\SAS9Net For SAS9 Callable LAN User Version

Select the appropriate folder that corresponds with the version of SUDAAN you are renewing. When you open that folder, your SUDAAN.EXP should be on the right. Click on it, and the OK button will be highlighted. You should then be able to continue.


5: I upgraded my SAS, and now my SUDAAN doesn't work.

SUDAAN is compatible with SAS 9.2.  If you upgrade your SAS after SUDAAN has already been installed on your computer, you will need to remove and reinstall SUDAAN to update the SASV9.CFG file with the correct path.  If you previously had 32 bit SAS and have upgraded to 64 bit SAS, you will need to get a 64 bit SUDAAN license to remain compatible.  The SAS version and SUDAAN must match.  The 32 bit and 64 bit SUDAAN Product Keys are not interchangeable, so you will need to transfer your SUDAAN license to the appropriate platform.  There is no charge associated with this transfer for Annual Licenses. 

Browse examples of SUDAAN programs using NCHS data, replicate weight designs, Taylor series designs and more.

1: Can I use SUDAAN to analyze data from National Center for Health Statistics (NCHS)?

SUDAAN is commonly used to analyze data from such NCHS studies as the National Health Interview Survey (NHIS), National Health and Nutrition Examination Survey (NHANES), National Survey of Family Growth (NSFG) and many more. The NCHS web site contains many examples. Below are some links to useful information:

Variance Estimation for Person Data Using SUDAAN and the National Health Interview Survey (NHIS) Public-Use Person Data Files, 1987-94

Variance Estimation for Person Data Using the NHIS Public Use Person Data Tape, 1995. (pdf)

Sample Design, Sampling Weights, Imputation, and Variance Estimation in the 1995 National Survey of Family Growth (NSFG). This is a PDF document that includes several example programs using SUDAAN to analyze NSFG data. (Use Adobe Acrobat Reader to view.)

You can also search for further information and examples about using SUDAAN with NCHS and CDC data by using either the NCHS Web Search or the CDC Web Search sites. Enter SUDAAN as the search word and click SEARCH.


2: Are there SUDAAN examples that I can look at?

Click here to see our collection of SUDAAN examples. Each example is packaged as a downloadable self-extracting executable. Each executable contains both standalone and SAS-callable versions of the program, data and output for the procedure


3: I have an experimental study with 2 (or more) stages of nesting. Can SUDAAN properly analyze my data?

SUDAAN uses a robust variance estimator that properly accounts implicitly for any number of stages of nesting. This approach is used in all of the SUDAAN procedures and all variance estimates. Therefore, hypothesis tests and confidence intervals also account for all stages of nesting. The critical assumption is that the primary, or first-stage, clusters are independent of each other. If this is true, then any number of nesting stages can occur within the primary clusters and SUDAAN will yield valid inferences.

In the above situation, you use the DESIGN=WR option on the PROC statement. With this option, you only need to identify the primary clusters. On the NEST statement, list any strata or blocks used to collect the data, followed by the primary clusters. For example, consider a dental experiment where measurements are taken on each tooth surface from all teeth from a set of patients. Here, there are 3 stages of nesting -- tooth surfaces within teeth within patients. Patients are the primary clusters, which are independent of each other. Nested within each patient are teeth and surfaces within teeth. For such a study, you would usually use DESIGN=WR with "NEST _ONE_ PATIENT;". The SUDAAN keyword _ONE_ indicates that there was no stratification or blocks and all patients are in one stratum. Also, the variable PATIENT identifies each patient (the PSU) by taking on a common value for all of the observations from a single patient. The data must be sorted by PATIENT. This is all that SUDAAN needs to know in order to calculate valid variance estimates and hypothesis tests.

Other examples are:

  1. Teratology studies with pups nested within litters.
    • NEST _ONE_ LITTERID;  
  2. Educational studies with students nested within classrooms nested within schools. The schools were stratified by region of the country.
    • NEST REGION SCHOOLID;
  3. A clinical study with longitudinal repeated measurements nested within body sites (e.g., eyes), nested within patients.
    • NEST _ONE_ PATIENT;

4: I have a data set with Jackknife weights already on it. Can I use the JACKKNIFE design in SUDAAN to analyze this data?

Yes! Beginning with Release 8.0 you can analyze data with Jackknife weights already computed on the main (or auxiliary) data set. Specify DESIGN=JACKKNIFE on the PROC statement and the Jackknife weights on the JACKWGTS statement. See Chapter 3 of the SUDAAN User’s Manual for details on using this new feature.

Get examples and information on using SAS data in Standalone SUDAAN

1: How can I read my SAS data in Standalone SUDAAN?

To read SAS data within Standalone SUDAAN you must create a SAS XPORT data set. This contains exactly the same information as the original data set, just stored in a different way. Here are the steps:

Sample SAS code for creating a dataset in XPORT format:

Libname out "c:\sasfiles\terata.xpt";
libname in "c:\sasfiles";
data out.terata;
set in.terata;
run;

This will create a data set TERATAXPT in XPORT format.

If you have a format library, you can create a format catalog, also in XPORT format as the following sample SAS code shows:

libname library "c:\sasfiles";
libname out "c:\sasfiles";
proc format library=library cntlout=out.fmtlib;
run;

libname xout xport "c:\sasfiles\teratalev.xpt";
data xout.teratalev;
set out.fmtlib;
run;

You can then use these files in SUDAAN as the following example SUDAAN code shows:

proc logistic filename="c:\sasfiles\terata.xpt" filetype=sasxport levfile="c:\sasfiles\teratalev.xpt" format dose_5 dose_5.;

...Additional logistic statements ...


2: Is there any way to write my Standalone SUDAAN output in a format easily read by SAS?

Beginning with Release 7.5.4 for PCs and Release 7.5.5 for Solaris, you can write your Standalone SUDAAN output in a SASXPORT data set, saving all of the format labels in a companion SAS FORMAT CATALOG in SASXPORT format, and then import these into SAS for further analysis. The following example shows how to do this in Standalone SUDAAN:

proc logistic data="c:\sasfiles\terata" filetype=sasxport levfile="c:\sasfiles\terlev.xpt" noprint;
nest _one_ dam;
weight _one_;
subgroup dose_5;
format dose_5 dose_5.;
levels 5;
reflevel dose_5=1;
model dead=dose_5;
output / predicted=all filetype=sasxport filename="c:\sasfiles\terpred" levfile="c:\sasfiles\terprlev" replace;

The following set of SAS statements imports the format catalogue, format created by SUDAAN, as well as the output data set, terpred,and uses them in an analysis.

libname xin xport "c:\sasfiles\terprlev.stx";
libname out "c:\sasfiles";
data out.terprlev;
set xin1.terprlev;
proc format cntlin=out.terprlev;
run;
libname xdin xport "c:\sasfiles\terpred.stx";
proc descript data=xdin.terpred filetype=sasxport design=wr;
nest _one_ dam;
weight _one_;
var expected;
print nsum wsum mean semean;
run;


3: I created a SAS export data set, but SUDAAN is telling me it isn’t a valid XPORT data set.

SAS has two export file types: XPORT (older) and CPORT. SUDAAN can only read the XPORT format, which is in the public domain. The CPORT format is a SAS proprietary format to which we do not have access. Are you sure you are using the XPORT engine to create the output file? The following sample SAS code shows how to convert a SAS dataset to XPORT format:

libname out xport "c:\sasfiles\testdat.xpt";
libname in "c:\sasfiles";
data out.testdat;
set in.testdat;


4: Can I use long variable names and labels with SASXPORT data?

Beginning with Release 7.5.4 of standalone SUDAAN for PCs and release 7.5.5 of Standalone SUDAAN for Sun/Solaris, SUDAAN can directly read data files created by SAS in XPORT format, and can also write files in the SAS XPORT format. For both reading and writing these files simply use "FILETYPE=SASXPORT" on the PROC or OUTPUT statements. Note that the SASXPORT file type is the XPORT file type supported by SAS, which has been recently adopted by the FDA as a standard for data submissions. The structure of this file type is fully documented at SAS Institute's web site. This is an important new feature for users of Standalone SUDAAN who wish to use data from SAS or to do further analysis of SUDAAN results within SAS, since SAS no longer exports version 6.04 SAS data sets.

The SAS XPORT file format does not support the longer variable names and labels that are available in SAS beginning with Version 7. Within SAS you cannot write variables with long names and/or labels to an XPORT file. However in SUDAAN, you can do this since SUDAAN automatically creates a second file, also in XPORT format which links long and short forms of variable names and labels. On output to a SASXPORT file, SUDAAN truncates long variable names and labels in a unique way so that different variables always have distinct names up to the limit of 8 characters. Whenever there is any truncation, SUDAAN saves a separate data set, also in SASXPORT format, with records whose data link old and new variable names and labels. In addition, whether or not there are any long variable names or labels, SUDAAN takes preserves any labels associated with the variables in a separate SASXPORT file. This file has the same structure as a SAS format catalog which has been saved in XPORT format. This means that SAS users can easily import their labels from SUDAAN into SAS along with their data.

To support these changes, two new options are now available on both the PROC and OUTPUT statements when FILETYPE=SASXPORT is used:

NAMEFILE=filename
 

On the PROC statement use this parameter to specify the name of a file containing records associating long variable names and labels with their shortened versions on an input file in SASXPORT format. If this parameter is not supplied, SUDAAN will look for a file named . If SUDAAN cannot find a name file, whether the name is supplied explicitly or by default, or if the (long) variables used within the SUDAAN program are not on the given name file, then SUDAAN will not recognize them when reading the main data file in SASXPORT format.

On the OUTPUT statement use this parameter to specify the name of a file in SASXPORT format to contain any records associating long variable names and labels with their shortened versions on the main output file. If this parameter is not supplied, and there are one or more variable names or labels which have to be truncated, SUDAAN will use the name NAMEFILE.STX. Note, however, that in no case will SUDAAN overwrite a file already using the specified filename unless the REPLACE option is present on the OUTPUT statement.

LEVFILE=filename
 

On the PROC statement use this parameter to specify the name of a file containing records with SAS-style format information. If this parameter is not supplied, SUDAAN will look for a file named LEVFILE.STX. If SUDAAN cannot find a level file, whether the name is supplied explicitly or by default, then SUDAAN will not be able to use formats named on FORMAT statements within your SUDAAN program.

On the OUTPUT statement use this parameter to specify the name of a file in SASXPORT format to contain SAS-style formats to be associated with variables on the main data set. If this parameter is not supplied, and there are one or more variables with associated level labels, SUDAAN will use the name LEVFILE.STX. Note, however, that in no case will SUDAAN overwrite a file already using the specified filename unless the REPLACE option is present on the OUTPUT statement. SUDAAN will create a name for each format which is at most 8 characters in length, contains only '_', and alphanumeric characters, and does not begin with a numeric digit. This name will be as close to the name of the variable as possible.

There are two common extensions for files in SAS XPORT format: '.stx' and '.xpt'. To make sure SUDAAN reads or creates the file with the extension you intend, include it in the file specification in any of the following parameters on the PROC statement
  DATA
PSUDATA
REPDATA
NAMEFILE
LEVFILE
and include it in the file specifications in the following parameters on the OUTPUT statement:
  FILENAME
NAMEFILE
LEVFILE

If you do not specify an extension, SUDAAN will use the extension '.stx' by default on input and on output. On input, if SUDAAN supplies '.stx' and does not find an input data set by the given name, it will try supplying the extension '.xpt' before giving up.

Get examples and information on using SAS data in SAS-Callable SUDAAN

1. What is SAS-Callable SUDAAN?

SAS users will want to get the SAS-Callable version of SUDAAN, which is installed as an add-on to SAS on your Windows or Sun/Solaris computer system. Thus, you execute SUDAAN procedures within your SAS programs just like any SAS procedures. SUDAAN can read and write any data file that your SAS system can read or write. In addition, you can use the SAS Macro Facility, and formats created by SAS PROC FORMAT within your SUDAAN procedures. This makes all SUDAAN procedures as easy to use as SAS procedures. However, all versions of SUDAAN are developed and sold by the Research Triangle Institute. Please see Order to obtain SUDAAN ordering information. SAS is a product of the SAS Institute, Inc.


Does SAS-Callable SUDAAN accept the SAS ODS (Output Delivery System) option?

SAS-Callable SUDAAN cannot currently accept the SAS ODS options for output. This is because support for this option is not currently available in SAS’s Toolkit software we use to produce the SAS-Callable version of SUDAAN.


3. Why am I getting error messages in SAS-Callable SUDAAN when I try to use SUDAAN's PROC LOGISTIC?

SAS has a procedure named LOGISTIC, so there is a name conflict. To use SUDAAN's LOGISTIC procedure in SAS you must use the procedure alias, RLOGIST. There are other syntax changes you must make when you use SAS-Callable SUDAAN. There is a complete list in the appendix of the SUDAAN manual.

Get examples and information on using SPSS data in Standalone SUDAAN

1: Can I use SUDAAN to analyze my SPSS data?

Yes! The Standalone version of SUDAAN Release 7.5 and above on PCs and Solaris can read SPSS data files. Thus, you can use SPSS to manage your data files and easily use SUDAAN to analyze your SPSS data sets. With SUDAAN, you can analyze your SPSS data sets while properly accounting for the complex sampling plan used to collect your data. SUDAAN allows you to apply survey data analysis methods, generalized estimating equations (GEE), linear regression, multinomial logistic regression, log-linear regression for count data, survival analysis (Cox regression) and descriptive statistics, all for cluster-correlated or longitudinal data, which are not available directly from SPSS.


2: How can I get my results back into SPSS for further analysis?

Beginning with Release 8.0.0 of SUDAAN, Standalone PC versions of SUDAAN can write SPSS data sets. Simply specify FILETYPE=SPSS on the OUTPUT statement. Thus you can directly import your SUDAAN results into SPSS for further analysis. Standalone Solaris users of SUDAAN will need to create a text output file (FILETYPE=ASCII) and then import this into SPSS.


3: What versions of SPSS will work with SUDAAN?

Standalone SUDAAN (versions 7.5 and later) can read data sets from SPSS Windows Versions 5.x, and above. Beginning with Release 8.0, standalone PC versions of SUDAAN can also write SPSS Windows data sets. No documentation files are required, although SUDAAN allows you to supply a LEVEL file for labeling the levels of categorical variables. Indicate to SUDAAN that your input and/or output data are an SPSS data set by using the FILETYPE=SPSS parameter on the PROC and OUTPUT statements.

SUDAAN is dependent on the SPSS-supplied SPSSIO32.DLL executable file and SPSSIO32.LIB library file for the routines which read and write the SPSS data files. The versions of these files linked with Release 10 of SUDAAN support reading and writing of SPSS data through Version 17 of SPSS. They may or may not support reading of higher numbered versions of SPSS data in the future. If your release of SUDAAN is unable to read your SPSS data set, try saving it from SPSS as an earlier version file type before using it in SUDAAN.

Are you getting a warning or error message you don’t understand? Get more information.

1: Why am I getting error messages in SAS-Callable SUDAAN when I try to use SUDAAN's PROC LOGISTIC?

SAS has a procedure named LOGISTIC. To use SAS-Callable SUDAAN’s LOGISTIC procedure you must use the procedure alias, RLOGIST.


2: In MULTILOG and LOGISTIC what does the following message mean? "WARNING: One or more parameters are approaching infinity. The data may have singularities for the model you are trying to fit."

There are some models and data for which one or more of the parameters are logically infinite. Correspondingly, the probabilities for some or all of the observations become 0 or 1. The process cannot converge in these cases, and may produce floating point divide errors, exponential overflow errors, or other unpredictable results. SUDAAN now analyzes the data before fitting the model in PROCs LOGISTIC and MULTILOG and removes records which are logically associated with infinite betas. In some cases where the number of observations in a cell in the table created by crossing the response variable with the independent effect is very small but nonzero, even removing these records may not completely alleviate the problem. This is the reason for this warning. We recommend that in cases such as this you first run PROC CROSSTAB with a table of the form DEPVAR*(independent effects), and print NSUM and WSUM for each cell. Consider removing observations associated with near-zero cells in the table, and /or removing these terms from the model.

Example:

Suppose you have the following statements in PROC LOGISTIC, and you are getting the warning message above.

PROC LOGISTIC DATA=mydat FILETYPE=SAS DESIGN=WR;
NEST STR PSU;
WEIGHT WGT;
SUBGROUPS A B C;
LEVELS 2 3 4;
MODEL Y = A B A*C;

Execute the following PROC CROSSTAB:

PROC CROSSTAB DATA=mydat FILETYPE=SAS DESIGN=WR;
NEST STR PSU;
WEIGHT WGT;
RECODE Y = (0 1);
SUBGROUPS A B C Y;
LEVELS 2 3 4 2;
TABLES Y*(A B A*C);
PRINT NSUM WSUM;

3: In the modeling procedures, what does this message mean? "DATA WARNING: The matrix for estimable parameters is singular. The model may be overspecified. You should reduce the number of variables on the right-hand side and refit the model before attempting to draw any conclusions."

This message is printed during execution of the modeling procedures when the rank of the estimation matrix (MODELCOV in REGRESS, LOGISTIC, LOGLINK, and MULTILOG) is less than the maximum number of estimable parameters for the model. Consider eliminating one variable at a time from the model to determine which variable(s) are causing the problem.


4: What does this message mean? "There is a problem with nest variable STRVAR=2 in record 100. It has only one PSUVAR whose value is 17."

During the processing of your data, SUDAAN encountered a stratum that contains only one unique PSU. In this example, SUDAAN determined that the stratum coded as STRVAR=2 contained only one PSU, coded PSUVAR=17.

Two or more PSUs are needed in each stratum in order for SUDAAN to be able to estimate variance. You can verify whether this is the case by looking at a cross-tabulation of your strata and PSU variables.

If each of your strata contains two or more unique PSUs, then you may need to sort your data in the order given by the variables on the NEST statement. This sorting needs to be done prior to running SUDAAN.

If one of your strata does contain only one PSU, then the following are possible solutions:

1. If you subsetted the data in order to obtain estimates for a specific domain, you should instead try using SUDAAN's SUBPOPN statement to specify the domain. You should not subset your data prior to running SUDAAN since this can result in a loss from the data file of part of the design.

2. Manually combine or collapse the stratum that contains only one PSU with another stratum. This should be done prior to running SUDAAN. You may want to consult with a statistician prior to combining strata.

3. Use the MISSUNIT option on the NEST statement. This option causes the variance contribution of the PSU to be estimated as the difference between the PSU's value and the overall mean value for the population. Chapter 3 of the SUDAAN User's Manual gives further information about this option. You may want to consult with a statistician prior to using this option.

5: What does this message mean? "WARNING: MAX relative difference during matrix inversion was 0.001."

This warning indicates that for your data there is a possible problem with the inversion of the X'X matrix. The model that you specify yields an X matrix that is not numerically stable.


6: We got the following warning message while running PROC LOGISTIC: "WARNING: DDF (32) < maximum number of independent parameters in the model (40)" Can we ignore it?

The estimate of the variance-covariance matrix for all 40 parameters is likely to be inaccurate and misleading. The true variance-covariance matrix will have rank=40, whereas the estimated matrix will have at most rank=32. Your data do not support such a large model.


7: Why am I getting the error message "No Data on File"?

The message "No Data on File" usually indicates that SUDAAN has not found any valid data on file. Remember that SUDAAN rejects records on which the WEIGHT variable is non-positive. In addition, in the modeling procedures SUDAAN rejects all records on which any model variable (left or right-hand side) is missing. SUBGROUP variables outside the range of 1…LEVEL are considered missing.

Suppose you have the following program and you are getting the "No Data on File" Message:

PROC MULTILOG DATA=in.test FILETYPE=SAS DESIGN=WR;
NEST STRATUM PSU;
WEIGHT WGT;
SUBGROUP SEX AGEGRP EDUC OVERWT;
LEVELS 2 6 5 2;
MODEL OVERWT = SEX AGEGRP EDUC;
PRINT BETA SEBETA;

You can check to see whether you have these sorts of problems by executing a program similar to the following RECORDS procedure:

PROC RECORDS DATA=in.test FILETYPE=SAS CONTENTS COUNTREC;

/* SUBPOPN statement selects only records which SUDAAN can use in the model */

SUBPOPN WGT>0 & SEX<0 & SEX<=2 & AGEGRP>0 & AGEGRP<=6 & EDUC>0 & EDUC<=5 OVERWT>0 & OVERWT<=2;

/* Print the first few valid records */

PRINT / MAXREC=20;

If printing the data does not clarify the problem, then send the program and data (zipped!) to sudaan@rti.org so that we can help you.

Are your results not what you expected? Get more information.

1: My SUDAAN estimates, standard errors, and/or tests of hypothesis are not the same as the ones I get in other packages. Why is this?
  • If you are analyzing data from a complex sample survey, you will likely get different results in SUDAAN vs. other packages. First, if you cannot use the survey sampling weights in other packages, the point estimates will be different. Some packages allow a WEIGHT statement, and that will ensure that the point estimates are the same between SUDAAN and most other packages. Point estimates are generally biased if the survey sampling weights cannot be utilized. In addition, the variances, standard errors, tests of hypotheses, and p-values will still be different, even when weights are utilized. This is because SUDAAN allows the user to specify the sampling design and thereby compute a robust variance estimate, yielding valid inferences. If another package does not allow for specification of the complex sampling design (stratification, clustering, etc.), then variance estimation, and hence test statistics and p-values, will be wrong. Usually, this results in variances that are too small and false-positive tests of hypothesis.

  • In some procedures different estimates as well as different standard errors may be due to different tolerances for matrix inversion. Try changing the value of the TOL parameter on the PROC statement.

  • In the iterative regression procedures, different estimates may be due to a different number of iterations. Try changing the values of MAXITER, EPSILON and / or P_EPSILON on the PROC statement.


2: Why am I getting ******** in the output instead of results?

The ******* indicates that the default field width is not large enough for the result. Suppose, for instance, you find **** in the output from one of the descriptive procedures where you requested WSUM . You can add something similar to the following to your PRINT statement after the slash:

PRINT / WSUMFMT=Fw.d;

where w is the overall field width you desire and d is the number of decimal places. You should choose w large enough to accommodate the number of decimal places d, the decimal point, and enough digits to the left of the decimal to contain the largest value.


3: I am getting non-zero parameter estimates in LOGISTIC for the reference group. What is wrong?

The large number of records on your data set may be the cause of the problem. The large size reduces the precision of sums of squares and cross products, which are accumulated in order to estimate the parameters. In this case, the round-off errors may be larger than the default tolerance for matrix inversion (TOL=1e-6). We suggest that you supply a larger tolerance on the PROC statement (TOL=1e-5 for example) and rerun the job.


4: I am trying to estimate quantiles for a variable with a large percentage of 0 values. I am getting missing values for the quantile and for the SEs and upper and lower confidence limits. Is there anything I can do?

SUDAAN is unable to estimate any quantile that is less than or equal to the percentage of data accounted for by the 0 values. This will happen for any variable where the smallest value of that variable has ties.


5: Does it matter whether I use SUBPOPN or subset my data outside of SUDAAN before analyzing?

It makes a difference any time parts of the sampling design (e.g., an entire PSU) are lost after subsetting the data. SUDAAN needs the entire design present in order to estimate variances correctly. In most cases, it will make a difference. The difference shows up in the variance estimation and hypothesis testing.

Here is how the SUBPOPN statement works. Imagine a new variable named ELIGIBLE which is equal to 1. If the observation is to be included in the analysis through SUBPOPN, and ELIGIBLE is equal to 2, it is not included in the analysis. If this variable is used on the SUBGROUP statement with the corresponding LEVELS equal to 2, and also crossed with every term on a TABLES statement, then it will produce results for both levels ELIGIBLE=1 and ELIGIBLE=2. The use of SUBPOPN ELIGIBLE=1 will produce results that are identical to the results for the cell for "ELIGIBLE=1" when both levels are analyzed.

If you instead subset the population outside of SUDAAN and then analyze the data using SUDAAN, the results may be different in the two analyses. One case for which the results will be the same is when "DESIGN=WR" and the subset contains al least one observation (with positive weight) in each of the original PSUs.

In conclusion, the safe (therefore preferred) approach is to use SUBPOPN, and not subset the data prior to using SUDAAN.


6: Can I use "-2 log likelihood" to evaluate the relative fit of two models?

You can use "-2 log likelihood" to evaluate the relative fit of two models, but not the absolute p-value to test a hypothesis, since we do not know the distribution of the likelihood for complex samples.


7: I ran the same procedure using both WR and Delete-1 Jackknife designs. Results were very similar, but the Jackknife design takes much longer to execute. Which method do you recommend?

Both methods are good large sample approximations. Here "large" refers to the number of PSUs (Primary Sampling Units), not the number of observations. Which to use is a matter of preference. There is no evidence that one method is superior to the other in general.


8: Which SEMETHOD should I use with R=EXCHANGEABLE?

You can use either SEMETHOD=ZEGER or SEMETHOD=BINDER to obtain the robust variance. BINDER is most often used in complex sample surveys. ZEGER is most often used in randomized experiments and non-survey applications. In many cases, ZEGER and BINDER are identical.

Use SEMETHOD=MODEL to obtain the model-based or "naive" variance estimate. This estimate assumes that exchangeable intracluster correlations (R=EXCHANGEABLE) are correct. This is the most efficient variance estimate when the "working" correlation assumption (R=EXCHANGEABLE or R=INDEPENDENT) is correct. SEMETHOD=MODEL is most often used with randomized experiments and other non-survey applications.


9: My regression model contains independent variables that are coded as 0-1 indicator variables, and I listed these variables on the SUBGROUP statement. SUDAAN seems to be deleting a lot of observations from my analysis, and the regression coefficients don't look correct to me. What could be the problem?

Do not list independent variables that are coded as 0-1 on the SUBGROUP statement. Values of 0 are treated as missing for variables that are listed on the SUBGROUP statement and will be excluded from your analysis. Independent variables coded 0-1 may be placed on the CLASS statement if you wish to treat them as categorical, or you can enter them into the model as is.


10: I have used LSMEANS in PROC REGRESS to estimate means by race and income level controlling for body weight, sex, race, and income level. I have a set of LSMEANS for race and income. How can I test for differences between those LSMEANS?

First, you can use the t-tests that are printed by SUDAAN to test H0: Beta=0. Tests of the betas=0 are equivalent to testing for differences in LSMEANS. These t-tests automatically compare each level of the categorical covariates to the reference cells. You can also use the EFFECTS statement to compare other specific levels of the categorical covariates, and that is also equivalent to comparing LSMEANS.


11: Is there something about the way percentiles are calculated that would make the percentile estimates appear incompatible with proportions from a dichotomous variable?

I calculated percentiles for a duration variable (number of minutes walked) using:

proc descript ...;
var walkdur;
tables _one_
percentile 10 25 50 75 90;

And then using a cut-off of 30 minutes, I created a dichotomous variable and calculated using proc crosstab the proportion who walked for 30 or more minutes

Percentile results were:

18.4 min. - 10th percentile
29.1 min. - 25th
34.5 min. - 50th
59.3 min. - 75th
77.2 min. - 90th

I calculated an estimate of 79% walking for 30 minutes or more using a dichotomous variable. However, from the percentiles, one would expect that less than 75% walked for 30 min. or more.

If you have a large percentage of ties, then when SUDAAN interpolates between successive values, you may see the type of behavior that you have indicated.

SUDAAN is interpolating between 30 and the value just below it in order to estimate the 25th percentile. The value closest to but less than 30 that occurs for the walkdur variable is 29. Values at or below 29 account for, roughly, 22% of the data. Since the number 30 is , roughly, 28% of the data, SUDAAN assumes this percentage is distributed between 29 and 30. So, 29 is the 22nd percentile and 30 is the 22+28=50th percentile. In order to find where between 29 and 30 the 25th percentile occurs, we need to find how far away from 29 that the 25th percentile occurs. In order to find the amount to add to 29, solve this equation for x:

x(28)=(25-22), which yields, approximately, .11. So, the 25th percentile is at 29+.11=29.11.

Get answers to common "how to" questions.

1: How do I select predicted values for the observations used in the analysis from an output file of expected values?

You can include any number of additional identification variables on the PREDICTED output data set by using the IDVAR statement in your procedure. These extra variables can be used to uniquely connect the predicted values to the original data set.


2: How do I recode a 0-1 variable so that 0 recodes to 2 and 1 remains 1?

You can place the variable on a CLASS statement with the options SORT=INTERNAL and DIR=DESCENDING. Thus for example, to convert the 0-1 variable YESNO to a 1-2 variable with the 0's flipped to 2's, use the following:

CLASS YESNO / SORT=INTERNAL DIR=DESCENDING;

3: How can I incorporate an interaction between two continuous variables in a MODEL?

You can create a new variable whose value on each record is the product of the values of the two continuous variables,(e.g. AB=A*B) and then use this variable in your MODEL statement. Currently, you must do this data manipulation outside of SUDAAN.


4: How can I test the equality of means between four sub-populations?

Here is one possible way to accomplish this in SUDAAN. Create a new variable A with values 1, 2, 3, and 4 to designate the particular subpopulation to which the observation belongs. Suppose you wish to compare the means for the variable Y. The following SUDAAN program will do this:

PROC REGRESS DATA=<data set> DESIGN=<design>;
NEST < nest variables>;
WEIGHT <weight variable>;
SUBGROUP A;
LEVELS 4;
MODEL Y=A;

The test of hypothesis for the effect of A is the same as the test of equality of subgroup means.


5: How can I run backward or stepwise regressions in SUDAAN?

SUDAAN does not directly implement backward or stepwise regression. To run a backward regression using SUDAAN you must sequentially remove one variable from your model, and rerun your job. To run a forward regression, successively add one variable to your model, and rerun your job.


6: How do I perform an ANOVA in SUDAAN?

You can effectively perform an ANOVA by using the linear regression (REGRESS) model in SUDAAN. It works very much like the GLM procedure in SAS. You specify the categorical covariates (coded 1,2,3,...) on the SUBGROUP and LEVELS statements, the dependent variable on the left-hand side of the MODEL statement, and all independent variables (categorical and continuous) on the right-hand side of the MODEL statement.

SUBGROUP X;
LEVELS 4;
MODEL Y=X Z;

Here X is categorical with 4 levels (coded 1,2,3,4), Y is the dependent variable, and Z is a continuous covariate. X will be modeled using dummy variables (one for each level of X), and Z will be modeled with one regression coefficient (the slope of Z). By default, the last level of each of the categorical covariates is used as the reference cell for the covariate. You can change the reference cell of any categorical covariate by using the REFLEVEL statement.


7: How do I apply Hosmer-Lemeshow goodness-of-fit measures to my SUDAAN output in LOGISTIC?

Beginning with Release 9.0.0, in LOGISTIC, SUDAAN computes the following Hosmer-Lemeshow type statistics:

  • A Wald F test with numerator degrees of freedom equal to the rank of the variance-covariance matrix (usually G-1) and denominator degrees of freedom equal to the (Number of PSUs - number of strata) for Taylor series and Delete-1 jackknife designs, and (Number of replicates) for BRR and Replicate weight jackknife designs.
  • A simple Chi-square test, which is a weighted analog of the original Hosmer-Lemeshow test. The degrees of freedom are G-2.
  • For Taylor series designs, the Satterthwaite adjusted F-test, degrees of freedom and p-value are also provided.

There are two new options HLGROUPS=count and HLVAR=variable on the MODEL statement. HLGROUPS allows you to specify the number of groups of residuals to form. By default LOGISITC forms 10 groups. Note however that LOGISTIC will not form more groups than are supported by the data. HLVAR permits you to specify a variable on the input data set which gives the group number to associated with each residual. You may use at most one of these options.

LOGISTIC has two new output groups. HLGROUPS contains information on the residual groups formed. HLTEST contains all of the available test statistics. See your SUDAAN Language Manual, Chapter 10 for details.


8: How do I compute the Mean Square Error from my PROC REGRESS output?

The concept of "mean square error" is defined only for the case of simple random samples. There is no equivalent definition for complex survey data. For computing the variance of a predicted value for given X:

PREDICTED Y = B'X;
VARIANCE(PREDICTED Y) = X' {V(B)} X;

SUDAAN prints out the variance-covariance matrix V(B) of the estimated coefficients B. You can use this to compute the variance of any predicted value.


9: Is there a way to compute the probability of the response variable for a given set of covariates in LOGISTIC?

You can get the equivalent of "adjusted means" for logistic regression and other nonlinear models using the PREDMARG and CONDMARG statements. You can test hypotheses and form general linear contrasts among the marginals using the PRED_EFF and COND_EFF statements. See the LOGISTIC chapter of the SUDAAN User's Manual for more details.


10: I've tried to find a method in PROC CROSSTAB that can produce a 95% confidence limit for a binomial proportion (this is the proportion of observations in the first variable level that appears in the output). Is there a model in SUDAAN that can compute this CI?

SUDAAN 9.0.0 and above provides confidence intervals for proportions in the CROSSTAB procedure. These are printed by default as part of the TABLECELL group. The confidence limits for row percent (ROWPER) are LOWROW and UPROW. The confidence limits for column percent (COLPER) are LOWCOL and UPCOL. The confidence limits for total percent (TOTPER) are LOWTOT and UPTOT.


11: How do I use time-dependent covariates in SURVIVAL via the counting process approach?

There needs to be some work done prior to running SURVIVAL in order to use time-dependent covariates. The method that is assumed one is using to handle time-varying covariates follows the work of Anderson and Gill (1982, "Cox's Regression Model Counting Process: A Large Sample Study," Annals of Statistics, vol. 10, pp. 1100-1120) who developed the notion of the counting-style process of inputs.

Suppose you have data with time-dependent covariates and the context is survival analysis of people where you are seeing how long people survive over time and there is the possibility that the predictor (independent) variables are time-varying. For the sake of discussion, suppose you follow people from birth to death and you have recorded the weight of each person at varying points in time, say every year of their life on December 31st.

The counting-style process of input requires each person have a record every time the independent variable value changes, so, here is a "pseudo" case. Note that Time1 and Time2 are the left and right endpoints of the time interval over which the WEIGHT (time-dependent covariate) was constant.

PersonID Weight time1 time2 Survive? (Indicates survival)
1 7 0 1 1
1 15 1 2 1
1 25 2 3 1
1 35 3 4 1
1 40 4 5 1
1 42 5 6 1

If you have more than one time-dependent covariate, the construction of the multiplicities of records gets trickier.

With time-dependent covariates in general, what must be done is to take the interval of time over which each person is followed and break up the interval into periods of time where the time-dependent covariates are all CONSTANT.

In the above example, suppose in each year that a person's height changed mid-year (it was constant on the first half of the year and constant on the second half but not the same value in both halves). Then, you would have to take each record given above and split it into two records, one where the height represents the value in the first half of the year and one where the height represents the value in the second half of the year. Below we show this for the first record given above:

PersonID Weight time1 time2 Survive? Height
1 7 0 .5 1 18 inches
1 7 .5 1 1 20 inches

This is how you would handle the case of two time-dependent covariates.

Once the data is set up outside of SUDAAN, you treat the data in SURVIVAL as if the covariates are NOT time-varying, like so (but still using the counting process style of input):

Model time1 time2 = weight height;

All you specify are the two variables that record the time points of the left and right endpoints of each record's "time interval".

Get answers to your questions about the theory behind SUDAAN’s calculations.

1: How does SUDAAN handle singleton clusters for variance estimation using the Taylor linearization approach and resampling methods?

SUDAAN's handling of a single cluster within a stratum is based on the assumption that another cluster was in the sample but all data were missing. This is adequate for the case where only a few strata have single clusters. However, this method or any other method for handling singleton clusters in most strata depends heavily on certain assumptions and unless one is willing to accept those assumptions, one should not use such procedures. We have found that a better approach is to collapse strata to create pseudo strata so that each strata has at least two clusters. The creation can be based on subjective judgement about similarity of clusters; for example, in household samples one may use geographic proximity and urban rural character.

SUDAAN calculates the variance contribution for each stage of the design as the square of the difference between each unit's value and the mean of all the units within the stage. When only one sample unit is encountered within a stage, SUDAAN cannot calculate the variance contribution in this manner and will typically halt with an error message.

However, if you specify the MISSUNIT option on the NEST statement, then when only one sample unit is encountered in a stage, SUDAAN will estimate the variance contribution of that unit using the difference in that unit's value and the overall mean value for the population. For example, if you have a two-stage design and have specified a stratum and a primary sampling unit (PSU) variable on the NEST statement, then SUDAAN will abort with an error message if you have a stratum that contains only one PSU. If you specify MISSUNIT, SUDAAN will calculate the mean for the entire file and calculate the variance contribution for that unit as the difference in that unit's value and the overall mean.