SDA 3.5 Documentation for HARC


NAME

HARC - HTML archive specification file

DESCRIPTION

The HTML archive specification file (HARC file) provides information on the data files and procedures available in a data archive. This information is used by the HSDA program to construct option screens for the user of SDA on the Web.

CONTENTS OF THIS DOCUMENT


OVERVIEW

The HTML archive specification file (HARC file) lists the codebooks, data files, analysis programs, and data export procedures that are available for online access from an SDA archive, and it indicates where those data files, programs, and other types of files are located. The HARC file also defines labels for the study datasets and files.

Stratum and cluster variables can be specified for a study in the HARC file, to enable the MEANS program to calculate complex standard errors. Weight variables for a study can be specified, so that users can select a weight from a drop-down menu, instead of having to enter the name of a weight variable on the option screen.

The HARC file is read by the HSDA program and is used by that program to construct the HTML forms that will be displayed to users of the archive. Only those programs and options specified in the HARC file are accessible to the Web user.

Note that it is possible to have more than one archive file on the same server. Each archive is defined by its own HARC file, and each can offer different study datasets, analysis capabilities, or languages. Each time the HSDA program is executed, the name of the appropriate HARC file is given as an argument.


HARC FILE LAYOUT

The HARC file is an ASCII file that is laid out in various sections. Each section begins with a section title in square brackets ([ ]). Within each section, specifications are usually given in the form "keyword = something" with one keyword per line. Lines beginning with a pound sign (#) are interpreted as comments. Blank lines are ignored.

There are six possible section headings in the HARC file, and the first three are REQUIRED:

[GENERAL]
Location of help files and temp files; specification of overall options

[PROGRAMS]
Location of analysis programs; list of the programs to be made available

[DATASETS]
Datasets available; names, locations, options

[LABELS]
Modify the labels for programs and other procedures available to be selected

[HEADER]
Text for top of selection screen (old interface only)

[FOOTER]
Text for bottom of selection screen (old interface only)

The general layout is as follows:

[GENERAL] keyword = something keyword = something [PROGRAMS] keyword = something keyword = something [DATASETS] keyword = something keyword = something * keyword = something keyword = something

The names of sections and the keywords can be given in either upper or lower case, but they may not be abbreviated. The first section should be the [GENERAL] section. The other sections could be given in any order. However, it is a good idea to put the [DATASET] section last, to facilitate adding datasets to the HARC file.

Keywords within a section can be given in any order. However, within the [DATASET] section the keywords applicable to a specific study must be grouped together and be separated by an asterisk from the specifications for another study.


SPECIFICATIONS WITHIN EACH SECTION

Each section of the HARC file contains specifications appropriate to that section. Most sections contain specifications in the form of "keyword=something," where "something" is either an option specification or the name of a PATH or a Uniform Resource Locator (URL).

If the specification is a PATH, it must be a full pathname on the server computer such as: /bravo2/bravo3/sda

If the specification is a URL, it must be a complete one such as: http://socrates.berkeley.edu/mydata.html
In principle, a URL can refer to a location on any World Wide Web server. However, the checking for valid URLs done by the HSDA program in debugging mode will only work within the same domain as the local server running the HSDA program.

A slash (/) at the end of a PATH or a URL can be used if the referenced location is a directory (and not a specific file). However, this use of a final slash is optional.

The [HEADER] and [FOOTER] sections contain text, possibly including HTML formatting tags.


[GENERAL] Section Keywords

Possible GENERAL keywords are grouped into the following sections:


Basic Keywords


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

HELPDOCS=     URL for directory with          REQUIRED
                online help files

LOGFILE=      PATH of file for SDA log        REQUIRED to create log
               (The file should exist and be    of SDA usage
                writable by the CGI process.
                The log file can be analyzed
                with the ’sdalog’ program.)


DEBUG=        YES                             Debug mode disabled

URLCHECK=     NO                              Check URLs when in
               (turns off URL checking           debugging mode
                because some are in
                domains different from
                the local SDA server;
                you must verify manually
                that the ’tree_items.js’
                file for a study is
                available for the new
                interface)


BGCOLOR=      Color code for background       Use standard browser
                color of screens                background color

BACKGROUND=   Name of background file         Use standard browser
                for background of screens       background

BATCHSAVEDIR= PATH of directory into which    No batch command files
                to copy the batch command       saved
                files for the analysis
                programs before they are
                deleted

DUMMYGENMAX=  A number between 1 and 100      Max of 25 dummy vars can be
               (max dummy vars for REGRESS      generated by the "m:" syntax
                and LOGIT)                      for a single categorical var

INTERFACE=    CLASSIC                         Use the new interface
               (applies to all the studies
                defined in this HARC file)

JSCRIPTURL=   URL for JavaScript code         Look in the ‘jscript’
               (needed by the new SDA           subdirectory of the
                interface)                      URL defined by the
                                                ‘HELPDOCS=’ keyword

LANGUAGE=     PATH to directory with          Use built-in English
                alternate language files        messages and menus

MAXLISTCASE=  Maximum number of CASES         Limit lists to 500
                to list (for ’listcase’)         cases

MAXLISTVARS=  Maximum number of VARIABLES     Limit lists to 500
                to list in an OUTSTUDY          variables
                before a warning

SEARCHURL=    URL of the SDA search servlet   SDA search not enabled

XMEANS=       YES (to get special output--    No special output in
                average differences)            MEANS program


Subsetting


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

SUBTMPDIR=    PATH to temporary or            REQUIRED for subsetting
                work directory
               (writable by CGI process)

SUBTMPURL=    URL to same temporary or        REQUIRED for subsetting
                work directory

SUBMAXVARS=   Maximum number of variables     Limit subsets to 1000
                to allow in a subset            variables

OLDSUBSET=    YES (use old version of         Use new verion of SUBSET
                the subset procedure)


Charts


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

CHARTGENURL=  URL of the chart generator      No charts for TABLES output

CHARTSTMPDIR= PATH to temporary or            Same PATH as defined for
                work directory for charts       subsetting(SUBTMPDIR=),
               (writable by CGI process         or the generic temp directory
                and by chart servlet)           for the operating system

MAXCHARTS=    Maximum number of charts per    Maximum of 25 charts
                tables or means request
               (A number between 1 and 100)

TABLESCHARTS= Chart list and descriptions     All available charts
                for TABLES

               (Required only to limit the types of charts available,
                to change the order of charts on the drop-down menu,
                or to change the label for each type of chart)

                Default order and labels are:
                TABLESCHARTS = stackedbar (Stacked Bar Chart)
                TABLESCHARTS = bar (Bar Chart)
                TABLESCHARTS = pie (Pie Chart(s))
                TABLESCHARTS = line (Line Chart)
                TABLESCHARTS = none ((No Chart))

               Note that the ’TABLESCHARTS=’ keyword can be repeated,
                and the chart label can include parentheses.
                (See Example 7 below.)

MEANSCHARTS=  Chart list and descriptions     All available charts
                for MEANS

               (Required only to limit the types of charts available,
                to change the order of charts on the drop-down menu,
                or to change the label for each type of chart)

                Default order and labels are:
                MEANSCHARTS = bar (Bar Chart)
                MEANSCHARTS = line (Line Chart)
                MEANSCHARTS = none ((No Chart))

               Note that the ’MEANSCHARTS=’ keyword can be repeated,
                and the chart label can include parentheses.

At least the ’CHARTGENURL=’ keyword must be specified, in order to generate charts. That URL must correspond to the chart generation servlet on the server computer. For more details, see the SDA Archive Developer’s Guide.


[PROGRAMS] Section Keywords

The following keywords indicate which SDA analysis programs are to be made available for the datasets specified in this HARC file. The currently available programs are: tables, means, correl, corrtab, regress, logit, listcase, recode, compute, listvars, and listvars(delete). Note the difference between specifying ’listvars’ and ’listvars(delete)’. If you only specify ’listvars’, the user will be able to list the newly created variables but will not be able to delete them. This will protect the created variables from being deleted, but it will also prevent users from deleting variables that were created erroneously.

For information on interactive usage of each program, see the online help file for analysis programs or the online help file for new variables. For information on the batch command files for each program, see the index to these help documents.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

SDAPATH=      Directory on server in which    REQUIRED
                SDA programs are located

SDAPROGS=     Analysis programs to provide    REQUIRED
                (tables, means, etc.)

Note that in order to enable users to create new variables for a dataset, it is necessary but not sufficient to mention ’recode’ and ’compute’ in the list of programs. Each dataset for which new variables can be created must also specify where the new variables are to be stored, in one of two ways:

Also, the availability of the ’subset’ procedure for a particular dataset does not depend on this list of SDA programs. Rather, that availability depends on using the following:

  1. Using the ’SUBTMPDIR=’ and ’SUBTMPURL=’ keywords in the general section of the HARC file, to define the location of the directory for temporary work files;

  2. Using the ’SUBGRPINFO=’ keyword in the datasets/subset section of the HARC file for that particular dataset
    (optional but highly recommended).

  3. IF OLDSUBSET=YES, using the ’SUBDATA=’ and ’SUBDDL=’ keywords in the datasets/subset section of the HARC file for that particular dataset.

Note that if SUBTMPDIR and SUBTMPURL are both defined in the GENERAL section of the HARC file, the new subset procedure will be available for all datasets defined in that HARC file, unless you specify ’SUBSET=NO’ for a particular dataset in the datasets section of the HARC file.


[DATASETS] Section Keywords

The following keywords are repeated for each study to be made available for online browsing, analysis, subsetting or downloading. Keywords for each study are grouped together and separated from other studies by an asterisk (*) on a line by itself.

Possible keywords for each dataset are grouped into the following sections:


Basic Keywords


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

DATASET=      Name of the study (one word,    REQUIRED
                only letters or numbers)

DATALABEL=    Label of study to appear on     REQUIRED
                menus (one line)

CODEBOOK=     URL of homepage for HTML        REQUIRED
                codebook or documentation
                (may be repeated)

SDADATA=      PATH of SDA dataset             REQUIRED
                directory
                (may be repeated)

OUTSTUDY=     PATH of SDA dataset for         No recodes or computed
                newly created variables         variables can be stored
                                                (but see below)

CHARTFONT=    Name of font to use in charts   SanSerif

LISTVARS=     PATH of SDA dataset (label)     Enable listing only of a
                (provide option to list         dataset defined as an
                 variables in this dataset)     OUTSTUDY

VARCASE=      LOWER or UPPER                  Variable names entered on
                (names of variables entered     option screens must match
                 on option screens will be      the case of the variables
                 converted automatically to     stored in the dataset
                 the specified case)

VARIABLES=    varname (Label for variable)    User must enter var names
                (names of variables to be       (old interface only)
                 included on drop-down list
                 of variable names for
                 tables, means, and corrtab.
                 Repeat this keyword for
                 each variable to include.
                 See example 8 below.)
                 THIS KEYWORD APPLIES ONLY TO THE OLDER SDA INTERFACE.

If more than one SDA dataset is specified for a study, the SDA analysis programs will look for variables in each SDA dataset in the order they are specified in the HARC file. This is an important consideration when one of the specified SDA datasets is also specified as the ‘OUTSTUDY’ dataset for storing newly created variables. Ordinarily you will want to specify the main dataset first, so that the original variables are always accessible. For example:

SDADATA=/sda/study/origdata SDADATA=/sda/newvars OUTSTUDY=/sda/newvars

If you list the SDA dataset with the new variables first, you can create variables with the same name as an original variable, and effectively "hide" the original variable. For example:

SDADATA=/sda/newvars SDADATA=/sda/study/origdata OUTSTUDY=/sda/newvars

If the ‘OUTSTUDY’ option is given as a study-level option, all users who access the specified study will store recodes and computed variables in the same SDA dataset directory. If there are many users who try to do this, there may be some conflicts in naming the new variables. One user will be able to overwrite a variable by creating a new one with the same name.

An alternative method of enabling the creation of new variables is to invoke the HSDA program with the name of an SDA dataset directory given as the third argument. Different users can then share the main dataset for a study but store new variables in their own accounts (assuming that they have such an account on the server computer). See the HSDA document for further discussion and an example.

The ‘VARCASE’ option is designed to simplify the entry of variable names on the option screens, by converting the names of variables automatically to the correct case. This assumes that the variables in the SDA dataset are all either in upper or in lower case (except for the CASEID variable, which is always in upper case). The VARCASE specification also applies to the names of variables generated by RECODE and COMPUTE; those programs will only generate new variables in the appropriate case.

This ‘VARCASE’ specification will apply to all of the SDA datasets listed for a study, if it is given as a study-level option. If you have variables in more than one SDA dataset for the same study, you can set this option separately for each one by using the form:

SDADATA = PATH(varcase=lower).

The ‘LISTVARS’ option allows users to see a list of variables in an SDA dataset. The PATH of the dataset is given after the equal sign. The option screen for selecting an action will then include an option to see a list of the variables in the specified dataset directory. If a label for the directory is given in parentheses after the name of the dataset directory, that label will appear on the option screen, instead of the directory pathname. In order to enable this option, it is also necessary that the ‘listvars’ or ‘listvars(delete)’ program be included in the [PROGRAMS] section as one of the available programs.

This ‘LISTVARS’ option is useful for allowing users to see what is in a dataset that contains derived variables that are not in the main archive dataset for the study. Users can then run analyses which include those variables, provided that the directory has also been specified with an ‘SDADATA=’ command in the HARC file.


Multiple Codebooks

If there is only one HTML codebook for a study, use the basic ‘CODEBOOK=’ keyword described above.

The new SDA interface allows multiple HTML codebooks. For example, a codebook stratified by year or region could be set up, in addition to the basic unstratified codebook.

For each codebook, provide the URL of the main codebook HTML file, together with an appropriate label for the codebook (in parentheses). There can be as many codebooks as you wish. The user can select one of the codebooks to view at a time.

If no label is provided for a codebook, the label for the first codebook will be ‘Default’, and the label for the others will be ‘Alternative’. Those labels are not very helpful, so it is much better to include more descriptive labels.

See an example of multiple codebook definitions in example 3.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

CODEBOOK=     URL for Codebook #1 (label)     Required for codebook
CODEBOOK=     URL for Codebook #2 (label)     Required for codebook


Weights

Appropriate weight variables, and a label for each one, can be specified for a study. If no weights are specified, the user can still enter the name of a weight variable on the option screen.

If more than one weight variable is specified, they are presented to the user as a drop-down list on the option screen for each analysis program. The first one listed in the HARC file is the default weight; but the user may select one of the other available weights from the drop-down list.

One of the weight options listed in the HARC file can be the option NOT to use a weight. This is specified as ‘##none’. An optional label can be given for this option; for example ‘##none(Do not use a weight)’. The default label is ‘(No weight)’.

A set of weight variables is specified in the dataset definition in example 1.

Note that the user can be forced to use a specific weight on every analysis run. If only one weight variable is specified in the HARC file for a study, and if the ‘##none’ option is not provided, the specified weight is used automatically on every analysis run.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

WEIGHT=       wtvar1 (label) wtvar2 (label)   Required for drop-down weights
WEIGHT=       ##none (label)

Multiple weight variables and labels can be defined on a single line. Alternatively the ‘WEIGHT=’ keyword can be repeated for additional specifications of weight variables and labels.


Selection Filters

It is possible to pre-set one or more selection filters, so that the choices appear as a drop-down list on the option form for each program for this particular dataset. The regular Selection Filter window will appear beneath the pre-set selection choices.

For each specification of filter variables and codes, a label can be provided which describes (on the drop-down list) that particular group of cases to be included in the analysis. If a label is not provided, the filter variable(s) and codes are shown on the drop-down list.

One of the pre-set filter options listed in the HARC file should usually be the option NOT to use this selection filter. This option is specified as ‘##none’. An optional label can be given for this option.
For example: ##none : Include all ages
The default label for the ‘##none’ specification is ‘(All respondents)’.

(See example 6 for pre-set filters with one and two filter varibles.)


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

FILTER=       Label for pre-set selection filter  REQUIRED
GROUP=        ##none : Label for ignored filter
GROUP=        var1(codes), var2(codes) : Label for this filter group
GROUP=        var1(codes), var2(codes) : Label for this filter group


Standard Errors

If calculations of complex standard errors are to be enabled for a study, the stratum and/or cluster variables must be specified. This is done with a ‘design=’ keyword. The method of calculating standard errors depends on whether a stratum variable only, a cluster variable only, or both variables are specified.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

DESIGN=       STRATUM(var1) CLUSTER(var2)     Both variables are defined

DESIGN=       CLUSTER(var2)                   Clusters will be paired into strata

DESIGN=       STRATUM($1)   CLUSTER(var2)     Clusters all in one stratum

DESIGN=       STRATUM(var1)                   Only strata, no clusters

DESIGN=       STRATUM(var1) XREGRESSION       Default is SRS for REGRESS and LOGIT

If only a cluster variable is defined, the default procedure is to combine pairs of consecutive clusters (by cluster number) into strata, for purposes of calculating standard errors.
(See example 3.)
Alternatively, you can force all the clusters to remain in a single stratum by specifying the name of the stratum variable as ’$1’.

See the document on calculating standard errors for more details.

Complex standard errors are computed by default for each analysis using the TABLES, MEANS, REGRESS, and LOGIT programs if a stratum and/or a cluster variable is defined for the dataset. The user, however, may force the calculation of SRS standard errors, effectively assuming that the sample is a simple random sample (SRS), by selecting that option on each program option page.

The calculation of complex standard errors can require a substantial amount of computer time when analyzing a large dataset using REGRESS and especially LOGIT. Therefore, the archive can override the usual default for those programs and make SRS the default for REGRESS and LOGIT. To do that, add ’XREGRESSION’ to the specifications after ’DESIGN=’ in the HARC file. The last ’DESIGN=’ example below shows how this is done. Note that users will still be able to request complex standard errors if they wish, but they should not be surprised by delays in receiving results if they do so.

(See example 4.)


Subsetting

The next keywords are used to enhance or to suppress the customized subsetting of variables and/or cases. The file with information on groups of variables is optional, but it enables the user to select entire groups of variables, instead of having to specify all desired variables one by one. That group- information file is generated automatically whenever the XCODEBK program produces HTML codebook files. It has the name ‘Xsub.txt’, where ‘X’ is the root name of the HTML codebook files.

The ’SUBSET=NO’ specification, on the other hand, will suppress the option for creating a customized subset for this particular dataset.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

SUBGRPINFO=   PATH of file with info on       No subsetting of
                groups of variables             groups of vars
               (from codebook program)

SUBSET=       NO (to suppress ’subset’ from
                the user interface for this
                dataset)
The following two keywords are meaningful only if "OLDSUBSET=YES" is specified in the GENERAL section of the HARC file.
The new subset procedure (in SDA version 3.3 and later) ignores these two keywords, if they are present, because it obtains data directly from the binary SDA data files, not from the original data and DDL text files.)

Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

SUBDATA=      PATH of ASCII DATA file         REQUIRED for OLD subset

SUBDDL=       PATH of matching DDL file       REQUIRED for OLD subset

Note that if SUBTMPDIR and SUBTMPURL are both defined in the general section of the HARC file, the new subset procedure will be available for all datasets defined in that HARC file, unless you specify ’SUBSET=NO’ for a particular dataset.

Also, if you want to be absolutely sure that the subset procedure is not available for a particular dataset, you should set up a disclosure file for that dataset and include the specification "subset=no" in that disclosure file. Otherwise, even though the subset option may not be presented to the user, it remains possible to run the subset procedure in batch mode.


Downloading

The next two keywords are used to specify data and documentation files that have been created ahead of time (that is, they are not custom-made) and are available for downloading. These keywords are usually used in pairs -- a heading, followed by the URL of a file available for downloading. The URL itself can also be followed by a label in parentheses; that label will appear on the selection screen next to the file name. (See example 4.)

Note that the file given as a URL should preferably have a suffix of ’.txt’, if users are to be able to view the file as well as to save it.


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

DLFILE=       URL of file to download         No file available
                (any kind of file)

DLHEADING=    Heading for a file or group     No heading
                of files

(To get a blank line in the list of files and headings:)
DLHEADING =    


[LABELS] Section Text

Specifications in this section are used to customize the descriptions or labels for the various actions that the user can select.

This option is particularly relevant for preparing a user interface in a language other than English. If non-English text is specified, note that upper-ASCII characters such as letters with accent marks can be entered directly into the HARC file with an appropriate editor. It is also possible to enter upper ASCII characters with special codes. For details, see the language document.

The format is ‘keyword = Label for each action’. The following list shows each keyword and its default label.


Keyword             Default Label (if keyword not specified)
_____________________________________________________________________

LAB_cbwindow =    Extra Codebook Window

LAB_study =       Study:
LAB_sselect =     Select a study:
LAB_aselect =     Select an action:

LAB_codebook =    Browse codebook in this window
LAB_tables =      Frequencies or crosstabulation
LAB_means =       Comparison of means

LAB_correl =      Correlation matrix
LAB_corrtab =     Comparison of correlations
LAB_regress =     Multiple regression
LAB_logit =       Logit/Probit regression

LAB_listcase =    List values of individual cases

LAB_recode =      Recode variables
LAB_compute =     Compute a new variable
LAB_listvars =    List derived variables
LAB_delvars =     List/delete derived variables

LAB_download =    Download existing dataset and documentation
LAB_subset =      Download a customized subset of variables/cases

LAB_start =       Start


CUSTOMIZING THE SELECTION SCREEN (Classic Interface Only)

The next two sections of the HARC file allow archivists to customize the appearance of the selection screen in the original SDA interface (prior to version 3.0) by adding their own header and footer to the selection screen. This is the screen that displays the study selected and lists the various actions or procedures that the user can select.

The text specified in the Header and Footer sections of the HARC file will be passed as is to the user’s HTML browser. This means that the browser will fill and break lines in order to fit the text into the width of the available window. Archivists, however, may insert into the text their own HTML tags for the browser to interpret when the text is displayed.

To specify non-English text for the Header and Footer sections, see the relevant section of the Interface document. for important information.


[HEADER] Section Text

Text given after the [HEADER] specification will be displayed on the selection screen before the list of actions available. Unless some text is specified in this section, no header text is displayed on the selection screen before the list of actions.

[FOOTER] Section Text

Text given after the [FOOTER] specification will be displayed on the selection screen after the "start" or "run" button. Unless some text is specified in this section, no footer text is displayed on the selection screen after the list of actions.

EXAMPLES OF HARC FILES

  1. Codebooks and Basic Analysis (New Interface)
  2. Codebooks and Basic Analysis (Classic Interface)
  3. Codebooks, Analysis, and Subsetting
  4. Codebooks, Analysis, and Downloading
  5. Allow the Creation and Deletion of New Variables
  6. Pre-set Selection Filters
  7. Change the Order and/or the Names of Charts
  8. Drop-Down Variable List (Classic Interface)
  9. Different Options for Different Datasets

EXAMPLE 1: CODEBOOKS AND BASIC ANALYSIS (New interface)


[GENERAL]

HELPDOCS = http://socrates.berkeley.edu/HELPDOCS
LOGFILE = /bravo3/sda/sdalog

# URL for directory containg HTML help documents
HELPDOCS = http://socrates.berkeley.edu/sdadocs/

# URL for search servlet
SEARCHURL = http://socrates.berkeley.edu/sdasearch

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

# Note that the [HEADER] and [FOOTER] fields do not apply
# when using the new SDA version 3.x interface.

[PROGRAMS]

SDAPATH = /bravo3/psdawk/sda
SDAPROGS = tables means correl corrtab regress logit listcase

[DATASETS]

DATASET = nes92c
DATALABEL = NES 1952-1992 Cumulative Datafile

# For the new SDA interface to function, the codebook must be
# re-created with version 3.x of the ’xcodebk’ program.
CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92c.htm

SDADATA = /bravo3/NES/nes52-92.cum/

WEIGHT = sampwt(Sampling wt)
WEIGHT = pswt(Post-stratification wt)
WEIGHT = ##none

DESIGN = STRATUM(stratvar) CLUSTER(psuvar)

*
DATASET = capums1
DATALABEL = 1990 Census - California 1% Sample


# For the new SDA interface to function, the codebook must be
# re-created with version 3.x of the ’xcodebk’ program.
CODEBOOK = http://socrates.berkeley.edu/sdadocs/CENSUS/pums.htm

SDADATA = /bravo3/capums1/
WEIGHT = houswgt(Household weight) pwgt1(Person weight) ##none(No weight)


EXAMPLE 2: CODEBOOKS AND BASIC ANALYSIS (Classic interface)


[GENERAL]

HELPDOCS = http://socrates.berkeley.edu/HELPDOCS
LOGFILE = bravo3/sda/sdalog

# URL for directory containg HTML help documents
HELPDOCS = http://socrates.berkeley.edu/sdadocs/


# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

[PROGRAMS]
SDAPATH = /bravo3/psdawk/sda
SDAPROGS = tables means correl corrtab regress logit listcase

[HEADER]
The studies available here are intended to illustrate
how various types of data files can be set up
for Web access using SDA tools. Just select a study
and an action, then press "Start".

[FOOTER]

Hint for running analysis programs:

If your browser allows you to open more than one window at a time,
open one window for the codebook and another window for the analysis
programs. Then you can switch easily between the windows, in order to
specify correctly the names of the variables you want to analyze.

[DATASETS]

DATASET = nes92c
DATALABEL = NES 1952-1992 Cumulative Datafile
CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92c.htm

SDADATA = /bravo3/NES/nes52-92.cum/
WEIGHT = sampwt(Sampling weight) finalwt(Final weight) ##none(No weight)
DESIGN = STRATUM(stratvar) CLUSTER(psuvar)

*
DATASET = capums1
DATALABEL = 1990 Census - California 1% Sample
CODEBOOK = http://socrates.berkeley.edu/sdadocs/CENSUS/pums.htm

SDADATA = /bravo3/capums1/
WEIGHT = houswgt(Household weight) pwgt1(Person weight) ##none(No weight)


EXAMPLE 3: MULTIPLE CODEBOOKS, ANALYSIS, AND SUBSETTING


[GENERAL]

LOGFILE = bravo3/sda/sdalog
HELPDOCS = http://socrates.berkeley.edu/HELPDOCS

# URL for search servlet
SEARCHURL = http://socrates.berkeley.edu/sdasearch

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

# Note that the [HEADER] and [FOOTER] fields do not apply
# when using the new SDA version 3.x interface.


# If customized subsetting is to be enabled, specify:
#   the PATHNAME for the directory to store temporary files, and
#   a URL for that same directory.
#   Note that this temporary directory must be in a location
#     accessible to Web browsers.

SUBTMPDIR = /bravo3/docs/TMPDIR
SUBTMPURL = http://socrates.berkeley.edu/TMPDIR
# Limit subsets to 100 variables
SUBMAXVARS = 100

[PROGRAMS]

SDAPATH = /bravo3/sda
SDAPROGS = tables means correl corrtab regress logit listcase

[DATASETS]

# If customized subsetting is to be enabled, include the following:
#
#   SUBGRPINFO= FULL PATHNAME of file with info on groups of variables
#                (produced by codebook program, if there are headings;
#                 use is optional, but recommended)
#
# (The following two specifications are required only if the old
#  subsetting procedure is being used -- that is, only if
#  OLDSUBSET=YES in the GENERAL section of the HARC file.)
#
#   SUBDDL=     FULL PATHNAME of DDL file
#   SUBDATA=    FULL PATHNAME of ASCII data file
#
# For multiple codebooks, include a URL and (label) for each:
#
#   CODEBOOK=   URL for codebook #1 (label for codebook #1)
#   CODEBOOK=   URL for codebook #2 (label for codebook #2)
#

DATASET = gss
DATALABEL = GSS 1972-2004 Cumulative Datafile
SDADATA = /bravo3/GSS/sda
# Define a cluster variable for this dataset, to calculate
#  complex standard errors
DESIGN = cluster(sampcode)

CODEBOOK = http://sda.berkeley.edu/GSS/Doc/GSS.htm (Standard Codebook)
CODEBOOK = http://sda.berkeley.edu/GSS/Docyr/GSYR.htm (Codebook by Year)

SUBGRPINFO= /bravo3/GSS/Doc/GSSsub.txt

*
DATASET = multi
DATALABEL = 1994 Multi-Investigator Study
CODEBOOK = http://socrates.berkeley.edu/Multi/Doc/mult.htm
SDADATA = /bravo3/Multi/sda

SUBGRPINFO= /bravo3/Multi/Doc/multsub.txt



EXAMPLE 4: CODEBOOKS, ANALYSIS, AND DOWNLOADING


[GENERAL]

LOGFILE = bravo3/sda/sdalog
HELPDOCS = http://socrates.berkeley.edu/csadocs/

# URL for search servlet
SEARCHURL = http://socrates.berkeley.edu/sdasearch

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

# Note that the [HEADER] and [FOOTER] fields do not apply
# when using the new SDA version 3.x interface.

[PROGRAMS]

SDAPATH = /bravo3/sda
SDAPROGS = tables means correl corrtab regress logit listcase

[DATASETS]

# If files are to be available for downloading, include the following:
#
#   DLHEADING=  Heading for a file or group of files
#
#   DLFILE=     URL of file to download, followed by an optional
#                  label (given in parentheses)
#                Note that the URL should best have a suffix
#                  of ’.txt’, if users are to be able to view
#                  the file as well as to save it.
#
#               (Many of these headings and URLs can be given.)

DATASET = nes2004c
DATALABEL = NES 1952-2004 Cumulative Datafile
CODEBOOK = http://sda.berkeley.edu/NES2004C/n04c.htm
SDADATA = /bravo3/NES2004C/nes52-04.cum/
# Define stratum and cluster variables for this dataset, but SRS is the
#  default for the regression and logit/probit programs
DESIGN = stratum(stratcode) cluster(psucode) xregression

# The following keywords specify files available for downloading.
# Notice the optional labels in parentheses after some URLs.

DLHEADING = DATA FILES
DLFILE = http://socrates.berkeley.edu/DL/NESdat.txt (Plain ASCII file)
DLFILE = http://socrates.berkeley.edu/DL/NESdat.zip (Zipped file for PC’s)

# To get a blank line in the list, use the following heading:
DLHEADING =  

DLHEADING = SAS definition file
DLFILE = http://socrates.berkeley.edu/DL/NESsas.txt
DLHEADING = SPSS definition file
DLFILE = http://socrates.berkeley.edu/DL/NESspss.txt
DLHEADING = DDL file
DLFILE = http://socrates.berkeley.edu/DL/NESddl.txt (Plain ASCII file)

# To get a blank line in the list, use the following heading:
DLHEADING =  

DLHEADING = Microsoft Word Codebook ready to be printed
DLFILE = http://socrates.berkeley.edu/DL/NEScdbk.doc
DLHEADING = Set of HTML codebook files
DLFILE = http://socrates.berkeley.edu/DL/NEShtml.zip (Zip file)


EXAMPLE 5: ALLOW THE CREATION AND DELETION OF NEW VARIABLES


[GENERAL]

LOGFILE = bravo3/sda/sdalog
HELPDOCS = http://socrates.berkeley.edu/sdadocs/

# URL for search servlet
SEARCHURL = http://socrates.berkeley.edu/sdasearch

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

# Note that the [HEADER] and [FOOTER] fields do not apply
# when using the new SDA version 3.x interface.

[PROGRAMS]

SDAPATH = /bravo3/psdawk/sda
SDAPROGS = tables means correl corrtab regress logit listcase

SDAPROGS = recode, compute, listvars(delete)
# To allow variables to be created but not deleted, specify:
# SDAPROGS = recode, compute, listvars

[DATASETS]

DATASET = nes92c
DATALABEL = NES 1952-1992 Cumulative Datafile
CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92.htm
WEIGHT = sampwt(Sampling weight) finalwt(Final weight) ##none(No weight)
DESIGN = STRATUM(stratvar) CLUSTER(psuvar)
SDADATA = /bravo3/NES/nes52-92.cum/

OUTSTUDY = /bravo3/NES/nes52-92.cum/newvars
SDADATA = /bravo3/NES/nes52-92.cum/newvars

*

DATASET = capums1
DATALABEL = 1990 Census - California 1% Sample
CODEBOOK = http://socrates.berkeley.edu/sdadocs/CENSUS/pums.htm
WEIGHT = houswgt(Household weight) pwgt1(Person weight) ##none(No weight)
SDADATA = /bravo3/capums1/

OUTSTUDY = /bravo3/capums1/newvars
SDADATA = /bravo3/capums1/newvars


EXAMPLE 6: PRE-SET SELECTION FILTERS


[GENERAL]

LOGFILE = bravo3/sda/sdalog
HELPDOCS = http://socrates.berkeley.edu/sdadocs/

# URL for search servlet
SEARCHURL = http://socrates.berkeley.edu/sdasearch

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

# Note that the [HEADER] and [FOOTER] fields do not apply
# when using the new SDA version 3.x interface.

[PROGRAMS]

SDAPATH = /bravo3/psdawk/sda
SDAPROGS = tables means correl corrtab regress logit listcase
SDAPROGS = recode, compute, listvars(delete)

[DATASETS]

DATASET = nes92c
DATALABEL = NES 1952-1992 Cumulative Datafile
SDADATA = /bravo3/NES/nes52-92.cum/
CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92c.htm
WEIGHT = sampwt(Sampling weight) finalwt(Final weight) ##none(No weight)

FILTER = Limit analysis to one region
GROUP = ##none: (Include all regions)
GROUP = region(1): Northeast
GROUP = region(2): Midwest
GROUP = region(3): South
GROUP = region(4): West

FILTER = Limit analysis to age/sex subgroup
GROUP = ##none: (Include all)
GROUP = age(18-40), gender(1) : Younger males
GROUP = age(18-40), gender(2) : Younger females
GROUP = age(41-*), gender(1)  : Older males
GROUP = age(41-*), gender(2)  : Older females


EXAMPLE 7: CHANGE THE ORDER AND/OR THE NAMES OF CHARTS


[GENERAL]

LOGFILE = bravo3/sda/sdalog
HELPDOCS = http://socrates.berkeley.edu/sdadocs/

# URL for search servlet
SEARCHURL = http://socrates.berkeley.edu/sdasearch

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

TABLESCHARTS = bar(Side-by-Side Bar Chart) stackedbar (Stacked Bar Chart)
TABLESCHARTS = line (Line Chart) pie(Multiple Pie Chart(s))
TABLESCHARTS = none (No Chart)

[PROGRAMS]

SDAPATH = /bravo3/psdawk/sda
SDAPROGS = tables, means, corrtab, recode, compute, listvars

# Note that the [HEADER] and [FOOTER] fields do not apply
# when using the new SDA version 3.x interface.

[DATASETS]

DATASET = nes92c
DATALABEL = NES 1952-1992 Cumulative Datafile
SDADATA = /bravo3/NES/nes52-92.cum/
CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92c.htm
WEIGHT = sampwt(Sampling weight) finalwt(Final weight) ##none(No weight)

EXAMPLE 8: DROP-DOWN VARIABLE LIST (Classic interface only)


[GENERAL]

LOGFILE = bravo3/sda/sdalog
HELPDOCS = http://socrates.berkeley.edu/sdadocs/

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

[PROGRAMS]

SDAPATH = /bravo3/psdawk/sda
SDAPROGS = tables means correl corrtab regress logit listcase
SDAPROGS = recode, compute, listvars(delete)

[HEADER]
For this dataset you can select row, column, and control
variables from a drop-down list.

(These specifications are ignored if you run SDA with the new
interface available in version 3.x

[FOOTER]

Hint: Open the extra window for viewing the online codebook.

[DATASETS]

DATASET = nes92c
DATALABEL = NES 1952-1992 Cumulative Datafile
SDADATA = /bravo3/NES/nes52-92.cum/
CODEBOOK = http://socrates.berkeley.edu/sdadocs/NES92C/n92c.htm
WEIGHT = sampwt(Sampling weight) finalwt(Final weight) ##none(No weight)

VARIABLES = sex (Sex of the person)
VARIABLES = age (Age of the person)
VARIABLES = educ (Education of the person)


EXAMPLE 9: DIFFERENT OPTIONS FOR DIFFERENT DATASETS


[GENERAL]

LOGFILE = bravo3/sda/sdalog
HELPDOCS = http://socrates.berkeley.edu/HELPDOCS

# URL for search servlet
SEARCHURL = http://socrates.berkeley.edu/sdasearch

# Directory with Javascript code for variable selection tree
#  (if not located in a subdirectory of HELPDOCS)
JSCRIPTURL = http://sda.berkeley.edu/jscript

# Chart servlet container and temp directory
CHARTGENURL = http://socrates.berkeley.edu/chartgen
CHARTSTMPDIR = /bravo3/7502docs/TMPDIR

# Note that the [HEADER] and [FOOTER] fields do not apply
# when using the new SDA version 3.x interface.


SUBTMPDIR = /bravo3/docs/TMPDIR
SUBTMPURL = http://socrates.berkeley.edu/TMPDIR

[PROGRAMS]

SDAPATH = /bravo3/docs/sda
SDAPROGS = tables means correl corrtab regress logit listcase
SDAPROGS = recode compute listvars(delete)

[DATASETS]

# For this dataset, enable browsing of the codebook, online analysis,
#   and creation of new variables.
# No subsetting or downloading is allowed.

DATASET = gss04
DATALABEL = GSS 1972-2004 Cumulative Datafile
CODEBOOK = http://socrates.berkeley.edu/GSS/HTMLBOOK/gss.htm
SDADATA = /bravo3/docs/GSS

OUTSTUDY = /bravo3/docs/GSS/newvars
SDADATA = /bravo3/docs/GSS/newvars
*

# For this dataset, allow browsing of the codebook,
#   online analysis and downloading.
#   No subsetting is allowed, because the
#   ’SUBSET=NO’ keyword is included.

DATASET = multi
DATALABEL = 1994 Multi-Investigator Study
CODEBOOK = http://socrates.berkeley.edu/Multi/Doc/mult.htm
SDADATA = /bravo3/docs/GSS
SUBSET = NO

DLHEADING = ALL OF THE FOLLOWING FILES ARE PLAIN ASCII FILES
DLHEADING = Data file (616 K)
DLFILE= http://socrates.berkeley.edu/Multi/DL/multidat.txt
DLHEADING = SAS definition file
DLFILE= http://socrates.berkeley.edu/Multi/DL/multisas.txt
DLHEADING = SPSS definition file
DLFILE= http://socrates.berkeley.edu/Multi/DL/multisps.txt
DLHEADING = DDL definition file
DLFILE= http://socrates.berkeley.edu/Multi/DL/multiddl.txt

*

# For this dataset, enable all options:
#  codebook, online analysis with complex standard errors,
#  creation of new variables, pre-defined weights and filters,
#  customized subsetting, and downloading of pre-existing files.

DATASET = natlrace
DATALABEL = 1991 Race and Politics Survey

CODEBOOK = http://socrates.berkeley.edu/Natlrace/Doc/race.htm

SDADATA = /bravo3/docs/Natlrace
DESIGN = stratum(stratvar) cluster(psunum)

OUTSTUDY = /bravo3/docs/Natlrace/newvars
SDADATA = /bravo3/docs/Natlrace/newvars

WEIGHT = sampwt(Sampling wt)
WEIGHT = pswt(Post-stratification wt)
WEIGHT = ##none

FILTER = Limit analysis to certain regions
GROUP = ##none: (Include whole country)
GROUP = region(1-4) : Eastern U.S.
GROUP = region(5-9) : Western U.S.

SUBGRPINFO= /bravo3/docs/Natlrace/Doc/racesub.txt

DLHEADING = ALL OF THE FOLLOWING FILES ARE PLAIN ASCII FILES
DLHEADING = Data file (936 K)
DLFILE= http://socrates.berkeley.edu/Natlrace/DL/racedat.txt
DLHEADING = SAS definition file
DLFILE= http://socrates.berkeley.edu/Natlrace/DL/racesas.txt
DLHEADING = SPSS definition file
DLFILE= http://socrates.berkeley.edu/Natlrace/DL/racespss.txt
DLHEADING = DDL definition file
DLFILE= http://socrates.berkeley.edu/Natlrace/DL/raceddl.txt

SEE ALSO

DDL Data Description Language
hsda Create HTML links to SDA Programs
language Creating a Non-English User Interface
sdalog Generate a Report of SDA Usage


CSM, UC Berkeley
August 23, 2011