SAS Data Sets


SAS Data Sets
 SAS Data Set is a SAS file which holds Data
 Data must be in the form of a SAS data set to be processed
 Many of the data processing tasks access data in the form of a SAS data set and analyze,
manage, or present the data
 A SAS data set also points to one or more indexes, which enable SAS to locate records in the
data set more efficiently
Rules for SAS Data Set Names:
SAS data set names :
 can be 1 to 32 characters long
 must begin with a letter (A–Z, either uppercase or lowercase) or an underscore (_)
 can continue with any combination of numbers, letters, or underscores.
These are examples of valid data set names:
 Payroll
 LABDATA1995_1997
 _EstimatedTaxPayments3

SAS data set consists of two parts:
 Descriptor portion
 Data portion


Descriptor Portion:
The descriptor portion of a SAS data set contains information about the data set, including:
 The name of the data set
 The date and time that the data set was created
 The number of observations
 The number of variables.

Example: Descriptor portion of the data set Clinic.Insure

Data Set Name: CLINIC.INSURE
Member Type: DATA
Engine: V8
Created: 10:05 Tuesday, March 30, 1999
Observations: 21
Variables: 7
Indexes: 0
Observation Length: 64


Data Portion:
 Collection of data values that are arranged in a rectangular table
Example:
Here,
Jones is a data value, the weight 158.3 is a data value, and so on


Name
Sex
Age
Weight
Jones
M
48
128.6
Leverne
M
58
158.3
Jaffe
F
.
115.5
Wilson
M
28
170.1


Observations:
 Rows are called observations in SAS
 It is a Collections of data values that usually relate to a single object in SAS Data Sets
 The values Jones, M, 48, and 128.6 constitute a single observation in the data set shown
below


Name
Sex
Age
Weight
Jones
M
48
128.6
Leverne
M
58
158.3
Jaffe
F
.
115.5
Wilson
M
28
170.1



Variables:
 Columns are called variables in SAS
 It is a collection of values that describe a particular characteristic
 The values Jones, Laverne, Jaffe and Wilson contribute the variable Name in the data set
shown below


Name
Sex
Age
Weight
Jones
M
48
128.6
Leverne
M
58
158.3
Jaffe
F
.
115.5
Wilson
M
28
170.1


Missing Values:
If a data is unknown for a particular observation, a missing value is recorded
 “.” (called period) indicates missing value of a numeric variable
 “ “ (blank) indicates missing value of a character variable




Name
Sex
Age
Weight
Jones
M
48
128.6
Leverne
M
58
158.3
Jaffe
F
.
115.5
Wilson
M
28
170.1

Variable Attributes:
 In addition to general information about the data set, the descriptor portion contains information
about the attributes of each variable in the data set
 The attribute information includes the variable's:
 Name
 Type
 Length
 Format
 Informat
 Label

Example: Listing of the attribute information in the descriptor portion of the SAS data set
Clinic.Insure
Variable  Type   Length  Format  Informat                     Label
Policy      Num     8                      Policy                        Number
Total       Num     8         DOLLAR8.2 COMMA10.     Total Balance
Name     Char     20                                                       Patient Name

Name:
 Each variable has a name that conforms to SAS naming conventions
 Variable names follow exactly the same rules as SAS data set names
 Like data set names, variable names:
 Can be 1 to 32 characters long
Must begin with a letter (A–Z, either uppercase or lowercase) or  an underscore (_)
 Can continue with any combination of numbers, letters, or underscores.

Variable  Type   Length  Format              Informat                     Label
Policy      Num     8                                                                     Policy Number
Total       Num     8            DOLLAR8.2   COMMA10.              Total Balance
Name     Char     20                                                                     Patient Name

Type:
 A variable's type is either character or numeric
 Character variables, such as Name (shown below), can contain any values
 Numeric variables, such as Policy and Total (shown below), can contain only numeric values
(the digits 0 through 9, +, -, ., and E for scientific notation)

Variable  Type   Length  Format              Informat                     Label
Policy      Num     8                                                                     Policy Number
Total       Num     8            DOLLAR8.2   COMMA10.              Total Balance
Name     Char     20                                                                     Patient Name



Length:
 A variable's length (the number of bytes used to store it) is related to its type
 Character variables can be up to 32,767 bytes long
 In the example below, Name has a length of 20 characters and uses 20 bytes of storage.
 All numeric variables have a default length of 8
Numeric values (no matter how many digits they contain) are stored as floating- point numbers in
8 bytes of storage, unless specify a different length.

Variable  Type   Length  Format              Informat                     Label
Policy      Num     8                                                                     Policy Number
Total       Num     8            DOLLAR8.2   COMMA10.              Total Balance
Name     Char     20                                                                     Patient Name






Format:
* A Format is an instruction that SAS uses to write data values
*  Format is used to control the written appearance of data values, or in some cases, to group data values together for analysis
* SAS software offers a variety of character, numeric, and date and time formats
* Formats can be created and stored
* Can permanently assign a format to a variable in a SAS data set, or can temporarily specify a format in a PROC step to determine the way the data values appear in the output

Variable  Type   Length      Format          Informat                     Label
Policy      Num     8                                                                     Policy Number
Total       Num     8            DOLLAR8.2   COMMA10.              Total Balance
Name     Char     20                                                                     Patient Name

Informat:
 * Used to Read data values in certain formats into standard SAS values
 * It determines how data values are read into a SAS data set
 * Informats are used to read numeric values that contain letters or other special characters

Variable  Type   Length      Format          Informat                     Label
Policy      Num     8                                                                     Policy Number
Total       Num     8            DOLLAR8.2   COMMA10.              Total Balance
Name     Char     20                                                                     Patient Name


Label:
 * A variable can have a label consisting of descriptive text up to 256 characters long 
 * By default, many reports identify variables by their names
 * To display more descriptive information about the variable assign a label to that variable
Example:
Label Policy as Policy Number, Total as Total Balance, and Name as Patient Name to
display these labels in reports


Variable  Type   Length  Format              Informat                     Label
Policy      Num     8                                                                     Policy Number
Total       Num     8            DOLLAR8.2   COMMA10.              Total Balance
Name     Char     20                                                                     Patient Name