SAS Data Sets
SAS Data Set is a SAS file which holds Data
Data must be in the form of a SAS data set to be processed
Many of the data processing tasks access data in the form of a SAS data set and analyze,
manage, or present the data
A SAS data set also points to one or more indexes, which enable SAS to locate records in the
data set more efficiently
Rules for SAS Data Set Names:
SAS data set names :
can be 1 to 32 characters long
must begin with a letter (A–Z, either uppercase or lowercase) or an underscore (_)
can continue with any combination of numbers, letters, or underscores.
These are examples of valid data set names:
Payroll
LABDATA1995_1997
_EstimatedTaxPayments3
Descriptor portion
Data portion
Descriptor Portion:
The descriptor portion of a SAS data set contains information about the data set, including:
The name of the data set
The date and time that the data set was created
The number of observations
The number of variables.
Example: Descriptor portion of the data set Clinic.Insure
Data Set Name: CLINIC.INSURE
Member Type: DATA
Engine: V8
Created: 10:05 Tuesday, March 30, 1999
Observations: 21
Variables: 7
Indexes: 0
Observation Length: 64
Data Portion:
Collection of data values that are arranged in a rectangular table
Example:
Here,
Jones is a data value, the weight 158.3 is a data value, and so on
Name | Sex | Age | Weight |
Jones | M | 48 | 128.6 |
Leverne | M | 58 | 158.3 |
Jaffe | F | . | 115.5 |
Wilson | M | 28 | 170.1 |
Observations:
Rows are called observations in SAS
It is a Collections of data values that usually relate to a single object in SAS Data Sets
The values Jones, M, 48, and 128.6 constitute a single observation in the data set shown
below
Name | Sex | Age | Weight |
Jones | M | 48 | 128.6 |
Leverne | M | 58 | 158.3 |
Jaffe | F | . | 115.5 |
Wilson | M | 28 | 170.1 |
Variables:
Columns are called variables in SAS
It is a collection of values that describe a particular characteristic
The values Jones, Laverne, Jaffe and Wilson contribute the variable Name in the data set
shown below
Name | Sex | Age | Weight |
Jones | M | 48 | 128.6 |
Leverne | M | 58 | 158.3 |
Jaffe | F | . | 115.5 |
Wilson | M | 28 | 170.1 |
Missing Values:
If a data is unknown for a particular observation, a missing value is recorded
“.” (called period) indicates missing value of a numeric variable
“ “ (blank) indicates missing value of a character variable
Name | Sex | Age | Weight |
Jones | M | 48 | 128.6 |
Leverne | M | 58 | 158.3 |
Jaffe | F | . | 115.5 |
Wilson | M | 28 | 170.1 |
Variable Attributes:
In addition to general information about the data set, the descriptor portion contains information
about the attributes of each variable in the data set
The attribute information includes the variable's:
Name
Type
Length
Format
Informat
Label
Example: Listing of the attribute information in the descriptor portion of the SAS data set
Clinic.Insure
Variable Type Length Format Informat Label
Policy Num 8 Policy Number
Total Num 8 DOLLAR8.2 COMMA10. Total Balance
Name Char 20 Patient Name
Name:
Each variable has a name that conforms to SAS naming conventions
Variable names follow exactly the same rules as SAS data set names
Like data set names, variable names:
Can be 1 to 32 characters long
Must begin with a letter (A–Z, either uppercase or lowercase) or an underscore (_)
Can continue with any combination of numbers, letters, or underscores.
Variable Type Length Format Informat Label
Policy Num 8 Policy Number
Total Num 8 DOLLAR8.2 COMMA10. Total Balance
Name Char 20 Patient Name
Type:
A variable's type is either character or numeric
Character variables, such as Name (shown below), can contain any values
Numeric variables, such as Policy and Total (shown below), can contain only numeric values
(the digits 0 through 9, +, -, ., and E for scientific notation)
Variable Type Length Format Informat Label
Policy Num 8 Policy Number
Total Num 8 DOLLAR8.2 COMMA10. Total Balance
Name Char 20 Patient Name
Length:
A variable's length (the number of bytes used to store it) is related to its type
Character variables can be up to 32,767 bytes long
In the example below, Name has a length of 20 characters and uses 20 bytes of storage.
All numeric variables have a default length of 8
Numeric values (no matter how many digits they contain) are stored as floating- point numbers in
8 bytes of storage, unless specify a different length.
Variable Type Length Format Informat Label
Policy Num 8 Policy Number
Total Num 8 DOLLAR8.2 COMMA10. Total Balance
Name Char 20 Patient Name
Format:
* A Format is an instruction that SAS uses to write data values
* Format is used to control the written appearance of data values, or in some cases, to group data values together for analysis
* SAS software offers a variety of character, numeric, and date and time formats
* Formats can be created and stored
* Can permanently assign a format to a variable in a SAS data set, or can temporarily specify a format in a PROC step to determine the way the data values appear in the output
Variable Type Length Format Informat Label
Policy Num 8 Policy Number
Total Num 8 DOLLAR8.2 COMMA10. Total Balance
Name Char 20 Patient Name
Informat:
* Used to Read data values in certain formats into standard SAS values
* It determines how data values are read into a SAS data set
* Informats are used to read numeric values that contain letters or other special characters
Variable Type Length Format Informat Label
Policy Num 8 Policy Number
Total Num 8 DOLLAR8.2 COMMA10. Total Balance
Name Char 20 Patient Name
Label:
* A variable can have a label consisting of descriptive text up to 256 characters long
* By default, many reports identify variables by their names
* To display more descriptive information about the variable assign a label to that variable
Example:
Label Policy as Policy Number, Total as Total Balance, and Name as Patient Name to
display these labels in reports
Variable Type Length Format Informat Label
Policy Num 8 Policy Number
Total Num 8 DOLLAR8.2 COMMA10. Total Balance
Name Char 20 Patient Name