IF—THEN ELSE
Suppose you are asked to exclude some of the observations in a SAS data set from an analysis that you are generating. For example, you want to exclude all IDs whose values are greater than 100.To accomplish this task, we can use IF, IF-THEN DELETE.
Comparison Operators.
Symbolic | Mnemonic | Meaning | Example |
= | EQ | equals | IF gender = ‘M’; or IF gender EQ ‘M’; |
^= or ~= | NE | not equal | IF salary NE . ; |
> | GT | greater than | IF salary GT 4500; |
< | LT | less than | IF salary LT 4500; |
>= | GE | greater than or equal | IF salary GE 4500; |
<= | LE | less than or equal | IF salary LE 4500; |
in | IN | selecting multiple values | IF country IN(‘US’ ’IN’); |
1. IF statement
IF (condition is true) => It means subsetting a dataset.
Data readin;
Input ID Q1-Q3;
cards;
85 1 2 3
90 3 4 6
95 5 5 6
100 6 6 4
105 5 5 6
110 6 6 5
;
Data readin1;
Set readin;
IF ID LE 100;
run;
IF ID LE 100 => This would tell SAS to retain only those IDs whose values are less than or equal to 100. In other words, you are removing IDs whose values are greater than or equal to 100.
This can also be done using the IF-THEN DELETE statement.
2. IF-THEN DELETE
IF (condition is true) THEN (delete the selected observations);
Data readin;
Input ID Q1-Q3;
cards;
85 1 2 3
90 3 4 6
95 5 5 6
100 6 6 4
105 5 5 6
110 6 6 5
;
Data readin1;
Set readin;
IF ID GT 100 THEN DELETE;
run;
IF ID GT 100 THEN DELETE => This would tell SAS to remove all the IDs whose values are greater than 100.
Task 2: Suppose you want to set a tag on all the IDs. The condition is :
If value of ID is less than or equal to 100 set “Old” tag otherwise set “New” tag.
IF (condition is true) THEN (perform this action);
ELSE (perform the action that is set when condition is false);
Data readin;
Input ID Q1-Q3;
cards;
85 1 2 3
90 3 4 6
95 5 5 6
100 6 6 4
105 5 5 6
110 6 6 5
;
Data readin1;
Set readin;
IF ID LE 100 THEN TAG =”Old”;
ELSE TAG =”New”;
run;
OUTPUT:
Task 3: Suppose you are asked to update the TAG column.
The conditions for tagging are as follows :
- If value of ID is less than 75 then TAG = “Old”
- If value of ID is greater than or equal to 75 and less than 100 then TAG = “New”
- If value of ID is greater than or equal to 100 then TAG = “Unchecked”
IF (condition is true) THEN (perform this action);
ELSE IF (perform the action when second condition is true);
ELSE IF (perform the action when third condition is true);
Data readin1;
Set readin;
length TAG $20;
IF ID < 75 THEN TAG ="Old";
ELSE IF 75 <= ID < 100 THEN TAG = "New";
ELSE IF ID >= 100 THEN TAG ="Unchecked";
run;
The output is shown below :
LOGICAL OPERATORS
Symbolic | Mnemonic | Meaning | Example |
& | AND | Both conditions true | IF gender =’M’ and age =1; |
| | OR | Either condition true | IF gender =’M’ or age =1; |
~ or ^ | NOT | Reverse the statement | IF country not IN(‘US’,’IN’); |
Task 4: Suppose you want to generate an analysis for Q1 including only responses that are valid (non-missing) and less than 3.
Data readin;
Input ID Q1-Q3;
cards;
85 1 2 3
90 . 4 6
95 2 5 6
100 6 6 4
105 . 5 6
110 6 6 5
;
Data readin1;
Set readin;
IF (Q1 Le 3) AND (Q1 NE .);
run;
IF (Q1 LE 3) AND (Q1 NE .) => Since missing values are smaller than any other value, we need to give SAS an additional command to separate out missing values.
The output is shown below:
Selecting Multiple Observations :
Suppose you want to set tag “Incorrect” to the specified IDs 1,5,45,76
For this case, the logical statement would look like any one of the following statements. It can be written in three ways shown below.
IN Operator
IN operator is used to select multiple values of a variable. It is an awesome alternative to OR operator.
Where Statement and Dataset Options
The WHERE statement is an alternative to the IF statement for subsetting (filtering) a data set.
Syntax of WHERE Statement :
WHERE condition ;
For example : where Age > 25;
means selecting rows wherein the value of “age” variable is greater than 25.
Comparison Operators :
Symbolic | Mnemonic | Meaning | Example |
= | EQ | equals | WHERE gender = ‘M’; or WHERE gender EQ ‘M’; |
^= or ~= | NE | not equal | WHERE salary NE . ; |
> | GT | greater than | WHERE salary GT 4500; |
< | LT | less than | WHERE salary LT 4500; |
>= | GE | greater than or equal | WHERE salary GE 4500; |
<= | LE | less than or equal | WHERE salary LE 4500; |
in | IN | selecting multiple values | WHERE country IN(‘US’ ’IN’); |
Example 1 : Suppose you want to select only section A students. You know the variable Section contains information for students’ sections.
data readin;
input name $ Section $ Score;
cards;
Tom A 84
Raj A 80
Ram B 71
Atul A 77
Priya B 45
Sandy A 67
Sam A 57
David B 39
Wolf B 34
Rahul A 95
Sahul C 84
Lahul C 44
;
run;
data readin1;
set readin;
where Section EQ “A”;
run;
The output is shown below
Logical Operators :
Symbolic | Mnemonic | Meaning | Example |
& | AND | Both conditions true | WHERE gender =’M’ and age =1; |
| | OR | Either condition true | WHERE gender =’M’ or age =1; |
~ or ^ | NOT | Reverse the statement | WHERE country not IN(‘US’,’IN’); |
Example 2 : Suppose you want to select section A and B students. You know the variable Section contains information for students’ sections.
data readin1;
set readin;
where Section IN (“A” “B”);
run;
where section IN (“A” “B”) => This would tell SAS to select section A and B values.
However, you can also write …
BETWEEN-AND Operator : Between Two Numbers
Example 3 : Suppose you want to select scores whose values are greater than or equal to 50 and less than or equal to 75.
data readin1;
set readin;
where Score between 50 and 75;
run;
IS MISSING Operator : Selecting Missing Values
Example 4 : Suppose you want to select only those observations in which students did not fill their section information.
The dataset is modified to include missing values in SECTION variable.
Where section is missing => This would tell SAS to select missing values for variable SECTION.
IS NOT MISSING Operator : Selecting Non-Missing Values
Example 5 : Suppose you want to select only those observations in which students filled their section information.
data readin;
input name $ Section $ Score;
cards;
Tom A 84
Raj A 80
Ram B 71
Atul . 77
Priya . 45
Sandy A 67
Sam A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
;
run;
data readin1;
set readin;
where section is not missing;
run;
data readin1;
set readin;
where section is not missing;
run;
Where section is not missing => This would tell SAS to select non-missing values.
The NOT operator can be used within WHERE statement in many ways :
1. where section is missing and score is not missing;
2. where not (score in (34,44,84));
3. where not (Score between 50 and 75);
4. where NOT(Section EQ “A”);
CONTAINS Operator : Searching specific character
Example 6 : Suppose you want to select only those observations in which students’ name contain ‘hul’.
data readin;
input name $ Section $ Score;
cards;
Tom A 84
Raj A 80
Ram B 71
Atul . 77
Priya . 45
Sandy A 67
Sam A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
;
run;
data readin1;
set readin;
where name contains ‘hul’;
run;
where name contains 'hul';
=> This would tell SAS to select observations having the values Rahul, Sahul and Lahul for the variable NAME.
Note : The CONTAINS operator is case sensitive.
The output is shown below :
Since the CONTAINS operator is case sensitive, where Name contains ‘HUL’ would not select any observation. The log for this statement is shown below :
LIKE Operator : Pattern Matching
The LIKE operator selects observations by comparing the values of a character variable to a specified pattern. It is case sensitive.
Example 7 : To select all students with a name that starts with the letter S.
There are two special characters available for specifying a pattern:
1. percent sign (%) – Wildcard Character
2. underscore ( _ ) – Fill in the blanks
data readin;
input name $ Section $ Score;
cards;
Tom A 84
Raj A 80
Ram B 71
Atul . 77
Priya . 45
Sandy A 67
Sam A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
Pahulk A 81
;
run;
data readin1;
set readin;
where name like ‘S%’;
run;
where name like ‘S%’;
OR
where name like ‘Sa%’;
In this dataset, the above statements would produce the same result :
Examples :
1. where name like ‘_am’;
You can also write this statement like : where name like ‘%am’;
2. where name like ‘_ahu_’;
This would not select PAHULK from the variable NAME.
3. where name like ‘_ahu__’;
This would select PAHULK as double underscore (__) is stated.