If..then ..Else and Where Statement:

IF—THEN ELSE

Suppose you are asked to exclude some of the observations in a SAS data set from an analysis that you are generating. For example, you want to exclude all IDs whose values are greater than 100.To accomplish this task, we can use IF, IF-THEN DELETE.

Comparison Operators.

SymbolicMnemonicMeaningExample
=EQequalsIF gender = ‘M’; or

IF gender EQ ‘M’;
^= or ~=NEnot equalIF salary NE . ;
GTgreater thanIF salary GT 4500;
LTless thanIF salary LT 4500;
>=GEgreater than or equalIF salary GE 4500;
<=LEless than or equalIF salary LE 4500;
inINselecting multiple valuesIF country IN(‘US’ ’IN’);

1. IF statement
IF (condition is true) => It means subsetting a dataset.

Data readin;

Input ID Q1-Q3;

cards;

85 1 2 3

90 3 4 6

95 5 5 6

100 6 6 4

105 5 5 6

110 6 6 5

;

Data readin1;

Set readin;

IF ID LE 100;

run;

IF ID LE 100 => This would tell SAS to retain only those IDs whose values are less than or equal to 100. In other words, you are removing IDs whose values are greater than or equal to 100.

This can also be done using the IF-THEN DELETE statement.

2. IF-THEN DELETE

IF (condition is true) THEN (delete the selected observations);

Data readin;

Input ID Q1-Q3;

cards;

85 1 2 3

90 3 4 6

95 5 5 6

100 6 6 4

105 5 5 6

110 6 6 5

;

Data readin1;

Set readin;

IF ID GT 100 THEN DELETE;

run;

IF ID GT 100 THEN DELETE => This would tell SAS to remove all the IDs whose values are greater than 100.

Task 2: Suppose you want to set a tag on all the IDs. The condition is :

If value of ID is less than or equal to 100 set “Old” tag otherwise set “New” tag.

IF (condition is true) THEN (perform this action);
ELSE (perform the action that is set when condition is false);


Data readin;

Input ID Q1-Q3;

cards;

85 1 2 3

90 3 4 6

95 5 5 6

100 6 6 4

105 5 5 6

110 6 6 5

;

Data readin1;

Set readin;

IF ID LE 100 THEN TAG =”Old”;

ELSE TAG =”New”;

run;

OUTPUT:

Task 3: Suppose you are asked to update the TAG column.

The conditions for tagging are as follows :

  • If value of ID is less than 75 then TAG = “Old”
  • If value of ID is greater than or equal to 75 and less than 100 then TAG = “New”
  • If value of ID is greater than or equal to 100 then TAG = “Unchecked”

IF (condition is true) THEN (perform this action);

ELSE IF (perform the action when second condition is true);
ELSE IF (perform the action when third condition is true);

Data readin1;
Set readin;
length TAG $20;
IF ID < 75 THEN TAG ="Old";
ELSE IF 75 <= ID < 100 THEN TAG = "New"; 
ELSE IF ID >= 100 THEN TAG ="Unchecked";
run; 

The output is shown below :

LOGICAL OPERATORS


SymbolicMnemonicMeaningExample
&ANDBoth conditions trueIF gender =’M’ and age =1;
|OREither condition trueIF gender =’M’ or age =1;
~ or ^NOTReverse the statementIF country not IN(‘US’,’IN’);

Task 4: Suppose you want to generate an analysis for Q1 including only responses that are valid (non-missing) and less than 3.

Data readin;
Input ID Q1-Q3;
cards;
85 1 2 3
90 . 4 6
95 2 5 6
100 6 6 4
105 . 5 6
110 6 6 5
;
Data readin1;
Set readin;
IF (Q1 Le 3) AND (Q1 NE .);
run; 

IF (Q1 LE 3) AND (Q1 NE .) => Since missing values are smaller than any other value, we need to give SAS an additional command to separate out missing values. 

The output is shown below:

Selecting Multiple Observations :

Suppose you want to set tag “Incorrect” to the specified IDs 1,5,45,76

For this case, the logical statement would look like any one of the following statements. It can be written in three ways shown below.

IN Operator

IN operator is used to select multiple values of a variable. It is an awesome alternative to OR operator.

Where Statement and Dataset Options

The WHERE statement is an alternative to the IF statement for subsetting (filtering) a data set.

Syntax of WHERE Statement :

WHERE condition ;

For example : where Age > 25; means selecting rows wherein the value of “age” variable is greater than 25.

Comparison Operators :

SymbolicMnemonicMeaningExample
=EQequalsWHERE gender = ‘M’; or WHERE gender EQ ‘M’;
^= or ~=NEnot equalWHERE salary NE . ;
GTgreater thanWHERE salary GT 4500;
LTless thanWHERE salary LT 4500;
>=GEgreater than or equalWHERE salary GE 4500;
<=LEless than or equalWHERE salary LE 4500;
inINselecting multiple valuesWHERE country IN(‘US’ ’IN’);

Example 1 : Suppose you want to select only section A students. You know the variable Section contains information for students’ sections.

data readin;

input name $ Section $ Score;

cards;

Tom  A 84

Raj  A 80

Ram  B 71

Atul A 77

Priya B 45

Sandy A 67

Sam  A 57

David B 39

Wolf B 34

Rahul A 95

Sahul C 84

Lahul C 44

;

run;

data readin1;

set readin;

where Section EQ “A”;

run;

The output is shown below 

Logical Operators :

SymbolicMnemonicMeaningExample
&ANDBoth conditions trueWHERE gender =’M’ and age =1;
|OREither condition trueWHERE gender =’M’ or age =1;
~ or ^NOTReverse the statementWHERE country not IN(‘US’,’IN’);

Example 2 : Suppose you want to select section A and B students. You know the variable Section contains information for students’ sections.

data readin1;

set readin;

where Section IN (“A” “B”);

run;

where section IN (“A” “B”) => This would tell SAS to select section A and B values.

However, you can also write …

BETWEEN-AND Operator : Between Two Numbers

Example 3 : Suppose you want to select scores whose values are greater than or equal to 50 and less than or equal to 75.

data readin1;

set readin;

where Score between 50 and 75;

run;

IS MISSING Operator : Selecting Missing Values

Example 4 : Suppose you want to select only those observations in which students did not fill their section information.

The dataset is modified to include missing values in SECTION variable.

Where section is missing => This would tell SAS to select missing values for variable SECTION.

IS NOT MISSING Operator : Selecting Non-Missing Values

Example 5 : Suppose you want to select only those observations in which students filled their section information.

data readin;

input name $ Section $ Score;

cards;

Tom  A 84

Raj  A 80

Ram  B 71

Atul . 77

Priya . 45

Sandy A 67

Sam  A 57

David B 39

Wolf B 34

Rahul . 95

Sahul C 84

Lahul C 44

;

run;

data readin1;

set readin;

where section is not missing;

run;

data readin1;

set readin;

where section is not missing;

run;

Where section is not missing => This would tell SAS to select non-missing values.

The NOT operator can be used within WHERE statement in many ways :

1. where section is missing and score is not missing;

2. where not (score in (34,44,84));

3. where not (Score between 50 and 75);

4. where NOT(Section EQ “A”);

CONTAINS Operator : Searching specific character

Example 6 : Suppose you want to select only those observations in which students’ name contain ‘hul’.

data readin;

input name $ Section $ Score;

cards;

Tom  A 84

Raj  A 80

Ram  B 71

Atul . 77

Priya . 45

Sandy A 67

Sam  A 57

David B 39

Wolf B 34

Rahul . 95

Sahul C 84

Lahul C 44

;

run;

data readin1;

set readin;

where name contains ‘hul’;

run;

where name contains 'hul'; => This would tell SAS to select observations having the values Rahul, Sahul and Lahul for the variable NAME.

Note : The CONTAINS operator is case sensitive.

The output is shown below :

Since the CONTAINS operator is case sensitive, where Name contains ‘HUL’ would not select any observation. The log for this statement is shown below :

LIKE Operator : Pattern Matching

The LIKE operator selects observations by comparing the values of a character variable to a specified pattern. It is case sensitive.

Example 7 : To select all students with a name that starts with the letter S.

There are two special characters available for specifying a pattern:

1. percent sign (%) – Wildcard Character

2. underscore ( _ ) – Fill in the blanks

data readin;

input name $ Section $ Score;

cards;

Tom  A 84

Raj  A 80

Ram  B 71

Atul . 77

Priya . 45

Sandy A 67

Sam  A 57

David B 39

Wolf B 34

Rahul . 95

Sahul C 84

Lahul C 44

Pahulk A 81

;

run;

data readin1;

set readin;

where name like ‘S%’;

run;

where name like ‘S%’;

OR

where name like ‘Sa%’;

In this dataset, the above statements would produce the same result :

Examples :

1. where name like ‘_am’;

You can also write this statement like : where name like ‘%am’;

2. where name like ‘_ahu_’;

This would not select PAHULK from the variable NAME.

3. where name like ‘_ahu__’;

This would select PAHULK as double underscore (__) is stated.