The syntax of PROC SQL is as follows:
PROC SQL;
SELECT column(s)
FROM table(s) | view(s)
WHERE expression
GROUP BY column(s)
HAVING expression
ORDER BY column(s);
QUIT;
The SQL statements must be specified in the following order:
- SELECT : Specify the columns (variables) to be selected.
- FROM : Specify the table (dataset) to be queried.
- WHERE : Filters the data based on a condition.
- GROUP BY : Classifies data into groups based on the specified columns.
- HAVING : Filters data with the GROUP BY clause.
- ORDER BY : Sorts the rows (observations) by the specified columns.
Note : SELECT FROM clauses are required. All the other clauses are optional.
PROC SQL statement calls the SQL procedure and QUIT statement ends the procedure.
To memorize the order of SQL queries, you can use the mnemonic “SFWGHO“.

SAS vs. PROC SQL

PROC SQL STATEMENTS
1. How to Select All Variables from Dataset
proc sql;
select *
from outdata;
Quit;
To display the list of columns to the SAS log, use FEEDBACK option in the PROC SQL statement.
proc sql feedback;
select *
from outdata;
Quit;
2. How to Select Specific Variables from Dataset
In the SELECT clause, multiple columns are separated by commas.
proc sql;
select weight,married
from outdata;
Quit;
3. How to limit the number of rows
Suppose you want to limit the number of rows that PROC SQL displays, use the OUTOBS= option in the PROC SQL statement.
proc sql outobs=50;
select weight,married
from outdata;
Quit;
4. How to Rename a Variable
Suppose you want to rename a variable, use the column alias AS option in the PROC SQL statement.
options nolabel;
proc sql;
select weight,married as marriage
from outdata;
Quit;
The variable name has been renamed from married to marriage. options nolabel tells SAS not to use variable labels in SAS procedures. I used it so that you can see variable name has been changed to marriage.
5. How to Create a New Variable
Suppose you want to create a new variable that contains calculation.
proc sql;
select weight, (weight*0.5) as newweight
from outdata;
Quit;
A new variable has been created and named newweight which is calculated on the basis of the existing variable weight.
6. How to refer to a previously calculated variable
The keyword CALCULATED is used to refer a previously calculated variable.
proc sql;
select weight, (weight*0.5) as newweight,
CALCULATED newweight*0.25 as revweight
from outdata;
Quit;
7. How to Remove Duplicate Rows
The keyword DISTINCT is used to eliminate duplicate rows from your query results.
In the following program, we are asking SAS to remove all those cases where in duplicates exist on combination of both the variables – weight and married.
proc sql;
select DISTINCT weight, married
from outdata;
quit;
The DISTINCT * implies cases having same values in all the variables as a whole would be removed.
proc sql;
select DISTINCT *
from outdata;
quit;
8. How to Label and Format Variables
SAS-defined formats can be used to improve the appearance of the body of a report. You can also label the variables using LABEL keyword.
options label;
proc sql;
select weight FORMAT= 8.2, married Label =” Married People”
from outdata;
Quit;
9. How to Sort Data
The ORDER BY clause returns the data in sorted order.
ASC option is used to sort the data in ascending order. It is the default option. DESC option is used to sort the data in descending order.
proc sql;
select MoMAge, eight, married
from outdata
ORDER BY weight ASC, married DESC;
Quit;
10. How to Filter Data with WHERE clause
Use the WHERE clause with any valid SAS expression to subset data.
List of conditional operators :
1. BETWEEN-AND
The BETWEEN-AND operator selects within an inclusive range of values.
Example : where salary between 4500 and 6000;
2. CONTAINS or ?
The CONTAINS or ? operator selects observations by searching for a specified set of characters within the values of a character variable.
Example : where firstname contains ‘DE’;
OR
where firstname ? ‘DE’;
3. IN
The IN operator selects from a list of fixed values.
Example : where state = ‘NC’ or state = ‘TX’;
The easier way to write the above statement would be to use the IN operator.
where state IN (‘NC’,’TX’);
4. IS MISSING or IS NULL
The IS MISSING or IS NULL operator selects missing values.
Example : where dateofbirth is missing
OR where dateofbirth is null
5. LIKE
The LIKE Operator is used to select a pattern.
Important Point :
The WHERE clause can reference a previously calculated variable in two ways-
1. Use CALCULATED keyword.
2. Repeat the calculation in the WHERE clause.
Method I :
PROC SQL;
SELECT momage,
(WEIGHT * .01) AS NEWWEIGHT
FROM outdata
WHERE CALCULATED NEWWEIGHT > 5;
QUIT;
Method II :
PROC SQL;
SELECT momage, (WEIGHT * .01) AS NEWWEIGHT
FROM outdata
WHERE (WEIGHT * .01) > 5;
QUIT;
11. How to Write Multiple Conditions/Criteria in PROC SQL
The CASE WHEN statement is used in SQL to perform conditional logic and return different values based on specified conditions. The END statement is required when using the CASE WHEN statement.
PROC SQL;
SELECT WEIGHT,
CASE
WHEN WEIGHT BETWEEN 0 AND 2000 THEN ‘LOW’
WHEN WEIGHT BETWEEN 2001 AND 3000 THEN ‘MEDIUM’
WHEN WEIGHT BETWEEN 3001 AND 4000 THEN ‘HIGH’
ELSE ‘VERY HIGH’
END AS NEWWEIGHT
FROM outdata;
QUIT;
The conditions within the CASE statement are as follows:
- If the weight is between 0 and 2000 (inclusive), it is categorized as ‘LOW’.
- If the weight is between 2001 and 3000 (inclusive), it is categorized as ‘MEDIUM’.
- If the weight is between 3001 and 4000 (inclusive), it is categorized as ‘HIGH’.
- If the weight does not fall into any of the above ranges, it is categorized as ‘VERY HIGH’.
12. How to Summarize Data
Use GROUP BY clause to summarize or aggregate data. Summary functions are used on the SELECT statement to produce summary for each of the analysis variables.
proc sql;
select momage, COUNT(married) AS marriage
from outdata
GROUP BY momage;
Quit;
The summary functions available are listed below:
- AVG/MEAN
- COUNT/FREQ/N
- SUM
- MAX
- MIN
- NMISS
- STD
- VAR
- T (t value)
- USS (Uncorrelated Sum of Square)
- CSS (Correlated Sum of Square)
- RANGE
13. How to Filter Data within Groups
In order to subset data when grouping is in effect, the HAVING clause must be used.The variable specified in having clause must contain summary statistics.
proc sql;
select momage, weight, COUNT(married) AS marriage
from outdata
GROUP BY momage, weight
HAVING marriage > 2;
Quit;
Important Point –
The WHERE clause cannot be used to subset aggregated data. To subset data with the GROUP BY clause you must use HAVING clause.
14. How to Create a New Table
The CREATE TABLE statement can be used to create a new data set as output instead of a report produced in output window.
proc sql;
create table health AS
select weight, married
from outdata
ORDER BY weight ASC, married DESC;
Quit;
15. How to limit the number of rows in newly created dataset?
Suppose you want to limit the number of rows that PROC SQL produces in the data set, use the INOBS= option in the PROC SQL statement.
proc sql INOBS=50;
create table health AS
select weight,married
from outdata;
Quit;
Difference between INOBS= and OUTOBS=
INOBS controls how many records are read from the dataset and OUTOBS controls how many records are written. Run the following program and see the difference. Both returns different results.
OUTOBS=Example*/
proc sql outobs=2;
select age, count(*) as tot
from sashelp.class
group by age;
quit;
INOBS= Example */
proc sql inobs=4;
select age, count(*) as tot
from sashelp.class
group by age;
quit;
16. How to count unique values by a grouping variable
Suppose you are asked to calculate the unique number of age values by Sex columns using SASHELP.CLASS dataset.
You can use PROC SQL with COUNT(DISTINCT variable_name) to determine the number of unique values for a column.
PROC SQL;
CREATE TABLE TEST1 as
SELECT Sex,
Count(distinct Age) AS Unique_count
FROM sashelp.class
GROUP BY Sex;
QUIT;
17. How to count the number of missing values
You can use NMISS() function to compute the number of missing values in a variable. The COUNT() function returns the number of non-missing values in a variable.
data temp;
input id;
cards;
1
2
.
4
5
.
;
run;
proc sql;
select nmiss(id) as N_missings,
count(id) as N,
calculated N_missings + calculated N as total
from temp;
quit;
How to refer to a calculated variable
The keyword CALCULATED is used to refer to a newly created variable for further calculation. In this case, we have used CALCULATED to sum ‘N_MISSINGS’ and ‘N’ variables.
18. KEEP and DROP some variables
Suppose you need to keep all the variables from SASHELP.CARS except variables ‘MODEL’ and ‘MAKE’. The DROP= option is used to drop these two variables. Similarly, we can use KEEP= option to keep specific variables. These DROP= and KEEP= Options are not native SQL language. It only works in SAS.
proc sql;
create table saslearning (drop= make model) as
select * from sashelp.cars;
quit;
19. How to delete rows from a table
You can use DELETE FROM statement to remove records (rows) from a dataset.
proc sql;
delete from outdata
where momage > 0;
quit;
In this case, we are deleting all records having momage greater than 0 from outdata dataset. Log shows ‘478 rows were deleted from outdata’.
20. How to use sub query in PROC SQL?
Suppose you need to find out employee IDs having records in the table named ‘file1’ but not in table ‘file2’. In the code below, we are querying multiple tables (datasets).
data file1;
input ID age;
cards;
1 24
2 34
3 45
4 67
;
run;
data file2;
input ID age;
cards;
1 25
3 46
4 62
;
run;
Proc SQL;
Select ID from file1
Where ID not in (select ID from file2);
Quit;