In PROC SORT, the NODUP and NODUPKEY options are used to remove duplicate observations from the dataset.
The NODUPKEY option is used to remove duplicates based on the variable(s) specified in BY statement. Whereas the NODUP option is used to remove duplicates based on values of all variables in a dataset.
In the code below, 7 observations with duplicates based on “product” column were deleted.
Example:
data mydata;
input product $ transactions sale;
datalines;
A 84 158
A 75 118
A 64 421
A 12 592
B 75 206
B 17 855
B 46 360
C 87 650
C 96 922
C 40 860
;
run;
proc sort data=mydata nodupkey out=newdata;
by product;
run;
In the code below, 0 duplicate observations were deleted as there is no duplicate rows in the dataset.
proc sort data=mydata nodup out=newdata;
by product;
run;