Character Functions in SAS:

 This tutorial covers the most frequently used SAS character functions with examples. Dealing with character strings can be a little tricky compared to numeric values. Therefore, it is necessary to understand the practical usage of character functions.*/

 */

 1. COMPBL Function*/

 The COMPBL function compresses multiple blanks to a single blank.*/

 */

 In the example below, the Name variable contains a record “Sandy   David”. It has multiple spaces between the first and last name.*/

 */

 Create a dummy data*/

 character functions*/

Data char;

Input Name $ 1-50;

Cards;

    Sandy     David  

   Annie Watson 

  Hello ladies and gentlemen 

Hi, I am good

;

Run;

1. Use COMPBL Function*/

  set – we use for already created table */

Data char1;

Set char;

name1=compbl(Name);

run;

proc print;

run;

 2. STRIP Function*/

 The STRIP function removes leading and trailing spaces.*/

     1234    */

  1234 */

Data char1;

Set char;

Name1 = strip(Name);

run;

proc print;

run;

 3. COMPRESS Function*/

 The COMPRESS function removes leading, between and trailing spaces.*/

 SYNTAX*/

 COMPRESS(String, characters to be removed, Modifier)*/

Data char1;

Set char;

Name1 = compress(Name, ‘AB’);

run;

proc print;

run;

 compress alone is equal to compbl and strip*/

 Answer = not 100%*/

 _null_ – if we don’t want to create a table then we use _null_;*/

data abc;

x=’ABCDEF-!1.234′;

string=compress(x,’!4′);

run;

data abc1;

x=’ABCDEF-!1.234′;

string=compress(x,”,’a’);

put string=;

run;

 In SAS 9.1.3, the additional parameter called MODIFIER was added to the function.*/

 The following keywords can be used as modifiers-*/

 a – Remove all upper and lower case characters from String.*/

 ak – Keep only alphabets from String.*/

 kd – Keeps only numeric values*/

 d – Remove numerical values from String.*/

 i – Remove specified characters both upper and lower case from String.*/

 k – keeps the specified characters in the string instead of removing them.*/

 l – Remove lowercase characters from String.*/

 p – Remove Punctuation characters from String.*/

 s – Remove spaces from String. This is default.*/

 u – Remove uppercase characters from String.*/

 Example 1 : Keep only alphabets from alphanumeric values*/

data _null_;

x=’ABCDEF-!1234′;

string=compress(x,”,’ak’);

put string=;

run;

 It returns ABCDEF*/

 Example 2 : Keep only numeric from alphanumeric*/

data _null_;

x=’ABCDEF-!1234′;

string=compress(x,”,’kd’);

put string=;

run;

 It returns 1234*/

 Example 3 : Remove all punctuation from string*/

data _null_;

x=’ABCDEF-!1234′;

string=compress(x,”,’p’);

put string=;

run;

 It returns ABCDEF1234*/

 Example 4 : Keep Integer Values from String*/

data abc2;

x=’ABCDEF-!1.234′;

string=compress(x,’AB21′,’k’);

run;

 It returns 1.234*/

 4. LEFT Function*/

 The LEFT function moves leading blanks to the end of the value. The length of the string does not change.*/

 ‘   Ajay Arora    ‘*/

Data char1;

x = ‘   Ajay Arora    ‘;

char = left(x);

run;

 5. TRIM Function*/

 The TRIM function removes trailing spaces.*/

Data char1;

x = ‘   Ajay Arora    ‘;

char = trim(x);

run;

 6. TRIM(LEFT(string)) = strip(string)*/

Data char1;

Set char;

char1 = left(trim(Name));

char2=strip(Name);

run;

 It is equivalent to STRIP function. It first removes leading spaces and then trailing spaces.*/

 */

 ‘Sumit’ */

 ‘Dhingra’*/

 */

 ‘Sumit Dhingra’*/

 7. CAT Function*/

 The CAT function concatenates character strings. It is equivalent to || sign.*/

data new;

a = ‘Ajay’;

b = ‘Arora’;

z = a ||b;

  d = cat(a,’ ‘,b); */

run;

 Both c and d returns “abcxyz”.*/

CATS function:

 Concatenate String and Numeric Value*/

data new2;

x = “Temp”;

y = 22;

z = x||y;

z1 = cats(x,y);

z2= cat(x,y);

z3 = catx(‘,’,x,y);

 put z = z1= z2 =;*/

run;

proc print;

run;

 z = Temp   22*/

 z1=Temp22*/

 z2=Temp 22*/

 */

 Note -*/

 The || keyword inserts multiple spaces when numeric and text values are concatenated.*/

 CATS strips both leading and trailing blanks, and does not insert separators.*/

 CATX strips both leading and trailing blanks, and inserts separators. The first argument to CATX specifies the separator.*/

 8. SCAN Function*/

 The SCAN Function extracts words within a value that is marked by delimiters.*/

 SCAN( text, nth word, <delimiters>)*/

 For example :*/

 We wish to extract first word in a sentence ‘Hi, How are you doing?’. In this case, delimiter is a blank.*/

data new2;

Input Name $ 1-50 ;

Cards;

Sandy,David,John

Annie,Watson,Debbie

ladies,gentlemen,Women

Hi,how,are

;

run;

 first_word returns ‘Hi,’ since it is the first word in the above sentence using blank as a delimiter.*/

 We wish to extract last word in a sentence ‘Hi, How are you doing?’. In this case, delimiter is a blank.*/

data new3;

set new2;

  first_word=scan(Name,1,’,’); */

second_word=scan(Name,2,’,’);

run;

 last_word returns ‘doing?’ since it is the last word in the above sentence.*/

 Suppose, delimiter is a character instead of blank or special sign.*/

data temp;

string=’Hello SAS community people’;

beginning= scan( string, 1, ‘S’ ); ** returns “Hello “;

middle = scan( string, 2, ‘S’ ); ** returns “A”;

end= scan( string, 3, ‘S’ ); **returns ” community people”;

run;

data temp3;

address = “Noida UtterPradesh 522001”;

city = scan(address,1,’ ‘);

state = scan(address,2,’ ‘);

pincode = scan(address,3,’ ‘);

run;

 9. SUBSTR Function*/

 The SUBSTR function extracts strings based on character position and length. It is equivalent to MS Excel’s MID Function.*/

 */

 = substr(old_var, starting_position, number of characters to keep);*/

 Examples :*/

data temp4;

set sashelp.demographics;

run;

data temp5;

set temp4;

first_three = substr(isoname,1,3);

run;

data new2;

t=”AFHood Analytics Group”;

new_var=substr(t,8);

run;

=substr(name,length(name)-3,3)

substr(name,length(name)-1,2);

data temp;

input name$ 40.;

cards;

sumit,Dhingra

Priyanka,Mishra

Aman,Kumar

Rahul,Arora

Ajay,Chauhan

;

run;

data temp2;

set temp;

last_three=substr(name,length(name)-2,3);

run;

  extract last name out of full name using substr fucntion? */;

data temp;

set sashelp.class;

run;

 =substr(name,starting_pos, tot_length)*/

data new2;

set sashelp.class;

last_three_alpha=substr(name,length(name) – 2);

run;

proc print;

run;

 Result: new_var=Analytics*/

 10. SUBSTR(Left of =) Function*/

 It replaces a portion of string with new string*/

data new3;

string=’new_dataable’;

substr(string,1,8) = “new_data”;

run;

 new_dataable*/

 Result: string=new_dataable*/

 In this case, SUBSTR replaces the first 8 characters of string with the value “new_data”. After this line executes, the value of string will be ‘new_dataable’*/

 11. LOWCASE, UPCASE and PROPCASE*/

 LOWCASE converts the character string to lowercase.*/

 UPCASE converts the character string to uppercase.*/

 PROPCASE returns the word having uppercase in the first letter and lowercase in the rest of the letter (sentence format).*/

data new4;

  name = ‘Hello world’;

  name_upper = upcase(name);

  name_lower = lowcase(name);

  name_proper = propcase(name); 

run;

proc print;

run;

 Output:*/

 name_upper=HELLO WORLD*/

 name_lower=hello world*/

 name_proper=Hello World*/

 12. INDEX Function*/

 The INDEX function finds characters or words in a character variable*/

data new4;

string=’hi,Hi how are you doing?’;

x = index(string, ” “);

x1 = index(string,”hi”);

  if index(name,’ian’) > 0; */

y1 = find(string, “h”);

run;

data temp;

string=’hi,Hi how are you doing?’;

if index(string,’h’) > 0;

run;

  index(name,”Mr”) = 1 */

  name */

   */

  Amandeep Mr Kumar    – 10 */

  Sumit Mr – 7 */

  Mr Ram  – 1   –     */

  Master Aman – 0 – False */

  Mrs Priya */

  Mr Ajay */

  Mr Rman */

  Mrs Priyanka */

  Master Rahul */

data temp;

set sashelp.class;

run;

data temp2;

set temp;

  if index(name,’a’) >=5; */

if name =: ‘H’;

run;

proc print;

run;

 x returns 4 as “How” starts from 4th character.*/

 To select all the records having ‘ian’ in their character.*/

 if index(name,’ian’) > 0;*/

 To select all the records having first letter ‘H’*/

 if name =: ‘H’;*/

 13. FIND Function*/

 The FIND function locates a substring within a string*/

 FIND (character-value, find-string <,’modifiers’> <,start>)*/

 */

 STRING1 = “Hello hello goodbye”*/

 */

 Examples :*/

 */

 */

 1. FIND(STRING1, “hello”) returns 7*/

 */

 2. FIND(“abcxyzabc”,”abc”,4) 7*/

data temp;

STRING1 = “hello hi goodbye”;

x=FIND(STRING1, “h”);

y = find(string1,”h”,x+1);

run;

  string */

   */

  hi, how are – 1 */

  sumit my is, hi how – 14 */

  I am from Delhi, hi hoow – 13 */

data temp2;

set temp;

y=FIND(STRING1, “hello”,x+1);

run;

  Can we combine above two queries? */

 14. TRANWRD Function*/

 The TRANWRD function replaces all occurrences of a word in a character string. It doesn’t replace full phrase (entire value content).*/

 TRANWRD (variable name, find what , replace with)*/

 */

 Example : name : Mrs. Joan Smith*/

 */

data new3;

name = ‘Mrs. Joan Smith’;

name1=tranwrd(name, “Mrs.”, “Ms.”);

run;

proc print;

run;

 name=tranwrd(name, “Mrs.”, “Ms.”);*/

 */

 Result : Ms. Joan Smith*/

 15. TRANSLATE Function*/

 The TRANSLATE function replaces specific characters in a character expression*/

 TRANSLATE(source, replace with, find what)*/

 Examples:*/

data temp3;

x = translate(‘XYZW’,’AB’,’VW’);

run;

 */

 Result : “XYZB”*/

data temp;

set sashelp.cars;

new_model=translate(model,’Z’,’X’);

run;

proc print;

run;

 Difference between TRANWRD and TRANSLATE Functions*/

 The TRANSLATE function converts every occurrence of a user-supplied character to another character. TRANSLATE can scan for more than one character in a single call. In doing this, however, TRANSLATE searches for every occurrence of any of the individual characters within a string. That is, if any letter (or character) in the target string is found in the source string, it is replaced with the corresponding letter (or character) in the replacement string.*/

 The TRANWRD function differs from TRANSLATE in that it scans for words (or patterns of characters) and replaces those words with a second word (or pattern of characters).*/

 16. INPUT and PUT Function*/

 The INPUT Function is used to convert character variable to numeric.*/

 new_num=input(character-variable, 4.);*/

 Example -*/

data temp;

x = ‘12345’;

new_x = input(x,5.);

run;

proc print;

run;

  Accounts */

   */

  123         123 */

  234         234 */

  345 */

  456 */

  567 */

  123   123  – False */

  345   345 */

  567   567 */

  678 */

data temp;

x = 12345;

new_x = put(x,5.);

run;

 In the above example, the variable x is a character variable as it is defined in quotes ‘12345’. The newly created variable new_x is in numeric format.*/

 */

 The PUT Function is used to convert numeric variable to character.*/

 new_char=put(numeric,4.0);*/

data temp;

x = 12345;

new_x = put(x,5.);

run;

 In this example, the variable x is originally in numeric format and later converted to character in the new variable new_x.*/

 18. LENGTH*/

 The LENGTH function returns length of a string.*/

data xyz;

set sashelp.cars;

len_origin = length(origin);

run;

 It returns 13.*/

 If you need to calculate the number of digits in a numeric variable -*/

 First, we need to convert our numeric variable to character to count the number of digits as LENGTH function works only for character variable.*/

data _null_;

x = 12345;

cnt = length(strip(put(x,12.)));

put cnt=;

run;

 In the above nested function, we first converted the variable x to character and then remove spaces by using STRIP function and then count number of digits by using LENGTH() function.*/

 Another Method -*/

 19. IF THEN*/

 IF THEN replaces the entire phrase in a string.*/

data mydata;

input names $30.;

cards;

Raj Gates

Allen Lee

Dave Sandy

William Gates

Jon Jedi

;

run;

data mydata2;

set mydata;

length new_names $30.;

if find(names, “Raj”) then new_names = “Raj Kumar”;

else new_names = names;

run;

 In the above SAS program we are checking if the string “Raj” is found within the variable “names”. The FIND function is used to perform this search. If the string “Raj” is found, the if block is executed, and the value “Raj Kumar” is assigned to the “new_names” variable in the mydata2 dataset. If the string “Raj” is not found in the names variable, the else block is executed, and the value of the “names” variable is assigned to the “new_names” variable.*/

 19. COUNT Function*/

 The COUNT function counts the number of times that a specified substring appears within a character string.*/

data xyz;

name = “Aman”;

  x = count(name,”a”); */

x1 = count(name,”a”,”i”);

run;

 Result : x=2 as there are 2 lower case ‘a’s in the variable name. x1=3 as there are 3 ‘a’s in the variable name (The ‘i’ modifier ignores case sensitivity)*/

 21. COUNTW Function*/

 The COUNTW function counts the number of words in a character string.*/

data readin;

input name$15.;

cards;

Trait Jhonson   2

3+3=6

;

run;

data out;

set readin;

  x = countw(name); */

x1 = countw(name,’ ‘);

run;

proc print;

run;