Clin Infotech

SAS Interview Questions for Freshers

1. Why choose SAS over other data analytical tools?

Listed below are a few reasons to choose SAS over other data analysis tools:  

2. What are the essential features of SAS?

SAS has the following essential features:

3. Write down some capabilities of SAS Framework.

SAS Framework has the following four capabilities:

You can download a PDF version of Sas Interview Questions.

Download PDF

4. What is the use of Retain in SAS?

SAS, at the start of each iteration of the data step, reads the data statement and puts the missing values of variables (assigned either through an INPUT statement or via an assignment statement within the data step) into the program data vector (logical areas of memory). RETAIN statements override this default. In other words, a RETAIN statement instructs SAS not to set variables to missing when moving from one iteration of the data step to another. The variables are instead retained.


RETAIN variable1 variable2 … variablen;

There are no limits to the number of variables you can specify. When you do not specify variable names, SAS retains the values of every variable that was created in INPUT or assignment statement by default.

5. What is PDV (Program Data Vector)?

Logical areas of memory where SAS builds data sets, one observation at a time are called Program data vectors (PDVs). Whenever a program is executed, SAS usually reads data values from the input buffer or generates them based on SAS language statements and assigns these data values to specific or respective variables in the program data vector. The program data vector also includes two automatic variables i.e., _N_ and _ERROR_ variable. 

6. State difference between Missover and Truncover in SAS.

Example: An external file with variable-length records, for example, contains the following records: 






Following are the steps to create a SAS data set using these data. The numeric informat 5 is used for this data step and the informatted length of the variable NUM is matched by only one input record. 

data readin; 

infile ‘external-file’ missover; 

input NUM 5.; 


proc print data=readin; 



Obs    ID 

1      . 

2      . 

3      . 

4      . 

5      55555

Those values that were read from input records that were too short have been set to missing. This problem can be corrected by using the TRUNCOVER option in the INFILE statement:


An external file with variable-length records, for example, contains the following records: 





Following are the steps to create a SAS data set using these data. The numeric informat 5 is used for this data step.

data readin; 

infile ‘external-file’ truncover; 

input NUM 5.; 


proc print data=readin; 



Obs    ID 

1      1 

2      22 

3      333 

4      4444 

5      55555

Those values that were read from input records that were too short are not set to missing.

7. What do you mean by the Scan function in SAS and write its usage?

The Scan() function is typically used to extract words from a value marked by delimiters (characters or special signs that separate words in a text string). The SCAN function selects individual words from text or variables containing text and stores them in new variables. 



In this case, 


Consider that we would like to extract the first word from a sentence ‘Hello, Welcome to Scaler!’. In this case, the delimiter used is a blank. 

data _null_;

string=”Hello, Welcome to Scaler!”;

first_word=scan(string, 1, ‘ ‘ );

put first_word =;


First_word returns the word ‘hello’ since it’s the first word in the above sentence. Now, consider that we would like to extract the last word from a sentence ‘Hello, Welcome to Scaler!’. In this case, the delimiter used is a blank. 

data _null_; 

string=”Hello, Welcome to Scaler!”; 

last_word=scan(string, -1, ‘ ‘ ); 

put last_word =; 


Last_word returns ‘Scaler!’ As Scaler is the last word in the above sentence.  

8. Consider the following expression stored in the variable address: 9/4 Infantry Marg Mhow CITY, MP, 453441

In the following scenario, what would the scan function return?  


In the above program, we have used the scan function to read the 3rd word in the address string. The following output will the returned by the scan function:  


9. Explain what is first and last in SAS?

SAS Programming always uses the BY and SET statements to group data based on the order of grouping. When both BY and SET statements are used together, SAS automatically creates two temporary variables, FIRST. and LAST. ‘SAS’ identifies the first and last observations of a group based on the values of the FIRST. and LAST. variables. These variables are always 1 or 0, depending on the following conditions: 

Essentially, SAS stores FIRST.variable and LAST.variable in a program data vector (PDV). As a result, they become available for DATA step processing. However, SAS will not add them to the output data set since they are temporary. 

10. What is the meaning of STOP and OUTPUT statements in SAS?

Syntax:  STOP;

Example: As demonstrated in this example, STOP is used to avoid an infinite loop when using a random access method within a DATA step: 

data sample; 

do developerobs=1 to engineeringobs by 10; 

set master.research point=developerobs nobs=engineeringobs; 





Syntax: OUTPUT <data-set-name(s)>;

Example: Each line of input data can be used to create two or more observations. As given below, for each observation in the data set Scaler, three observations are created in the SAS data set Result. 

data Result(drop=time4-time6); 

set Scaler; 








11. State the difference between using the drop = data set option in the set statement and data statement.

In SAS, the drop= option is used to exclude variables from processing or from the output data set. This option tells SAS which variables you wish to remove from a data set.

Syntax: DROP=variable(s);

In this case, variable(s) lists one or more names of variables. Variables can be listed in any format SAS supports. 

Example: Consider the following data set: 

DATA outdata; 

INPUT gender $ section score1 score2; 


F   A  17  20

F   B  25  17 

F   C  12  15

M   D  21  25


proc print;


The following DROP= data set option command SAS to drop variables score1 and score2. 

data readin; 

set outdata (drop = score1 score2); 

totalsum = sum(score1, score2); 



Gender  Section    score1     score2    totalsum 

 F       A          .          .          . 

 F       B          .          .          .    

 F       C          .          .          .       

 M       D          .          .          .      

12. Name different data types that SAS support.

SAS supports two data types, i.e., Character and Numeric. Dates are also considered characters despite the fact that there are implicit functions that can be performed on them. 

13. What do you mean by the “+” operator and sum function?

In SAS, summation or addition is performed either with the “sum” function or by using the “+” operator. Function “Sum” returns the sum of arguments that are present (non-missing arguments), whereas “+” operator returns a missing value if one or more arguments are not present or missing. 

Example: Consider a data set containing three variables a, b, and c. 

data variabledata;

input a b c;


1      2     3

34     3     4

.      3     2

53     .     3

54     4     .

45     4     2



There are missing values for all variables and we wish to compute the sum of all variables.

data sumofvariables; 

set variabledata; 





x        y 

6        6 

41       41 

5        . 

56       . 

58       . 

51       51

The value of y is missing for the 3rd, 4th, and 5th observations in the output. 

14. Explain _N_ and _ERROR_ in SAS.

In a SAS Data Step, there are two variables that are automatically created, namely, the _ERROR_ variable and the _N_ variable. 

15. What are different ways to exclude or include specific variables in a dataset?

DROP and KEEP statements can be used to exclude or include specific variables from a data set. 

Example: Consider the following data set: 

DATA outdata; 

INPUT gender $ section score1 score2; 


F   A  17  20

F   B  25  17 

F   C  12  15

M   D  21  25


proc print;


The following DROP statement instructs SAS to drop variables score1 and score2. 

data readin;

set outdata;

totalsum = sum(score1,score2);

drop score1, score2;



Gender  Section   totalsum

F          A        37

F          B        42

F          C        27

M          D        46

The following KEEP statement instructs SAS to retain score1 in the data set. 

data readin1; 

set readin; 

keep score1; 



Gender  Section  score1    totalsum 

F         A       17         37 

F         B       25         42 

F         C       12         27 

M         D       21         46

16. What are some common mistakes that people make while writing programs in SAS?

The following are some of the most common programming errors in SAS:  

SAS Interview Questions for Experienced

17. What do you mean by SAS Macros and why to use them?

Macro is a group of SAS statements (program) that automates repetitive tasks. With SAS’s Macros feature, we can avoid repeating sections of code and use them again and again when needed without having to type them again and it increases readability also. Automation makes your work faster because you don’t have to write the same lines of code every day. %MACRO and %MEND are the start and end statements of a macro program. These can be reused multiple times. The SAS program declares them at the beginning and then calls them out during the body of the program when needed.

Macro variables contain a value that will be used over and over again by SAS programs. With a maximum of 65534 characters, macro variables are one of SAS’s most powerful tools. They can be either global or local in scope. The % Local macro variable is a variable that can be defined and accessed inside macro programs only. The %Global macro variable is defined in open code (outside of the macro program) and can be accessed from any SAS program running in the SAS environment. 

Syntax: The local variables are declared in the following syntax. 

In the following program, we have created the Macro variable in which we pass the parameters comma-separated and then we have written the Macro statement followed by the %MEND statement. After that, we have called the macro program by passing the parameters.

# Creating a Macro program.

%MACRO <macro name>(Param1, Param2,….Paramn); 

Macro Statements;


# Calling a Macro program.

%MacroName (Value1, Value2,…..Valuen);

18. Write different ways to create micro variables in SAS Programming?

The following are some ways to create macro variables: 

19. Explain how %Let and macro parameters can be used to create micro variables in SAS programming?

%LET: %Let is generally used to create micro variables and assign values to them. You can define it inside or outside a macro. 

Syntax: %LET macro-variable-name = value;

Any number, text or date can be entered in the Value field, depending on what the program requires.  

How to use the Micro Variable?

Whenever referencing macro variables, an ampersand (&) is used followed by the macro variable name as shown below: 

& <Macro variable Name>

Macro Parameters: Macros have variables called parameters whose values you set when you invoke the macro. The parameters are added to a macro by naming them in parenthesis in %macro.


%MACRO macro-name (parameter-1= , parameter-2= , ……parameter-n = );

Macro Statements;


How to call a Macro?

To call/use micro variables, we use % followed by the macro variable name and then pass parameters.

20. Name some SAS system options that are used to debug SAS Micros.

There are a number of SAS System options that users can use to troubleshoot macro problems and issues. Macro-option results are automatically shown in the SAS Log.  

21. State the difference between PROC MEANS and PROC SUMMARY.

Proc SUMMARY and Proc MEANS are essentially the same methods for calculating descriptive statistics, such as mean, count, sum, median, etc. Also, it is capable of calculating several other metrics such as percentiles, quartiles, variances, standard deviations, and t-tests. N, MIN, MAX, MEAN, and STD DEV are the default statistics produced by PROC MEANS.  

22. What do you mean by functions and procedures in SAS?

SAS Procedures: They process data in SAS data sets to create statistics, tables, reports, charts, and plots, as well as to perform other analyses and operations on the data. All types of statistical analysis can be performed using SAS procedures. Execution of a procedure is triggered by the keyword PROC, which starts the step. Here are some SAS PROCs: 

SAS Functions: There are many built-in functions in SAS that aid in the analysis and processing of data. You use them in DATA statements. Different functions take different numbers of arguments. Here is a list of SAS functions: 

23. Identify the error in the following code.

proc mixed data=SASHELP.IRIS plots=all;

model petallength= /;

class species;


Basically, it is a syntax error. In all cases, the MODEL statement must appear after the CLASS statement.

24. Explain what you mean by SYMGET and SYMPUT.

In a data step, SYMGET returns a macro variable’s value. Conversely, the primary function of SYMPUT is to store the value of the data set in a macro variable. 

Syntax of Symput: 

CALL SYMPUT(macro-variable, value);

Syntax of SYMGET: 


Example: In the following program we have created a macro variable and then we have used the symput function to put the value where our key is ‘avar’ and then we have used the symget function to get the micro variable value.

* Create a macro variable. 

data dataset;

set sashelp.class;

if _N_ = 1 then do;

call symput(‘avar’, name);



%put &avar;

* Get macro variable value in a dataset;

data needit;



25. What is the importance of the Tranwrd function in SAS.

TRANRWD, when applied to a character string, replaces or eliminates all occurrences of a substring. By using TRANWRD, you can scan for words (or patterns of characters) and replace them with a second word (or pattern of characters).  


TRANWRD(source, target, replacement)  


name : Mrs. Johny Lever  
name=tranwrd(name, “Mrs.”, “Ms.”);   
Result : Ms. Johny Lever 

26. How do you specify the number of iterations and specific conditions within a single do loop?

The code below illustrates how to specify the number of iterations and specific conditions within a single do loop. The iterative DO statement executes the DO loop until the Sum is greater than or equal to 50000, or until the DO loop has executed 10 times, whichever comes first. 

data Scaler;

do i=1 to 50 until (Sum>=50000);






27. Explain the usage of trailing @@.

Occasionally, multiple observations need to be created from a single record of raw data. In order to specify how SAS will read such a record, you can use the double trailing at-sign (@@ or “double trailing @”).  By using a double trailing @@, SAS is told to “hold the line more strongly”. A double trailing sign (@@) directs SAS not to advance to another input record, but to hold the current input record for the next input statement. 

It is important to note that the single trailing @ does not hold an input record for subsequent Iterations of the data step. A trailing “@” indicates that an input record will only be held for this iteration of the data step (until the processing returns or gets back to the top of the data step), or that it will be passed to the next INPUT statement without a single trailing “@”. 

28. Explain different ways to remove duplicate values in SAS.

Below are two ways to delete duplicate values in SAS: 

The NODUPRECS (or NODUPREC or NODUP) option of PROC SORT identifies observations with identical values for all columns and removes them from the output data set.

Proc sort data=SAS-Dataset nodups;

By varname;


PROC SQL can be used to remove duplicates. The DISTINCT keyword is used in the select clause to account for duplicate observations.

proc sql; 

create table New_dataset as select distinct * from Old_dataset where var=distinct(var); 


29. What do you mean by NODUP and NODUPKEY options and write difference between them?

PROC SORT in SAS enables the removal of duplicate values from a table primarily by utilizing two options: 


Each variable in the data set can be compared with it.  NODUPKEY only compares the variables that are listed in the BY statement.  
NODUP removes duplicate observations where the same values are repeated across all variables.When NODUPKEY is selected, the duplicate observations are removed where the values of a variable listed in the BY statement are the same. 
Syntax: PROC SORT DATA=readin NODUP;  By varname;  run;Syntax:  PROC SORT DATA=readin NODUPKEY;  By varname;  run;

30. Name the command used for sorting in SAS programs?

The PROC SORT command can be used to sort data in SAS. The command can be used for multiple variables within a program. It creates a new dataset with sorting and keeps the original dataset unchanged. 


PROC SORT DATA=original OUT=Sorted;  

BY variable_name; 


31. Explain what is INPUT and INFILE Statement.

In SAS programming, using an INFILE statement identifies an external file containing the data, whereas using an INPUT statement describes the variables used. 

Syntax of INFILE: INFILE ‘filename’;

Syntax of INPUT: INPUT ‘varname1’ ‘varname2’;


DATA readin


INPUT ID Gender Score;


32. What do you mean by %Include and %Eval?

%Include: If you run a program containing the %INCLUDE statement, the SAS System executes any statements or data lines that you bring into the program. Statements are executed immediately. 


%INCLUDE source(s)

</<SOURCE2> <S2=length> <option-list> >;


%Eval: Integer arithmetic is used to evaluate arithmetic or logical expressions. %EVAL accepts only integers as operands in arithmetic expressions. Operands with floating-point values cannot be used in %EVAL arithmetic calculations. %SYSEVALF can be used in these cases.  

Syntax: %EVAL(arithmetic/logical-expression)


%let d=%eval(13+23);


Have you been preparing for a SAS interview and wondering how you can succeed?  This useful guide can help you prepare for it. We’ve compiled a list of the top 30+ SAS interview questions and answers that you’re likely to be asked during your interviews. The questions have been specifically designed to familiarize you with the type of questions you might encounter during the interview.

SAS MCQ Questions


Which of the following PROC statements is correct? 




All of the above


Is there a way to limit the variables written to output dataset in DATA STEP?




Both A and B


When there is a missing value in SAS, it should be coded as __.

Semi-colon (;)

Period (.)

Dollar sign ($)

Comma (,)


SAS stands for ___ .

Subordinate Audit/Services

Subordinate Account Services

Statistical Analytics System

None of the above


The other name for the Data Preparation stage of Knowledge Discovery Process is ___.

CRISP-DM (Cross Industry Standard Process for Data Mining)

SAS (Statistical Analysis System)

SEMMA (Sample, Explore, Modify, Model, Assess)

ETL (Extract, Transform, Load)


Reports and graphs are typically generated using which of the following steps?






Which of the following is not a SAS system option that is used to debug SAS Micros?






Which of the following PROC statements is used to remove duplicate values from a data set?




None of the above


Which of the following is not a way to create micro variables in SAS Programming?


Call Symput


None of the above


Which of the following is used to store the value of the data set in a macro variable?




None of the above

Difference between INPUT and INFILE

The INFILE statement is used to identify an external file while the INPUT statment is used to describe your variables.







Note : The variable name, followed by $ (dollar sign), idenfities the variable type as character. In the example shown above, ID and SEX are numeric variables and Name a character variable.

2. Difference between Informat and Format

Informats read the data while Formats write the data. Informat – To tell SAS that a number should be read in a particular format. For example: the informat mmddyy6. tells SAS to read the number121713as the date December 17, 2013. Format – To tell SAS how to print the variables.

3. Difference between Missover and Truncover

Missover -When the MISSOVER option is used on the INFILE statement, the INPUT statement does not jump to the next line when reading a short line. Instead, MISSOVER setsvariables to missing. Truncover – It assigns the raw data value to the variable even if the value is shorter than the length that is expected by the INPUT statement. The following is an example of an external file that contains data:





This DATA step uses the numeric informat 4. to read a single field in each record of raw data and to assign values to the variable ID.

data readin;

infile ‘external-file’ missover;

input ID4.;


proc print data=readin;


The output is shown below :

Obs    ID

 1          .

 2          .

 3          .

 4      4444


data readin;

infile ‘external-file’ truncover;

input ID4.;


proc print data=readin;


The output is shown below :

Obs    ID

 1      1

 2      22

 3      333

 4      4444

4. Purpose of double trailing@@ in Input Statement ?

The double trailing sign (@@)tells SAS rather than advancing to a new record, hold the current input record for the execution of the next INPUT statement.

DATA Readin;

   Input Name $ Score @@;  


Sam 25 David 30 Ram 35

Deeps 20 Daniel 47 Pars 84



The output is shown below :

Double Trailing

5. How to include or exclude specific variables in a data set?

– DROP, KEEP Statements and Data set Options
DROP, KEEP Statement

The DROP statement specifies the names of the variables that you want to remove from the data set.

data readin1;

set readin;

drop score;


The KEEP statement specifies the names of the variables that you want to retain from the data set.

data readin1;

set readin;

keep var1;


DROP, KEEP Data set Options

The main difference between DROP/ KEEP statement and DROP=/ KEEP=data set option is that you can not use DROP/KEEP statement in procedures.

data readin1 (drop=score);

set readin;


data readin1 (keep=var1);

set readin;


6. How to print observations 5 through 10 from a data set?

The FIRSTOBS= and OBS=data set options would tell SAS to print observations 5 through 10 from the data set READIN.

proc print data = readin (firstobs=5 obs=10);


7.What are the default statistics that PROC MEANS produce?

PROC MEANS produce the “default” statistics of N, MIN, MAX, MEAN and STD DEV.

Proc Means : Detailed Explanation

8. Name and describe functions that you have used for data cleaning?
SAS Character Functions

Tutorial : Character Functions

9.Difference between FUNCTION and PROC

Example : MEAN function and PROC MEANS

The MEAN function is an average of the value of several variables in one observation.

The average that is calculated using PROC MEANS is the sum of all of the values of a variable divided by the number of observations in the variable.

In other words,The MEAN function will sum across the row and a procedure will SUM down a column.

MEAN Function

AVG=MEAN (of Q1 – Q3);

See the output below :
MEAN Function Output




The output is shown below :

10. Differences between WHERE and IF statement?

For detailed explanation, see this tutorial –SAS : Where Vs IF

  1. WHERE statement can be used in procedures to subset data while IF statement cannot be used in procedures.
  2. WHERE can be used as a data set option while IF cannot be used as a data set option.
  3. WHERE statement is more efficient than IF statement. It tells SAS not to read all observations from the data set
  4. WHERE statement can be used to search for all similar character values that sound alike while IF statement cannot be used.
  5. WHERE statement can not be used when reading data using INPUT statement whereas IF statement can be used.
  6. Multiple IF statements can be used to execute multiple conditional statements
  7. When it is required to use newly created variables, useIF statement as it doesn’t require variables to exist in the READIN data set

11.What is Program Data Vector (PDV)?

PDV is a logical area in the memory.

How PDV is created?
SAS creates a dataset one observation at a time.Input buffer is created at the time of compilation, for holding a record from external file.PDV is created followed by the creation of input buffer.SAS builds dataset in the PDV area of memory.

Detailed Explanation : How PDV Works

12. What is DATA _NULL_?

The DATA _NULL_ is mainly used to create macro variables. It can also be used to write output without creating a dataset.The idea of “null” here is that we have a data step that actually doesn’t create a data set.

13. What is the difference between ‘+’ operator and SUM function?

SUM function returns the sum of non-missing arguments whereas “+” operator returns a missing value if any of the arguments are missing.

Suppose we have a data set containing three variables – X, Y and Z. They all have missing values. We wish to compute sum of all the variables.

data mydata2;

set mydata;




The output is shown in the image below :
SAS : SUM Function vsPlus Operator

In the output, value of p is missing for 4th, 5th and 6th observations.

14. How to identify and remove unique and duplicate values?

1. Use PROC SORT with NODUPKEY and NODUP Options.
2. Use First. and Last. Variables – Detailed Explanation

The detailed explanation is shown below :



Create this data set in SAS

data readin;

input ID Name $ Score;


1 David 45

1 David 74

2 Sam 45

2 Ram 54

3 Bane 87

3 Mary 92

3 Bane 87

4 Dane 23

5 Jenny 87

5 Ken 87

6 Simran 63

8 Priya 72;


There are several ways to identify and remove unique and duplicate values:

In PROC SORT, there are two options by which we can remove duplicates.

1. NODUPKEY Option 2. NODUP Option

The NODUPKEY option removes duplicate observations where value of a variable listed in BY statement is repeated while NODUP option removes duplicate observations where values in all the variables are repeated (identical observations).







The output is shown below :

The NODUPKEY has deleted 5 observations with duplicate values whereas NODUP has not deleted any observations.

Why no value has been deleted when NODUP option is used?
Although ID 3 has two identical records (See observation 5 and 7), NODUP option has not removed them. It is because they are not next to one another in the dataset and SAS only looks at one record back.

To fix this issue, sort on all the variables in the dataset READIN.
To sort by all the variables without having to list them all in the program, you can use the keywork ‘_ALL_’in the BY statement (see below).


BY _all_;


The output is shown below :

PROC SORT – Detailed Explanation

15. Difference between NODUP and NODUPKEY Options?

The NODUPKEY option removes duplicate observations where value of a variable listed in BY statement is repeated while NODUP option removes duplicate observations where values in all the variables are repeated (identical observations).

See the detailed explanation for this question above (Q14).

16. What are _numeric_ and _character_ and what do they do?

1. _NUMERIC_ specifies all numeric variables that are already defined in the current DATA step.
2. _CHARACTER_ specifies all character variables that are currently defined in the current DATA step.
3. _ALL_ specifies all variables that are currently defined in the current DATA step.

Example : To include all the numeric variables in PROC MEANS

proc means;

var _numeric_;


Tutorial : Specify a list of variables

17. How to sort in descending order?

Use DESCENDING keyword in PROC SORT code. The example below shows the use of the descending keyword.


18. Under what circumstances would you code a SELECT construct instead of IF statements?

When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced.

The syntax for SELECT WHEN is as follows :
SELECT (condition);
WHEN (1) x=x;
WHEN (2) x=x*2;

Example :
SELECT (str);
WHEN (‘Sun’) wage=wage*1.5;
WHEN (‘Sat’) wage=wage*1.3;

19. How to convert a numeric variable to a character variable?

You must create a differently-named variable using the PUT function.

The example below shows the use of the PUT function.
charvar=put(numvar, 7.) ;

20. How to convert a character variable to a numeric variable?

You must create a differently-named variable using theINPUTfunction.

The example below shows the use of the INPUT function.

21. What’s the difference between VAR A1 – A3 and VAR A1 — A3?

Single Dash :It is used to specify consecutively numbered variables. A1-A3 implies A1, A2 and A3.
Double-dash :It is used to specify variables based on the order of the variables as they appear in the file,regardless of the name of the variable. A1–A3 implies all the variables from A1 to A3 in the order they appear in the data set.
Example :The order of variables in a data set : ID Name A1 A2 C1 A3
So using A1-A3 would returnA1 A2 A3. A1–A3 would returnA1 A2 C1 A3.

22. Difference between PROC MEANS and PROC SUMMARY?

1. Proc MEANS by default produces printed output in the OUTPUT window whereas Proc SUMMARY does not. Inclusion of the PRINT option on the Proc SUMMARY statement will output results to the output window.
2. Omitting the var statement in PROC MEANS analyses all the numeric variable whereasOmitting the variable statement in PROC SUMMARY produces a simple count of observation.

How to produce output in the OUTPUT window using PROC SUMMARY?
Use PRINT option.

proc summary data=retail print;

 class services;

 var investment;


23. Can PROC MEANS analyze ONLY the character variables?

No, Proc Means requires at least one numeric variable.

24. How SUBSTR function works?

The SUBSTR function is used to extract substring from a character variable.
The SUBSTR function has three arguments:
SUBSTR ( character variable, starting point to begin reading the variable, numberof characters to read from the starting point)
There are two basic applications of the SUBSTR function:

data _null_ ;                                                            

phone='(312) 555-1212′ ;                                                      

area_cd=substr(phone, 2, 3) ;                                                   

put area_cd=;                                                           


Result : In the log window, it writes area_cd=312 .
It is used to change just a few characters of a variable. data _null_ ; phone='(312) 555-1212′ ; substr(phone, 2, 3)=’773′ ; put phone=; run ; Result : The variable PHONE has been changed from(312) 555-1212 to (773) 555-1212.
Explanation : Other Character Functions

25. Difference between CEIL and FLOOR functions?

The ceil function returns the smallest integer greater than/equal to the argument whereas the floor returns the greatest integer less than/equal to the argument.
For example : ceil(4.4) returns 5 whereas floor(4.4) returns 4.

26. Difference between SET and MERGE?

SET concatenates the data sets where as MERGE matches the observations of the data sets.

Detailed Explanation : Data Step Merging
Detailed Explanation : Combine Data Sets

27. How to do Matched Merge and output only consisting of observations from both files?

Use IN=variable in MERGE statements. It is used for matched merge to track and select which observations in the data set from the merge statement will go to a new data set.

data readin;
merge file1(in=infile1) file2(in=infile2);
by id;
if infile1=infile2;

28. How to do Matched Merge and output consisting of observations in file1 but not in file2, or in file2 but not in file1?

data readin;

merge file1(in=infile1)file2(in=infile2);

by id;

if infile1 ne infile2;


SAS Merge

29. How to do Matched Merge and output consisting of observations from only file1?

data readin;
merge file1(in=infile1)file2(in=infile2);
by id;
if infile1;

30. How do I create a data set with observations=100, mean 0 and standard deviation 1?

data readin;

do i=1 to 100;

     temp=0 + rannor(1) * 1;




proc means data=readin mean stddev;

var temp;


31. How to label values and use it in PROC FREQ?

Use PROC FORMAT to set up a format.

proc format;

value score 0 – 100=‘100-‘

101 – 200=‘101+’



proc freq data=readin;

tables outdata;

format outdatascore. ;


Tutorial : PROC FREQ Detailed Explanation

32. How to use arrays to recode set of variables?

Recode the set of questions: Q1,Q2,Q3…Q20 in the same way: if the variable has a value of 6 recode it to SAS missing.

data readin;

set outdata;  

array Q(20) Q1-Q20;

do i=1 to 20;

if Q(i)=6 then Q(i)=.;



SAS Arrays and Do Loops Made Easy

33. How to use arrays to recode all the numeric variables?

Use _numeric_ and dim functions in array.

data readin;

set outdata;  

array Q(*) _numeric_;

do i=1 to dim(Q);

if Q(i)=6 then Q(i)=.;



Note : DIM returns a total count of the number of elements in array dimension Q.

34. How to calculate mean for a variable by group?

Suppose Q1 is a numeric variable and Age a grouping variable. You wish to compute mean for Q1 by Age.





35. How to generate cross tabulation?

Use PROC FREQ code.




SAS will produce table of A by B.

36. How to generate detailed summary statistics?




 VAR Q1;


Note : Q1 is a numeric variable and Age a grouping variable.

37. How to count missing values for numeric variables?

Use PROC MEANS with NMISSoption.
Types of Missing Values in SAS

38. How to count missing values for all variables?

proc format;

value $missfmt ‘ ‘=’Missing’ other=’Not Missing’;

value missfmt .=’Missing’ other=’Not Missing’;


proc freq data=one;

format _CHAR_ $missfmt.;

tables _CHAR_ / missing missprint nocum nopercent;

format _NUMERIC_ missfmt.;

tables _NUMERIC_ / missing missprint nocum nopercent;


39. Describe the ways in which you can create macro variables

There are 5 ways to create macro variables:

  1. %Let
  2. Iterative %DO statement
  3. Call Symput
  4. Proc SQl into clause
  5. Macro Parameters.

Detailed Tutorial : SAS Macros Made Easy

40. Use of CALL SYMPUT

CALL SYMPUT puts the value from a dataset into a macro variable.

proc means data=test;

var x;

output out=testmean mean=xbar;


data _null_;

set testmean;

call symput(“xbarmac”,xbar);


%put mean of x is &xbarmac;

41. What are SYMGET and SYMPUT?

SYMPUT puts the value from a dataset into a macro variable where as
SYMGET gets the value from the macro variable to the dataset.

Tutorial – Difference between SYMGET and SYMPUT

42. Which date function advances a date, time or datetime value by a given interval?

INTNX function advances a date, time, or datetime value by a given interval, and returns a date, time, or datetime value. Ex: INTNX(interval,start-from,number-of-increments,alignment).

Tutorial : INTNX Function with Examples

43. How to count the number of intervals between two given SAS dates?

INTCK(interval,start-of-period,end-of-period) is an interval function that counts the number of intervals between two give SAS dates, Time and/or datetime.

Tutorial : INTCK Function Explained

44. Difference between SCAN and SUBSTR?

SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location. It is best used when we know the exact position of the sub string to extract from a character value.

45. The following data step executes:

Data strings;




What will the value of the variable Text be?

* DONALD DUCK [(Leading blanks are displayed using an asterisk *]

46. For what purpose would you use the RETAIN statement?

A RETAIN statement tells SAS not to set variables to missing when going from the current iteration of the DATA step to the next. Instead, SAS retains the values.

Tutorial : RETAIN Statement

47. When grouping is in effect, can the WHERE clause be used in PROC SQL to subset data?

No. In order to subset data when grouping is in effect, the HAVING clause must be used. The variable specified in having clause must contain summary statistics.
PROC SQL Made Easy

48. How to use IF THEN ELSE in PROC SQL?










49. How to remove duplicates using PROC SQL?

Proc SQL noprint;
Create Table inter.Merged1 as
Select distinct * from inter.readin ;

50. How to count unique values by a grouping variable?

You can use PROC SQL with COUNT(DISTINCT variable_name) to determine the number of unique values for a column.

51. How to merge two data sets using PROC SQL?

PROC SQL Merging

52. Difference between %EVAL and %SYSEVALF

%EVAL cannot perform arithmetic calculations with operands that have the floating point values. It is when the %SYSEVALF function comes into picture.

%let last=%eval (4.5+3.2);

%let last2=%sysevalf(4.5+3.2);

%put &last2;

53. How to debug SAS Macros

There are some system options that can be used to debug SAS Macros:
Detailed Tutorial : SAS Macros Made Easy


%let x=temp;
%let n=3;
%let x3=result;
%let temp3=result2;

Difference between &x&n , &&x&n , &&&x&n ?

Solution : Multiple Ampersand Macro Variables

55. How to save log in an external file


proc printto log=”C:\Users\Deepanshu\Downloads\LOG2.txt” new;


56. How Data Step Merge and PROC SQL handle many-to-many relationship?

Data Step MERGE does not create a cartesian product incase of a many-to-many relationship. Whereas, Proc SQL produces a cartesian product.

SAS : Many-to-Many Merge

57. What is the use of ‘BY statement’ in Data Step Merge?

Without ‘BY’ statement, Data Step Merge performs merging without matching. In other words, the records are combined based on their relative position in the data set. The second data set gets placed to the “right” of the first data set (no matching based on the unique identifier – if data is not sorted based on unique identifier, wrong records can be merged).

When you use ‘BY’ statement, it matches observations according to the values of the BY variables that you specify.

58. Use of Multiple SET Statments

SAS : Use of Multiple SET Statements

59. How to combine tables vertically with PROC SQL

PROC SQL : Combine tables vertically

60. Two ways to reverse order of data

Reverse order of data

61. Which is more faster- Data Step / Proc SQL

The SQL procedure performed better with the smaller datasets (less than approx. 100 MB) whereas the data step performed better with the larger ones (more than approx. 100 MB).
It is because the DATA step handles each record sequentially so it never uses a lot of memory, however, it takes time to process one at a time. So with a smaller dataset, the DATA step is going to take more time sending each record through.
With the SQL procedure, everything is loaded up into memory at once. By doing this, the SQL procedure can process small datasets rather quickly since everything is available in memory. Conversely, when you move to larger datasets, your memory can get bogged down which then leads to the SQL procedure being a little bit slower compared to the DATA step which will never take up too much memory space.

If you need to connect directly to a database and pull tables from there, then use PROC SQL.

Advanced SAS Interview Questions and Answers

  1. Two ways to select every second row in a data set
  2. data example;
    set sashelp.class;
    if mod(_n_,2) eq 0;
  3. MOD Function returns the remainder from the division of the first argument by the second argument. _N_ corresponds to each row. The second row would be calculated like (2/2) which returns zero remainder.

  4. data example1;
    do i = 2 to nobs by 2;
    set sashelp.class point=i nobs=nobs;

 How to select every second row of a group

proc sort data = sashelp.class;
by sex;
data example2 (drop = N);
set sashelp.class;
by sex;
if then N = 1;
else N +1;
if N = 2 then output;

Tutorial : First. and Last. Variables

3. How to calculate cumulative sum by group

Create Sample Data

data abcd;
input x y;
1 25
1 28
1 27
2 23
2 35
2 34
3 25
3 29
Cumulative Sum by Group

Cumulative Sum by X

data example3;
set abcd;
if first.x then z1 = y;
else z1 + y;
by x;

Tutorial : Uses of RETAIN Statement

4. Can both WHERE and IF statements be used for subsetting on a newly derived variable?

No. Only IF statement can be used for subsetting when it is based on a newly derived variable. WHERE statement would return an error “newly derived variable is not on file”.

Please note that WHERE Option can be used for subsetting on a newly created variable.

data example4 (where =(z <=50));
set abcd;
z = x*y;

5. Select the Second Highest Score with PROC SQL

data example5;
input Name $ Score;
sam 75
dave 84
sachin 92
ram 91

proc sql;
select *
from example5
where score in (select max(score) from example5 where score not in (select max(score) from example5));

Tutorial : Learn PROC SQL with 20 Examples

6. Two ways to create a macro variable that counts the number of observations in a dataset

data _NULL_;
if 0 then set sashelp.class nobs=n;
call symputx(‘totalrows’,n);
%put nobs=&totalrows.;

proc sql;
select count(*) into: nrows from sashelp.class;
%put nobs=%left(&nrows.);

7. Suppose you have data for employees. It comprises of employees’ name, ID and manager ID. You need to find out manager name against each employee ID.
SQL: Self Join

Create Sample Data

data example2;
input Name $ ID ManagerID;
Smith 123 456
Robert 456  .
William 222 456
Daniel 777 222
Cook 383 222

SQL Self Join

proc sql;
create table want as
select a.*, b.Name as Manager
from example2 as a left join example2 as b
on a.managerid =;

Data Step : Self Join 

proc sort data=example2 out=x;
by ManagerID;
proc sort data=example2 out=y (rename=(Name=Manager ID=ManagerID ManagerID=ID));
by ID;
data want;
merge x (in= a) y (in=b);
by managerid;
if a;

8.  Create a macro variable and store TomDick&Harry

Issue : 
When the value is assigned to the macro variable, the ampersand placed after TomDick may cause SAS to interpret it as a macro trigger and an warning message would be occurred.

%let x = %NRSTR(TomDick&Harry);
%PUT &x.;

%NRSTR function is a macro quoting function which is used to hide the normal meaning of special tokens and other comparison and logical operators so that they appear as constant text as well as to mask the macro triggers ( %, &).

Tutorial : 
SAS Macro Programming

9. Difference between %STR and %NRSTR

Both %STR and %NRSTR functions are macro quoting functions which are used to hide the normal meaning of special tokens and other comparison and logical operators so that they appear as constant text. The only difference is %NRSTR can mask the macro triggers ( %, &) whereas %STR cannot.

10. How to pass unmatched single or double quotations text in a macro variable

%let eg  = %str(%’x);
%let eg2 = %str(x%”);
%put &eg;
%put &eg2;

If the argument to %STR or %NRSTR contains an single or double quotation mark or an unmatched open or close parenthesis, precede each of these characters with a % sign.

11. How can we use COUNTW function in a macro

%let cntvar = %sysfunc(countw(&nvar));

There are several useful Base SAS function that are not directly available in Macro, %Sysfunc enables those function to make them work in a macro.


%let x=temp;
%let n=3;
%let x3=result;
%let temp3 = result2;

 What %put &&x&n; and %put &&&x&n; would return?

  1. &&x&n : Two ampersands (&&) resolves to one ampersand (&) and scanner continues and then N resolves to 3 and then &x3 resolves to result.
  2. &&&x&n :  First two ampersands (&&) resolves to & and then X resolves to temp and then N resolves to 3. In last, &temp3 resolves to result2.

13. How to reference a macro variable in selection criteria

Use double quotes to reference a macro variable in a selection criteria. Single quotes would not work.
SAS : Reference Macro Variable

14. How to debug %IF %THEN statements in a macro code

MLOGIC option will display how the macro variable resolved each time in the LOG file as TRUE or FALSE for %IF %THEN.

15. Difference between %EVAL and %SYSEVALF functions 

Both %EVAL and %SYSEVALF are used to perform mathematical and logical operation with macro variables. %let last = %eval (4.5+3.2); returns error as %EVAL cannot perform arithmetic calculations with operands that have the floating point values. It is when the %SYSEVALF function comes into picture.

%let last2 = %sysevalf(4.5+3.2);
%put &last2;

16. What would be the value of i after the code below completes

data test;
set temp;
array nvars {3} x1-x3;
do i = 1 to 3;
if nvars{i} > 3 then nvars{i} =.;

Answer is 4. It is because when the first time the loop processes, the value of count is 1; the second time, 2; and the third time, 3. At the beginning of the fourth iteration, the value of count is 4, which is found to be greater than the stop value of 3 so the loop stops. However, the value of i is now 4 and not 3, the last value before it would be greater than 3 as the stop value.

17. How to compare two tables with PROC SQL

The EXCEPT operator returns rows from the first query that are not part of the second query.

proc sql;
select * from newfile
select * from oldfile;

18. Selecting Random Samples with PROC SQL

The RANUNI and OUTOBS functions can be used for selecting random samples. The RANUNI function is used to generate random numbers.

proc sql outobs = 10;
create table tt as
select * from sashelp.class
order by ranuni(1234);

19. How to use NODUPKEY kind of operation with PROC SQL

In PROC SORT, NODUPKEY option is used to remove duplicates based on a variable. In SQL, we can do it like this :

proc sql noprint;
create table tt (drop = row_num) as
select *, monotonic() as row_num
from readin
group by name
having row_num = min(row_num)
order by ID;

20. How to make SAS stop macro processing on Error

Check out this link – Stop SAS Macro on Error

21. Count Number of Variables assigned in a macro variables

%macro nvars (ivars);
%let n=%sysfunc(countw(&ivars));
%put &n;
%nvars (X1 X2 X3 X4);

22. Two ways to assign incremental value by group

See the snapshot below –
Advanced SAS Interview Questions

Prepare Input Data

data xyz;
input x $;

Data Step Code

data example22;
set xyz;
if first.x then N+1;
by x;
proc print;


proc sql;
select a.x, b.N from xyz a
inner join
(select x, monotonic() as N
from (
select distinct x
from xyz)) b
on a.x=b.x;

23. Prepare a Dynamic Macro with %DO loop

Check out this link – Dynamic SAS Macro

24. Write a SAS Macro to extract Variable Names from a Dataset

*Selecting all the variables;
proc sql noprint;
select name into : vars separated by ” “
from dictionary.columns
where LIBNAME = upcase(“work”)
and MEMNAME = upcase(“predata”);

The DICTIONARY.COLUMNS contains information such as name, type, length, and format, about all columns in the table. LIBNAME : Library Name, MEMNAME : Dataset Name

%put variables = &vars.;

25. How would DATA STEP MERGE and PROC SQL JOIN works on the following datasets shown in the image below?
Many to Many Merge

The DATA step does not handle many-to-many matching very well. When we perform many to many merges. the result should be a cartesian (cross) product. For example, if there are three records that match from one contributing data set to two records from the other, the resulting data set should have 3 × 2 = 6 records. Whereas, PROC SQL creates a cartesian product in case of many to many relationship.

Detailed Explanation – Many to Many Merge

26. Two ways to create a blank table

Copy structure of existing table


Enforce FALSE condition in Selection Criteria

WHERE 1=0;

27. How to insert rows in a table with PROC SQL

Tutorial : Insert Rows in Table

28. Difference between %LOCAL and %GLOBAL

%LOCAL is used to create a local macro variable during macro execution. It gets removed when macro finishes its processing.

%GLOBAL is used to create a global macro variable and would remain accessible till the end of a session . It gets removed when session ends.

29. Write a macro with CALL EXECUTE

Detailed Explanation of CALL EXECUTE

30. Write a macro to split data into N number of datasets

Suppose you are asked to write a macro to split large data into 2 parts (not static 2). In the macro, user should have flexibility to change the number of datasets to be created.

%macro split(inputdata=, noofsplits=2);
data %do i = 1 %to &noofsplits.;
split&i. %end;;
retain x;
set &inputdata. nobs=nobs;
if _n_ eq 1 then do;
if mod(nobs,&noofsplits.) eq 0
then x=int(nobs/&noofsplits.);
else x=int(nobs/&noofsplits.)+1;
if _n_ le x then output split1;
%do i = 2 %to &noofsplits.;
else if _n_ le (&i.*x)
then output split&i.;
%mend split;
%split(inputdata=temp, noofsplits=2);

31. Store value in each row of a variable into macro variables

data _null_;
set sashelp.class ;
call symput(cats(‘x’,_n_),Name);
%put &x1. &x2. &x3.;

The CATS function is used to concatenate ‘x’ with _N_ (row index number) and removes leading and trailing spaces to the result.

32. How PROC TRANSPOSE works?

Tutorial : PROC TRANSPOSE Explained

33. How to check if SAS dataset is empty?

Tutorial : 
Check number of observations

End Note

The above list of SAS interview questions are designed especially for experienced SAS programmers and analysts. These are real world examples with proper explanation. Most of the tough SAS interviews include SAS SQL and Macros questions. Before going for interview, you need to brush up your concepts of SAS programming. It is advised to go through concepts when practicing above interview questions.