SAS Clinical Interview Questions and Answers

1. Explain what is SAS? What are the functions it performs?

SAS means Statistical Analysis System, which is an integrated set of software products.

• Information retrieval and data management
• Writing reports and graphics
• Statistical analysis, econometrics and data mining
• Business planning, forecasting and decision support
• Operation research and Project management
• Quality Improvement
• Data Warehousing
• Application Development

2. What do you mean by CALL PRXFREE Routine?

CALL PRXFREE routine is used for Character String Matching and is used for allocation of free memory for Perl regular expression.

3. What is the therapeutic area you worked earlier?

There are so many diff. therapeutic areas a pharmaceutical company can work on and few of them include, anti-viral (HIV), Alzheimer’s, Respiratory, Oncology, Metabolic Disorders (Anti-Diabetic), Neurological, Cardiovascular. 

4. Explain what is the basic structure of SAS programming?

The basic structures of SAS are

• Program Editor
• Explorer Window
• Log Window

5. What are Macro libraries?

ANS: Macro libraries are the libraries, which stores all the macros required for developing TLG’s of the clinical trial. These are very are necessary in controlling and managing the macros. With the help of a %INCLUDE statement; the stored macros in the macro library can be automatically called.

6. Define RUN-Group processing?

RUN-Group processing is used to submit a PROC step using RUN statement without ending the procedure.

7. Describe the phases of clinical trials?

Ans:- These are the following four phases of the clinical trials:
Phase 1: Test a new drug or treatment to a small group of people (20-80) to evaluate its safety.
Phase 2: The experimental drug or treatment is given to a large group of people (100-300) to see that the drug is effective or not for that treatment.
Phase 3: The experimental drug or treatment is given to a large group of people (1000-3000) to see its effectiveness, monitor side effects and compare it to commonly used treatments.
Phase 4: The 4 phase study includes the post marketing studies including the drug’s risk, benefits etc.

8. How to generate statistics using Proc SQL?

ANS: Yes, we can generate the statistics like N, Mean, Median, Max, Min, STD & SUM using PROC SQL. But SQL procedure cannot calculate all the above statistics by default, as it is the case with PROC MEANS.

9. Explain what is PDV?

Program Data Vector is the area of memory where data sets are created through SAS system i.e. one at a time. When program is executed an input buffer is created which will read the data values and make them assign to their respective variables.

10. What is your involvement while using CDISC standards? What is mean by CDISC where do you use it?

CDISC is nothing but an organization (Clinical Data Interchange Standards Consortium), which implements industrial standards for the pharmaceutical industries to submit the clinical data to FDA.

There are so many advantages of using CDISC standards: Reduced time for regulatory submissions, more efficient regulatory reviews of submission, savings in time and money on data transfers among business.

11. Describe the validation procedure? How would you perform the validation for TLG as well as analysis data set?

Validation procedure is used to check the output of the SAS program generated by the source programmer. In this process validators write the program and generate the output. If this output is same as the output generated by the SAS programmer’s output then the program is considered to be valid. We can perform this validation for TLG by checking the output manually and for analysis data set it can be done using PROC COMPARE.

12. What is the difference between stratum and by statement in Proc Life test?

We specify a BY statement with PROC LIFETEST to obtain separate analyses on observations in groups defined by the BY variables.

13. What do you mean by treatment emergent and treatment emergent serious adverse events?

Treatment emergent adverse events and Treatment emergent serious adverse events are nothing but the adverse events and serious adverse events which were happened after the drug administration or getting worse by the drug, if patients are already having those adverse events before drug administration.

14. What is CRT?

Case Report Tabulation, whenever a pharmaceutical company is submitting an NDA, company has to send the CRT’s to the FDA.


15. Mention what is the difference between nod-up key and nod-up options?

The identical observations are checked and removed through NODUP option. NODUPKEY option checks for all BY variable values and if found, it will eliminate that.

16. Can you use PROC COMPARE to validate listings? Why?

Yes, we can use PROC COMPARE to validate the listing because if there are many entries (pages) in the listings then it is not possible to check them manually. So in this condition we use PROC COMPARE to validate the listings.

17. How did you create analyzed data sets?

Analysis datasets are nothing but the datasets that are used for the statistical analysis of the data. Analysis datasets contains the raw data and the variables derived from the raw data. Variables, which are derived for the raw data, are used to produce the TLG’s of the clinical study. The safety as well as efficacy endpoints (parameters) dictate the type of the datasets are required by the clinical study for generating the statistical reports of the TLG’s. Sometimes the analysis datasets will have the variables not necessarily required to generate the statistical reports but sometimes they may required to generate the ad-hoc reports.

18. What is MedDRA?

The Medical Dictionary for Regulatory Activities (MedDRA) has been developed as a pragmatic, clinically validated medical terminology with an emphasis on ease-of-use data entry, retrieval, analysis, and display, with a suitable balance between sensitivity and specificity, within the regulatory environment. MedDRA is applicable to all phases of drug development and the health effects of devices. By providing one source of medical terminology, MedDRA improves the effectiveness and transparency of medical product regulation worldwide.

MedDRA is used to report adverse event data from clinical trials, as well as post-marketing and pharmacovigilance.

19. Explain what does PROC print, and PROC contents are used for?

PROC print outputs a listing of the values of some or all of the variables in a SAS data set. PROC contents tells the structure of the data set rather than the data values.

20. What is Program Data Vector (PDV) and what are its functions?

PDV is a logical area in the memory

SAS creates a dataset one observation at a time

Input buffer is created at the time of compilation, for holding a record from external file

PDV is created followed by the creation of input buffer

SAS builds dataset in the PDV area of memory

21. What are all the PROCS have you used in your experience?

I have used many procedures like proc report, proc sort, proc format etc. I have used proc report to generate the list report, in this procedure I have used subjid as order variable and trt_grp, sbd, dbd as display variables.

How would you submit the docs to FDA? Who will submit the docs?
Ans:- We can submit the docs to FDA by e-submission. Docs can be submitted to FDA using
Define.pdf or define. Xml formats. In this doc we have the documentation about macros and program and E-records also. Statistician or project manager will submit this doc to FDA.

22. Explain what is the use of PROC gplot?

PROC gplot identifies the data set that contains the plot variables. It has more options and therefore can create more colorful and fancier graphics.

23. How do I Create a SAS Data Set with Compressed Observations?

To create a compressed SAS data set, use the COMPRESS=YES option as an output DATA set option or in an OPTIONS statement. Compressing a data set reduces its size by reducing repeated consecutive characters or numbers to 2-bye or 3-byte representations. To uncompressed observations, you must use a DATA step to copy the data set and use option COMPRESS=NO for the new data set.

The advantages of using a SAS compressed data set are reduced storage requirements for the data set and fewer input/output operations necessary to read from and write to the data set during processing. The disadvantages include not being able to use SAS observation number to access an observation. The CPU time required to prepare compressed observations for input/output observations is increased because of the overhead of compressing and expanding the observations.