I have spent a significant part of my career as a data visualization guy. I am very particular about the formatting and presentation of reports. So, when I started using SAS, I faced a few challenges in changing formats of numbers and characters, especially when dates were involved. Not so surprisingly, both Kunal and I receive a fair number of queries on this topic.
In SAS, there are various options to enhance the reporting layouts. In this article, we will particularly discuss about methods to play with format of data values. It should be noted that these changes are only applied while displaying the results. Changing format of output does not change the way, the data gets stored at the back end.
By default, SAS provides various built in formats to deal with various formats, but they are not sufficient to meet custom requirements your data might have. For example, we may have coded Male and Female as M and F (or 0 and 1), but while printing we would want to display the field as MALE and FEMALE only. Another common example is to display area codes in 10 digit telephone numbers (e.g. 123-3456-789).
P.S. This is a long article compared to what I usually write, so feel free to digest this in bits and pieces.
The article is divided in 2 broad parts: First we understand the pre-defined formats & user defined formats in SAS. Next, we look at various applications and examples of these concepts.
Built In SAS Formats
Below is a sample data containing agent performance details. We will use this dataset in examples and discussion in this article. Here DOJ is in numeric format, as SAS defines by default, starting from 01-Jan-1960.
Let’s first look at the syntac of the FORMAT statement:-
Syntax :- FORMAT Variable(s) Format;
Let’s do some exercises with format statement:
Example 1: Display DOJ in DDMMYYYY format:
Code:
Proc print Data=Sales;
Format DOJ Date9. ;
Run;
Output:
Here is the list of predefined date formats available to change the output format of variables:
Example 2: Display the Salesamount in Dollars with one decimal position
Code:
Proc print Data=Sales;
Format Salesamount Dollar9.1 ;
Run;
Output:
List of predefined date formats available to change the output format of variables:
Example 3: Display Salesamount in Dollar with two decimal position with comma and DOJ in Weekdate format.
Code:
Proc Print Data=Sales;
Format Salesamount Dollar12.2 DOJ WEEKDATE.;
Run;
Output:
Need for user defined Formats:
Till now we have seen, how to change the format of numbers and Dates with in built SAS formats. But there can be many occasions when SAS built-in formats do not suffice our needs. Like in current dataset, we want to:
- Display “MALE” and “FEMALE” instead of “M” and “F”
- Re-define categories A, B, C, D and E as Ultra, Super, Average, Low and Poor
- Display the frequency of Salesamount in three categories “< 8000”,”8000-12000” and “>=12000”
These all things can be done with Data step (or PROC SQL) with If-Else statement (or switch case), but it will require a new variable that will store these formatted values. Whereas, if we only want to only change the display (and not the values in the data set), then creating user defined format using PROC FORMAT is a more efficient way to make these changes.
User Defined Formats Using Proc FORMAT
PROC FORMAT is a procedure that creates a mapping of data values into data labels. The user defined FORMAT mapping is independent of a SAS DATASET and variables and must be explicitly assigned in a subsequent DATASTEP and/or PROC.
Syntax of PROC FORMAT: –
Rules for defining FORMAT NAME:
- For character values, first character must be a dollar sign ($), and a letter or underscore as the second character. For numeric values, name must have a letter or underscore as the first character
- Name cannot end with a number
- Cannot be the name of an existing SAS format
- Should not end with a period in the VALUE statement
Let’s solve the above discussed problems using PROC Format:
Problem 1 (& 2): Display M and F as Male and Female.
Code:
Proc Format;
Value $Genderfmt ‘M’=’Male’
‘F’ =’Female’;
Run;
Proc Print Data = Sales ;
Format Gender $Genderfmt. ;
Run;
Output:
In a similar way, we can solve it for the problem 2. Define the category A, B, C, D and E as Ultra, Super, Average, Low and Poor.
Problem 3: Want to display the frequency of Salesamount in three categories “< 8000”,”8000-12000” and “>=12000”
Code:
Proc format;
Value Salegrp low-8000 = ‘<8000’
8000 – < 12000 = ‘8000-12000’
12000-High = ‘>=12000’;
Run;
Proc Freq Data=Sales;
Format Salesamount Salegrp.;
Table Salesamount;
Run;
Output:
Above, I have used ranges to define the format. They can be used for both – character and numeric values.
Special keywords used to define the ranges are:-
a) Ranges can be multiple values separated by commas.
i) ‘A’, ’B’, ’C’
ii) 22, 44, 67
b) Ranges can include or exclude the bounding values, depending on the use of various keywords like:
i) 24 – 45: It includes values from 24 to 45 including 24 and 45.
ii) 24 < – 45: It includes values between 24 to 45 including 45 and excluding 24.
iii) 24 – < 45: It includes values between 24 to 45 including 24 and excluding 45.
iv) 24 < – < 45: It includes values between 24 to 45 excluding 24 and 45.
c) LOW refers to least available number, HIGH refers to the highest available number and OTHER includes all numbers not specified. Others also include missing values if it is not specified.
Few Important points / Applications to note about PROC FORMAT
-
PROC Format returns the original value if we have not included all data values while defining the format.
-
We can create multiple formats in a single PROC FORMAT statement, by specifying multiple values under PROC Format
Code:
Proc Format;
Value Salegrp low-8000 = ‘<8000’
8000 – < 12000 = ‘8000-12000’
12000-High = ‘>=12000’;
Value $Genderfmt ‘M’=’Male’
‘F’ =’Female’;
Run;
Proc Print Data=Sales;
Format Salesamount Salegrp. Gender $Genderfmt.;
Run;
Output:
-
PROC FORMAT can be used for creating efficient Data merges
In PROC FORMAT, we have an option CNTLIN, that allows us to create a format from a data set rather than a VALUE statement. Before using this option we first look at the guidelines below:-
a) Input dataset must contain three variables required by PROC FORMAT – START, LABEL and FMTNAME.
b) Here START is the key field between these two tables and it must be unique in the input dataset (Dataset, we are using in CNTLIN option).
c) LABEL variable is the value we want to map it to another dataset.
d) The FMTNAME variable is the name of the format and it must be passed in single quotes.
e) After defining format, we can use put function to create variable in dataset based on key field and the format we have defined.
Example:
Let us say we want to map agent DOB from dataset Agent_DOB (Sample Data) to above dataset
Here we want to create a format based on Agent_DOB and apply it to Sales. In Agent_DOB, we would consider AGT_ID (key field, between both datasets) as START, DOB as Label and FMTNAME =’ $AGENT’. Now look at the statements (below).
Data Agent_DOB_Fmt;
Rename AGT_ID=START;
Set Agent_DOB;
Label=DOB;
FMTNAME=’$AGENT’;
Run;
Proc Format CNTLIN=Agent_DOB_Fmt;
Run;
Above code has generated format using dataset Agent_DOB_Fmt and after that to merge with SALES dataset have written data steps (below).
Data Sales_DOB;
Set Sales;
Birth_Date=Put(Agt_ID, $Agent.);
Run;
Limitation of PROC FORMAT as Merging
This is the best method when we want to merge one variable from another data set, but if we want to add five or multiple variables, then we have to repeat the PROC FORMAT statement that many times along with the multiple PUT function in data step. So, I prefer to use MERGE or PROC SQL in such cases.
-
We can save user defined SAS format for future use.
All SAS formats are stored in a catalog (collection of formats). When we create a Format, it gets stored in the catalog. If we don’t specify the catalog, then SAS stores formats in the WORK library in a catalog called FORMATS. Like other datasets of WORK library, they also get deleted at the end of the session.
Now to save User defined formats, we need to specify where to store the catalog and what to call it. This can be achieved by storing formats in a library other than WORK.
Step -1 First of all we have to define a library (Here I am using the SAS University edition)
Syntax: – LIBNAME Library_Name “Path”
LIBNAME STATDATA “/folders/myfolders/ECSTAT0”;
Step-2 Use library option in PROC format and provide a library name with the format file name. The file name must be a valid SAS dataset name.
Syntax: – PROC FORMAT LIBRARY=Library_Name.MYFILENAME;
Proc Format library=STATDATA.Gender_Fmt;
Value $Genderfmt ‘M’=’Male’
‘F’ =’Female’;
Run;
Above program has created a file called Gender_Fmt.sas7bcat in your directory location.
- How to Use stored Format
Now, whenever we want to use stored format, we have to tell SAS to look for formats in that catalog file. This is done with the fmtsearch option. So before using it, we need to write a statement.
Syntax: – Options fmtsearch = (Library_Name.MYFILENAME)
Options fmtsearch = (STATDATA.Gender_Fmt); /*After this we can use Gender_Fmt*/
Proc Print data=Sales;
Format Gender $Genderfmt.;
Run;
-
SAS PICTURE FORMAT
SAS picture format creates templates in which we define how the numbers are displayed. With use of PICTURE FORMAT, we can get over multiple display issues of numbers like:-
1. Decimal and Comma Placement
2. Embedding Characters with Numbers
3. Prefixes
Example 1: – I want to display the sales amount ending with a % sign and preceded with $ sign.
Proc Format;
Picture New_fmt low-High=’000000%’ (Prefix=’$’);
Run;
Proc print data=Sales;
Format Salesamount New_fmt.;
Run;
Output:
Example 2: – Let’s say, I have a 10 digit telephone number and I want to display it as 123-3456-789.
Data Telephone;
Input Contact;
datalines;
1111111111
222222222
3333333333
;
Run;
Proc Format;
Picture Tele low-High=’000-0000-000′;
Run;
Proc print data=Telephone;
Format Contact Tele.;
Run;
End Notes:-
In this article, we looked at various methods to display format of data values using built in and user defined formats. We have also looked at various techniques to define formats like ranges, picture, handle missing values and unmatched values using OTHER. We have also discussed efficient merging technique using PROC FORMAT. This should be all what you need to be a pro with SAS Formats.
We have not covered In-format while reading non-standard dataset. We will discuss that in one of our future post.
Hope you found this article useful? We have simplified this topic and have tried to present it in a very simple and lucid manner. If you need any more help with SAS formatting, please feel free to ask your questions through comments below.