Blog: Challenges migrating SAS solutions to SQL Server – Missing Values versus NULL

Managers for a well-known accounting and tax-preparation company contacted DB Best looking for an alternative solution to a big problem. They were long term customers of SAS but were now facing two key challenges: high SAS license costs and the fact that SAS developers could not keep up with business demands and provide for effective support. The DB Best team analyzed our customers’ initial SAS solution and determined that the best possible option was to migrate their workloads to Microsoft SQL Server. In this post, we will talk about one of the challenges when migrating SAS DATA step modules to the Microsoft SQL Server — SAS missing values versus SQL Server NULL.

Specifically, SAS stores missing values differently based on the data type. Keep in mind, Anthony Barr developed the SAS language starting in 1966 to run on an IBM System/360 mainframe. Back then, developers used Hollerith Card Codes with each line of code on a separate punch card to process their SAS programs.

With punch cards, there was no way to directly represent enter NULL values for missing data. That’s why SAS uses a single space for character values and a period symbol for numeric values.

And, yes, it’s hard to find SAS developers given the development innovations that have taken place since 1966.

Overcoming the gap between SAS missing values and SQL Server’s use of NULL

One of the biggest challenges of migration from SAS to the Microsoft SQL Server is that SAS has very specific processing mechanics that cannot be directly converted to SQL. You need a deep analysis of each case for the merge operator, modify operator, and conditional processing using SAS Base variables for intermediate calculations.

We’d like to share with you how we overcame the challenges of dealing with SAS missing values when migrating our client’s SAS DATA step modules to SQL Server.

For example, the following SAS code produces 2 tables in one code statement with a check for a null numeric value in table1.column2.

/****************************************************************
Code example 1 - SAS code that has two output tables as part of
                 the DATA statement.
****************************************************************/

DATA table2             /* table2 will have data from table1 where
                           column2 has a missing value */

     table3;            /* table3 will have data from table1 where
                           column2 has a value*/

set table1;
if column2 = . then     /* Note: SAS allows the use of . followed by a-z
                           for categorizing missing data! for example .a */

    output table2;
else
    output table3;
RUN;
/*
Editor's note: A better way of testing column2 for
missing values is to use the missing() function
instead of the original SAS (.) syntax.
Ex.  if missing(column2) then;
See http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000509996.htm
*/

And here is an example of how we translated this code using T-SQL:

/****************************************************************
Code example 2 - T-SQL code that shows the translation of the SAS
multi-table DATA step code in code example 1.
****************************************************************/


-- Copy the data from table1 into table2 where t1.column2 is null
SELECT * INTO table2 FROM table1 AS t1
WHERE t1.column2 IS NULL;

-- Then, copy the data from table1 into table2 where t1.column1 is not null
SELECT * INTO table3 FROM table1 AS t1
WHERE t1.column2 IS NOT NULL;
GO

SAS includes missing values when evaluating conditional statements

SAS treats a missing value as meeting a condition by default. For example, we have a piece of SAS code that adds to table2 all values that are less than 20 including the missing values. According to SQL logic, the corresponding code in SQL won’t include the null values and thus will produce very different results.

/****************************************************************
Code example 3 - SAS code that tests an expression that includes
missing values. In this case, append the values from table1 that
meet the condition to table2.
****************************************************************/

DATA table2;
    set table1;
    if column2 < 20;  /* This expression includes missing values */
RUN;

If the SAS code doesn’t explicitly check for a missing value, then you need to assume that you should include missing values when migrating the functionality to SQL Server.

To achieve the same results in T-SQL we added the additional condition or column2 is null to the SQL query.

/****************************************************************
Code example 4 - T-SQL append example that shows the translation
of example 3 that includes NULL / missing values for the tested
expression.
****************************************************************/


INSERT INTO table2
    SELECT * FROM table1
    WHERE (column2 < 20) OR (column2 IS NULL);
GO
/*
    This works well for numeric data types. If you migrated
    string values from SAS, you would need to use the
    following expression using the ISNULL() function and
    two single quote characters to test for blank values.
    Ex.  or (ISNULL(column2, '') = ''))
    See ISNULL() - https://docs.microsoft.com/en-us/sql/t-sql/functions/isnull-transact-sql
*/

Missing metadata that count impact the way SAS treats missing values

For the cases when business stores data in several external sources, missing metadata is a common problem. In a particular case, our customer supplied CSV files that do not provide for storing metadata. Thus, our developers had to supply the missing data based on context.

For example, consider the following week5_scores.csv file example with headings for the first line.

Name,Score,Team,Division
Joseph,76,"Red Racers, Washington",AAA
Mitchel,82,"Blue Bunnies, Richmond",AAA
Sue Ellen,74,"Green Gazelles, Atlanta",AA

In this example, the metadata is fairly easy. However, unless you have the code for loading the data, you may make an incorrect assumption on the column type in SQL Server.

If we had the input definition like this, we would know right away how the data should be formatted.

/****************************************************************
Code example 6 - SAS code example that includes the metadata
for the columns in the csv file example.
****************************************************************/

DATA team_results;
    infile 'week5_scores.csv' dsd truncover firstobs=2;
    length name $50 score 3.1 team $100 div $4;
    input name -- div;
RUN;

The key challenge here is that you cannot guarantee proper testing without metadata. Since developers supply it on their own, they need to double-check to determine if they were right and this approach is very time-consuming.

Here is another example where the metadata code for library dw and the value of the ty variable were embedded in %include files.

/****************************************************************
Code example 7 - SAS code example where needed migration details
are buried in include files.
****************************************************************/


%include '/common/db_conn.sas';
%include '/common/env.sas';
DATA sales_ty;
    set dw.sales;
    if dw.date_year = &ty;
RUN;

In this case, we might want to consider using a stored procedure and passing the value for the ty variable as a parameter.

/****************************************************************
Code example 8 - T-SQL stored procedure example of passing in
value for date_year condition for the append query in example 7.
****************************************************************/


CREATE PROCEDURE dw.Append_Sales_to_Sales_ty
    @ty INT
AS
    SET NOCOUNT ON;
    INSERT INTO sales_ty
        SELECT * FROM dw.sales
        WHERE dw.date_year = @ty;
GO

-- Execute the stored procedure with the parameter value.
EXEC dw.Append_Sales_to_Sales_ty @ty = 2019;
GO

Conclusion

The challenges of migration from SAS split into two groups: common challenges and challenges that refer to initial SAS solution architecture. In both cases, to obtain good results you need a professional team and a reliable migration plan that considers the peculiarities of the existing SAS solution.

If you are considering moving from SAS to a more comfortable solution, contact us to learn how to take advantage of our SAS and SQL Server experience for your SAS migration projects.

Share this...
Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Share on LinkedIn
Linkedin