Thursday, 1 February 2018

How to Implement Dynamic Data Masking in Azure SQL Database and SQL Server

In this article, we will discuss how we can use these built in functions to mask the data based on different data patterns, like social security number, credit card number, date of birth, email address, name, etc.

Data Masking Functions


SQL Server provides four built in functions to mask data in SQL tables. These functions are as follows:

1. partial()
2. default()
3. email()
4. random()

We learned about these functions in a previous article. In this article, we will discuss how we can use these functions to achieve data masking.

Test Data Preparation


First, we will create test tables and insert test data in these tables. Later, we will apply data masking functions and see how the data looks like after data masking.

         Create test tables:

Create table TestDDM
              (ID Int,
              PersonName varchar (100),
              EmailAddress varchar(120),
              CreditCardNumber varchar(19),
              SocialSecurityNumber varchar(11))

CREATE TABLE TestPerson
             ( ID INT,
             Name VARCHAR(100),
             DOB DATETIME,
             INCOME NUMERIC)

         Insert data in test tables:

      INSERT INTO TestDDM  Values (1, 'Anoop Kumar','abcdefgh@hotmail.com','1234-5678-4321-8765','123-45-6789')
      INSERT INTO TestPerson VALUES (1, 'John Smith','06-07-1986',10000)

         Retrieve table data:

SELECT * FROM TestDDM

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications


Figure 1 TestDDM table with Data

SELECT * FROM TestPerson

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications

Figure 2 TestPerson table with Data

Currently, no data masking is implemented so data can be visible as it is inserted in tables.

Data Masking Implementation

In this section, we will use all four built in data masking functions and learn the behavior of these functions.

partial() - The partial () function gives the option to customize the masking. The function exposes the first and last letters and adds a custom padding string in the middle.

We have used partial() function to mask the social security number. We masked the initial five digits of the social security number and exposed last four digits.

ALTER TABLE TestDDM
ALTER COLUMN [SocialSecurityNumber] ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)');

email(): The email() function masks the email address.

The command below masks the column EmailAddress in the TestDDM table using the email() function. The function masks the whole string of the email address except the first letter of the email, the special character of the address (@), and the constant suffix of the email address (.com).

ALTER TABLE TestDDM ALTER COLUMN EmailAddress ADD MASKED WITH (FUNCTION = 'email()')

default(): The default() function masks the full value of a column.

This SQL command below masks the CreditCardNumber column in the TestDDM table. The function masks the complete number string and replaces the masked value with ‘X’.

ALTER TABLE TestDDM
ALTER COLUMN CreditCardNumber varchar(19) MASKED WITH (FUNCTION = 'default()') NULL

Review Results

Now, we will review the behavior of data masking implementation. We will run a Select query and review the data.

SELECT * FROM TestDDM;

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications

Figure 3 TestDDM table Data - Unmasked

We can see that data is still visible as inserted. There is no change in data behavior and the data doesn’t mask. The reason for this behavior is user permission. In the current scenario, my ID has db_owner permission and has full access to the data.

To understand the behavior of mask functions and masked data, we will create a new database user TestMaskUser (without login) and will grant select permission on the TestDDM table to the newly created database user.

CREATE USER TestMaskUser WITHOUT LOGIN;
GRANT SELECT ON TestDDM TO TestMaskUser;

Now, we will change the context of the query execution and review the TestDDM data table.

EXECUTE AS USER = 'TestMaskUser';
SELECT * FROM TestDDM;
REVERT;

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications

Figure 4 TestDDM table Data - Masked

We can see that after changing the user context, data in the TestDDM table has been masked and complete values are not visible to user TestMaskUser as expected.

a. SocialSecurityNumber is showing last 4 digits
b. EmailAddress is visible with first character, special character (@) and suffix (.com) only
c. CreditCardNumber has masked completely

Now, you can understand how easy it is to mask the data available in your database. In the next example, we will see the use of the random() function and how to mask the date of birth data.

random() - function is used on any numeric type to mask the original value with a random value within a specified range.

This SQL command below uses the random() function to mask the Income column in the TestPerson table. This function will replace the original value with a random number within a given range in the random() function.

ALTER TABLE TestPerson
ALTER COLUMN INCOME NUMERIC MASKED WITH (FUNCTION = 'random(1,100000)') NULL

This SQL command masks the DOB column in the TestPerson table with the default() function. This function will replace the actual DOB with a default value 1900-01-01 00:00:00:000.

ALTER TABLE TestPerson
ALTER COLUMN DOB  DATETIME MASKED WITH (FUNCTION = 'default()') NULL

Now, we will grant select permission to TestMaskUser and change the context of execution to review table TestPerson data.

GRANT SELECT ON TestPerson TO TestMaskUser
EXECUTE AS USER = 'TestMaskUser';
SELECT * FROM TestPerson;
REVERT;

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications

Figure 5 TestPerson table Data - Masked

We can see that after applying the random() function the value of income was changed and the date of birth has been replace with the default value 1900-01-01 00:00:00:000.

List of Masked Columns

At any point of time, we can find what columns have been masked and what mask functions have been used to implement the data masking in a database.

The SQL query below can help to identify the list of masked columns in a database.

SELECT a.name, b.name as table_name, a.is_masked, a.masking_function  
FROM sys.masked_columns AS a  
JOIN sys.tables AS b   
    ON a.[object_id] = b.[object_id];  

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications

Figure 6 List of Masked Columns

Grant and Revoke UNMASK Permission

We know that data masking is a feature used to secure data that should not be visible to unprivileged users.However, there are methods to control the access of data. We can use the query below to grant permission to UNMASK the user. After this, the unmasked user can see the actual stored data in a table rather than masked data.

GRANT UNMASK TO TestMaskUser;

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications

Figure 7 Grant UNMASK to DB User

Once we Grant UNMASK to TestMaskUser, the user can see actual data rather than masked data.

Similarly, if we Revoke UNMASK permission to TestMastUser, the user can see only masked data.

REVOKE UNMASK TO TestMaskUser;

Azure SQL Database and SQL Server, Oracle Database Tutorials and  Materials, Oracle Database Certifications

Figure 8 Revoke UNMSK to DB User