Optimize SQL queries using subqueries for lookups in Azure SQL Databases

I like to work with Azure SQL Databases and use data from the cloud without any administrative effort. When working with apps, App Insights helps to detect anomalies, and to analyze bottle necks. Although SQL Azure can scale very well, we discovered performance issues in some legacy apps. Read here how we solved that by optimizing TSQL data queries.

The scenario

There is an older web application here that is connected to an SQL database that is running in an SQL elastic pool. App Insights showed the reason (the SQL queries) for the bad performance. Diving deeper, the elastic pool and the database / Query Performance showed performance issues in detail with certain queries which had very long runtimes. As we see in the following screenshot, there are two long running queries. Query 164 has been executed almost 274,000 times in the past 24 hours, consuming more than 6:40 hours. We started to investigate this issue.

Analyzing one specific TSQL query

Clicking on the query 164 exposes the underlying SQL query. So, we used SQL Management Studio to test the query and to optimize that behavior.

This is a view with a table that is joined with other tables. The background is that there is a master table with persons and another table with (multiple) functions of a person. So, e.g. *Stefan *(master table) can have role67, role71 and role72 (functions table) assigned as shown below. We see column Aemter (German for function) that contains none, one or more functions for a person.

 SELECT ID, Vorname, Aemter FROM view1 ORDER BY Aemter DESC

The specialty is that view1 is using a scalar values function to lookup the multiple functions and to transform that data into a field. As we see, the result is almost 11,000 rows and the query took 9 seconds to deliver the result. The problem here is, that this query is a) executed almost 274,000 times per day and b) very slow.

Why the query is slow

Well, we cannot change a) the number of calls without adapting the app code. But, we can optimize the TSQL query b). Here it is (simplified):

 CREATE VIEW [dbo].[view1]  
AS  
SELECT dbo.Mitglieder.ID,  
        dbo.Mitglieder.Vorname,  
        <font style="background-color: rgb(255, 255, 0);">dbo.fnGetMitgliederAemter(dbo.Mitglieder.ID) AS Aemter</font>  
FROM   dbo.Mitglieder -- plus eventually some JOINs

When we run that query, the database server already cached data, but it still runs 8 seconds.
When we run that query without the line “dbo.fnGetMitgliederAemter(dbo.Mitglieder.ID) AS Aemter”, the database server shows the 11,000 rows in … 0 seconds.

So we see, the bad performance is coming from a scalar values function fnGetMitgliederAemter(). that does the lookup of the functions for a person. With about 11,000 persons, this is time consuming.

If we check the function, we see that this works in principle, but there is room for improvement. There may be little sample data available when this feature was created many years ago.

 -- The OLD function  
CREATE FUNCTION [dbo].[<font style="background-color: rgb(255, 255, 0);">fnGetMitgliederAemter</font>](  
     @personid int  
) Returns Varchar(MAX)  
BEGIN  
declare @out VARCHAR(MAX)  
select @out = COALESCE(@out + ', ','') + b from   
(select Distinct isNull([NameAmt],'') as b from [dbo].[MitgliederAemterView]   
where ( FK_MitgliederID = @personid ) as a

 if len(@out) > 0   
  Begin  
   if left(@out,1)=','  
   Begin  
    set @out= right(@out, len(@out)-1)  
   End  
  End

RETURN @out  
END

The function is generating a new string @out that appends every role from the table and returns the output. Now that is no longer practical.

Tip: Avoid using database functions when working with large data sets.

How to optimize

We needed a replacement for the function. The web app shall be unchanged, so we must deliver column Aemter in a different way. After my SQL times are long gone, Google helped.

How to use COALESCE with multiple rows and without preceding comma? with an answer from Aaron Bertrand on dba.stackexchange.com described a cool solution for our problem. There are some workarounds included: Using a sub query with FOR XML PATH(’’)) to retrieve formal results of a SQL query as XML, and the STUFF() function to format the collected result. The STUFF() function deletes a part of a string and then inserts another part into the string, starting at a specified position: STUFF(string_to_be_modified, start, length, new_string)

 SELECT STUFF('SQL helps', 1, 3, 'Stuff');  
returns  
Stuff helps

In our sample, the optimized query view2 below is using a Sub select and the helper functions instead of the scalar function. The person´s functions are selected and returned as XML, as here.

 SELECT [NameAmt] FROM [MitgliederAemterView] FOR XML PATH ('')  
<NameAmt>role67</NameAmt>  
<NameAmt>role71</NameAmt>  
<NameAmt>role72</NameAmt> …

The STUFF() function adds each value with the two separator characters at the beginning of the output in a clever way and finally removing the leading ‘, ’ separator. See also Aaron´s answer here.

 -- The NEW method  
CREATE VIEW [dbo].[view2]  
AS  
SELECT dbo.Mitglieder.ID,  
        dbo.Mitglieder.Vorname,  
        <font style="background-color: rgb(255, 255, 0);">(STUFF((SELECT N', ' + [NameAmt]  
                FROM   [MitgliederAemterView]  
                WHERE  ( FK_MitgliederID = dbo.Mitglieder.ID )  
                FOR    XML PATH (''), TYPE ).value(N'.[1]', N'nvarchar(max)'), 1, 2, N'')) AS Aemter</font> FROM   dbo.Mitglieder -- plus eventually some JOINs

Now, we run a new query using view2:


SELECT ID, Vorname, Aemter FROM view2  
where Aemter is not NULL

This time, we get the 11,000 rows in less than 1 second. Winking smile

Tip: Use the query sample above to replace database functions as shown here.

Summary

In many cases, we can optimize applications on multiple levels. The Azure platform provides a bunch of tools to analyze performance and anomalies. In our sample, App Insights helped to identify the issue, and the database query optimization lead to a much better performance in the application. Now, in the app, the list of people is returned within a second instead of waiting at least 10 seconds. This is a significant optimization that has only been achieved through improved logic.

We hope these quick tips provide ideas on how to optimize your database operations, in on-premises SQL servers and in the cloud.

The scenario#

Analyzing one specific TSQL query#

Why the query is slow#

How to optimize#

Summary#

The scenario

Analyzing one specific TSQL query

Why the query is slow

How to optimize

Summary