Blog: Collecting business-critical transaction throughput for analyzing using Power BI

In my blog post Using Power BI to analyze Windows Perfmon data for SQL Server, I introduced a concept that I call “business-critical transactions” (BCT) to understand the overall throughput of your SQL Server applications. When I worked on the SQL Server team, the SQL Performance Engineering team used a set of customer applications with server-side replay scripts that would get executed before development teams checked in their code. If a development team’s build showed significant regression in performance, they could not check in their code. Organizations that practice DevOps use a similar process, but need to consider ways to minimize the time needed to look for regressions in performance.

How to identify meaningful business-critical transactions

So, how do you determine what transactions you should use for measuring performance for SQL Server for purposes of optimizing deployments on AWS EC2 or any other platform?

As a fan of Dr. Goldratt’s Theory of Constraints, I’m always thinking of ways to improve the overall throughput systems and identifying the key constraints that limit them. Sure, you can look at all the queries that your applications execute over a given period of time. However, which of those queries really matter?

For example, NASDAQ on January 13, 2020, recorded 1,127,485,803 equity trades for the United States for the first 10 hours of the full trading day. That comes out to approximately 31,319 trades per second! One might guess that the transaction for executing trades would be considered as a potential constraint.

The idea is to identify the high volume queries that add the most value to your business. For Amazon, it might be new orders processed and time to return search results. There are lots of queries that Amazon needs to accomplish, but the two examples I provided our business experiences that their customers have.

These are your performance BCTs.

We use HammerDB with the TPC-C and TPC-H benchmark testing as a proxy against different AWS EC2 instances and configurations to provide baselines to help our customers make data-based decisions when moving their SQL Server instances to AWS. For the TPC-C benchmark that measures OLTP performance, the New Orders Per Minute (NOPM) is considered the BEQ. If you want to read more about this metric, check out the blog post Why both TPM and NOPM Performance Metrics?

As a SQL DBA, I’m sure you have a SQL Server floating around that you can use to collect this data. If not, you can always write the data to CSV files on a common share and process them in Power BI. The important thing is to include the event date and time to correlate with the performance counter data so that for optimizing the overall system with SQL Server and AWS EC2 instances.

Making HammerDB log files Power BI friendly

Obviously, you’ll want to take into consideration how you want to format your BCT data to align with any performance data you are collecting at the same time. However, extracting data from the HammerDB TPC-C log files takes some pre-processing work. In this section, I’m going to walk through some PowerShell scripts that we use as part of our capacity planning performance testing for SQL Server in the cloud.

Hammer DB log file format

There are two basic log file formats available with HammerDB results. The Time Profile provides detailed information every 10 seconds for one of the virtual users. The HammerDB site has a great article on How to Graph HammerDB Response Times. I like to use the simpler combination of Log Output to Temp + Use Unique Log Name + Log Timestamps options since I know it has very little impact on the server that I’m testing. For each virtual user run, HammerDB creates a log file in the location of the Windows temp directory. I’ve included all 9 log files in the GitHub directory for this project. Here is an example of a 3 virtual user result file from the performance run that generated the first series of performance counters.

Hammerdb Log @ Tue Jan 07 12:01:32 PST 2020
Timestamp 1 @ Tue Jan 07 12:01:32 PST 2020
Vuser 1:Beginning rampup time of 2 minutes
Timestamp 2 @ Tue Jan 07 12:01:33 PST 2020
Vuser 2:Processing 1000000 transactions with output suppressed...
Timestamp 3 @ Tue Jan 07 12:01:33 PST 2020
Vuser 3:Processing 1000000 transactions with output suppressed...
Timestamp 4 @ Tue Jan 07 12:01:34 PST 2020
Vuser 4:Processing 1000000 transactions with output suppressed...
Timestamp 1 @ Tue Jan 07 12:02:34 PST 2020
Vuser 1:Rampup 1 minutes complete ...
Timestamp 1 @ Tue Jan 07 12:03:35 PST 2020
Vuser 1:Rampup 2 minutes complete ...
Timestamp 1 @ Tue Jan 07 12:03:36 PST 2020
Vuser 1:Rampup complete, Taking start Transaction Count.
Timestamp 1 @ Tue Jan 07 12:03:37 PST 2020
Vuser 1:Timing test period of 3 in minutes
Timestamp 1 @ Tue Jan 07 12:04:38 PST 2020
Vuser 1:1 ...,
Timestamp 1 @ Tue Jan 07 12:05:39 PST 2020
Vuser 1:2 ...,
Timestamp 1 @ Tue Jan 07 12:06:39 PST 2020
Vuser 1:3 ...,
Timestamp 1 @ Tue Jan 07 12:06:41 PST 2020
Vuser 1:Test complete, Taking end Transaction Count.
Timestamp 1 @ Tue Jan 07 12:06:45 PST 2020
Vuser 1:3 Active Virtual Users configured
Timestamp 1 @ Tue Jan 07 12:06:46 PST 2020
Vuser 1:TEST RESULT : System achieved 53319 SQL Server TPM at 11591 NOPM

Here are the key lines that we need to extract from this file:

  • Line 3 – The beginning of the Rampup time
  • Line 20 – The end of the Rampup time
  • Line 28 – The number of Virtual users
  • Line 29 – The end of the Transaction Count time
  • Line 30 – The TPM and NOPM results where NOPM value is what we’ll use as the BCT for this benchmark.

What we want to do is shape the data contained in the 9 log files generated by HammerDB into one CSV file.

"Task","Mode","Rampup Start Time","Transaction Start Time","Transaction End Time","Virtual Users","TPM","NOPM"
"Blog-B1","TPCC","1/7/2020 8:01:32 PM","1/7/2020 8:03:36 PM","1/7/2020 8:06:45 PM","3","53319","11591"
"Blog-B1","TPCC","1/7/2020 8:10:48 PM","1/7/2020 8:12:51 PM","1/7/2020 8:15:59 PM","5","61684","13366"
"Blog-B1","TPCC","1/7/2020 8:20:05 PM","1/7/2020 8:22:08 PM","1/7/2020 8:25:14 PM","8","66804","14516"
"Blog-B1","TPCC","1/7/2020 8:29:19 PM","1/7/2020 8:31:23 PM","1/7/2020 8:34:28 PM","13","68388","14857"
"Blog-B1","TPCC","1/7/2020 8:38:39 PM","1/7/2020 8:40:42 PM","1/7/2020 8:43:50 PM","21","67997","14799"
"Blog-B1","TPCC","1/7/2020 8:48:05 PM","1/7/2020 8:50:08 PM","1/7/2020 8:53:16 PM","34","69278","15088"
"Blog-B1","TPCC","1/7/2020 8:57:34 PM","1/7/2020 8:59:37 PM","1/7/2020 9:02:55 PM","55","68069","14754"
"Blog-B1","TPCC","1/7/2020 9:07:12 PM","1/7/2020 9:09:15 PM","1/7/2020 9:12:46 PM","89","66863","14609"
"Blog-B1","TPCC","1/7/2020 9:17:06 PM","1/7/2020 9:19:08 PM","1/7/2020 9:23:31 PM","144","78844","16637"

You can find the source file on our GitHub project at FinalResult-TPCC-Blog-B1.csv.

Using PowerShell to Preprocess Files into a CSV file for Power BI

Power BI has awesome tools for pre-processing data from multiple files. For example, in my original Power BI report, I used Power Query Editor to load the Log files for a test run by using the Get Data using the File > Folder option. The problem is that Power BI gets slower and slower as it processes more and more files.

By using PowerShell to shape the files, we can minimize the processing needed for Power BI. In addition, having a ready to go CSV files makes importing the data into SQL Server easy by extending the PowerShell script to load the data directly into a table. That’s for another blog.


Share this...
Share on Facebook
Tweet about this on Twitter
Share on LinkedIn