Takin' It to the Shop (With the DBA) – Matt Zuberko, SQL Server Professional

So, our solid-state implementation did live up to expectations. What happened? All the vendor trials were great, hook up was flawless, and SQL Server started. What did we miss?

The DBA Takes Another Crack at It

After working six years on the ABCs for over 100 instances, 2,000 databases and a couple of dozen clusters, I wanted to focus on performance. The hybrid data warehouse was still the big beast in the SQL Server portfolio. Let’s see what we can do…

Let’s look at the performance counters in a totally different light. Let’s not only capture them on the host side where SQL Server is running, but see how it lines up with the SAN array’s side.

During a typical week, I captured PerfMon on the 4 SSD LUNs presented to us, looking at…

Logical Disk: Avg. Disk sec/Transfer
Logical Disk: Bytes/sec
Disk Transfers/Sec

The results were pretty much what we’ve seen before; IOPS ranging in the 2k-8k range with peaks over 10k , but nowhere near the 300,000 quoted by the SAN vendor. Latency was high going past 100 ms often. Throughput was meh at around 400 MB/s.

Going one step further, I asked the SAN team to send me a chart of what they see on LUNs they present to database hosts. So they send me a sample…

This was confusing. IOPS numbers and Throughput numbers line up, but their latency numbers stayed in the single digits like they’re supposed to for solid state devices. Why is this host getting 100 ms latency for relatively light load?

Stress-Test the Storage

A quick test of our storage was in order. We wanted to see how fast the storage actually could go regardless of our actual query load from our customers. How do you generate a good amount of load to stress test the system?

The quickest way to to stress test the storage is to use a utility like Crystal DiskMark. Crystal’s underlying architecture uses Microsoft’s DiskSpeed (formerly known as SQLIO) to run a series of tests comparable to file system operations. While not a perfect SQL Server emulator, it puts enough load on the system to see how far you can push it roughly.

So we installed Crystal, configured it to run five rounds of tests with a 4 GB test files. Crystal performs both sequential and random access patterns. The latest versions, allow you to configure the number of threads and outstanding I/Os. Even the older version, like seen here, give you enough data to give you a good primer into how your disk subsystem performs.

———————————————————————–
CrystalDiskMark 3.0.4 x64 (C) 2007-2015 hiyohiyo
Crystal Dew World : http://crystalmark.info/
———————————————————————–
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

Sequential Read : 437.499 MB/s
Sequential Write : 270.043 MB/s
Random Read 512KB : 193.892 MB/s
Random Write 512KB : 220.735 MB/s
Random Read 4KB (QD=1) : 5.580 MB/s [ 1362.4 IOPS]
Random Write 4KB (QD=1) : 6.317 MB/s [ 1542.3 IOPS]
Random Read 4KB (QD=32) : 157.154 MB/s [ 38367.7 IOPS]
Random Write 4KB (QD=32) : 151.624 MB/s [ 37017.6 IOPS]

Test : 4000 MB [J: 53.2% (1088.9/2047.9 GB)] (x5)
Date : 2017/12/03 0:37:38
OS : Windows Server 2008 R2 Enterprise (Full installation) SP1 [6.1 Build 7601] (x64)

The results were startling: our problem not IOPS, but throughput. We discovered that we can only push about 430 MB/second between our host and the SAN. For a data warehouse, that’s quite low; a common rule of thumb for SQL Server performance is to push up to 200 MB/second per core when scanning a fact table for processing a cube. If we’re only pushing up to 400, only two cores are going to be maxxed out. But, if I paid for 32 cores and I’m now wasting 30 of them, and each of those 30 cost $7,000 to license, $210,000 in SQL licensing is being unused.

Ouch.