SQL server performance

From PARS
Jump to navigationJump to search

This guide is used if you have run the TASC Infrastructure test and your server has failed or was only a borderline pass. Several known issues and likely fixes are listed in this document. It is important to note this document, while useful, is not intended as a comprehensive list of all factors affecting SQL performance.

Firstly you might try re-running the Speed Test at a quieter time; a fail could be a one-off due to a long running SIMS process. If for example the average is 1500 over 5 runs with one run of 900, the higher average rating is likely to be accurate and your server would pass.

Archiving SIMS Attendance Data

The first thing you should check is whether the attendance data in SIMS has been archived recently. Capita recommend that you archive your attendance data every year. If this has not been done for many years, it can cause performance issues.

If you have archived SIMS attendance data, re-run the TASC Infrastructure Test to see whether this has made a difference. If your server still fails the test, move on to the next section.

CPU, Memory and Disk Issues

SQL bottlenecks can be caused by excessive CPU, Memory or Disk usage. Use Task Manager, Resource Manager and/or Performance Monitor to check whether this is the case.

Excessive CPU Usage

If your CPU graph is maxed out nearly all the time then the CPU is probably the limiter. Increasing the number of cores or upgrading the existing cores will solve this issue. Note that VMs may be restricted by their host.

Use the Number of CPUs and Cores statement to find out how many CPUs and cores are available to the server. The CPU Utilisation History statement will show you the percentage use of the CPU over time, and how that use is divided between SQL and other processes.

Excessive Memory Usage

SQL will always use a large portion of the memory that is available so high memory usage does not necessarily indicate that more memory is required. You must use your own judgement to decide whether the SQL server has an adequate amount of memory.

If you believe the server has adequate memory available, you should only consider increasing the available memory after all other options have been exhausted, unless there is other evidence of a shortage of memory.

Excessive Disk Usage

If the bottleneck is not the CPU then the issue may be drive latency. You can check the drive latency using the Drive Latency and IO waits SQL statement.

If there is a high level of drive latency, disk access is probably limiting the speed of the server. Check that:

  • Your SIMS database is running at the correct compatibility level. To test, use the Large LDF or bad compatibility mode SQL statement.
  • Your log files are not being overused. Use the Large LDF or bad compatibility mode SQL statement to find the size of the LDF file and shrink it if required.
  • There is enough free hard drive space and your server is not reporting IO warnings (IO warnings in particular are a significant issue). Use the IO Warnings SQL statement to check IO availability.
  • The Auto_Close and Auto_Shrink fields of the SIMS database are not enabled (they should be disabled).

If there is enough disk space, the log files are a reasonable size and the database is configured correctly, then the drives themselves could be the problem. See the Other Common SQL Server Issues section next.

Other Common SQL Server Issues

This is not a definitive list of potential issues on the SQL server, but many of the issues listed below have been observed as the cause of performance issues in live school environments.

  • Running antivirus on the SIMS MDF and LDF data files. These files should be excluded from the Antivirus
  • Power Management settings on the VM Host limiting the Host CPU to save power. The CPU should not be limited
  • Adding more cores to VM Guests than the Host has physically available
  • Using the default Microsoft NIC driver on a VM rather than the correct manufacturer's driver
  • Limiting the SQL server to run on a single CPU with Processor Affinities. Use the SSMS Server Properties Processor to prevent this happening
  • Having the SIMS MDF and LDF files on the OS volume
  • Not routinely re-indexing the SIMS database
  • Running Third Party tools constantly against the SIMS database or during peak times
  • Unusually large tables in SQL. This may be a symptom of a larger problem - you should investigate if you think any tables are unusually large
  • Having SQL and and other hard-used services on the same physical server

Additionally the following may apply to you depending on your server setup:

  • If you have hosted the SIMS MDF and LDF files on a fast SSD make sure that the frequently used TempDB has not been left on a slow mechanical drive
  • If you are using RAID make sure that it is set to WRITE THROUGH and not WRITE BACK

The following configurations may positively affect the performance of the SQL server:

  • Host the SIMS ldf and mdf files on separate physical drives (not on separate partitions of the same physical drive)
  • Use RAID 10 to maintain the SIMS ldf file over two separate drives. This requires specialist hardware and may be costly

SQL Statements

IO Warnings

If you see any 15 second warnings, the required IO is not being made available by the drives. Ideally you should not see anything returned by this statement.

If you are seeing 15 second warnings you may find the following links useful:

https://blogs.msdn.microsoft.com/sqlsakthi/2011/02/09/troubleshooting-sql-server-io-requests-taking-longer-than-15-seconds-io-stalls-disk-latency/

https://blogs.msdn.microsoft.com/karthick_pk/2012/06/26/io-requests-taking-longer-than-15-seconds-to-complete-on-file/

CREATE TABLE #IOWarningResults(LogDate datetime, ProcessInfo sysname, LogText nvarchar(1000));
Insert INTO #IOWarningResults 
EXEC xp_readerrorlog 0, 1, N'taking longer than 15 seconds';
Insert INTO #IOWarningResults 
EXEC xp_readerrorlog 1, 1, N'taking longer than 15 seconds';
Insert INTO #IOWarningResults 
EXEC xp_readerrorlog 2, 1, N'taking longer than 15 seconds';
Insert INTO #IOWarningResults 
EXEC xp_readerrorlog 3, 1, N'taking longer than 15 seconds';
Insert INTO #IOWarningResults 
EXEC xp_readerrorlog 4, 1, N'taking longer than 15 seconds';
Select  LogDate, ProcessInfo, LogText FROM #IOWarningResults ORDER BY LogDate DESC;
DROP TABLE #IOWarningResults;


Drive Latency and IO Waits

This statement will show you the drive-level latency for reads and writes, in milliseconds. Latency above 20-25ms is usually a problem.

If your server latency is high you may find the following link useful:

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/9d746f83-43c8-426c-9269-586bd0dd44d3/io-and-drive-latency?forum=sqlgetstarted

SELECT [Drive],
CASE WHEN num_of_reads = 0 THEN 0
ELSE (io_stall_read_ms/num_of_reads) END AS [Read Latency],
CASE WHEN io_stall_write_ms = 0 THEN 0
ELSE (io_stall_write_ms/num_of_writes) END AS [Write Latency],
CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE (io_stall/(num_of_reads + num_of_writes)) END AS [Overall Latency],
CASE WHEN num_of_reads = 0 THEN 0
ELSE (num_of_bytes_read/num_of_reads) END AS [Avg Bytes/Read],
CASE WHEN io_stall_write_ms = 0 THEN 0
ELSE (num_of_bytes_written/num_of_writes) END AS [Avg Bytes/Write],
CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) END AS [Avg Bytes/Transfer]
FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
INNER JOIN sys.master_files AS mf WITH (NOLOCK)
ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
GROUP BY LEFT(UPPER(mf.physical_name), 2)) AS tab
ORDER BY [Overall Latency] OPTION (RECOMPILE);


Number of CPUs and Cores

This statement will show you how many CPUs and cores the server has physically available, and how many virtual CPUs exist.

DECLARE @xp_msver TABLE (
[idx] [int] NULL 
,[c_name] [varchar](100) NULL 
,[int_val] [float] NULL 
,[c_val] [varchar](128) NULL 
) 
              
INSERT INTO @xp_msver 
EXEC ('[master]..[xp_msver]');
             
WITH [ProcessorInfo] 
AS (SELECT ([cpu_count] / cast([hyperthread_ratio] as FLOAT)) AS [number_of_physical_cpus],
CASE WHEN hyperthread_ratio = cpu_count THEN cpu_count
ELSE (([cpu_count] - cast([hyperthread_ratio] as FLOAT)) / ([cpu_count] / cast([hyperthread_ratio] as FLOAT))) END AS [number_of_cores_per_cpu],
CASE WHEN hyperthread_ratio = cpu_count THEN cpu_count
ELSE ([cpu_count] / cast([hyperthread_ratio] as FLOAT)) * (([cpu_count] - [hyperthread_ratio]) / ([cpu_count] / cast([hyperthread_ratio] as FLOAT)))
END AS [total_number_of_cores], [cpu_count] AS [number_of_virtual_cpus], (
SELECT [c_val] FROM @xp_msver WHERE [c_name] = 'Platform') AS [cpu_category] 
FROM [sys].[dm_os_sys_info]) 
SELECT [number_of_physical_cpus],[number_of_cores_per_cpu],[total_number_of_cores],[number_of_virtual_cpus],LTRIM(RIGHT([cpu_category],CHARINDEX('x', [cpu_category]) - 1)) AS [cpu_category] 
FROM [ProcessorInfo]


CPU Utilisation History

This statement will show you the CPU usage of the server for every minute over the last few hours. Look at the trend over the entire period.

Does SQL account for the majority of CPU usage or are other processes limiting what SQL is able to do? Are there are points during the day where CPU is especially high, and if so why?

DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks) FROM sys.dm_os_sys_info WITH (NOLOCK)); 
SELECT TOP(256) SQLProcessUtilization AS [SQL Server Process CPU Utilization], SystemIdle AS [System Idle Process], 100 - SystemIdle - SQLProcessUtilization AS [Other Process CPU Utilization], DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()) AS [Event Time] 
FROM (SELECT record.value('(./Record/@id)[1]', 'int') AS record_id, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle], record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [SQLProcessUtilization], [timestamp] 
FROM (SELECT [timestamp], CONVERT(xml, record) AS [record] 
FROM sys.dm_os_ring_buffers WITH (NOLOCK)
WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' 
AND record LIKE N'%<SystemHealth>%') AS x) AS y 
ORDER BY record_id  OPTION (RECOMPILE);


Large LDF or incorrect compatibility mode

The following statement will give you information about the databases hosted on this instance of SQL. You should consider the following questions:

  • How many databases are on the instance? Is it possible that other databases are affecting the performance of SIMS
  • How full are the transaction logs? Check your Recovery Model and backup schedule
  • What compatibility level are the databases on? Check with SIMS Support to find the best level for your server version
  • What is the Page Verify Option? It should be CHECKSUM
  • Is Auto Update Statistics Asynchronously enabled? We recommend setting it back to the default (disabled) unless you’ve got evidence that delays caused by statistics updates are causing you problems

SELECT db.name AS [Database Name], db.recovery_model_desc AS [Recovery Model], db.state_desc, 
db.log_reuse_wait_desc AS [Log Reuse Wait Description], CONVERT(DECIMAL(18, 2), ls.cntr_value / 1024.0) AS [Log Size (MB)], 
CONVERT(DECIMAL(18, 2), lu.cntr_value / 1024.0) AS [Log Used (MB)], CAST(CAST(lu.cntr_value AS FLOAT) 
/ CAST(ls.cntr_value AS FLOAT) AS DECIMAL(18, 2)) * 100 AS [Log Used %], db.compatibility_level AS [DB Compatibility Level], 
db.page_verify_option_desc AS [Page Verify Option], db.is_auto_create_stats_on, db.is_auto_update_stats_on, 
db.is_auto_update_stats_async_on, db.is_parameterization_forced, db.snapshot_isolation_state_desc, 
db.is_read_committed_snapshot_on, db.is_auto_close_on, db.is_auto_shrink_on, db.target_recovery_time_in_seconds, 
db.is_cdc_enabled
FROM sys.databases AS db WITH (NOLOCK) INNER JOIN
sys.dm_os_performance_counters AS lu WITH (NOLOCK) ON db.name = lu.instance_name INNER JOIN
sys.dm_os_performance_counters AS ls WITH (NOLOCK) ON db.name = ls.instance_name
WHERE (lu.counter_name LIKE N'Log File(s) Used Size (KB)%') AND (ls.counter_name LIKE N'Log File(s) Size (KB)%') AND 
(ls.cntr_value > 0) OPTION (RECOMPILE)


Is the server virtual or physical?

This statement will tell you whether or not the machine is physical or virtual

SELECT dosi.virtual_machine_type_desc
,Server_type = CASE 
WHEN dosi.virtual_machine_type = 1
THEN 'Virtual' 
ELSE 'Physical'
END
FROM sys.dm_os_sys_info dosi


If the machine is virtual, this statement will tell you the physical host name and the virtual machine name

CREATE TABLE #GetPort
(
token varchar(100),
value varchar(20))

INSERT #GetPort
EXEC master..xp_regread
'HKEY_LOCAL_MACHINE',
'Software\Microsoft\Virtual Machine\Guest\Parameters',
'PhysicalHostName'

INSERT #GetPort
EXEC master..xp_regread
'HKEY_LOCAL_MACHINE',
'Software\Microsoft\Virtual Machine\Guest\Parameters',
'VirtualMachineName'
SELECT * FROM #GetPort
DROP TABLE #GetPort