Collect Performance Data
Server is sometimes slow and you want to know why? Here is a lightweight performance logging kit which will give you the answer. It supports SQL Server and SQL Azure Databases, but perfmon part can be used on any other Windows machine to log perormance data, even without SQL Server. This lightweight logging is already in use by many productions with mission-critical workload. Advantage is – you are NOT installing any new tools: we exploit native, already existing features of windows and SQL, by configuring them to capture performance-relevant information.
There are two parts to install:
1. Perfmon counters– they log CPU, RAM, disk, number of TCP connections. As well as some SQL instance-level counters like lock wait time, memory, plan cache, batch requests, etc. SQL counters are added for every auto-detected sql instance on the machine. Instally by running powershell script “sqlcollector.ps1“. Can be used on any windows machine even without SQL Server, eg on Web server.
2. Extended Events – on SQL (Azure or classic) they log slow queries, blocked/deadlocked queries, and SQL errors.
- LongRunningStmt.sql – logs slow queries (eg, duration > 10s)
- BlockedProcessReport.sql – logs blocked queries (eg. blocked > 10s)
- ErrorEvents.sql – logs SQL errors
Examine the scripts before running, and adjust paths on disk. Make sure disk has enough space (>2GB).
Supports SQL2012 SP4+ till SQL2019. Older SQL2008R2 is not supported, but you could comment-out parts that throw error, because older SQL versions do not have some of events/attributes used in the script.
- if powershell fails, check this.
Creates Extended Events (XE) session that logs slow queries, blocks&deadlocks, and SQL errors:
- Set destination path before executing! Make sure SQL has permission to write there.
- Make sure you have supported version of SQL. If your version is below supported, upgrade to latest Service Pack and Cumulative Update before installing this XE.
- SQL2012 fails if dot (.) is in the name of the file, so remove dot from the script
Impact on production
This is super-lightweight logging. Overhead is zero, or close to. It is safe to leave it running even after performance tuning is finished. Because app code changes, and when new issue arrives, you are ready with logged event. The only thing to keep eye on is disk space. Because perfmon log creates around 40MB per day (30GB per 2 years) so make sure you have space or delete old logs every few years. Extended event logs are limited to 10 files, will roll-over over the oldest and therefore will not grow. They can take eg. around 2GB. One might want to adjust the thresholds for extended events to a higher value (parameters at beginning of each script) to capture less events and have longer recorded history.
Why is impact so low? Because we do not install anything new. We just configure existing stuff. Perfmon is already a part of Windows. We just use powershell to configure which counters to track every 30 seconds (eg CPU%). Also, Extended Events are existing part of SQL Server – we do not install a new tool here either. Extended events are lightweight by nature. We configure them to capture slow queries, blocked queries and errors. Those events are rare by nature, making this logging even more lightweight, with almost zero overhead. Runs on many production environments non-stop, because it is so light that stopping it would make no difference, and the data it logs is valuable. Even environments with super-heavy workload handle it flawlessly without a single “overhead” issue found so far.
2020-02-22 Version 2.4
- Fix: Perfmon – more succesfull install. Now asks for credentials, instead of failing miserably. Also fixes execution policy if needed. Added instructions.
- Fix: LongRunningStmt now logs abortied queries independent of “slowness threshold” value. Every aborted query (timeout) is logged, not just ones above threshold.
- Improved: BlockedProcessReport analysis script now can decode waited resource names and has accurate local time. Deadlock analysis added.
- Improved: ErrorEvents now logs tsql_stack so we can find who caused the error. Analysis part is improved, extracting error per hour, statement, object.
- Improved: LongRunningStmt now logs client process ID so we can identify exact process on the client machine that called a slow statement. Analysis is also improved with a summary row at the top so we can see how much each query participates in total.
2019-08-29 Version 2.2
- Improved: Perfmon now has Page writes/sec, and removed 3 useless counters around transaction log (info visible anyway from other counters).
2019-01-04 Version 2.0
- New: SQL Azure Database support, XE sessions adjusted to work on Azure SQL Databases
- New: DBA.ErrorEvents – captures SQL errors for later analysis
- Improved: blockage XE by adding deadlocks
- Improved: Configurable XE session names
2018-09-08 Version 1.3
- New: Added BlockedProcessReport.sql
- Improved: LongRunningStmt
2018-05-25 Version 1.2