Page Life Expectancy and 300 sec limit
People blogged about it, including excellent Jonathan’s post here (highly recommended to read), but I will give a little more insight here. Basically, Microsoft says if PLE is below 300 seconds, we should worry about memory pressure (an old recommendation from 2006. still appearing today here and there). Jonathan shows how 300 sec threshold is obsolete, and that it should be per every 4GB of RAM. That makes a big difference on current servers which have much more RAM than 4GB. I will split it down to the NUMA node level, because today’s servers not only have plenty of RAM, but also plenty of CPU’s which are grouped into NUMA nodes, and so is the memory divided into memory nodes.
That means the PLE limit is better calculated per memory node. If we calculate it globally, it might hide the fact that a single memory node falls way below the treshold, because other numas are in good standing, healthy. If we look only global PLE we will not notice that one numa node’s PLE is not ok.
The Jonathan’s formula for the PLE treshold to worry about is “(DataCacheSizeInGB/4GB)*300”. That means PLE of your system should normally be higher that that number. If you have 4GB of cached data, your treshold is 300 secs. If you have server with a lot of RAM, e.g. 40GB is cached, your PLE treshold should be 3 000 seconds. Basically, PLE should be 300 sec per every 4GB of data you have cached in the buffer pool.
This is the query of the PLE per NUMA node, normalized to 4GB, baked ready for you:
-- ple_per_4gb of every NUMA node should be > 300 sec !
SELECT numa_node = ISNULL(NULLIF(ple.instance_name, ''), 'ALL'),
ple_sec = ple.cntr_value, db_node_mem_GB = dnm.cntr_value*8/1048576,
ple_per_4gb = CASE WHEN ple.instance_name = ''
THEN (SUM(ple.cntr_value) OVER()-ple.cntr_value) * 4194304 / (dnm.cntr_value*8)
ELSE ple.cntr_value * 4194304 / (dnm.cntr_value*8)
FROM sys.dm_os_performance_counters ple join sys.dm_os_performance_counters dnm
on ple.instance_name = dnm.instance_name
and ple.counter_name='Page life expectancy' -- PLE per NUMA node
and dnm.counter_name='Database pages' -- buffer pool size (in pages) per NUMA node
This server has more than 64GB of RAM, and a lot of CPU’s (around 80 cores) divided into 4 hardware NUMA nodes. The query result is below. All NUMAs have good PLE (ple_per_4GB is > 300 sec):
We can see that the values are not equal. ple_per_4gb of the best numa node is more than 2x larger than the worst numa, so global PLE might not be detailed enough in some situations and you may want to go to the NUMA level.
This server has a bit less CPUs than in previous example, but still a bunch of them, so they are divided into 2 hardware NUMA nodes. Server has a lot more memory, around 800 GB, quite a beast! One would expect that would be enough to have a good PLE. But surprisingly, the ple_per_4GB is below 300 sec:
This server has a problem. If memory is not reused efficiently, the performance will rely on the disk which is much slower than memory, and will generate a lot of disk IO and stress the storage. So, by looking at the PLE normalized to 4GB and per numa node, we can spot the opportunity for optimization and investigate further (which objects are using the most of buffer cache memory, the distribution of “hotness” of it’s indexes and partitions etc).
Use the query above to get per NUMA, 4GB normalized values of PLE. Then you can use the fixed treshold of 300 to raise an alarm or start the investigation.
UPDATE: 28.2.2018 Corrected “all” row in query