Page Life Expectancy and 300 sec limit

People blogged about it, including excellent Jonathan’s post here (highly recommended to read), but I will give a little more insight here. Basically, Microsoft says if PLE is below 300 seconds, we should worry about memory pressure (an old recommendation from 2006. still appearing today here and there). Jonathan shows how 300 sec threshold is obsolete, and that it should be per every 4GB of RAM. That makes a big difference on current servers which have much more RAM than 4GB. I will split it down to the NUMA node level, because today’s servers not only have plenty of RAM, but also plenty of CPU’s which are grouped into NUMA nodes, and so is the memory divided into memory nodes.

That means the PLE limit is better calculated per memory node. If we calculate it globally, it might hide the fact that a single memory node falls way below the treshold, because other numas are in good standing, healthy. If we look only global PLE we will not notice that one numa node’s PLE is not ok.

The Formula

The Jonathan’s formula for the PLE treshold to worry about is “(DataCacheSizeInGB/4GB)*300”. That means PLE of your system should normally be higher that that number. If you have 4GB of cached data, your treshold is 300 secs. If you have server with a lot of RAM, e.g. 40GB is cached, your PLE treshold should be 3 000 seconds. Basically, PLE should be 300 sec per every 4GB of data you have cached in the buffer pool.

This is the query of the PLE per NUMA node, normalized to 4GB, baked ready for you:

Example1

This server has more than 64GB of RAM, and a lot of CPU’s (around 80 cores) divided into 4 hardware NUMA nodes. The query result is below. All NUMAs have good PLE (ple_per_4GB is > 300 sec):

ple_per_4GB_ok

We can see that the values are not equal. ple_per_4gb of the best numa node is more than 2x larger than the worst numa, so global PLE might not be detailed enough in some situations and you may want to go to the NUMA level.

Example2

This server has a bit less CPUs than in previous example, but still a bunch of them, so they are divided into 2 hardware NUMA nodes. Server has a lot more memory, around 800 GB, quite a beast! One would expect that would be enough to have a good PLE. But surprisingly, the ple_per_4GB is below 300 sec:

ple_per_4GB_notok

This server has a problem. If memory is not reused efficiently, the performance will rely on the disk which is much slower than memory, and will generate a lot of disk IO and stress the storage. So, by looking at the PLE normalized to 4GB and per numa node, we can spot the opportunity for optimization and investigate further (which objects are using the most of buffer cache memory, the distribution of “hotness” of it’s indexes and partitions etc).

Summary

Use the query above to get per NUMA, 4GB normalized values of PLE. Then you can use the fixed treshold of 300 to raise an alarm or start the investigation.

4 comments on “Page Life Expectancy and 300 sec limit
  1. JRStern says:

    “This server has a problem”.

    OK, but with 400gb per node WHY does that server have a problem?

    We can guess: something is happening to flush the cache every few minutes. So it is possible that memory is plentiful and efficiently used, except for these events.

    You never say otherwise, but much discussion about PLE is in the context of “memory pressure”, and I just like to point out that while PLE is an excellent indicator, what it indicates may not really be memory pressure at all.

    Thanks.

    • Vedran Vedran says:

      Good thinking. PLE tells us how long average page stays in buffer pool. If we do only inserts (ETL into data warehouse), and no user is doing selects, PLE will be related to the speed at we are loading the data. If we have 800GB of RAM like this server, and PLE of 14 000 sec, that could be achieved by loading the data at speed 57 MB/sec (800GB / 14000 sec). Quite achievable. To detect memory pressure just PLE is not enough. But it is an indicator we should look at together with other to get a real picture of the situation.

  2. Van says:

    This is what i was looking for. Excellent and thank you!
    One quick question, I used your query both in SQL 2012 and 2008 with multiple numa node servers. I am getting result set only on the SQL 2012 servers not in the 2008 server. I am not sure why and trying to find how I can use the same query for SQL 2008 to get the PLE value for each numa node.
    Thank you in advance for any suggestion.

    • Vedran Vedran says:

      It is because the counter “Database Node Memory (KB)” does not exist in SQL2008R2. I adjusted the query to use “Database pages” instead, so it shows results even on versions prior to SQL2012, plus a little bonus: “ALL” nodes row.
      Thank you Van!

Leave a Reply

Your email address will not be published. Required fields are marked *

*