Frits Hoogland: All about pg_stat_statements

Almost everyone using PostgreSQL, as well as the users of YugabyteDB, are using pg_stat_statements for looking at historical execution. However, there are some things that lots of people keep on wondering about, which is what this blogpost is about.

First about pg_stat_statements itself. Pg_stat_statements is an extension, and not part of the core database. However, it is a so-called ‘contrib extension’, which comes with the postgres source. The functionality of pg_stat_statements sits in a library called pg_stat_statements.so, and postgres has to be configured to load the library via the setting shared_preload_libraries.

The way extensions in postgres work is very interesting, although beyond the scope of the article to dig in too deep. The postgres executable code contains ‘hooks’, which allow extensions such as pg_stat_statements to attach itself to, which means that the extension functionality gets triggered/executed when the postgres execution triggers such an hook. A great in-depth presentation can be found here. The hooks that pg_stat_statements uses be seen here.

configuration

pg_stat_statements allocates a fixed amount of shared memory at server startup time to allocate a state struct and a hash table. This can be a fixed size allocation because pg_stat_statements stores the SQL text in a file, and stores the offset in the hash table.

The amount of statements pg_stat_statements can store is set with pg_stat_statements.max and by default is set to 5000.

By default, pg_stat_statements saves the statements data into a file, so the information is kept over server stop and start. This behaviour is set with pg_stat_statements.save, which is on by default.

pg_stat_statements can be configured to track statements in the following ways by setting pg_stat_statements.track:

  • none: no statements are tracked.
  • top: (default) only ‘top level statements’ are tracked. This means SQL inside functions and procedures are not tracked.
  • all: this tracks all statements, including ‘nested’ statements.

If you are in the process of tuning or investigating a database that uses functions and/or procedures, setting pg_stat_statements.track to all can be very useful to see the SQL that is executed inside the functions and procedures.

pg_stat_statements by default also tracks so-called ‘utility’ statements, which are statements other than SELECT, INSERT, UPDATE and DELETE. This can be changed by setting pg_stat_statements.track_utility to off.

the SQL

The way pg_stat_statements deals with the SQL text is something that needs to be understood to make use of pg_stat_statements.

First of all: if an execution does not finish successfully, the execution is NOT recorded in pg_stat_statements, despite potentially a lot of work done.

When a statement is put in the pg_stat_statements hash table, it tries to group equivalent queries/statements. For that, pg_stat_statements uses a hash key:

typedef struct pgssHashKey
{
    Oid         userid;         /* user OID */
    Oid         dbid;           /* database OID */
    uint64      queryid;        /* query identifier */
} pgssHashKey;

This reveals that any statement is uniquely identified by user OID and database OID additional to a query ID. So a truly identical SQL executed using another user will generate its own statement in pg_stat_statements. The statement is identified by a query identifier.

This query identifier is generated in an interesting way, for which it is useful to have an understanding of how that is computed. In short, the query is normalised, which postgres internally refers to as ‘query jumble’, which means it parses the query tree nodes for essential fields and generates a 64-bit hash, which is stored as query identifier. Because a query tree stores the OIDs, identical queries in text, but with different OIDs, will generate a different query identifier.

If a new pg_stat_statements entry is created, which is when the generated hash key is not found, pg_stat_statements takes the query text and creates a ‘representative query’ of it, which means constants are replaced with parameter symbols (‘$n’). This means the layout, and any extra comments in the SQL text are kept and stored at pg_stat_statements entry creation time. Any query that generates the same hash key that is successfully executed after the pg_stat_statements entry is created will change (add to) the statistics, but does not change the query text.

The query text in pg_stat_statements not changing can potentially be confusing in cases where for example the first execution includes a hint that later is changed: the hint does not take part in the query jumble that generates the query id, and thus changed hints will generate the same query id, and thus execution in that situation will add to the pg_stat_statements query that shows the original hints when the query was added to pg_stat_statements. Obviously this requires the postgres pg_hint_plan extension to be in use, core postgres doesn’t support hints.

linking pg_stat_activity.query with pg_stat_statements.query

The view pg_stat_activity shows current activity, including the original/unaltered query text (up to 1024 bytes). The original query text is changed to a ‘representative query’ query text when the query is added to pg_stat_statements.

This means that the query text cannot be used to map a query in pg_stat_activity to a query in pg_stat_statements. In fact, up to postgres version 13 there is no way to get absolute proof of a query being the same between pg_stat_activity and pg_stat_statements. With version 14 pg_stat_activity is improved with the query_id column, which allows postgres to compute the query id, which is the only way that I am aware of to link a query between pg_stat_activity and pg_stat_statements.

the query time

In PostgreSQL versions below version 13, pg_stat_statements keeps ‘total_time’, and statistical derivations of it such as min/max/avg/stddev time. This is (sadly) NOT the actual query total time. This time is the time spent in the execution phase of a postgres statement, and thus excludes the parse, rewrite and plan phases.

In PostgreSQL versions 13 and higher, pg_stat_statements is more clear about what the timing includes, and calls the timing ‘total_plan_time’ and ‘total_exec_time’, which are more obvious. Please mind this means it is still not keeping track of the full query time, which includes the parse and rewrite phases.

reset pg_stat_statements

Using the pg_stat_statements_reset() function, which can be called using select pg_stat_statements_reset();, the pg_stat_statements view can emptied/reset.

PostgreSQL