Kirk Roybal: PostgreSQL 14: PostgreSQL Is Still a Tweeny

This article was a bit harder to write than it would seem on the surface. Amid all the hustle and bustle of the current development cycle, it was hard to find a particular focus of what this effort was all about. Then it finally dawned on me. There wasn’t one.

That may sound a bit harsh for the developers working on it, but please let me explain. PostgreSQL is caught in the middle of a maelstrom. As it continually tries to grow up, the meaning of being a grown up keeps changing.

Over the last 20 years, we have seen NoSQL, data warehousing, reporting, and telemetry applications completely change the playing field. We have also seen advancements in indexing and retrieval, file systems, and memory management. The hardware underneath PostgreSQL has changed several times, not to mention virtualization and cloud operation. Of course, this isn’t an exhaustive list, but you get the idea.

The rate of change has been so astounding that it beggars the imagination that the team has been able to keep up with it for so long. And it’s not slowing down any in the near future. There is quite a long laundry list of things that the community keeps asking about.

The PostgreSQL development group has responded to all those changes as fast as possible. They created better monitoring, better management, more locking for less blocking, and revamped processes to be more cooperative and generally increased performance to keep up with the demand.

The PostgreSQL 14 release is a continuation of that effort. I was hoping for a super shocking title like “PostgreSQL no longer needs VACUUM!”. That would have been awesome for search results and response. What I really got was “a lot of people made a huge effort to improve most of the things that needed improving.” Not quite the shocker that drives searches, but not to be underestimated in value.

Like any other tweeny, these improvements are in search of what PostgreSQL is going to be someday. Improvement because this is the stage of life PostgreSQL is in, not because it is particularly driving anywhere.

 Performance Improvements in PostgreSQL 14

The biggest single feature to point out is the connection handling at peak capacities. PostgreSQL has dramatically improved for connection counts above 5000. The pgAnalyze guys have a nice article about that.

Things that now take advantage of background processes to do it in parallel:

  • REFRESH MATERIALIZED VIEW
  • RETURN QUERY
  • Queries to foreign data wrappers

Improvements to CONCURRENTLY:

  • ALTER TABLE DETACH PARTITION CONCURRENTLY FINALIZE
  • Allow index commands using CONCURRENTLY to avoid waiting for the completion of other operations using CONCURRENTLY
  • CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY no longer limit the dead row removal of other relations.

I would be remiss not to mention VACUUM improvements. These improvements are all under the hood, but this is one of the areas of improvement most needed by the community.

 Visibility Improvements

The next largest block of changes has to do with getting visibility on running processes. Here’s a list of all of the things you couldn’t see before, but you can see now.

  • Status of a COPY operation in pg_stat_progress_copy
  • Status of the write ahead log is in pg_stat_wal
  • Status of replication is now in pg_stat_replication_slots
  • Status of the backend query handling is in pg_backend_memory_contexts, which you may also force into the log using pg_log_backend_memory_contexts
  • New columns in pg_stat_database
  • Added the archiver to pg_stat_activity
  • Added lock wait start time to pg_locks
  • Added a view in information_schema called routine_column_usage to track columns referenced by function and procedure default expressions.
  • Added %P to log_line_prefix to see the group leader PID
  • Added server parameter log_recovery_conflict_waits to report long recovery conflict wait times
  • Added function pg_get_wal_replay_pause_state() to report the recovery state
  • Added function pg_xact_commit_timestamp_origin() to return the commit timestamp and replication origin of the specified transaction
  • Added the replication origin to the record returned by pg_last_committed_xact()
  • pg_stat_statements tracks top and nested statements separately
  • Added cumulative –verbose (-vvv) flags to several utilities for screen logging

 Other Improvements in PostgreSQL 14

PostgreSQL also grew quite a few functions, operators, data types and expressions. Some of these are convenience functions like string_to_table(). Most of them bring us closer to SQL standard compliance.

It sprouted a few global configuration parameters (most notably compute_query_id) and improved tuning for huge pages.

And it improved a few hundred other things. The release notes are quite extensive, well beyond repeating for the sake of this article.

What Will PostgreSQL Be When it Grows Up?

That is a very hard question to answer, and it gets harder to answer all the time. Let’s not make the mistake of looking at the future based on how the environment looks today. That is the mistake that the relational database community made when NoSQL disrupted our thinking.

Let’s also not make the mistake of thinking that what the market wants makes any sense.  Ultimately, the “scalable” products are just SQL products that don’t use relations and enforce a narrow architectural design. We didn’t really need additional products for that, but that’s what it took to change the mindset of the relational crowd.

What we’ve learned as a community from not paying attention is that the project can either be ahead of the public demand or behind it. Behind it is probably a safer place for the project as a whole to be. Predicting the demand is hard and time consuming. It costs global development group effort that gets thrown away when the fad ends. It provides services that nobody needed if they actually knew how the technology stack worked (e.g. caching layers to help with inefficiencies caused by SQL hate). It divides the community into “us” and “them”, which are arbitrary distinctions used to feel comfortable in ivory towers.

Waiting until the newfangled thing becomes mature and the uses of it are well-defined makes a lot more sense for the broader PostgreSQL community. This allows the development of incremental features with very little code bloat. It allows the users to keep the existing features, solutions and investment. It allows the concept to fail if it wasn’t worth doing to begin with. Everybody gets to participate in the conversation, even if the idea is ultimately scrapped. Focusing the resources on the ideas that survive is ultimately good for everyone involved in the project. Let’s face it, development resources are not infinite and wasted time means lower ability to adapt in the future.

So how do we predict the future of the project? We wait for it to happen and deal with the result after the winner picks itself. This is a philosophy that is only possible because the product is open source. We don’t have any sales projections to meet. There are no investors and backers to satisfy this quarter. The users are the contributors and the developers are the reviewers. There is no reason to take a big risk of investment when the reward is a pat on the back, but the risk is that your project dies and is never heard from again.

There are a few obvious improvements to be done to PostgreSQL. Of course, the holy grail is to make VACUUM suckless and prevent transaction wraparound. But those are hardly revolutionary new concepts, just some hard work.

Some larger conceptual things that I think could stand the test of time are sharding and mesh replication. After that, the list gets a lot murkier. Maybe cloud computing support would land here, but that’s really just a catalyst for fixing things so they don’t have to run as a superuser.

Mostly the future of PostgreSQL is secured by the adaptability of the development group and the attitude of providing a mature, consistent product over time. Notice that the previous statement didn’t need to say anything about money, trends, marketing or even specific features.

So how do we say what a grown up PostgreSQL will look like? We don’t. Not any more than a tweeny could say what they will choose to be in the future. We can only decide what quality of character we will have and what our outlook will be. And then be fine with who we are because we made the right choices.

The post PostgreSQL 14: PostgreSQL Is Still a Tweeny appeared first on Instaclustr.

PostgreSQL