Bruce Momjian: Clustering a Table

Having written over 600 blog entries, I thought I would have already covered the complexities of the
CLUSTER command, but it seems I have not, so let’s do that now. wink

CLUSTER is an unusual SQL command because, like non-unique CREATE INDEX, it only affects performance. In fact,
CLUSTER requires the existence of an index. So, what does CLUSTER do? Well, what does CREATE INDEX do? Let’s look at how storage works in Postgres.

User data rows are stored in heap files in the file system, and those rows are stored in an indeterminate order. If the table is initially
loaded in INSERT/COPY order, later inserts, updates, and deletes will cause rows to be added in unpredictable order in the heap files. CREATE INDEX creates a secondary file with entries
pointing to heap rows, and index entries are ordered to match the values in the columns specified in the CREATE INDEX command. By quickly finding desired values in the index, index pointers can
be followed to quickly find matching heap rows.

Continue Reading »

PostgreSQL