PostgreSQL 9.6 Parallel Polymerization - 9.6, aggregation, in parallel, PostgreSQL

PostgreSQL in version 9.6 adds support for parallel aggregation.

With 9.6, PostgreSQL introduces initial support for parallel execution
of large queries. Only strictly read-only queries where the driving
table is accessed via a sequential scan can be parallelized. Hash
joins and nested loops can be performed in parallel, as can
aggregation (for supported aggregates). Much remains to be done, but
this is already a useful set of features.

>What is the supported aggregation mentioned above ?
>Are there any special considerations when designing aggregate functions to allow the use of parallel machines?

PostgreSQL 9.6 User-defined Aggregates documentation now mentions parallel aggregation:

35.10.4. Partial Aggregation

Optionally, an aggregate function can support partial aggregation. The
idea of partial aggregation is to run the aggregate’s state transition
function over different subsets of the input data independently, and
then to combine the state values resulting from those subsets to
produce the same state value that would have resulted from scanning
all the input in a single operation. This mode can be used for
parallel aggregation by having different worker processes scan
different portions of a table. Each worker produces a partial state
value, and at the end those state values are combined to produce a
final state value. (In the future this mode might also be used for
purposes such as combining aggregations over local and remote tables;
but that is not implemented yet.)

To support partial aggregation, the aggregate definition must provide a combine function, which takes two values of the aggregate’s state type (representing the results of aggregating over two subsets of the input rows) and produces a new value of the state type, representing what the state would have been after aggregating over the combination of those sets of rows. It is unspecified what the relative order of the input rows from the two sets would have been. This means that it’s usually impossible to define a useful combine function for aggregates that are sensitive to input row order.

PostgreSQL in version 9.6 Added support for parallel aggregation.

With 9.6, PostgreSQL introduces initial support for parallel execution
of large queries. Only strictly read-only queries where the driving
table is accessed via a sequential scan can be parallelized. Hash
joins and nested loops can be performed i n parallel, as can
aggregation (for supported aggregates). Much remains to be done, but
this is already a useful set of features.

>mentioned above What are the supported aggregations?
>Are there any special considerations when designing aggregate functions to allow the use of parallel machines?

The PostgreSQL 9.6 User-defined Aggregates documentation now mentions parallel aggregation:

35.10.4. Partial Aggregation

Optionally, an aggregate function can support partial aggregation. The
idea of partial aggregation is to run the aggregate’s state transition
function over different subsets of the input data independently, and
then to combine the state values resulting from those subsets to
produce the same state value that would have resulted from scanning
all the input in a single operation. This mode can be used for
parallel aggregation by having different worker processes scan
different portions of a table. Each worker produces a partial state
value, and at the end those state values are combined to produce a
final state value. (In the future this mode might also be used for
purposes such as combining aggregations over local and remote tables;
but that is not implemented yet.)

To support pa rtial aggregation, the aggregate definition must provide a combine function, which takes two values of the aggregate’s state type (representing the results of aggregating over two subsets of the input rows) and produces a new value of the state type, representing what the state would have been after aggregating over the combination of those sets of rows. It is unspecified what the relative order of the input rows from the two sets would have been. This means that it’s usually impossible to define a useful combine function for aggregates that are sensitive to input row order.

Leave a Comment Cancel reply