Instructions for PostgreSQL SQL HINT - HINT, instructions, PostgreSQL, SQL

This article comes from: http://www.023dns.com/Database_mssql/5974.html

The PostgreSQL optimizer is cost-based (CBO), (of course, if GEQO is turned on After the number of association tables exceeds a certain threshold, GEQO will be used. This is mainly because in the case of too many association tables, the exhaustive method may bring huge PLAN overhead, so the execution plan output by GEQO is not necessarily optimal )

　　 What this article is going to talk about has nothing to do with GEQO, it is mainly related to CBO.

　　 When PostgreSQL uses CBO, can it always output the best execution plan every time?

　　1. First, let’s take a look at what factors CBO has examined and how does it calculate the cost?

　　 Cost and scanning methods, association methods, operators, cost factors, data sets, etc. All related, the specific calculation method can refer to the following code:

　　src/backend/optimizer/path/costsize.c

　　 Let’s briefly list here, which factors will affect the cost calculation For the results, see costsize.c for the specific algorithm:

　　– How many records are in the table, and the cost of CPU processing records that affect the full table scan.

　　– How many data are in the table Block, which affects the cost of scanning data blocks; for example, full table scans, index scans, all need to scan data blocks.

　　– Cost factor, which affects the calculation result of the cost; for example, continuous or random scanning of a single data block Cost factor, the cost factor of the CPU processing a record from the HEAP block, the cost factor of processing an index record from the INDEX block, and the cost factor of executing an operator or function.

　　– Data storage physical order and index The dispersion of the order affects the calculation cost of the index scan.

　　– The memory size, affects the calculation cost of the index scan.

　　– Column statistics (column width, null ratio , Unique value ratio, high frequency value and its ratio, bucket, the dispersion of physical order and index order, array statistics, etc.), affect selectivity, that is, the number of rows in the result set, and ultimately affect the index scan Calculate the cost.

　　– The cost set when creating a function or operator.

　　2. Then let’s see which factors CBO did not take into account, and which factors CBO took into account , But it may change at any time.

　　 Whether PostgreSQL can be dynamic Keep up with these changes?

　　2.1 PostgreSQL enables automatic analysis, and the factors that can be updated in time are as follows:

　　– How many data blocks the table has, the number of records, update pg_class.relpages, pg_class. reltuples

　　– column statistics, data storage physical order and index order dispersion, update pg_statistic

　　2.2 static configuration factors:

　　– Actually available Memory used as a cache, because other programs may be running in the operating system where the database is located, the memory that can be used as a cache may change. Even if no other programs are running, when work_mem is heavily used in the database session, it will This causes the memory that can be used as a cache to change.

　　– The cost set when a function or operator is created. When the function changes due to internal SQL or processing logic, it may cause the processing time of the function itself to change.

/p>

　　2.3 Unconsidered factors:

　　– Pre-reading of block devices. Generally, 128KB of data will be pre-read when reading once.

# blockdev –getra /dev/sda

　　256

　　 What effect does this have? If the data you want to read is in a continuous 128KB data block, you only need it once Block device IO. For the database, the cost is calculated regardless of how many data blocks are scanned when scanning data. Therefore, for different block device pre-reading configurations, or for different block devices (such as mechanical disks and SSDs), The scanning cost may be different. The performance of the PostgreSQL block device is reflected in the cost calculation, which is seq_page_cost, random_page_cost.

　　 These two parameters can be set for the table space, that is, for different table spaces, you can Set different values. For example, we have tablespaces created on SSDs and tablespaces created on ordinary mechanical disks. Of course, we need to set different seq_page_cost and random_page_cost values.

　　 But for pre-reading, If there is a change, it will have a slight impact on the actual performance. Generally, it should not change the read ahead of the block device all the time.

　　2.4 generic plan cache, that is, the execution plan cache.

　　PostgreSQL chooses to re-plan the execution plan through choose_custom_plan Planning or using the cached execution plan. When the cost of the cached plan is greater than the average cost of custom, the custom plan will be selected, so when the statistical information is correct, the problem of the cached execution plan can be found in time and a new execution plan can be planned in time. /p>

For details of 　　, please see: src/backend/utils/cache/plancache.c

　　2.5 The sampling accuracy parameter default_statistics_target affects the number of buckets and the accuracy of sampling.

After some analysis, PostgreSQL uses CBO, will it be able to output the optimal execution plan “every time”?

　　1. The first thing to do is to ensure that the artificial cost factor is accurate, and it is also necessary to turn on the automatic analyze (Update column statistics, blocks, dispersion, etc.),

　　2. There are also some factors that affect the cost that are statically configured: such as the memory that can be used as BUFFER, the cost of the function.

　　3. There is nothing to consider: pre-reading (very small).

　　 In most cases, if we set a reasonable configuration, then it is rarely necessary to use hints. Except for the above The two points mentioned in 2,3.

At the same time, the hint also has serious drawbacks. If the hint is written in the program code, once the execution plan needs to be changed, the program code needs to be changed, which is not flexible.

　　 Of course, we do not rule out another starting point for using HINT, such as debugging. I want to see if the execution efficiency under different execution plans is the same as imagined.

　　 (we can also Use switches to control the execution plan, but isn’t it more straightforward to have HINT)

　　 In the long run, if only from a performance point of view, it is more reliable to continuously improve the optimizer of the database itself. But for requirements such as debugging, it is more convenient to have HINT.

　　Entering the topic, most Oracle users will ask if PG has SQL hints after they come into contact with PostgreSQL?

　　In order to allow the database to output the execution plan according to the user’s ideas, PostgreSQL generally provides some switches, such as turning off the full table scan and letting it go to the index.

　　 Turning off the index scan and letting it go to the bitmap or Scan the whole table, close the nested loop, and let him go hash join or merge join, etc.

　　 But only with these switches, it is not very easy to use, so is there a direct HINT?

　　There is a plug-in that can solve your problem: pg_hint_plan.

　　pg_hint_plan uses the hook interface opened by PostgreSQL, so the HINT injection function is implemented without changing the PG code.

/*

　　* Module load callbacks

　　void

　　_PG_init(void)

{

　　…

　　}

　　 Due to different PostgreSQL versions, the code in the plan part may be inconsistent, so pg_hint_plan is also the source code released by version.

　　 For example, I want to test this tool in PostgreSQL 9.4.1.

　　 Next test:

　　Install

　　# wget http: //iij.dl.sourceforge.jp/pghintplan/62456/pg_hint_plan94-1.1.3.tar.gz

　　# tar -zxvf pg_hint_plan94-1.1.3.tar.gz

# cd pg_hint_plan94-1.1.3

　　[[emailprotected] pg_hint_plan94-1.1.3]# export PATH=/opt/pgsql/bin:$PATH

　　[[emailprotected] pg_hint_plan94-1.1.3]# which psql

　　/opt/pgsql/bin/psql

　　[[email Protected] pg_hint_plan94-1.1.3]# psql -V

　　psql (PostgreSQL) 9.4.1

　　# gmake clean

　　# gmake

　　# gmake install

　　[[emailprotected ] pg_hint_plan94-1.1.3]# ll -rt /opt/pgsql/lib|tail -n 1

　　-rwxr-xr-x 1 root root 78K Feb 18 09:31 pg_hint_plan.so

　　[[email protected] pg_hint_plan94-1.1.3]# su-postgres

　　$ vi $PGDATA/postgresql.conf

　　shared_preload_libraries =’pg_hint_plan’

　　pg_hint_plan.enable_hint = on

　　pg_hint_plan.debug_print = on

　　pg_hint_plan.log

>　　$ pg_ctl restart -m fast

　　[email protected]> psql

　　psql (9.4.1)

　　Type “help” for help.

　　postgres=# create extension pg_hint_plan;

　　CREATE EXTENSION

　　 Usage example:

　　postgres=# create table a(id int primary key, info text, crt_time timestamp);

　　CREATE TABLE

　　postgres=# create table b(id int primary key, info text, crt_time timestamp);

　　CREATE TABLE

　　postgres=# insert into a select generate_series(1,100000),’a_’||md5(random()::text), clock_timestamp();

　　INSERT 0 100000

　　postgres=# insert into b select generate_series(1,100000),’b_’||md5(random()::text), clock_timestamp();

INSERT 0 100000

　　postgres=# analyze a;

　　ANALYZE

　　postgres=# analyze b;

　　ANALYZE

postgres=# explain select a.*,b.* from a,b where a.id=b.id and a.id<10;< p="">

　　QUERY PLAN

　　——————————————— ————————–

Nested Loop (cost=0.58..83.35 rows=9 width=94)

　　-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)

Index Cond: (id <10)

-> Index Scan using b_pkey on b (cost=0.29..8.31 rows=1 width=47)

　　Index Cond: (id = a.id)

　　(5 rows)

　　When there is no pg_hint_plan, we need to use a switch to change the execution plan of PostgreSQL

　　postgres=# set enable_nestloop=off;

　　SET

postgres=# explain select a.*,b.* from a,b where a.id=b.id and a.id<10;< p="">

　　QUERY PLAN

　　——————————————— ——————————–

　　Hash Join (cost=8.56..1616.65 rows= 9 width=94)

　　Hash Cond: (b.id = a.id)

　　-> Seq Scan on b (cost=0.00..1233.00 rows=100000 width=47)

　　-> Hash (cost=8.45..8.45 rows=9 width=47)

　　- > Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)

Index Cond: (id <10)

　　(6 rows)

　　postgres=# set enable_nestloop=on;

　　SET

postgres=# explain select a.*,b.* from a,b where a.id=b. id and a.id<10;< p="">

　　QUERY PLAN

　　——————– ————————————————– –

　　Nested Loop (cost=0.58..83.35 rows=9 width=94)

　　-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width =47)

Index Cond: (id <10)

　　-> Index Scan using b_pkey on b (cost=0.29..8.31 rows=1 width=47)

　　Index Cond: (id = a.id)

　　(5 rows)

　　 Use pg_hint_plan to change the PostgreSQL execution plan, as shown below:

postgres=# /*+

　　HashJoin(ab)

　　SeqScan(b)

*/ explain select a.*,b.* from a ,b where a.id=b.id and a.id<10;< p="">

　　QUERY PLAN

　　———- ————————————————– —————–

　　Hash Join (cost=8.56..1616.65 rows=9 width=94)

　　Hash Cond: (b.id = a.id)

　　-> Seq Scan on b (cost=0.00..1233.00 rows=100000 width=47)

　　-> Hash (cost=8.45 ..8.45 rows=9 width=47)

　　-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)

Index Cond: ( id <10)

　　(6 rows)

postgres=# /*+ SeqScan(a) */ explain select * from a where id<10;< p="">

　　QUERY PLAN

　　———————————- ——————–

　　Seq Scan on a (cost=0.00..1483.00 rows=10 width=47)

Filter: (id <10)

　　(2 rows)

postgres=# /*+ BitmapScan(a) */ explain select * from a where id<10;< p="">

　　QUERY PLAN

　　—————————– —————————————-

　　Bitmap Heap Scan on a (cost=4.36..35.17 rows=9 width=47)

Recheck Cond: (id <10)

　　-> Bitmap Index Scan on a_pkey (cos t=0.00..4.36 rows=9 width=0)

Index Cond: (id <10)

　　(4 rows)

　　 currently supported by pg_hint_plan HINT

　　http://pghintplan.sourceforge.jp/hint_list.html

　　The available hints are listed below.

　　 Share a picture ?

　　 share picture ?

　　[Reference]

　　1. http://pghintplan.sourceforge. jp/pg_hint_plan-en.html

　　2. http://pghintplan.sourceforge.jp/pg_hint_plan.html

　　3. http://pghintplan.sourceforge.jp/hint_list. html

　　4. http://pghintplan.sourceforge.jp/

　　5. src/backend/optimizer/path/costsize.c

　　6. src /backend/utils/cache/plancache.c

Leave a Comment Cancel reply