PostgreSQL – Postgres Query Optimization

Below are two almost identical postgres queries, but produced very different query plans and execution times. I assume that the first query is fast because form_id =’W40′ is only 196 Form_instance records, and form_id =’W30L’ there are 7000 records. But why the jump from 200 records to 7000 records (which seems relatively small to me) caused such an amazing increase in query time? I tried to index the data in various ways to speed it up, but I was basically stuck. How can I speed it up? (Please note that the patterns of both tables are included at the bottom).

explain analyze select form_id,form_instance_id,answer,field_id
from form_instances,field_instances
where workflow_state ='DRqueued' and form_instance_id = form_instances.id
and field_id in ('Book_EstimatedDueDate','H_SubmittedDate','H_Ccode','miscarriage','miscarriage_of_multiple','stillbirth_of_AP_IUFD ','maternal_death','birth_includes_transport','newborn_death','H_Pid','H_Mid1','H_Mid2','H_Mid3')
and (form_id ='W40');

QUERY PLAN
Nested Loop (cost=0.00..70736.14 rows=4646 width=29) (actual time=0.000..20.000 rows=2399 loops=1)
-> Index Scan using form_id_and_workflow_state on form_instances (cost=0.00..1041.42 rows=507 width=8) (actual time=0.000..0.000 rows=196 loops=1)
Index Cond: (((form_id)::text ='W40':: text) AND ((workflow_state)::t ext ='DRqueued'::text))
-> Index Scan using index_field_instances_on_form_instance_id on field_instances (cost=0.00..137.25 rows=17 width=25) (actual time=0.000..0.102 rows=12 loops=196 )
Index Cond: (field_instances.form_instance_id = form_instances.id)
Filter: ((field_instances.field_id)::text = ANY ('(Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_ir_multiple, IU ,maternal_death,birth_includes_transport,newborn_death,H_Pid,H_Mid1,H_Mid2,H_Mid3}'::text[]))
Total runtime: 30.000 ms
(7 rows)

explain analyze select form_id,form_instance_id,answer,field_id
from form_instances,field_instances
where workflow_state ='DRqueued' and form_instance_id = form_instances.id
and field_id in ('Book_EstimatedDueDate','H_Submitted_Ccode','H_Submitted_Date ','miscarriage','miscarriage_of_multiple','stillbirth','AP_IUFD_of_multiple','maternal_death','birth_includes_transport','newborn_deat h','H_Pid','H_Mid1','H_Mid2','H_Mid3')
and (form_id ='W30L');

QUERY PLAN
Hash Join (cost =34300.46..160865.40 rows=31045 width=29) (actual time=65670.000..74960.000 rows=102777 loops=1)
Hash Cond: (field_instances.form_instance_id = form_instances.id)
-> Bitmap Heap Scan on field_instances (cost=29232.57..152163.82 rows=531718 width=25) (actual time=64660.000..72800.000 rows=526842 loops=1)
Recheck Cond: ((field_id)::text = ANY ( '{Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_multiple,stillbirth,AP_IUFD_of_multiple,maternal_death,birth_includes_transport,newborn_death,H_Pid,H_Mid1,H_Mid2)_Pid,H_Mid1,H_Mid2, Index Field Index Scan, H_Mid1,H_Mid2, H_Mid_field Index: cost=0.00..29099.64 rows=531718 width=0) (actual time=64630.000..64630.000 rows=594515 loops=1)
Index Cond: ((field_id)::text = ANY ('{Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_multiple,stillbirth,AP_IUFD_of_multiple,maternal_death,birth_includes_transport,newborn_Hath,Hid2),H text: )
-> Hash (cost=5025.54..5025.54 rows=3388 width=8) (actual time=980.000..980.000 rows=10457 loops=1)
-> Bitmap Heap Scan on form_instances (cost =90.99..5025.54 rows=3388 width=8) (actual time=10.000..950.000 rows=10457 loops=1)
Recheck Cond: (((form_id)::text ='W30L'::text) AND ((workflow_state)::text ='DRqueued'::text))
-> Bitmap Index Scan on form_id_and_workflow_state (cost=0.00..90.14 rows=3388 width=0) (actual time=0.000..0.000 rows=10457 loops=1)
Index Cond: (((form_id)::text ='W30L'::text) AND ((workflow_state)::text ='DRqueued'::text))
Total runtime: 75080.000 ms

# \d form_instances Table "public.form_instances" Column | Type | Modifiers
-----------------+----------------- ------------+------------------------------------- ------------------------
id | integer | not null default nextval('form_instances_id_seq'::regclass)
form_id | character varying(255) |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
created_by_id | integer |
updated_by_id | integer |
workflow | character varying(255) |
workflow_state | character varying(255) |
validation_data | text |
Indexes:
"form_instances_pkey" PRIMARY KEY, btree (id)
"form_id_and_workflow_state" btree (form_id, workflow_state)
"index_form_instances_on_form_id" b tree (form_id)
"index_form_instances_on_workflow_state" btree (workflow_state)

# \d field_instances
Table "public.field_instances"
Column | Type | Modifiers
------------------+-----------------------------+- -------------------------------------------------- -----------
id | integer | not null default nextval('field_instances_id_seq'::regclass)
form_instance_id | integer |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
created_by_id | integer |
updated_by_id | integer |
field_id | character varying(255) |
answer | text |
state | character varying(255) |
explanation | text |
idx | integer | not null default 0
Indexes:
"field_instances_pkey" PRIMARY KEY, btree (id)
"field_instances__lower_answer" btree (lower(answer))
"index_field_instances_on_answer" btree (answer)
"index_field_instances_on_field_id" btree (field_id)
"index_field_instances_on_field_id_and_answer" btree (field_id, answer)
"index_field_instances_instances_instance_form_instances "index_field_instances_on_idx" btree (idx)

used to be a comment, but since it seems to have solved the problem, I will present a practical answer.

The system estimates how many rows may be closed. We can see that in the second query, it estimates 3388 in the bitmap index scan OK, but I actually got 10457 rows.

So you might want to vacuum the full analysis;

In addition, other commands can greatly help including reindex and/or clustering.

>

OP said that vacuum did not help, but re-indexing did.

Here are two almost identical postgres queries, but produced completely different query plans and Execution time. I assume that the first query is fast, because form_id=’W40′ only has 196 form_instance records, and form_id=’W30L’ has 7000. But why jump from 200 records to 7 000 records (which seems relatively small to me) caused such an amazing increase in query time? I tried to index the data in various ways to speed it up, but I was basically stuck. How can I speed it up? (Please note that the patterns of both tables are included at the bottom).

explain analyze select form_id,form_instance_id,answer,field_id
from form_instances,field_instances
where workflow_state ='DRqueued' and form_instance_id = form_instances.id
and field_id in ('Book_EstimatedDueDate','H_SubmittedDate','H_Ccode','miscarriage','miscarriage_of_multiple','stillbirth_of_AP_IUFD ','maternal_death','birth_includes_transport','newborn_death','H_Pid','H_Mid1','H_Mid2','H_Mid3')
and (form_id ='W40');

QUERY PLAN
Nested Loop (cost=0.00..70736.14 rows=4646 width=29) (actual time=0.000..20.000 rows=2399 loops=1)
-> Index Scan using form_id_and_workflow_state on form_instances (cost=0.00..1041.42 rows=507 width=8) (actual time=0.000..0.000 rows=196 loops=1)
Index Cond: (((form_id)::text ='W40':: text) AND ((workflow_state)::text ='DRqueued'::text))
-> Index Scan using index_field_instances_on_form_instance_id on field_instances (cost=0.00..137.25 rows=17 width=25) (actual time=0.000..0.102 rows=12 loops=196)
Index Cond: (field_instances.form_instance_id = form_instances.id)
Filter: ((field_instances.field_id)::text = ANY ('(Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_irof_multiple,IUb_multiple,still maternal_death,birth_includes_transport,newborn_death,H_Pid,H_Mid1,H_Mid2,H_Mid3}'::text[]))
Total runtime: 30.000 ms
(7 rows)

explain analyze select form_id,form_instance_id,answer,field_id
from form_instances,field_instances
where workflow_state ='DRqueued' and form_instance_id = form_instances.id
and field_id in ('Book_EstimatedDueDate','H_SubmittedDate','H_Ccode ,'miscarriage','miscarriage_of_multiple','stillbirth','AP_IUFD_of_multiple','maternal_death','birth_includes_transport','newborn_death', 'H_Pid','H_Mid1','H_Mid2','H_Mid3')
and (form_id ='W30L');

QUERY PLAN
Hash Join (cost=34300.46. .160865.40 rows=31045 width=29) (actual time=65670.000..74960.000 rows=102777 loops=1)
Hash Cond: (field_instances.form_instance_id = form_instances.id)
-> Bitmap Heap Scan on field_instances (cost=29232.57..152163.82 rows=531718 width=25) (actual time=64660.000..72800.000 rows=526842 loops=1)
Recheck Cond: ((field_id)::text = ANY ('(Book_EstimatedDueDate ,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_multiple,stillbirth,AP_IUFD_of_multiple,maternal_death,birth_includes_transport,newborn_death,H_Pid,H_Mid1,H_Mid2,H_Mid3)_Index_index_field (index on index) ->0.00) ..29099.64 rows=531718 width=0) (actual time=64630.000..64630.000 rows=594515 loops=1)
Index Cond: ((field_id)::text = ANY ('{Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_multiple,stillbirth,AP_IUFD_of_multiple,maternal_death,birth_includes_transport,newborn_Hath,Hid2),H text: )
-> Hash (cost=5025.54..5025.54 rows=3388 width=8) (actual time=980.000..980.000 rows=10457 loops=1)
-> Bitmap Heap Scan on form_instances (cost =90.99..5025.54 rows=3388 width=8) (actual time=10.000..950.000 rows=10457 loops=1)
Recheck Cond: (((form_id)::text ='W30L'::text) AND ((workflow_state)::text ='DRqueued'::text))
-> Bitmap Index Scan on form_id_and_workflow_state (cost=0.00..90.14 rows=3388 width=0) (actual time=0.000..0.000 rows=10457 loops=1)
Index Cond: (((form_id)::text ='W30L'::text) AND ((workflow_state)::text ='DRqueued'::text))
Total runtime: 75080.000 ms

# \d form_instances Table "public.form_instances" Column | Type | Modifiers
-----------------+----------------- ------------+------------------------------------- ------------------------
id | integer | not null default nextval('form_instances_id_seq'::regclass)
form_id | character varying(255) |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
created_by_id | integer |
updated_by_id | integer |
workflow | character varying(255) |
workflow_state | character varying(255) |
validation_data | text |
Indexes:
"form_instances_pkey" PRIMARY KEY, btree (id)
"form_id_and_workflow_state" btree (form_id, workflow_state)
"index_form_instances_on_form_id" btre e (form_id)
"index_form_instances_on_workflow_state" btree (workflow_state)

# \d field_instances
Table "public.field_instances"
Column | Type | Modifiers
------------------+-----------------------------+- -------------------------------------------------- -----------
id | integer | not null default nextval('field_instances_id_seq'::regclass)
form_instance_id | integer |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
created_by_id | integer |
updated_by_id | integer |
field_id | character varying(255) |
answer | text |
state | character varying(255) |
exp lanation | text |
idx | integer | not null default 0
Indexes:
"field_instances_pkey" PRIMARY KEY, btree (id)
"field_instances__lower_answer" btree (lower(answer))
"index_field_instances_on_answer" btree (answer)
"index_field_instances_on_field_id" btree (field_id)
"index_field_instances_on_field_id_and_answer" btree (field_id, answer)
"index_field_instances_instances_instance_form_instances "index_field_instances_on_idx" btree (idx)

It was a comment before, but since it seems to have solved the problem, I will provide an actual answer.

< /p>

The system estimates how many rows may be closed. We can see that in the second query, it estimated 3388 rows in the bitmap index scan, but it actually got 10457 rows.

p>

So you may want to vacuum full analysis;

In addition, other commands can greatly help including reindex and/or clustering.

OP said that vacuum did not help, but re The index is indeed the case.

Leave a Comment

Your email address will not be published.