PostgreSQL arrangement by query optimization order

I encountered a small problem here.

SELECT DISTINCT ON ("reporting_processedamazonsnapshot"."offer_id") * 
FROM "reporting_processedamazonsnapshot" INNER JOIN
"offers_boooffer"
ON ("reporting_processedamazonsnapshot"."offer_id" =
"offers_boooffer"."id") INNER JOIN
" offers_offersettings"
ON ("offers_boooffer"."id" = "offers_offersettings"."offer_id")
WHERE "offers_offersettings"."account_id" = 20
ORDER BY "reporting_processedamazonsnapshot"."offer_id "ASC,
"reporting_processedamazonsnapshot"."scraping_date" DESC

I have an index called latest_scraping on offer_id ASC, scraping_date DESC, but for some reason, PostgreSQL is still using the index Sorting leads to huge performance problems.

I don’t understand why it doesn’t use the sorted data instead of redoing the sorting. Is my index wrong? Or should I try to query in another way?

This is the explanation
enter image description here
with the actual data

'Unique (cost=21260.47..21263.06 rows=519 width=1288) (actual time=38053.685..38177.348 rows=1783 loops=1)'
'-> Sort (cost=21260.47..21261.76 rows=519 width=1288) (actual time=38053.683..38161.478 rows=153095 loops=1)'
' Sort Key: reporting_processedamazonsnapshot.offer_id, reporting_processedamazonsnapshot.scraping_date DESC '
' Sort Method: external merge Disk: 162088kB'
' -> Nested Loop (cost=41.90..21237.06 rows=519 width=1288) (actual time=70.874..36148.348 rows=153095 loops =1)'
' -> Nested Loop (cost=41.47..17547.90 rows=1627 width=8) (actual time=54.287..126.740 rows=1784 loops=1)'
' -> Bitmap Heap Scan on offers_offersettings (cost=41.04..4823.48 rows=1627 width=4) (actual time=52.532..84.102 rows=1784 loops=1)'
' Recheck Cond: (account_id = 20)'
' Heap Blocks: exact=38'
' -> Bitmap Index Scan on offers_offersettings_account_id_fff7a8c0 (cost=0.00..40.63 rows=1627 width=0) (actual time =49.886..49.886 rows=4132 loops=1)'
' Index Cond: (account_id = 20)'
' -> Index Only Scan using offers_boooffer_pkey on offers_boooffer (cost=0.43..7.81 rows= 1 width=4) (actual time=0.019..0.020 rows=1 loops=1784)'
' Index Cond: (id = offers_offersettings.offer_id)'
' Heap Fetches: 1784'
'-> Index Scan using latest_scraping on reporting_processedamazonsnapshot (cost=0.43..1.69 rows=58 width=1288) (actual time=0.526..20.146 rows=86 loops=1784)'
' Index Cond: (offer_id = offers_boooffer.id)'
'Planning time: 187.133 ms'
'Execution time: 38195.266 ms'

To use index to avoid sorting, PostgreSQL must first scan all “reporting_processedamazonsnapshot” in index order, and then use nested loop join to join all “offers_boooffer” (In order to preserve the order), then add all “offers_offersettings”, and use the nested loop connection again.

Finally, all lines that do not match the condition “offersettings”. “account_id” = 20 will be discarded.

PostgreSQL believes-correctly speaking in my opinion-it is more efficient to use conditions to reduce the number of rows as much as possible, and then use the most effective join method to join the tables, and then the DISTINCT sub Sentence order.

I want to know if the following query might be faster:

SELECT DISTINCT ON (q.offer_id) *
FROM offers_offersettings ofs
JOIN offers_boooffer bo ON bo.id = ofs.offer_id
CROSS JOIN LATERAL
(SELECT *
FROM reporting_processedamazonsnapshot r
WHERE r.offer_id = bo.offer_id< br /> ORDER BY r.scraping_date DESC
LIMIT 1) q
WHERE ofs.account_id = 20
ORDER BY q.offer_id ASC, q.scraping_date DESC;

The execution plan will be similar, except that fewer rows must be scanned from the index, which will reduce the execution time you need most.

If you want to speed up the sorting, increase the work_mem of the query to about 500MB (if If you can afford it).

I have a small problem here.

SELECT DISTINCT ON ( "re porting_processedamazonsnapshot"."offer_id") *
FROM "reporting_processedamazonsnapshot" INNER JOIN
"offers_boooffer"
ON ("reporting_processedamazonsnapshot"."offer_id" =
"offers"_boooffer ) INNER JOIN
"offers_offersettings"
ON ("offers_boooffer"."id" = "offers_offersettings"."offer_id")
WHERE "offers_offersettings"."account_id" = 20
ORDER BY "reporting_processedamazonsnapshot"."offer_id" ASC,
"reporting_processedamazonsnapshot"."scraping_date" DESC

I have an index called latest_scraping on offer_id ASC, scraping_date DESC but for some reason PostgreSQL is still sorting after using the index, causing huge performance problems.

I don’t understand why it doesn’t use the sorted data instead of redo sorting. Is my index wrong? Or should I try to query in another way?

This is the explanation
enter image description here
with the actual data

'Unique (cost=21260.47..21263.06 rows=519 width=1288) (actual time=38053.685..38177.348 rows=1783 loops=1)'
'-> Sort (cost=21260.47..21261.76 rows=519 width=1288) (actual time=38053.683..38161.478 rows=153095 loops=1)'
' Sort Key: reporting_processedamazonsnapshot.offer_id, reporting_processedamazonsnapshot.scraping_date DESC '
' Sort Method: external merge Disk: 162088kB'
' -> Nested Loop (cost=41.90..21237.06 rows=519 width=1288) (actual time=70.874..36148.348 rows=153095 loops =1)'
' -> Nested Loop (cost=41.47..17547.90 rows=1627 width=8) (actual time=54.287..126.740 rows=1784 loops=1)'
' -> Bitmap Heap Scan on offers_offersettings (cost=41.04..4823.48 rows=1627 width=4) (actual time=52.532..84.102 rows=1784 loops=1)'
' Recheck Cond: (account_id = 20)'
' Heap Blocks: exact=38'
' -> Bitmap Index Scan on offers_offersettings_account_id_fff7a8c0 (cost=0.00..40.63 rows=1627 width=0) (actual time =49.886..49.886 rows=4132 loops=1)'
' Index Cond: (account_id = 20)'
' -> Index Only Scan using offers_boooffer_pkey on offers_boooffer (cost=0.43..7.81 rows= 1 width=4) (actual time=0.019..0.020 rows=1 loops=1784)'
' Index Cond: (id = offers_offersettings.offer_id)'
' Heap Fetches: 1784'
'-> Index Scan using latest_scraping on reporting_processedamazonsnapshot (cost=0.43..1.69 rows=58 width=1288) (actual time=0.526..20.146 rows=86 loops=1784)'
' Index Cond: (offer_id = offers_boooffer.id)'
'Planning time: 187.133 ms'
'Execution time: 38195.266 ms'

Use index to avoid sorting, PostgreSQL must first scan all “reporting_processedamazonsnapshot” in index order, then use nested loop connection to join all “offers_boooffer” (in order to preserve the order), and then add all “offers_offersettings”, use nested loop again Connect.

Finally, all rows that do not match the condition “offers_offersettings”. “account_id” = 20 will be discarded.

PostgreSQL thinks-correctly speaking in my opinion Come-use conditions to reduce the number of rows as much as possible to be more efficient, and then use the most effective join method to join the tables, and then sort the DISTINCT clause.

I want to know if the following query might be faster:

SELECT DISTINCT ON (q.offer_id) *
FROM offers_offersettings ofs
JOIN offers_boooffer bo ON bo.id = ofs.offer_id
CROSS JOIN LATERAL
(SELECT *
FROM reporting_processedamazonsnapshot r
WHERE r.offer_id = bo.offer_id
ORDER BY r.scraping_date DESC
LIMIT 1) q
WHERE ofs.account_id = 20
ORDER BY q.offer_id ASC, q.scraping_date DESC;

The execution plan will be similar, except that fewer rows must be scanned from the index, which will reduce what you need most Execution time.

If you want to speed up the sorting speed, please increase the work_mem of the query to about 500MB (if you can afford it).

Leave a Comment

Your email address will not be published.