Execute this hour of operation in PostgreSQL

I am in the RoR stack, and I have to write some actual SQL to complete all “open” record queries, which means that the current time is within the specified operation time. In the hours_of_operations table , The two integer columns opens_on and closes_on store working days, and the two time fields opens_at and closes_at store the corresponding time of the day.

I made a query to compare the current date and time with the stored Values ​​are compared, but I want to know if there is a way to convert to a certain type of date type and let PostgreSQL do the rest?

The content of the query is:

WHERE (
(

/* Opens in Future */
(opens_on> 5 OR (opens_on = 5 AND opens_at::time> '2014-03-01 00:27:25.851655'))
AND (
(closes_on 5)
OR ((closes_on = opens_on)
AND (closes_at::time '2014-03-01 00:27:25.851655'))
OR ((closes_on = 5)
AND (closes_at::time> '2014-03-01 00:27:25.851655' AND closes_at::time OR

/* Opens in Past */
(opens_on <5 OR (opens_on = 5 AND opens_at::time <'2014-03-01 00:27:25.851655'))
AND
(closes_on> 5)
OR
((closes_on = 5)
AND (closes_at::time> '2014-03-01 00:27:25.851655'))
OR (closes_on OR ((closes_on = opens_on)
AND (closes_at::time )

)

The reason for this intensive complexity is that one-hour operations can be performed at the end of the week, for example, from noon on Sunday to 6 a.m. on Monday. Since I am in UTC format Store the value, so in many cases the user’s local time can be wrapped in a very strange way. The above query ensures that you can enter it any two times in a week, And we can compensate for the packaging.

table layout

redesign Form and store opening hours (business hours) as a set of tsrange (range of timestamp without time zone) values. Postgres 9.2 or higher is required.

Choose a random week to start business hours. I like this week:
1996-01-01 (Monday) to 1996-01-07 (Sunday)
This is the most recent leap year, January 1 The day happens to be Monday. But for this case, it can be any random week. Just keep it consistent.

First install the additional module btree_gist. Why?

CREATE EXTENSION btree_gist;

Create the table like this:

CREATE TABLE hoo (
hoo_id serial PRIMARY KEY
, shop_id int NOT NULL REFERENCES shop(shop_id) - reference to shop
, hours tsrange NOT NULL
, CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&)
, CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours))
, CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange'[1996-01- 01 0:0, 1996-01-08 0:0]')
);

One column of time replaces all columns:

 



opens_on, closes_on, opens_at, closes_at


For example, from Wednesday, 18:30 Until Thursday, 05:00 UTC business hours are entered as:

'[1996-01-03 18:30, 1996-01-04 05:00]'

pre>

The exclusion constraint hoo_no_overlap prevents overlapping entries for each store. It is implemented using GiST indexing, which also happens to support your query. Please consider the "Index and Performance" chapter below to discuss indexing strategies.

The check constraint hoo_bounds_inclusive enforces the inclusive boundary of the scope, with two noteworthy consequences:

>Always include the time point exactly falling on the lower or upper boundary.
>Actually not allowed Adjacent entries in the same store. Through the inclusive boundary, these will "overlap" and the exclusion constraint will raise an exception. Adjacent entries must be merged into one line. Unless they are wrapped around midnight on Sunday, in which case they must be split Two rows. See tool 2 below.

Check the constraint hoo_standard_week using the "range is contained by" operator <@ to enforce the outer boundary of the segment week.

< p>In the case of including the border, you must observe the special/corner situation around midnight on Sunday:

'1996-01-01 00:00+0' = '1996-01-08 00:00+0'
Mon 00:00 = Sun 24:00 (= next Mon 00:00)

You must search for two timestamps at once. This It is a related case with a unique upper limit, and this shortcoming will not occur:

> Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL

function f_hoo_time(timestamptz)

To "normalize" any given timestamp using the time zone:

CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
RETURNS timestamp AS
$func$
SELECT date '1996-01-01'
+ ($1 AT TIME ZONE'UTC'-date_trunc('week', $1 AT TIME ZONE'UTC'))
$func$ LANGUAGE sql IMMUTABLE;

This function uses timestamptz and returns a timestamp. It takes the elapsed time interval of the corresponding week ($1-date_trunc('week', $1)) in UTC time ( ! ) To the starting point of our segmented week. (The date interval generates a timestamp.)

The function f_hoo_hours(timestamptz,timestamptz)

Normalize the range and split those crossing Monday 00 :00. This function takes any interval (as two timestamptz) and generates one or two standardized tsrange values. It covers any legal opinions, and no other content is allowed:

< pre>CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz)
RETURNS TABLE (hoo_hours tsrange) AS
$func$
DECLARE
ts_from timestamp := f_hoo_time(_from);
ts_to timestamp := f_hoo_time(_to);
BEGIN
- test input for sanity (optional)
IF _to <= _from THEN
RAISE EXCEPTION'%' ,'_to must be later than _from!';
ELSIF _to> _from + interval '1 week' THEN
RAISE EXCEPTION'%','Interval cannot span more than a week!';
END IF;

IF ts_from> ts_to THEN - split range at Mon 00:00
RETURN QUERY
VALUES (tsrange('1996-01-01 0:0' , ts_to,'[]'))
, (tsrange(ts_from, '1996-01-08 0:0','[]'));
ELSE - simple case: range in standard week
hoo_hours := tsrange(ts_from, ts_to,'[]');
RETURN NEXT;
END IF;

RETURN;
END< br />$func$ LANGUAGE plpgsql IMMUTABLE COST 1000 ROWS 1;

To insert a single input line:

INSERT INTO hoo(shop_id, hours)< br />SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');

If the range needs to be on Monday 00: Split by 00, two rows will be generated.

To insert multiple input rows:

INSERT INTO hoo(shop_id, hours)
SELECT id, hours
FROM (
VALUES (7, timestamp '2016-01-11 00:00', timestamp '2016-01-11 08:00')
, (8, ' 2016-01-11 00:00', '2016-01-11 08:00')
) t(id, f, t), f_hoo_hours(f, t) hours; - LATERAL join

About implicit LATERAL connection:

> What is the difference between LATERAL and a subquery in PostgreSQL?

Inquiry

The adjusted design , Your entire huge, complex, and expensive query can be replaced with...:

Select*
From hoo
Where@> f_hoo_time(now());

< p>For a little suspense, I put a spoiler on the solution. Move the mouse over it.

The query is supported by the GiST index and is fast, even for large tables So.

SQL Fiddle(There are more examples).

If you want to calculate the total business hours (per store), here is a recipe:

> Calculate working hours between 2 dates in PostgreSQL

Index and performance

GiST or SP-GiST index support containment operator for range types can be used. Both can be used to implement exclusion constraints, but only GiST supports multicolumn indexes:

Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

and order of index columns matters:

A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values ​​in additional columns.

So we have conflicts of interest here. For large Table, shop_id will have More different values ​​than hours.

>GiST indexes with leading shop_id are faster to write and enforce exclusion constraints.
>But we are searching for the hour column in the query. First of all It would be better to have that column.
>If we need to find shop_id in other queries, then the ordinary btree index is much faster.
>Most importantly, I found that the SP-GiST index only needs a few You can get the fastest query speed within hours.

Benchmark

My script generates dummy data:

INSERT INTO hoo( shop_id, hours)
SELECT id, hours
FROM generate_series(1, 30000) id, generate_series(0, 6) d
, f_hoo_hours(((date '1996-01-01' + d) + interval '4h' + interval '15 min' * trunc(32 * random())) AT TIME ZONE'UTC'
, ((date '1996-01-01' + d) + interval ' 12h' + interval '15 min' * trunc(64 * random() * random())) AT TIME ZONE'UTC') AS hours
WHERE random()> .33;

The result is 141k randomly generated rows, 30k different shop_id, 12k different hours. (Usually the difference will be bigger.) The table size is 8 MB.

I deleted and recreated the exclusion constraint:< /p>

ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
EXCLUDE USING gist (shop_id WITH =, hours WITH &&); - 4.4 sec !!

ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
EXCLUDE USING gist (hours WITH &&, shop_ id WITH =); - 16.4 sec

shop_id is ~4 times faster at first.

In addition, I tested more than two reading performances:

CREATE INDEX hoo_hours_gist_idx on hoo USING gist (hours);
CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours); - !!

In VACUUM FULL ANALYZE hoo; After that, I ran two queries:

> Q1: Late at night, only 53 rows were found
> Q2: In the afternoon, 2423 rows were found.

Results

< p>Each has an index-only scan (except for "no index" of course):

index idx size Q1 Q2
-------- ----------------------------------------
no index 41.24 ms 41.2 ms
gist (shop_id, hours) 8MB 14.71 ms 33.3 ms
gist (hours, shop_id) 12MB 0.37 ms 8.2 ms
gist (hours) 11MB 0.34 ms 5.1 ms
spgist (hours ) 9MB 0.29 ms 2.0 ms - !!

>For queries with few search results, SP-GiST and GiST are the same (for very few people, GiST is even faster).
> SP-GiST scales better with more and more results, and is also smaller.

If you read much more than you write (typical use cases), please follow The suggestions at the beginning keep the exclusion constraints and create an additional SP-GiST index to optimize read performance.

I am in the RoR stack and I have to write some actual SQL To complete the query of all "open" records, which means that the current time In the specified operation time. In the hours_of_operations table, two integer columns opens_on and closes_on store working days, and two time fields opens_at and closes_at store the corresponding time of the day.

I do I have a query that compares the current date and time with the stored value, but I want to know if there is a way to convert to a certain type of date and let PostgreSQL do the rest?

The content of the query is:

WHERE (
(

/* Opens in Future */
(opens_on> 5 OR (opens_on = 5 AND opens_at::time> '2014-03-01 00:27:25.851655'))
AND (
(closes_on 5)
OR ((closes_on = opens_on)
AND (closes_at::time '2014-03-01 00:27:25.851655'))
OR ((closes_on = 5)
AND (closes_at::time> '2014-03-01 00:27:25.851655' AND closes_at::time OR

/* Opens in Past */
(opens_on <5 OR (opens_on = 5 AND opens_at::time <'2014-03-01 00:27:25.851655'))
AND
(closes_on> 5)
OR
((closes_on = 5)
AND (closes_at::time> '2014-03-01 00:27:25.851655'))
OR (closes_on OR ((closes_on = opens_on)
AND (closes_at::time )

)

The reason for this intensive complexity is that one-hour operations can be performed at the end of the week, for example, from noon on Sunday to 6 a.m. on Monday. Since I am in UTC format Store the value, so in many cases the user’s local time can be wrapped in a very strange way. The above query ensures that you can enter any two times in a week, and we Package can be compensated.

Form layout

Redesign the form and store the opening hours (business hours) as a group< strong>tsrange (range of timestamp without time zone) value. Postgres 9.2 or higher is required.

Choose a random week to start business hours. I like This week:
1996-01-01 (Monday) to 1996-01-07 (Sunday)
This is the most recent leap year, and January 1st happens to be Monday. But for this case, it can Any random week. Just keep it consistent.

First install the add-on module btree_gist. Why?

< p>

CREATE EXTENSION btree_gist;

Create a table like this:

CREATE TABLE hoo (
hoo_id serial PRIMARY KEY
, shop_id int NOT NULL REFERENCES shop(shop_id) - reference to shop
, hours tsrange NOT NULL
, CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&)
, CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours))
, CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange'[1996-01-01 0:0, 1996-01-08 0:0] ')
);

One column of time replaces all columns:

 



opens_on, closes_on, opens_at, closes_at < br />

For example, from Wednesday, 18:30 to Thursday, 05:00 UTC, the business hours are entered as:

'[1996-01-03 18:30, 1996-01-04 05:00]'

The exclusion constraint hoo_no_overlap prevents overlapping entries for each store. It uses the GiST index Implementation, which also happens to support your query. Please consider the following chapter "Indexing and Performance" to discuss indexing strategies.

The check constraint hoo_bounds_inclusive enforces the inclusive boundary of the scope, with two noteworthy consequences :

>Always include the point in time that exactly falls on the lower or upper boundary.
>Adjacent entries of the same store are actually not allowed. With the inclusive boundary, these will "overlap", excluding constraints Will throw an exception. Adjacent entries must be merged into one line. Unless they are wrapped around midnight on Sunday, in which case they must be split into two rows. See tool 2 below.

Check constraint hoo_standard_week use The "range is contained by" operator <@ enforces the outer boundary of the segment week.

In the case of including the boundary, you must observe the special/ Corner situation:

'1996-01-01 00:00+0' = '1996-01-08 00:00+0'
Mon 00:00 = Sun 24:00 (= next Mon 00:00)

You must search for two timestamps at once. This is a related case with a unique upper limit, and this disadvantage will not occur:

> Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL

The function f_hoo_time(timestamptz)

To "normalize" any given timestamp using the time zone:

CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
RETURNS timestamp AS
$func$
SELECT date '1996-01-01'
+ ($1 AT TIME ZONE'UTC'-date _trunc('week', $1 AT TIME ZONE'UTC'))
$func$ LANGUAGE sql IMMUTABLE;

This function uses timestamptz and returns a timestamp. It will correspond to the elapsed time interval of the week ($1 – date_trunc('week',$1) in UTC time (! ) To the starting point of our segmented week. (The date interval generates a timestamp.)

The function f_hoo_hours(timestamptz,timestamptz)

Normalize the range and split those crossing Monday 00 :00. This function takes any interval (as two timestamptz) and generates one or two standardized tsrange values. It covers any legal opinions, and no other content is allowed:

< pre>CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz)
RETURNS TABLE (hoo_hours tsrange) AS
$func$
DECLARE
ts_from timestamp := f_hoo_time(_from);
ts_to timestamp := f_hoo_time(_to);
BEGIN
- test input for sanity (optional)
IF _to <= _from THEN
RAISE EXCEPTION'%' ,'_to must be later than _from!';
ELSIF _to> _from + interval '1 week' THEN
RAISE EXCEPTION'%','Interval cannot span more than a week!';
END IF;

IF ts_from> ts_to THEN - split range at Mon 00:00
RETURN QUERY
VALUES (tsrange('1996-01-01 0:0' , ts_to,'[]'))
, (tsrange(ts_from, '1996-01-08 0:0','[]'));
ELSE - simple case: range in standard wee k
hoo_hours := tsrange(ts_from, ts_to,'[]');
RETURN NEXT;
END IF;

RETURN;
END< br />$func$ LANGUAGE plpgsql IMMUTABLE COST 1000 ROWS 1;

To insert a single input line:

INSERT INTO hoo(shop_id, hours)< br />SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');

If the range needs to be on Monday 00: Split by 00, two rows will be generated.

To insert multiple input rows:

INSERT INTO hoo(shop_id, hours)
SELECT id, hours
FROM (
VALUES (7, timestamp '2016-01-11 00:00', timestamp '2016-01-11 08:00')
, (8, ' 2016-01-11 00:00', '2016-01-11 08:00')
) t(id, f, t), f_hoo_hours(f, t) hours; - LATERAL join

About implicit LATERAL connection:

> What is the difference between LATERAL and a subquery in PostgreSQL?

Inquiry

The adjusted design , Your entire huge, complex, and expensive query can be replaced with...:

Select*
From hoo
Where@> f_hoo_time(now());

< p>For a little suspense, I put a spoiler on the solution. Move the mouse over it.

The query is supported by the GiST index and is fast, even for large tables So.

SQL Fiddle (more examples).

< p>If you want to calculate the total operating hours (per store), here is a recipe:

> Calculate working hours between 2 dates in PostgreSQL

Index and performance

You can use GiST or SP-GiST index to support containment operator for range types. Both can be used to implement exclusion constraints, but only GiST supports multicolumn indexes:

Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

and order of index columns matters:

A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values , even
if there are many distinct values ​​in additional columns.

So we have a conflict of interest here. For large tables, shop_id will have more distinct values ​​instead of hours .

>has a leading shop_id The GiST index is faster to write and enforce exclusion constraints.
>But we are searching for the hour column in the query. It would be better to have that column first.
>If we need to find shop_id in other queries, Then the ordinary btree index is much faster.
>The most important thing is that I found that the SP-GiST index only takes a few hours to get the fastest query speed.

Benchmark

My script generates dummy data:

INSERT INTO hoo(shop_id, hours)
SELECT id, hours
FROM generate_series(1, 30000) id, generate_series(0, 6) d
, f_hoo_hours(((date '1996-01-01' + d) + interval '4h' + interval '15 min' * trunc(32 * random() )) AT TIME ZONE'UTC'
, ((date '1996-01-01' + d) + interval '12h' + interval '15 min' * trunc(64 * random() * random()) ) AT TIME ZONE'UTC') AS hours
WHERE random()> .33;

The result is 141k randomly generated rows, 30k different shop_id, 12k different hours. (Usually The difference will be even greater.) The table size is 8 MB.

I deleted and recreated the exclusion constraint:

ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
EXCLUDE USING gist (shop_id WITH =, hours WITH &&); - 4.4 sec !!

ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
EXCLUDE USING gist (hours WITH &&, shop_id = ); - 16.4 sec

shop_id First ~4 times faster.

In addition, I tested more than two reading performances:

CREATE INDEX hoo_hours_gist_idx on hoo USING gist (hours) ;
CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours); - !!

After VACUUM FULL ANALYZE hoo;, I ran two queries:

> Q1: Late at night, only 53 rows were found
> Q2: In the afternoon, 2423 rows were found.

Results

Each has an index-only scan (except of course "no index" ):

index idx size Q1 Q2
------------------------- -----------------------
no index 41.24 ms 41.2 ms
gist (shop_id, hours) 8MB 14.71 ms 33.3 ms
gist (hours, shop_id) 12MB 0.37 ms 8.2 ms
gist (hours) 11MB 0.34 ms 5.1 ms
spgist (hours) 9MB 0.29 ms 2.0 ms - !!

> For queries with few search results, SP-GiST and GiST are the same (for a very small number of people, GiST is even faster).
> SP-GiST is better with more and more results Expanded, and also smaller.

If you read much more than you wrote (typical use cases), follow the suggestions at the beginning to keep the exclusion constraints and create an additional SP-GiST index to Optimize reading performance.

Leave a Comment

Your email address will not be published.