Group of time and space proximity trajectories in R or PostgreSQL

I am using R and PostgreSQL to perform some trajectory analysis. In order to form a group of trajectory segments whose continuous positions are close in time and space, I created the following table. What I still lack is the column group_id, which This is where my problem lies.

bike_id1 datetime bike_id2 near group_id
1 2016-05-28 11:00:00 2 TRUE 1
1 2016-05-28 11:00:05 2 TRUE 1
1 2016-05-28 11:00:10 2 FALSE NA
[...]
2 2016-05- 28 11:00:05 3 TRUE 1
2 2016-05-28 11:00:10 3 TRUE 1

This is the combination of each track and other tracks (all combinations without repetition) The result of multiple comparisons between and the internal connection of the date and time (always sampling on multiples of 5 seconds). It shows that for some positions, bicycles 1 and 2 are sampled at the same time and are close in space (some arbitrary threshold) .

Now I want to give a unique ID to the section (group_id) where two bicycles are close in time. This is where I am stuck: I want group_id to respect groups with multiple tracks The method of assigning group_id should realize that if bicycles 1 and 2 are in a group at 2016-05-28 11:00:05, then if they are close to 2 at the same timestamp, then 3 belongs to the same group (2016-05-28 11 :00:05).

Is there a tool in R or PostgreSQL that can help me with this task? Running a loop in a table seems to be the wrong way.

Edit:
As @wildplasser pointed out, this seems to be a gap and island problem traditionally solved with SQL. He kindly made it Some sample data I have slightly expanded and will be included in the question.

CREATE TABLE nearness
- (seq SERIAL NOT NULL UNIQUE - surrogate for conveniance
(bike1 INTEGER NOT NULL
, bike2 INTEGER NOT NULL
, stamp timestamp NOT NULL
, near boolean
, PRIMARY KEY(bike1,bike2,stamp)
);
INSERT INTO nearness( bike1,bike2,stamp,near) VALUES
(1,2, '2016-05-28 11:00:00', TRUE)
, (1,2, '2016-05-28 11:00:05', TRUE)
,(1,2, '2016-05-28 11:00:10', TRUE)
, (1,2, '2016-05-28 11:00:20', TRUE) - <<-- gap here
,(1,2, '2016-05-28 11:00:25' , TRUE)
,(1,2, '2016-05-28 11:00:30', FALSE)
,(4,5, '2016-05-28 11:00:00' , FALSE)
,(4,5, '2016-05-28 11:00:05', FALSE)
,(4,5, '2016-05-28 11:00:10' , TRUE)
,(4,5, '2016-05-28 11:00:15', TRUE)
,(4,5, '2016-05-28 11:00:20' , TRUE)
,(2,3, '2016-05-28 11:00:05', TRUE)- -<<-- bike 1, 2, 3 are in one grp @ 11:00:05
,(2,3, '2016-05-28 11:00:10', TRUE) - << - no group here
,(6,7, '2016-05-28 11:00:00', FALSE)
,(6,7, '2016-05-28 11:00: 05', FALSE)
;

Update: [After understanding the real problem ; –] Finding the equivalence group of bicycles (set, bike_set) is actually a relational division problem. Finding the beginning and end of a cluster (clust) in a group of bicycles is basically the same as in the first attempt.

>The cluster is stored in an array: (I believe the cluster will not become too large)
>The array is constructed by a recursive query: each pair of bicycles sharing a member of the current cluster will be merged into it .
>Finally, the array contains all the bike_ids that happen to be within reach at a specific time.
>(plus some middle rows that need to be suppressed by uniq CTE later)
>The rest is in the time series or More or less standard gap detection.

Note: code trust (bike2> bike1). This needs to keep the array sorted to be standardized. The actual content is not guaranteed to be standardized, because there is no guarantee that the recursive query will be added Sequence. This may require some extra work.

CREATE TABLE nearness
(bike1 INTEGER NOT NULL
, bike2 INTEGER NOT NULL
, stamp timestamp NOT NULL
, near boolean
, PRIMARY KEY(bike1,bike2,stamp)
);
INSERT INTO nearness( bike1,bike2,stamp,near) VALUES
(1,2, '2016-05-28 11:00:00', TRUE)
,(1,2, '2016-05-28 11: 00:05', TRUE)
,(1,2, '2016-05-28 11:00:10', TRUE)
,(1,2, '2016-05-28 11: 00:20', TRUE) - <<-- gap here
,(1,2, '2016-05-28 11:00:25', TRUE)
,(1,2, '2016-05-28 11:00:30', FALSE) - <<-- these False-records serve no pupose
,(4,5, '2016-05-28 11:00:00' , FALSE) - <<-- result would be the same without them
,(4,5, '2016-05-28 11:00:05', FALSE)
,(4,5 , '2016-05-28 11:00:10', TRUE)
,(4,5, '2016-05-28 11:00:15', TRUE)
,(4,5 , '2016-05-28 11:00:20', TRUE)
,(2,3, '2016-05-28 11:00:05', TRUE) - <<-- bike 1, 2, 3 are in one grp @ 11:00:05
,(2,3, '2016-05-28 11:00:10', TRUE) - <<-- no group here
,(6,7, '2016-05-28 11:00:00', FALSE)
,(6,7, '2016-05-28 11:00:05', FALSE)
;


- Recursive union-find to glue together sets of bike_ids
- ,occuring at the same moment.
- Sets are represented as { ordered,unique} arrays here
WITH RECURSIVE wood AS (
WITH omg A S (
SELECT bike1 ,bike2,stamp
, row_number() OVER(ORDER BY bike1,bike2,stamp) AS seq
, ARRAY[bike1,bike2]::integer[] AS arr
FROM nearness n WHERE near = True
)
- Find all existing combinations of bikes
SELECT o1.stamp, o1.seq
, ARRAY[o1.bike1 ,o1.bike2]::integer[] AS arr
FROM omg o1
UNION ALL
SELECT o2.stamp, o2.seq - avoid duplicates inside the array
, CASE when o2.bike1 = ANY(w.arr) THEN w.arr || o2.bike2
ELSE w.arr || o2.bike1 END AS arr
FROM omg o2
JOIN wood w
ON o2.stamp = w.stamp AND o2.seq> w.seq
AND (o2.bike1 = ANY(w.arr) OR o2.bike2 = ANY(w.arr))
AND NOT (o2.bike1 = ANY(w.arr) AND o2.bike2 = ANY(w.arr))
)
, uniq AS (- suppress partial sets caused by the recursive union-find buildup
SELECT * FROM wood w
WHERE NOT EXISTS (SELECT * FROM wood nx
WHERE nx.stamp = w.stamp
AND nx. arr @> w.arr AND nx.arr <> w.arr - contains but not equal
)
)
, xsets AS (- make unique sets of bikes
SELECT DISTINCT arr
-, MIN(seq) AS grp
FROM uniq
GROUP BY arr
)
, sets AS (- enumerate the sets of bikes< br /> SELECT arr
, row_number() OVER () AS setnum
FROM xsets
)
, drag AS (- Detect beginning and end of segments of consecutive observations
SELECT u.* - within a constant set of bike_ids
- Edge-detection begin of group
, NOT EXISTS (SELECT * FROM uniq nx
WHERE nx.arr = u. arr
AND nx.stamp AND nx.stamp >= u.stamp-'5 sec'::interval
) AS is_first
- Edge-detection end of group
, NOT EXISTS (SELECT * FROM uniq nx< br /> WHERE nx.arr = u.arr
AND nx.stamp> u.stamp
AND nx.stamp <= u.stamp + '5 sec'::interval
) AS is_last
, row_number() OVER(ORDER BY arr,stamp) AS nseq
FROM uniq u
)
, top AS (- id and groupnum for the start of a group
SELECT nseq
, row_number() OVER () AS clust
FROM drag
WHERE is_first
)
, bot AS (- id and groupnum for the end of a group
SELECT nseq
, row_number() OVER () AS clust
FROM drag
WHERE is_last
)
SELECT w.seq as orgseq - results, please ...
, w.stamp
, g0.clust AS clust
, row_number() OV ER(www) AS rn
, s.setnum, s.arr AS bike_set
FROM drag w
JOIN sets s ON s.arr = w.arr
JOIN top g0 ON g0.nseq <= w.seq
JOIN bot g1 ON g1.nseq >= w.seq AND g1.clust = g0.clust
WINDOW www AS (PARTITION BY g1.clust ORDER BY w.stamp )
ORDER BY g1.clust, w.stamp
;

Result:

orgseq | stamp | clust | rn | setnum | bike_set 
--------+---------------------+-------+---- +--------+----------
1 | 2016-05-28 11:00:00 | 1 | 1 | 1 | {1,2}
4 | 2016-05-28 11:00:20 | 3 | 1 | 1 | {1,2}
5 | 2016-05-28 11:00:25 | 3 | 2 | 1 | { 1,2}
6 | 2016-05-28 11:00:05 | 4 | 1 | 3 | {1,2,3}
7 | 2016-05-28 11:00:10 | 4 | 2 | 3 | {1,2,3}
8 | 2016-05-28 11:00:10 | 4 | 3 | 2 | {4,5}
(6 rows)

I am using R and PostgreSQL to perform some trajectory analysis. In order to form a group of trajectory segments whose continuous positions are close in time and space, I Created the following table. What I am still missing is the column group_id, which is my problem.

bike_id1 datetime bike_id2 near group_id
1 2016-05 -28 11:00:00 2 TRUE 1
1 2016-05-28 11:00:05 2 TRUE 1
1 2016-05-28 11:00:10 2 FALSE NA
[...]
2 2016-05-28 11:00:05 3 TRUE 1
2 2016-05-28 11:00:10 3 TRUE 1

This is The result of multiple comparisons between each trajectory and other trajectories (all combinations without repetition) and the internal connection of the date and time (always sampled on multiples of 5 seconds). It shows that for some positions, bikes 1 and 2 Sampled at the same time and close in space (some arbitrary threshold).

Now I want to give a unique ID to the segment (group_id) where two bicycles are close in time. This is how I am stuck. The place: I want group_id to respect groups with multiple trajectories. The method of assigning group_id should be aware that if bikes 1 and 2 are in a group at 2016-05-28 11:00:05, then if they are close at the same timestamp 2, then 3 belong to the same group (2016-05-28 11:00:05).

Is there a tool in R or PostgreSQL that can help me accomplish this task? Running a loop in a table seems to be the wrong way.

Edit:
As @wildplasser pointed out, this seems to be a gap and island problem traditionally solved with SQL. He kindly made it Some sample data I have slightly expanded and will be included in the question.

CREATE TABLE nearness
- (seq SERIAL NOT NULL UNIQUE - surrogate for conveniance
(bike1 INTEGER NOT NULL
, bike2 INTEGER NOT NULL
, stamp timestamp NOT NULL
, near boolean
, PRIMARY KEY(bike1,bike2,stamp)
);
INSERT INTO nearness( bike1,bike2,stamp,near) VALUES
(1,2, '2016-05-28 11:00:00', TRUE)
, (1,2, '2016-05-28 11:00:05', TRUE)
,(1,2, '2016-05-28 11:00:10', TRUE)
, (1,2, '2016-05-28 11:00:20', TRUE) - <<-- gap here
,(1,2, '2016-05-28 11:00:25' , TRUE)
,(1,2, '2016-05-28 11:00:30', FALSE)
,(4,5, '2016-05-28 11:00:00' , FALSE)
,(4,5, '2016-05-28 11:00:05', FALSE)
,(4,5, '2016-05-28 11:00:10' , TRUE)
,(4,5, '2016-05-28 11:00:15', TRUE)
,(4,5, '2016-05-28 11:00:20' , TRUE)
,(2,3, '2016-05-28 11:00:05', TRUE) - < <-- bike 1, 2, 3 are in one grp @ 11:00:05
,(2,3, '2016-05-28 11:00:10', TRUE) - <<-- no group here
,(6,7, '2016-05-28 11:00:00', FALSE)
,(6,7, '2016-05-28 11:00:05' , FALSE)
;

Update: [After understanding the real problem; –] Find the equivalence group of bicycles (set, bike_set) It is actually a relational division problem. Finding the beginning and end of a cluster in a group of bicycles is basically the same as in the first attempt.

>The cluster is stored in an array: (I believe the cluster will not become too big)
>The array is constructed by a recursive query: every pair of bicycles that share a member with the current cluster will be merged into it.
>Finally, the array contains tentacles that happen to be at a specific time All bike_id available.
>(plus some middle rows that need to be suppressed by uniq CTE later)
>The rest is more or less standard gap detection in the time series.

Note: code trust (bike2> bike1). This needs to keep the array sorted to be standardized. The actual content is not guaranteed to be standardized, because the order of addition in the recursive query cannot be guaranteed. This may require some extra work.

CREATE TABLE nearness
(bike1 INTEGER NOT NULL
, bike2 INTEGER NOT NULL
, stamp timestamp NOT NULL
, near boolean
>, PRIMARY KEY(bike1,bike2,stamp)
);
INSERT INTO nearness( bike1,bike2,stamp,near) VALUES
(1,2, '2016-05-28 11 :00:00', TRUE)
,(1,2, '2016-05-28 11:00:05', TRUE)
,(1,2, '2016-05-28 11 :00:10', TRUE)
,(1,2, '2016-05-28 11:00:20', TRUE) - <<-- gap here
,(1,2, '2016-05-28 11:00 :25', TRUE)
,(1,2, '2016-05-28 11:00:30', FALSE) - <<-- these False-records serve no pupose
,( 4,5, '2016-05-28 11:00:00', FALSE) - <<-- result would be the same without them
,(4,5, '2016-05-28 11: 00:05', FALSE)
,(4,5, '2016-05-28 11:00:10', TRUE)
,(4,5, '2016-05-28 11: 00:15', TRUE)
,(4,5, '2016-05-28 11:00:20', TRUE)
,(2,3, '2016-05-28 11: 00:05', TRUE) - <<-- bike 1, 2, 3 are in one grp @ 11:00:05
,(2,3, '2016-05-28 11:00:10 ', TRUE) - <<-- no group here
,(6,7, '2016-05-28 11:00:00', FALSE)
,(6,7, '2016 -05-28 11:00:05', FALSE)
;


- Recursive union-find to glue together sets of bike_ids
- ,occuring at the same moment.
- Sets are represented as {ordered,unique} arrays here
WITH RECURSIVE wood AS (
WITH omg AS (
SELECT bike1 ,bike2,stamp< br /> , row_number() OVER(ORDER BY bike1,bike2,stamp) AS seq
, ARRAY[bike1,bike2]::integer[] AS arr
FROM nearness n WHERE near = True
)
- Find all existing combinations of bikes
SELECT o1.stamp, o1.seq
, ARRAY[o1.bike1,o1.bike2]::integer[] AS arr
FROM omg o1
UNION ALL
SELECT o2.stamp, o2.seq - avoid duplicates inside the array
, CASE when o2.bike1 = ANY(w.arr) THEN w.arr | | o2.bike2
ELSE w.arr || o2.bike1 END AS arr
FROM omg o2
JOIN wood w
ON o2.stamp = w.stamp AND o2.seq > w.seq
AND (o2.bike1 = ANY(w.arr) OR o2.bike2 = ANY(w.arr))
AND NOT (o2.bike1 = ANY(w.arr) AND o2.bike2 = ANY(w.arr))
)
, uniq AS (- suppress partial sets caused by the recursive union-find buildup
SELECT * FROM wood w
WHERE NOT EXISTS (SELECT * FROM wood nx
WHERE nx.stamp = w.stamp
AND nx.arr @> w.arr AND nx.arr <> w .arr - contains but not equal
)
)
, xsets AS (- make unique sets of bikes
SELECT DISTINCT arr
-, MIN(seq ) AS grp
FROM uniq
GROUP BY arr
)
, sets AS (- enumerate the sets of bikes
SELECT arr
, row_number() OVER () AS setnum
FROM xsets
)
, drag AS (- Detect beginning and end of segments of consecutive observations
SELECT u.* - within a constant set of bike_ids
- Edge-detection begin of group
, NOT EXISTS (SELECT * FROM uniq nx
WHERE nx.arr = u.arr
AND nx.stamp AND nx.stamp >= u.stamp-'5 sec'::interval
) AS is_first
- Edge-detection end of group
, NOT EXISTS (SELECT * FROM uniq nx
WHERE nx.arr = u.arr
AND nx .stamp> u.stamp
AND nx.stamp <= u.stamp + '5 sec'::interval
) AS is_last
, row_number() OVER(ORDER BY arr,stamp) AS nseq
FROM uniq u
)
, top AS (- id and groupnum for the start of a group
SELECT nseq
, row_number() OVER () AS clust
FROM drag
WHERE is_first
)
, bot AS (- id and groupnum for the end of a group
SELECT nseq
, row_number () OVER () AS clust
FROM drag
WHERE is_last
)
SELECT w.seq as orgseq - results, please ...
, w.stamp
, g0.clust AS clust
, row_number() OVER(www) AS rn
, s.setnum, s.arr AS bike_set FROM drag w
JOIN sets s ON s.arr = w.arr
JOIN top g0 ON g0.nseq <= w.seq
JOIN bot g1 ON g1.nseq >= w.seq AND g1.clust = g0.clust
WINDOW www AS (PARTITION BY g1.clust ORDER BY w.stamp)
ORDER BY g1.clust, w.stamp
;

Result:

orgseq | stamp | clust | rn | setnum | bike_set 
--------+----- ----------------+-------+----+--------+----------< br /> 1 | 2016-05-28 11:00:00 | 1 | 1 | 1 | {1,2}
4 | 2016-05-28 11:00:20 | 3 | 1 | 1 | {1,2}
5 | 2016-05-28 11:00:25 | 3 | 2 | 1 | {1,2}
6 | 2016-05-28 11:00:05 | 4 | 1 | 3 | {1,2,3}
7 | 2016-05-28 11:00:10 | 4 | 2 | 3 | {1,2,3}
8 | 2016 -05-28 11:00:10 | 4 | 3 | 2 | {4,5}
(6 rows)

Leave a Comment

Your email address will not be published.