Looking for the build forest (with recursive, postgreSQL 9.5)

I have a table of identities (ie aliases) for any number of people. Each row has a previous name and a new name. In production, there are approximately 1M rows. For example:

id, old, new
---
1,'Albert','Bob'
2,'Bob ','Charles'
3,'Mary','Nancy'
4,'Charles','Albert'
5,'Lydia','Nancy'
6, 'Zoe','Zoe'

What I want is to generate a list of users and refer to their respective identities. This is similar to finding all the nodes in each graph of the connection ID, or finding the generation forest:< /p>

User 1: Albert, Bob, Charles (identities: 1,2,4)
User 2: Mary, Nancy, Lydia (identities: 3,5)< br />User 3: Zoe (identities: 6)

I have been patching the WITH RECURSIVE of PostgreSQL, but it generates sets and subsets for each. For example:

1,2,4 <-- spanning tree: good
2 <-- subset: discard
3,5 <-- spanning tree: good
4 <-- subset: discard
5 <-- subset: discard
6 <-- spanning tree: good

What do I need to do to generate a complete set of identities for each user (i.e. generate Tree)?

SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4 My latest attempt. Here is the code:

WITH RECURSIVE search_graph AS (
SELECT id
, id AS min_id
, ARRAY[id] AS path
, ARRAY[old,new] AS emails
FROM identities

UNION

SELECT identities.id
, LEAST(identities.id, sg.min_id)
, (sg.path || identities.id)
, (sg.emails || identities.old || identities.new)

FROM search_graph sg
JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
WHERE identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;

The results are as follows:

1,2,4
2
3,5
4< br />5
6

I just wrote an answer to a similar question: How to find all connected subgraphs of an undirected graph. In that question, I used SQL Server. For a detailed explanation of the intermediate CTE, please refer to this answer. I adapted the query to Postgres. < p>

Use the Postgres array function to write it more efficiently instead of concatenating paths to text columns.

WITH RECURSIVE
CTE_Idents
AS
(
SELECT old AS Ident
FROM identities

UNION

SELECT new AS Ident
FROM identities
)
,CTE_Pairs
AS
(
SELECT old AS Ident1, new AS Ident2
FROM identities
WHERE old <> new< br />
UNION

SELECT new AS Ident1, old AS Ident2
FROM identities
WHERE old <> new
)
,CTE_Recursive
AS
(
SELECT
CTE_Idents.Ident AS AnchorIdent
, Ident1
, Ident2
,',' || Ident1 || ' ,'|| Ident2 ||',' AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1
< br /> UNION ALL

SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs .Ident2
, CTE_Recursive.IdentPath || CTE_Pairs.Ident2 ||',' AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 ||',%')
)
, CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult

UNION

SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
,CTE_Groups
AS
(
SELECT
CTE_Idents.Ident
,array_agg(COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)
ORDER BY COALESCE (CTE_CleanResult.Ident, CTE_Idents.Ident)) AS AllIdents
FROM
CTE_Idents
LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
GROUP BY CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;

< p>I added a row (7,X,Y) to the sample data.

SQL Fiddle

Result

| allidents |
|--------------------|
| Lydia,Mary,Nancy |
| Albert,Bob,Charles |
| X,Y |
| Zoe |

I have an identity table (i.e. aliases) for any number of people. Each row has A previous name and a new name. In production, there are about 1M lines. For example:

id, old, new
---< br />1,'Albert','Bob'
2,'Bob','Charles'
3,'Mary','Nancy'
4,'Charles','Albert '
5,'Lydia','Nancy'
6,'Zoe','Zoe'

What I want is to generate a list of users and quote their respective identities. This It is similar to finding all nodes in each graph of the connection ID, or finding the spawning forest:

User 1: Albert, Bob, Charles (identities: 1,2,4) 
User 2: Mary, Nancy, Lydia (identities: 3,5)
User 3: Zoe (identities: 6)

I have been patching PostgreSQL WITH RECURSIVE, but it Sets and subsets are generated for each. For example:

1,2,4 <-- spanning tree: good
2 <-- su bset: discard
3,5 <-- spanning tree: good
4 <-- subset: discard
5 <-- subset: discard
6 <-- spanning tree: good

What do I need to do to generate a complete identity set (ie spanning tree) for each user?

SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4 My latest attempt. Here is the code:

WITH RECURSIVE search_graph AS (
SELECT id
, id AS min_id
, ARRAY[id] AS path
, ARRAY[old,new] AS emails
FROM identities

UNION

SELECT identities.id
, LEAST(identities.id, sg.min_id)
, (sg.path || identities.id)
, (sg.emails || identities.old || identities.new)

FROM search_graph sg
JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
WHERE identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;

The results are as follows:

1,2,4
2
3,5
4< br />5
6

I just wrote an answer to a similar question: How to find all connected subgraphs of an undirected graph. In that In the question, I used SQL Server. For a detailed description of the intermediate CTE, please refer to this answer. I adapted the query to Postgres.

Use the Postgres array function to write it more efficiently, Instead of concatenating the path to the text column.

WITH RECURSIVE
CTE_Idents
AS
(
SELECT old AS Ident
FROM identities

UNION

SELECT new AS Ident
FROM identities
)
,CTE_Pairs
AS
(
SELECT old AS Ident1, new AS Ident2
FROM identities
WHERE old <> new

UNION

SELECT new AS Ident1, old AS Ident2
FROM identities
WHERE old <> new
)
,CTE_Recursive
AS
(
SELECT
CTE_Idents.Ident AS AnchorIdent
, Ident1
, Ident2
,',' || Ident1 ||',' || Ident2 ||',' AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents .Ident = CTE_Pairs.Ident1

UNION ALL

SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CTE_Recursive.IdentPath || CTE_Pairs.I dent2 ||',' AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 ||',%')
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult

UNION

SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
,CTE_Groups
AS
(
SELECT
CTE_Idents.Ident
,array_agg(COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)
ORDER BY COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)) AS AllIdents
FROM
CTE_Idents
LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
GROUP B Y CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;

I added a row (7 ,X,Y).

SQL Fiddle

Result

| allidents |
|----- ---------------|
| Lydia,Mary,Nancy |
| Albert,Bob,Charles |
| X,Y |
| Zoe |

Leave a Comment

Your email address will not be published.