Recursive CTE: The Elegance of Simplicity in SQL – 3
Introduction
As we have already seen in other Articles, Common Table Expressions (CTE) in SQL on IBM iSeries (AS/400), but also present in many other versions on other non-IBM systems, represent a powerful and flexible tool for creating temporary queries and reusable, simplifying and improving the readability of complex SQL queries.
Below are the articles in the CTE series (including this current one):
In particular, Recursive Common Table Expressions (CTE) are a powerful tool in SQL that allows you to perform operations which, in the absence of this functionality, would require the development of specific programs or the use of complex loops.
In this Article we will see a further extension of the example CTE Ricorsive: L’Eleganza della Semplicità in SQL – 1 and we will see how a SQL command using Recursive CTE can give results that would normally require many lines of code if solved with a traditional programmatic approach.
It must be said that the use of recursive CTE is not trivial and must be approached carefully but if it is understood the approach can provide concise and elegant solutions to solve even complex problems.
Description Scenario and objective
Let’s imagine a network diagram organized in a tree structure, where both the nodes and the dependency relationships that exist between them are illustrated.
These nodes can symbolize various elements, such as the departments of a company, the employees of an organization, or the nodes of an electronic circuit, where the dependency relationships between different processors are delineated, and so on.
Let’s assume we want to compile a list of all the unique paths that start from the root and reach the leaves in the node tree, also including those ‘isolated’ nodes, i.e. those without dependencies or connections with other nodes.
Below is a diagram of the tree structure:
In this structure there are ‘isolated’ nodes, i.e. without ties, and nodes with one or more ties.
The result we want to obtain is the list of all unique paths (that are not included in other paths) that start from a root (node without a ‘father’) and reach the terminal leaves (nodes without ‘children’) in the tree of nodes, also including those ‘isolated’ nodes, i.e. without dependencies or connections with other nodes.
Query
WITH -- starting data TreeStructure (NodeID, Description, ParentNodeID) AS ( VALUES ('I011', 'Description_011', NULL) , ('I012', 'Description_012', NULL) , ('I013', 'Description_013', NULL) , ('I021', 'Description_021', 'I013') , ('I022', 'Description_022', 'I013'), ('I023', 'Description_023', 'I013') , ('I024', 'Description_024', 'I013'), ('I031', 'Description_031', 'I022') , ('I032', 'Description_032', 'I022'), ('I041', 'Description_041', 'I031') , ('I042', 'Description_042', 'I031'), ('I043', 'Description_043', 'I031') , ('I044', 'Description_044', 'I032'), ('I045', 'Description_045', 'I032') , ('I046', 'Description_046', 'I032'), ('I051', 'Description_051', 'I042') , ('I052', 'Description_052', 'I046'), ('I061', 'Description_061', 'I051') , ('I062', 'Description_062', 'I051'), ('I063', 'Description_063', 'I051') , ('I064', 'Description_064', 'I052'), ('I065', 'Description_065', 'I052') , ('I066', 'Description_066', 'I052'), ('I033', 'Description_033', 'I023') , ('I034', 'Description_034', 'I024'), ('I035', 'Description_035', 'I024') ) , Routes(NodeID, Path) AS ( -- Parte ancorata: seleziona i nodi radice SELECT NodeID, CAST(NodeID AS VARCHAR(1024)) FROM TreeStructure WHERE ParentNodeID IS NULL UNION ALL -- Parte ricorsiva: aggiungi il nodo corrente al Path SELECT nd.NodeID, p.Path concat '->' concat nd.NodeID FROM TreeStructure nd JOIN Routes p ON nd.ParentNodeID = p.NodeID ) , Routes2(NodeID, Path) AS ( SELECT NodeID, SUBSTR(Path, 1, length(Path)-6) FROM Routes WHERE LENGTH(Path) > 6 UNION ALL SELECT NodeID, Path FROM Routes WHERE LENGTH(Path) <= 6 ) , Routes3(NodeID, Path) AS ( SELECT * FROM Routes P1 WHERE NOT EXISTS ( SELECT 1 from Routes2 P2 WHERE P2.Path = P1.Path) ORDER BY 2 ) -- SELECT * FROM Routes3; SELECT * FROM Routes3 UNION ALL SELECT NodeID, NodeID FROM TreeStructure N1 WHERE NOT EXISTS ( SELECT 1 FROM TreeStructure N2 WHERE N1.NodeID = N2.ParentNodeID) AND N1.ParentNodeID IS NULL ORDER BY 2;
Query result
NODEID | PATH |
I011 | I011 |
I012 | I012 |
I021 | I013->I021 |
I041 | I013->I022->I031->I041 |
I061 | I013->I022->I031->I042->I051->I061 |
I062 | I013->I022->I031->I042->I051->I062 |
I063 | I013->I022->I031->I042->I051->I063 |
I043 | I013->I022->I031->I043 |
I044 | I013->I022->I032->I044 |
I045 | I013->I022->I032->I045 |
I064 | I013->I022->I032->I046->I052->I064 |
I065 | I013->I022->I032->I046->I052->I065 |
I066 | I013->I022->I032->I046->I052->I066 |
I033 | I013->I023->I033 |
I034 | I013->I024->I034 |
I035 | I013->I024->I035 |
Detailed description of the query
The SQL query described is an example of a recursive Common Table Expression (CTE) used to process a tree structure. The query is split into multiple parts, each of which builds on the previous one to arrive at the final result.
1) Definition of the Tree Structure:
- This initial CTE, TreeStructure, defines the starting data, representing the tree of nodes. Each node is defined by a unique NodeID, a Description, and a ParentNodeID indicating the parent node (if it exists; if NULL, the node is a root).
2) Generation of Routes:
- The Routes CTE starts with the ‘anchored’ part that selects all root nodes (nodes without a parent node).
- Then, through the ‘recursive’ part, the query expands by progressively adding the child nodes to the paths, concatenating the current NodeID to the Path of the parent node.
3) Cleaning of Routes (Routes2)
- The Routes2 CTE is an intermediate table that is used to identify any routes that are included in other routes (sub-routes). In practice, in order to identify the sub-routes, this intermediate table is created as a copy of the Routes table in which the Path column is modified by removing the last node (thus creating sub-routes to compare them).
4) Selection of Single Routes (Routes3)
- Routes3 selects the final routes by excluding those that are sub-routes of other existing routes, ensuring that each route is unique and is not included in another longer route.
5) Final results
- Finally, the query selects all unique routes from the latest CTE (Routes3).
- It also adds ‘isolated’ nodes, which are root nodes with no children, using a UNION ALL to join these nodes to the path list. Isolated nodes are identified by comparing TreeStructure with itself to find nodes that are not parents of other nodes (i.e., have no children) and that do not have a parent node (they are root nodes).
- The final part of the query sorts all the results based on the Path column to present the list in a readable order.
In summary, the query is a sophisticated way of transforming a tree structure represented in a database table into a list of unique paths from every root node to every leaf node, including isolated nodes.
This technique is often used in databases that support recursive CTE to manipulate hierarchical data or derive information from complex tree structures.
To run the script you need to make sure that the specific syntax (concat, SUBSTR, LENGTH) is supported by the DBMS.
IBM iSeries uses a slightly different syntax than other SQL systems for some functions. As always, keep in mind that actual implementation and functionality may vary depending on the version of the iSeries operating system and DB2 database. It is always good practice to refer to the official IBM documentation for more precise and detailed information.