Recursive CTE: The Elegance of Simplicity in SQL – 3

Introduction

As we have already seen in other Articles, Common Table Expressions (CTE) in SQL on IBM iSeries (AS/400), but also present in many other versions on other non-IBM systems, represent a powerful and flexible tool for creating temporary queries and reusable, simplifying and improving the readability of complex SQL queries.

Below are the articles in the CTE series (including this current one):

In particular, Recursive Common Table Expressions (CTE) are a powerful tool in SQL that allows you to perform operations which, in the absence of this functionality, would require the development of specific programs or the use of complex loops.

In this Article we will see a further extension of the example CTE Ricorsive: L’Eleganza della Semplicità in SQL – 1 and we will see how a SQL command using Recursive CTE can give results that would normally require many lines of code if solved with a traditional programmatic approach.

It must be said that the use of recursive CTE is not trivial and must be approached carefully but if it is understood the approach can provide concise and elegant solutions to solve even complex problems.

Description Scenario and objective

Let’s imagine a network diagram organized in a tree structure, where both the nodes and the dependency relationships that exist between them are illustrated.

These nodes can symbolize various elements, such as the departments of a company, the employees of an organization, or the nodes of an electronic circuit, where the dependency relationships between different processors are delineated, and so on.

Let’s assume we want to compile a list of all the unique paths that start from the root and reach the leaves in the node tree, also including those ‘isolated’ nodes, i.e. those without dependencies or connections with other nodes.

Below is a diagram of the tree structure:

In this structure there are ‘isolated’ nodes, i.e. without ties, and nodes with one or more ties.

The result we want to obtain is the list of all unique paths (that are not included in other paths) that start from a root (node ​​without a ‘father’) and reach the terminal leaves (nodes without ‘children’) in the tree of nodes, also including those ‘isolated’ nodes, i.e. without dependencies or connections with other nodes.

Query

WITH -- starting data
     TreeStructure (NodeID, Description, ParentNodeID) AS (
         VALUES  ('I011', 'Description_011',  NULL) , ('I012', 'Description_012',  NULL)
               , ('I013', 'Description_013',  NULL) , ('I021', 'Description_021', 'I013')
               , ('I022', 'Description_022', 'I013'), ('I023', 'Description_023', 'I013')
               , ('I024', 'Description_024', 'I013'), ('I031', 'Description_031', 'I022')
               , ('I032', 'Description_032', 'I022'), ('I041', 'Description_041', 'I031')
               , ('I042', 'Description_042', 'I031'), ('I043', 'Description_043', 'I031')
               , ('I044', 'Description_044', 'I032'), ('I045', 'Description_045', 'I032')
               , ('I046', 'Description_046', 'I032'), ('I051', 'Description_051', 'I042')
               , ('I052', 'Description_052', 'I046'), ('I061', 'Description_061', 'I051')
               , ('I062', 'Description_062', 'I051'), ('I063', 'Description_063', 'I051')
               , ('I064', 'Description_064', 'I052'), ('I065', 'Description_065', 'I052')
               , ('I066', 'Description_066', 'I052'), ('I033', 'Description_033', 'I023')
               , ('I034', 'Description_034', 'I024'), ('I035', 'Description_035', 'I024')
)
, Routes(NodeID, Path) AS (
    -- Parte ancorata: seleziona i nodi radice
    SELECT NodeID, CAST(NodeID AS VARCHAR(1024))
    FROM TreeStructure
    WHERE ParentNodeID IS NULL
    UNION ALL
    -- Parte ricorsiva: aggiungi il nodo corrente al Path
    SELECT nd.NodeID, p.Path concat '->' concat nd.NodeID
    FROM TreeStructure nd
    JOIN Routes p ON nd.ParentNodeID = p.NodeID
)
, Routes2(NodeID, Path) AS (
    SELECT NodeID, SUBSTR(Path, 1, length(Path)-6)
      FROM Routes
      WHERE LENGTH(Path) > 6
    UNION ALL 
    SELECT NodeID, Path
      FROM Routes
      WHERE LENGTH(Path) <= 6
)
, Routes3(NodeID, Path) AS (
    SELECT * FROM Routes P1
     WHERE NOT EXISTS (
           SELECT 1 from Routes2 P2
            WHERE P2.Path = P1.Path)
      ORDER BY 2
) --  SELECT * FROM Routes3;
 SELECT * FROM Routes3
  UNION ALL
 SELECT NodeID, NodeID
   FROM TreeStructure N1
  WHERE NOT EXISTS (
            SELECT 1 FROM TreeStructure N2
              WHERE N1.NodeID = N2.ParentNodeID)
    AND N1.ParentNodeID IS NULL
   ORDER BY 2;

Query result

NODEIDPATH
I011I011
I012I012
I021I013->I021
I041I013->I022->I031->I041
I061I013->I022->I031->I042->I051->I061
I062I013->I022->I031->I042->I051->I062
I063I013->I022->I031->I042->I051->I063
I043I013->I022->I031->I043
I044I013->I022->I032->I044
I045I013->I022->I032->I045
I064I013->I022->I032->I046->I052->I064
I065I013->I022->I032->I046->I052->I065
I066I013->I022->I032->I046->I052->I066
I033I013->I023->I033
I034I013->I024->I034
I035I013->I024->I035

Detailed description of the query

The SQL query described is an example of a recursive Common Table Expression (CTE) used to process a tree structure. The query is split into multiple parts, each of which builds on the previous one to arrive at the final result.

1) Definition of the Tree Structure:

  • This initial CTE, TreeStructure, defines the starting data, representing the tree of nodes. Each node is defined by a unique NodeID, a Description, and a ParentNodeID indicating the parent node (if it exists; if NULL, the node is a root).

2) Generation of Routes:

  • The Routes CTE starts with the ‘anchored’ part that selects all root nodes (nodes without a parent node).
  • Then, through the ‘recursive’ part, the query expands by progressively adding the child nodes to the paths, concatenating the current NodeID to the Path of the parent node.

3) Cleaning of Routes (Routes2)

  • The Routes2 CTE is an intermediate table that is used to identify any routes that are included in other routes (sub-routes). In practice, in order to identify the sub-routes, this intermediate table is created as a copy of the Routes table in which the Path column is modified by removing the last node (thus creating sub-routes to compare them).

4) Selection of Single Routes (Routes3)

  • Routes3 selects the final routes by excluding those that are sub-routes of other existing routes, ensuring that each route is unique and is not included in another longer route.

5) Final results

  • Finally, the query selects all unique routes from the latest CTE (Routes3).
  • It also adds ‘isolated’ nodes, which are root nodes with no children, using a UNION ALL to join these nodes to the path list. Isolated nodes are identified by comparing TreeStructure with itself to find nodes that are not parents of other nodes (i.e., have no children) and that do not have a parent node (they are root nodes).
  • The final part of the query sorts all the results based on the Path column to present the list in a readable order.

In summary, the query is a sophisticated way of transforming a tree structure represented in a database table into a list of unique paths from every root node to every leaf node, including isolated nodes.

This technique is often used in databases that support recursive CTE to manipulate hierarchical data or derive information from complex tree structures.

To run the script you need to make sure that the specific syntax (concat, SUBSTR, LENGTH) is supported by the DBMS.

IBM iSeries uses a slightly different syntax than other SQL systems for some functions. As always, keep in mind that actual implementation and functionality may vary depending on the version of the iSeries operating system and DB2 database. It is always good practice to refer to the official IBM documentation for more precise and detailed information.

Similar Posts