"Connect by" is a feature which can be used in order to get full path over tables.
Ex: Suppose you have a table made of arcs, you want to have all the paths inside the table.
In Oracle exists a feature called "connect by" which makes all the computation for you.
Probably I will not be able to make a generic code like in Oracle (like simply call a function).
Anyway, I have got some good result on a simple test case.
Example and Theory documentation
create table arc (
node_father int,
node_child int
);
insert into arc values(1, 2);
insert into arc values(1, 3);
insert into arc values(1, 4);
insert into arc values(2, 5);
insert into arc values(2, 6);
insert into arc values(6, 8);
insert into arc values(6, 9);
insert into arc values(4, 7);
The graph looks like this:
1
/|\
/ | \
2 3 4
/\ \
/ \ \
5 6 7
/\
/ \
8 9
Set of Path starting from the root:
1-3
1-4-7
1-2-5
1-2-6-8
1-2-6-9
Set of Sub path
2-5
6-8
6-9
4-7
2-6-8
2-6-9
create table paths(
node_father int,
node_child int,
length int,
path varchar(100),
root int
);
Expected result:
F = father
C = Child
L = Length (of the path)
R = Root
Path = Path
The first loop is simply moving the arc table in the path.
All the other step join the current paths p to the ars a on the condition
p.child = a.father and p.length = current_iteration - 1
The values for each path are given like:
father = a.father
child = a.child
length = p.length + 1 (or current_iteration number)
root = p.root
path = p.path - a.node_child
F C L R Path
--------------------
1 2 0 1 1-2 1st Loop (length 1)
1 3 0 1 1-3
1 4 0 1 1-4
2 5 0 2 2-5
2 6 0 2 2-6
6 8 0 6 6-8
6 9 0 6 6-9
4 7 0 4 4-7
-----
2 5 1 1 1-2-5 2nd Loop (length 2)
2 6 1 1 1-2-6
4 7 1 1 1-4-7
6 8 1 1 2-6-8
6 9 1 1 2-6-9
-----
6 8 2 1 1-2-6-8 3rd Loop (length 3)
6 9 2 1 1-2-6-9
To remove the sub path, remove each entry where of the previous iteration which matched in the current iteration.
This at the end of each loop.
This at the end of each loop.
Following this variation then it will look like:
F C L R Path
--------------------
1 2 0 1 1-2 --> deleted after iteration 2
1 3 0 1 1-3
1 4 0 1 1-4 --> deleted after iteration 2
2 5 0 2 2-5 --> deleted after iteration 2
2 6 0 2 2-6 --> deleted after iteration 2
6 8 0 6 6-8 --> deleted after iteration 2
6 9 0 6 6-9 --> deleted after iteration 2
4 7 0 4 4-7 --> deleted after iteration 2
--------------------
2 5 1 1 1-2-5
2 6 1 1 1-2-6 --> deleted after iteration 3
4 7 1 1 1-4-7
6 8 1 1 2-6-8 --> deleted after iteration 3
6 9 1 1 2-6-9 --> deleted after iteration 3
--------------------
6 8 2 1 1-2-6-8
6 9 2 1 1-2-6-9
WATCH OUT!!!! Oracle decided to remove the sub patch using the concept of STARTING!
I used a different approach! The algorithm try to create the longest path and remove the path included in another which is longer.
This works only if you are using trees or generically acyclic graph (an by definition a tree is a direct acyclic graph)!!!
Code:
create procedure connect_by_test(in p_mode varchar(10) default 'no_sub') as
v_length integer;
v_count integer;
v_boolean varchar(5);
c_procedure varchar(30) := 'connect_by_test';
v_sql varchar(5000);
begin
-- put a clean table on the beginning
v_sql := 'delete from paths';
write_debug_log(c_procedure, 'truncate table paths', v_sql);
execute immediate v_sql;
-- initialization of the control variables used for the loop
v_boolean := 'true';
v_length := 1;
while v_boolean = 'true' do
-- it is needed to distinguish the first iteration with the n-th generic one
if (v_length = 1) then
-- simply insert all the arcs into the final table
v_sql := 'insert into paths(node_father, node_child, length, root, path)
select a.node_father, a.node_child, ' || v_length || ' , a.node_father, a.node_father || '' - '' || a.node_child
from arc a';
write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);
execute immediate v_sql;
else
-- join the arcs with the path known till now
v_sql := 'insert into paths(node_father, node_child, length, root, path)
select a.node_father, a.node_child, ' || v_length || ' , p.root, p.path || '' - '' || a.node_child
from arc a join paths p on ( a.node_father = p.node_child )
where p.length = ' || v_length || '-1' ;
write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);
execute immediate v_sql;
if p_mode = 'no_sub' then
-- delete the known path joined one more time (I want to keep only the full paths (no subpath)
v_sql := 'delete from paths where (node_father, node_child, length) in ( ' ||
-- this first query is to get the arcs joined in front of the SUB path
' select a.node_father, a.node_child, ' || v_length - 1 || '
from arc a join paths p on ( a.node_father = p.node_child )
where p.length = ' || v_length || '-1
union all ' ||
-- this second query is to get the arcs joined after the SUB path
' select p.node_father, p.node_child, ' || v_length - 1 || '
from arc a join paths p on ( a.node_father = p.node_child )
where p.length = ' || v_length || '-1 )' ;
write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);
execute immediate v_sql;
/**/
end if;
end if;
-- check how many records have I inserted in the last iteration
write_debug_log(c_procedure, 'perform the count for length: ' || v_length, 'select count(*) from paths where length = ' || v_length);
select count(*) into v_count from paths where length = v_length;
write_debug_log(c_procedure, 'performed count for length: ' || v_length, v_count);
-- exit condition: if in the last iteration I did not generated any entry, then I have found already every possible path.
if v_count = 0 then
-- no other loop, change the while condition
v_boolean := 'false';
else
--there will be another loop
--increment the level to be used for the next loop
v_length := v_length + 1;
end if;
-- this is just an emergency break: after 10 iteration just stop.
if (v_length > 10) then
v_boolean := 'false';
end if;
write_debug_log(c_procedure, 'variable loop status: v_length = ' || v_length || '; v_boolean = ' || v_boolean, '');
end while;
end;
After performing a test, here is the output example:
(SUB mode = with sub path)
NODE_FATHER;NODE_CHILD;LENGTH;ROOT;PATH
1;2;1;1;1 - 2
1;3;1;1;1 - 3
1;4;1;1;1 - 4
2;5;1;2;2 - 5
2;6;1;2;2 - 6
4;7;1;4;4 - 7
6;8;1;6;6 - 8
6;9;1;6;6 - 9
2;5;2;1;1 - 2 - 5
2;6;2;1;1 - 2 - 6
4;7;2;1;1 - 4 - 7
6;8;2;2;2 - 6 - 8
6;9;2;2;2 - 6 - 9
6;8;3;1;1 - 2 - 6 - 8
6;9;3;1;1 - 2 - 6 - 9
(No SUB mode = without sub path)
NODE_FATHER;NODE_CHILD;LENGTH;ROOT;PATH
1;3;1;1;1 - 3
2;5;2;1;1 - 2 - 5
4;7;2;1;1 - 4 - 7
6;8;3;1;1 - 2 - 6 - 8
6;9;3;1;1 - 2 - 6 - 9
Cool! Very nice! Now I understood!
But what does it means? Of course I am working here! I am not playing with the concept of table Arc and Paths... What can I do now in order to use on my real table?
I am a lazy guy! I want to do just a copy and paste!
Since you need to have totally dynamic code, Hana is having some issue with retrieving the result from a dynamic query.
You need to create first a table:
table.schemaName = "My Schema";
table.tableType = COLUMNSTORE;
table.columns = [
{name = "ID"; sqlType = INTEGER; nullable = false;},
{name = "SQL_TEXT"; sqlType = NVARCHAR; nullable = true; length = 5000;},
{name = "NUM_RESULT"; sqlType = INTEGER; nullable = true; },
{name = "CHR_RESULT"; sqlType = NVARCHAR; nullable = true; length = 5000;}
];
table.primaryKey.pkcolumns = ["ID"];
Which simply generate the following DDL:
CREATE COLUMN TABLE TEMP_RESULTS (
"ID" INTEGER CS_INT NOT NULL ,
"SQL_TEXT" NVARCHAR(5000),
"NUM_RESULT" INTEGER CS_INT,
"CHR_RESULT" NVARCHAR(5000),
PRIMARY KEY ("ID")
) ;
create sequence temp_result_id starting with 1 ;
Ok, then I can create the following code:
PROCEDURE dynamic_connect_by (
in p_source_schema_name varchar(100),
in p_source_table_name varchar(100),
in p_paths_schema_name varchar(100),
in p_paths_table_name varchar(100),
in p_field_father varchar(100),
in p_field_child varchar(100),
in p_list_of_fields varchar(1000),
in p_where varchar(1000),
in p_sub_path varchar(10),
in p_start_with varchar(100)
)
LANGUAGE SQLSCRIPT AS
v_length integer;
v_count integer;
v_boolean varchar(5);
c_procedure varchar(30) := 'connect_by_test_dynamic';
v_sql varchar(5000);
v_id integer;
v_list_of_fields varchar(1000);
v_alias_list_of_fields varchar(1000);
begin
call write_debug_log(c_procedure, 'start the procedure', '');
-- put a clean table on the beginning
v_sql := 'delete from ' || p_paths_schema_name || '.' || p_paths_table_name;
call write_debug_log(c_procedure, 'truncate table paths', v_sql);
execute immediate v_sql;
v_list_of_fields := replace(p_list_of_fields, ' ', '');
if v_list_of_fields != '' then
v_list_of_fields := v_list_of_fields || ',';
v_alias_list_of_fields := 'a.' || replace(p_list_of_fields, ',', ',a.') || ',';
else
v_list_of_fields := '';
v_alias_list_of_fields := '';
end if;
-- initialization of the control variables used for the loop
v_boolean := 'true';
v_length := 1;
while v_boolean = 'true' do
-- it is needed to distinguish the first iteration with the n-th generic one
if (v_length = 1) then
-- simply insert all the arcs into the final table
v_sql := 'insert into ' || p_paths_schema_name || '.' || p_paths_table_name || ' (' || v_list_of_fields || p_field_father || ', ' || p_field_child || ', length, root, path)
select ' || v_alias_list_of_fields || ' a.' || p_field_father || ', a.' || p_field_child || ', ' || v_length || ' , a.' || p_field_father || ', a.' || p_field_father || ' || '' - '' || a.' || p_field_child || '
from ' || p_source_schema_name || '.' || p_source_table_name || ' a';
if p_start_with != '' then
v_sql := v_sql || ' where ' || p_field_father || ' = ''' || p_start_with || '''';
if P_WHERE != '' then
V_SQL := V_SQL || ' and ' || P_WHERE;
end if;
else
if P_WHERE != '' then
V_SQL := V_SQL || ' where ' || P_WHERE;
end if;
end if;
call write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);
execute immediate v_sql;
else
-- join the arcs with the path known till now
v_sql := 'insert into ' || p_paths_schema_name || '.' || p_paths_table_name || '(' || v_list_of_fields || p_field_father || ', ' || p_field_child || ', length, root, path)
select ' || v_alias_list_of_fields || 'a.' || p_field_father || ', a.' || p_field_child || ', ' || v_length || ' , p.root, p.path || '' - '' || a.' || p_field_child || '
from ' || p_source_schema_name || '.' || p_source_table_name || ' a join ' || p_paths_schema_name || '.' || p_paths_table_name || ' p on ( a.' || p_field_father || ' = p.' || p_field_child || ' )
where p.length = ' || v_length || '-1' ;
call write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);
execute immediate v_sql;
if p_sub_path = 'false' then
/**/
-- delete the known path joined one more time (I want to keep only the full paths (no subpath)
v_sql := 'delete from ' || p_paths_schema_name || '.' || p_paths_table_name || ' where (' || p_field_father || ', ' || p_field_child || ', length) in ( ' ||
-- this first query is to get the arcs joined in front of the SUB path
' select a.' || p_field_father || ', a.' || p_field_child || ', ' || v_length - 1 || '
from ' || p_source_schema_name || '.' || p_source_table_name || ' a join ' || p_paths_schema_name || '.' || p_paths_table_name || ' p on ( a.' || p_field_father || ' = p.' || p_field_child || ' )
where p.length = ' || v_length || '-1
union all ' ||
-- this second query is to get the arcs joined after the SUB path
' select p.' || p_field_father || ', p.' || p_field_child || ', ' || v_length - 1 || '
from ' || p_source_schema_name || '.' || p_source_table_name || ' a join ' || p_paths_schema_name || '.' || p_paths_table_name || ' p on ( a.' || p_field_father || ' = p.' || p_field_child || ' )
where p.length = ' || v_length || '-1 )' ;
call write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);
execute immediate v_sql;
/**/
end if;
end if;
-- check how many records have I inserted in the last iteration
select temp_result_id.nextval into v_id from dummy;
v_sql := 'insert into TEMP_RESULTS (id, num_result) select ' || v_id || ', count(*) from ' || p_paths_schema_name || '.' || p_paths_table_name || ' where length = ' || v_length;
call write_debug_log(c_procedure, 'perform the count for length: ' || v_length, v_sql);
execute immediate v_sql;
select num_result into v_count from TEMP_RESULTS where id = v_id;
call write_debug_log(c_procedure, 'performed count for length: ' || v_length, v_count);
-- exit condition: if in the last iteration I did not generated any entry, then I have found already every possible path.
if v_count = 0 then
-- no other loop, change the while condition
v_boolean := 'false';
else
--there will be another loop
--increment the level to be used for the next loop
v_length := v_length + 1;
end if;
-- this is just an emergency break: after 10 iteration just stop.
if (v_length > 10) then
v_boolean := 'false';
end if;
call write_debug_log(c_procedure, 'variable loop status: v_length = ' || v_length || '; v_boolean = ' || v_boolean, '');
end while;
end;
It is almost impossible to read.
If you want to understand the code, just remain on the basic approach I put previously...
For your generic usage, you can just create this function and then invoke it in the following way:
-- be sure you have a proper table:
create table ['Schema name where the output table is'].['Output table name'] (
['Additional fields in the Select statement'],
['Father Field'],
['Chield Field'],
"ROOT_PROF" VARCHAR(36) ,
"HIERACHY" NUMBER,
"LENGTH" NUMBER,
"PATH" VARCHAR(3000),
"ROOT" VARCHAR(3000)
)
call dynamic_connect_by(
'Schema name where the input table is',
'Input table name',
'Schema name where the output table is',
'Output table name',
'Father Field',
'Child Field',
'Additional fields in the Select statement',
'condition for the where statement',
'TRUE' if you want also the sub-paths, 'FALSE' if you want only the complete paths
'' if you want to comunicate which is the starting point (make the algorithm faster)
);
The equivalent in Oracle is:
SELECT employee_id, last_name, manager_id
FROM employees
CONNECT BY PRIOR employee_id = manager_id;
...
SELECT ['Additional fields in the Select statement'], ['Father Field'], ['Chield Field']
FROM ['Schema name where the output table is'].['Output table name']
where ['condition for the where statement']
CONNECT BY PRIOR employee_id = manager_id;