Visualizzazione post con etichetta Algorithm. Mostra tutti i post
Visualizzazione post con etichetta Algorithm. Mostra tutti i post

mercoledì 19 febbraio 2014

Connect By in SAP HANA



As you know SAP HANA does not offer the feature: CONNECT BY.
"Connect by" is a feature which can be used in order to get full path over tables. 
Ex: Suppose you have a table made of arcs, you want to have all the paths inside the table. 

In Oracle exists a feature called "connect by" which makes all the computation for you. 
Probably I will not be able to make a generic code like in Oracle (like simply call a function). 
Anyway, I have got some good result on a simple test case. 


Example and Theory documentation
 create table arc (  
    node_father int,   
    node_child int  
 );   
   
 insert into arc values(1, 2);  
 insert into arc values(1, 3);  
 insert into arc values(1, 4);  
 insert into arc values(2, 5);  
 insert into arc values(2, 6);  
 insert into arc values(6, 8);  
 insert into arc values(6, 9);  
 insert into arc values(4, 7);  


The graph looks like this: 
                
              1
             /|\
           /  |  \
         2   3    4
        /\            \
      /    \            \
     5      6          7
             /\
           /    \
         8      9     


Set of Path starting from the root: 
1-3
1-4-7
1-2-5
1-2-6-8
1-2-6-9

Set of Sub path
2-5
6-8
6-9
4-7
2-6-8
2-6-9


 create table paths(  
    node_father int,   
    node_child int,   
    length int,   
    path varchar(100),   
    root int  
 );  


Expected result: 
F    = father
C    = Child
L    = Length (of the path)
R    = Root
Path = Path


The first loop is simply moving the arc table in the path. 
All the other step join the current paths p to the ars a on the condition 
     p.child = a.father and p.length = current_iteration - 1

The values for each path are given like: 
     father   = a.father
     child    = a.child
     length  = p.length + 1 (or current_iteration number)
     root     = p.root
     path     = p.path - a.node_child



F C L R   Path
--------------------
1 2 0 1   1-2         1st Loop (length 1)
1 3 0 1   1-3
1 4 0 1   1-4
2 5 0 2   2-5
2 6 0 2   2-6
6 8 0 6   6-8
6 9 0 6   6-9
4 7 0 4   4-7
-----
2 5 1 1   1-2-5       2nd Loop (length 2)
2 6 1 1   1-2-6
4 7 1 1   1-4-7
6 8 1 1   2-6-8
6 9 1 1   2-6-9
-----
6 8 2 1   1-2-6-8     3rd Loop (length 3)
6 9 2 1   1-2-6-9



To remove the sub path, remove each entry where of the previous iteration which matched in the current iteration.

This at the end of each loop. 
Following this variation then it will look like: 

F C L R   Path
--------------------
1 2 0 1   1-2    --> deleted after iteration 2     
1 3 0 1   1-3
1 4 0 1   1-4    --> deleted after iteration 2
2 5 0 2   2-5    --> deleted after iteration 2
2 6 0 2   2-6    --> deleted after iteration 2
6 8 0 6   6-8    --> deleted after iteration 2
6 9 0 6   6-9    --> deleted after iteration 2
4 7 0 4   4-7    --> deleted after iteration 2
--------------------
2 5 1 1   1-2-5       
2 6 1 1   1-2-6  --> deleted after iteration 3
4 7 1 1   1-4-7
6 8 1 1   2-6-8  --> deleted after iteration 3
6 9 1 1   2-6-9  --> deleted after iteration 3
--------------------
6 8 2 1   1-2-6-8     
6 9 2 1   1-2-6-9



WATCH OUT!!!! Oracle decided to remove the sub patch using the concept of STARTING!
I used a different approach! The algorithm try to create the longest path and remove the path included in another which is longer.
This works only if you are using trees or generically acyclic graph (an by definition a tree is a direct acyclic graph)!!!


Code: 
 create procedure connect_by_test(in p_mode varchar(10) default 'no_sub') as  
   v_length integer;  
   v_count integer;  
   v_boolean varchar(5);  
   c_procedure varchar(30) := 'connect_by_test';  
   v_sql varchar(5000);  
   
 begin  
   
   -- put a clean table on the beginning  
   v_sql := 'delete from paths';  
   write_debug_log(c_procedure, 'truncate table paths', v_sql);  
   execute immediate v_sql;  
   
   
   -- initialization of the control variables used for the loop  
   v_boolean := 'true';  
   v_length := 1;  
   
   
   while v_boolean = 'true' do  
     -- it is needed to distinguish the first iteration with the n-th generic one  
     if (v_length = 1) then  
       -- simply insert all the arcs into the final table  
       v_sql := 'insert into paths(node_father,  node_child,       length,    root,                   path)  
        select a.node_father, a.node_child, ' || v_length || ' , a.node_father, a.node_father || '' - '' || a.node_child  
        from  arc a';  
       write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);  
       execute immediate v_sql;  
   
     else  
       -- join the arcs with the path known till now  
       v_sql := 'insert into paths(node_father,  node_child,       length,  root,               path)  
        select a.node_father, a.node_child, ' || v_length || ' , p.root, p.path || '' - '' || a.node_child  
        from  arc a join paths p on ( a.node_father = p.node_child )  
        where  p.length = ' || v_length || '-1' ;  
       write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);  
       execute immediate v_sql;  
   
       if p_mode = 'no_sub' then  
         -- delete the known path joined one more time (I want to keep only the full paths (no subpath)  
         v_sql := 'delete from paths where (node_father, node_child, length) in ( ' ||  
          -- this first query is to get the arcs joined in front of the SUB path  
           ' select a.node_father, a.node_child, ' || v_length - 1 || '  
          from  arc a join paths p on ( a.node_father = p.node_child )  
          where  p.length = ' || v_length || '-1  
          union all ' ||  
           -- this second query is to get the arcs joined after the SUB path  
         ' select p.node_father, p.node_child, ' || v_length - 1 || '  
          from  arc a join paths p on ( a.node_father = p.node_child )  
          where  p.length = ' || v_length || '-1 )' ;  
         write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);  
         execute immediate v_sql;  
         /**/  
       end if;  
     end if;  
   
     -- check how many records have I inserted in the last iteration  
     write_debug_log(c_procedure, 'perform the count for length: ' || v_length, 'select count(*) from paths where length = ' || v_length);  
     select count(*) into v_count from paths where length = v_length;  
     write_debug_log(c_procedure, 'performed count for length: ' || v_length, v_count);  
   
     -- exit condition: if in the last iteration I did not generated any entry, then I have found already every possible path.  
     if v_count = 0 then  
       -- no other loop, change the while condition  
       v_boolean := 'false';  
     else  
       --there will be another loop  
       --increment the level to be used for the next loop  
       v_length  := v_length + 1;  
     end if;  
   
   
   
     -- this is just an emergency break: after 10 iteration just stop.  
     if (v_length > 10) then  
       v_boolean := 'false';  
     end if;  
   
   
     write_debug_log(c_procedure, 'variable loop status: v_length = ' || v_length || '; v_boolean = ' || v_boolean, '');  
   end while;  
   
   end;  
   



After performing a test, here is the output example: 

(SUB mode = with sub path)
NODE_FATHER;NODE_CHILD;LENGTH;ROOT;PATH
1;2;1;1;1 - 2
1;3;1;1;1 - 3
1;4;1;1;1 - 4
2;5;1;2;2 - 5
2;6;1;2;2 - 6
4;7;1;4;4 - 7
6;8;1;6;6 - 8
6;9;1;6;6 - 9
2;5;2;1;1 - 2 - 5
2;6;2;1;1 - 2 - 6
4;7;2;1;1 - 4 - 7
6;8;2;2;2 - 6 - 8
6;9;2;2;2 - 6 - 9
6;8;3;1;1 - 2 - 6 - 8
6;9;3;1;1 - 2 - 6 - 9


(No SUB mode = without sub path)
NODE_FATHER;NODE_CHILD;LENGTH;ROOT;PATH
1;3;1;1;1 - 3
2;5;2;1;1 - 2 - 5
4;7;2;1;1 - 4 - 7
6;8;3;1;1 - 2 - 6 - 8
6;9;3;1;1 - 2 - 6 - 9



Cool! Very nice! Now I understood!
But what does it means? Of course I am working here! I am not playing with the concept of table Arc and Paths... What can I do now in order to use on my real table?
I am a lazy guy! I want to do just a copy and paste!

Since you need to have totally dynamic code, Hana is having some issue with retrieving the result from a dynamic query.

You need to create first a table:

 table.schemaName = "My Schema";  
   
 table.tableType = COLUMNSTORE;  
 table.columns = [  
   {name = "ID"; sqlType = INTEGER; nullable = false;},  
   {name = "SQL_TEXT"; sqlType = NVARCHAR; nullable = true; length = 5000;},  
   {name = "NUM_RESULT"; sqlType = INTEGER; nullable = true; },  
   {name = "CHR_RESULT"; sqlType = NVARCHAR; nullable = true; length = 5000;}  
 ];  
 table.primaryKey.pkcolumns = ["ID"];  
   

Which simply generate the following DDL:


 CREATE COLUMN TABLE TEMP_RESULTS (  
           "ID" INTEGER CS_INT NOT NULL ,   
           "SQL_TEXT" NVARCHAR(5000),   
           "NUM_RESULT" INTEGER CS_INT,   
           "CHR_RESULT" NVARCHAR(5000),   
           PRIMARY KEY ("ID")  
 ) ;  

create sequence temp_result_id starting with 1 ;


Ok, then I can create the following code:


 PROCEDURE dynamic_connect_by (   
      in p_source_schema_name varchar(100),   
      in p_source_table_name varchar(100),   
      in p_paths_schema_name varchar(100),   
      in p_paths_table_name varchar(100),   
      in p_field_father varchar(100),   
      in p_field_child varchar(100),   
      in p_list_of_fields varchar(1000),  
      in p_where varchar(1000),  
      in p_sub_path varchar(10),   
      in p_start_with varchar(100)       
 )   
 LANGUAGE SQLSCRIPT AS  
   
      v_length integer;  
      v_count integer;  
      v_boolean varchar(5);  
      c_procedure varchar(30) := 'connect_by_test_dynamic';  
      v_sql varchar(5000);  
      v_id integer;   
      v_list_of_fields varchar(1000);  
      v_alias_list_of_fields varchar(1000);  
        
 begin   
        
      call write_debug_log(c_procedure, 'start the procedure', '');  
   
      -- put a clean table on the beginning  
      v_sql := 'delete from ' || p_paths_schema_name || '.' || p_paths_table_name;  
      call write_debug_log(c_procedure, 'truncate table paths', v_sql);       
      execute immediate v_sql;  
        
      v_list_of_fields := replace(p_list_of_fields, ' ', '');
      if v_list_of_fields != '' then 
 v_list_of_fields := v_list_of_fields || ','; 
 v_alias_list_of_fields := 'a.' || replace(p_list_of_fields,   ',',   ',a.') || ',';
      else
 v_list_of_fields := '';
 v_alias_list_of_fields := ''; 
      end if;
        

      -- initialization of the control variables used for the loop  
      v_boolean := 'true';  
      v_length := 1;  
        
      while v_boolean = 'true' do  
             
           -- it is needed to distinguish the first iteration with the n-th generic one  
           if (v_length = 1) then   
                     -- simply insert all the arcs into the final table  
                     v_sql := 'insert into ' || p_paths_schema_name || '.' || p_paths_table_name || ' (' || v_list_of_fields || p_field_father || ',  ' || p_field_child || ',       length,         root,                   path)   
                                          select ' || v_alias_list_of_fields || ' a.' || p_field_father || ', a.' || p_field_child || ', ' || v_length || ' , a.' || p_field_father || ', a.' || p_field_father || ' || '' - '' || a.' || p_field_child || '  
                                          from  ' || p_source_schema_name || '.' || p_source_table_name || ' a';  
                       
                     if p_start_with != '' then   
                           v_sql := v_sql || ' where ' || p_field_father || ' = ''' || p_start_with || '''';  
                                 
                     if P_WHERE != '' then   
                      V_SQL := V_SQL || ' and ' || P_WHERE;  
                     end if;  
             
               else  
             
                     if P_WHERE != '' then   
                      V_SQL := V_SQL || ' where ' || P_WHERE;  
                     end if;  
                 
               end if;   
                 
                     call write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);       
                     execute immediate v_sql;  
   
           else   
                     -- join the arcs with the path known till now  
                     v_sql := 'insert into ' || p_paths_schema_name || '.' || p_paths_table_name || '(' || v_list_of_fields || p_field_father || ',  ' || p_field_child || ',       length,  root,               path)  
                                           select ' || v_alias_list_of_fields || 'a.' || p_field_father || ', a.' || p_field_child || ', ' || v_length || ' , p.root, p.path || '' - '' || a.' || p_field_child || '  
                                          from  ' || p_source_schema_name || '.' || p_source_table_name || ' a join ' || p_paths_schema_name || '.' || p_paths_table_name || ' p on ( a.' || p_field_father || ' = p.' || p_field_child || ' )  
                                          where  p.length = ' || v_length || '-1' ;  
                     call write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);       
                     execute immediate v_sql;  
   
                     if p_sub_path = 'false' then   
                          /**/  
                          -- delete the known path joined one more time (I want to keep only the full paths (no subpath)  
                          v_sql := 'delete from ' || p_paths_schema_name || '.' || p_paths_table_name || ' where (' || p_field_father || ', ' || p_field_child || ', length) in ( ' ||  
                                               -- this first query is to get the arcs joined in front of the SUB path  
                                           ' select a.' || p_field_father || ', a.' || p_field_child || ', ' || v_length - 1 || '  
                                               from  ' || p_source_schema_name || '.' || p_source_table_name || ' a join ' || p_paths_schema_name || '.' || p_paths_table_name || ' p on ( a.' || p_field_father || ' = p.' || p_field_child || ' )  
                                               where  p.length = ' || v_length || '-1   
                                                    union all ' ||  
                                                -- this second query is to get the arcs joined after the SUB path  
                                              ' select p.' || p_field_father || ', p.' || p_field_child || ', ' || v_length - 1 || '  
                                               from  ' || p_source_schema_name || '.' || p_source_table_name || ' a join ' || p_paths_schema_name || '.' || p_paths_table_name || ' p on ( a.' || p_field_father || ' = p.' || p_field_child || ' )  
                                               where  p.length = ' || v_length || '-1 )' ;  
                          call write_debug_log(c_procedure, 'perform insert nr: ' || v_length, v_sql);       
                          execute immediate v_sql;  
                          /**/  
                     end if;  
           end if;  
             
           -- check how many records have I inserted in the last iteration  
           select temp_result_id.nextval into v_id from dummy;   
           v_sql := 'insert into TEMP_RESULTS (id, num_result) select ' || v_id || ', count(*) from ' || p_paths_schema_name || '.' || p_paths_table_name || ' where length = ' || v_length;  
           call write_debug_log(c_procedure, 'perform the count for length: ' || v_length, v_sql);       
           execute immediate v_sql;  
             
           select num_result into v_count from TEMP_RESULTS where id = v_id;  
           call write_debug_log(c_procedure, 'performed count for length: ' || v_length, v_count);       
             
   
   
                  
           -- exit condition: if in the last iteration I did not generated any entry, then I have found already every possible path.   
           if v_count = 0 then   
                -- no other loop, change the while condition  
                v_boolean := 'false';  
           else   
                --there will be another loop  
                --increment the level to be used for the next loop  
                v_length  := v_length + 1;       
           end if;  
   
   
   
           -- this is just an emergency break: after 10 iteration just stop.   
           if (v_length > 10) then   
       v_boolean := 'false';  
           end if;  
             
           call write_debug_log(c_procedure, 'variable loop status: v_length = ' || v_length || '; v_boolean = ' || v_boolean, '');       
       end while;  
   
 end;  


It is almost impossible to read.
If you want to understand the code, just remain on the basic approach I put previously...
For your generic usage, you can just create this function and then invoke it in the following way:


 -- be sure you have a proper table:  
 create table ['Schema name where the output table is'].['Output table name'] (   
                ['Additional fields in the Select statement'],   
                ['Father Field'],   
                ['Chield Field'],   
                "ROOT_PROF" VARCHAR(36) ,   
                "HIERACHY" NUMBER,   
                "LENGTH" NUMBER,   
                "PATH" VARCHAR(3000),   
                "ROOT" VARCHAR(3000)   
           )  
   
   
      call dynamic_connect_by(  
           'Schema name where the input table is',   
           'Input table name',   
           'Schema name where the output table is',   
           'Output table name',   
           'Father Field',   
           'Child Field',   
           'Additional fields in the Select statement',   
           'condition for the where statement',   
           'TRUE' if you want also the sub-paths, 'FALSE' if you want only the complete paths   
           '' if you want to comunicate which is the starting point (make the algorithm faster)  
      );  
   



The equivalent in Oracle is:
 SELECT employee_id, last_name, manager_id  
 FROM employees  
 CONNECT BY PRIOR employee_id = manager_id;  
   
 ...  
   
 SELECT ['Additional fields in the Select statement'], ['Father Field'], ['Chield Field']  
 FROM ['Schema name where the output table is'].['Output table name']  
 where ['condition for the where statement']  
 CONNECT BY PRIOR employee_id = manager_id;