Marco Andolfi's Blog

sabato 15 marzo 2014

Start crash course on Octave

Octave Cheat Sheet:

Octave is an easy programming language used for solving machine learning problems.

I am transcripting here the lesson I am following on Standford from Andrew Ng.
I think that this script is a perfect crash course:

 GNU Octave, version 3.2.4  
 Copyright (C) 2009 John W. Eaton and others.  
 This is free software; see the source code for copying conditions.  
 There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or  
 FITNESS FOR A PARTICULAR PURPOSE. For details, type `warranty'.  
 Octave was configured for "i686-pc-mingw32".  
 Additional information about Octave is available at http://www.octave.org.  
 Please contribute if you find this software useful.  
 For more information, visit http://www.octave.org/help-wanted.html  
 Report bugs to <bug@octave.org> (but first, please read  
 http://www.octave.org/bugs.html to learn how to write a helpful report).  
 For information about changes from previous versions, type `news'.  
 octave-3.2.4.exe:1> % here I put comment;  
 octave-3.2.4.exe:1> %first remove the line starter...  
 octave-3.2.4.exe:1> PS1('>> ');  
 >> PS1(' -- ');  
  -- % I will go back to the >> line starter  
  -- PS1('>> ');  
 >>   
 >>   
 >> % basic operators  
 >> 10+3  
 ans = 13  
 >> 4-2  
 ans = 2  
 >> 5*10  
 ans = 50  
 >> 50/5  
 ans = 10  
 >> 2^3  
 ans = 8  
 >>   
 >>   
 >> % logic operators  
 >> 1==2  
 ans = 0  
 >> % 0 is false, 1 is true  
 >> 1==1  
 ans = 1  
 >> 1!=2  
 ans = 1  
 >> 1~=2  
 ans = 1  
 >> 1~=1  
 ans = 0  
 >> 1 && 1  
 ans = 1  
 >> 1 && 1 % and operator  
 ans = 1  
 >> 1 || 1 % or operator  
 ans = 1  
 >> 0 || 0  
 ans = 0  
 >> 1 || 2  
 ans = 1  
 >> 1 && 2  
 ans = 1  
 >> % better than to say:  
 >> % 0 is false; != 0 is true  
 >> 1 && -1  
 ans = 1  
 >> % 0 is false; != 0 is true, by default is represented by 1  
 >>  
 >>  
 > % variable assignment  
 >> a=3  
 a = 3  
 >> % suppress the output return line  
 >> a=3;  
 >>  
 >>  
 >> % display...  
 >> a = pi;  
 >> a  
 a = 3.1416  
 >> format long  
 >> a  
 a = 3.14159265358979  
 >> format short  
 >> a  
 a = 3.1416  
 >> disp(a);  
  3.1416  
 >> format long  
 >> disp(a);  
  3.14159265358979  
 >> disp(sprintf('only 2 decimals: %0.2f', a))  
 only 2 decimals: 3.14  
 >> disp(sprintf('only 5 decimals: %0.5f', a))  
 only 5 decimals: 3.14159  
 >> disp(sprintf('only 10 decimals: %0.10f', a))  
 only 10 decimals: 3.1415926536  
 >>  
 >>  
 >>  
 >>  
 >> % matrix  
 >>  
 >> A= [1 2; 3 4; 5 6]  
 A =  
   1  2  
   3  4  
   5  6  
 >> v = [1 2 3 ]  
 v =  
   1  2  3  
 >> % v is an array  
 >> % v has dimension 1 x 3  
 >> % let  
 >> % let's see now a vector, so a v having dimensions 3 x 1  
 >> v = [1;2;3]  
 v =  
   1  
   2  
   3  
 >> % now interval range arrays  
 >> v = 1:0.1:2  
 v =  
  Columns 1 through 3:  
   1.00000000000000  1.10000000000000  1.20000000000000  
  Columns 4 through 6:  
   1.30000000000000  1.40000000000000  1.50000000000000  
  Columns 7 through 9:  
   1.60000000000000  1.70000000000000  1.80000000000000  
  Columns 10 and 11:  
   1.90000000000000  2.00000000000000  
 >> %meta syntax is:  
 >> % vector = STARTING_AT:DELTA:ENDING_AT  
 >> v = 1:6  
 v =  
   1  2  3  4  5  6  
 >> % by default the delta is 1  
 >> %known matrixes  
 >>  
 >> w = ones(1,3)  
 w =  
   1  1  1  
 >> w = zeros(1,3)  
 w =  
   0  0  0  
 >> %random  
 >> r = rand(3,3)  
 r =  
   0.567402857515203  0.301184329285262  0.830303110034579  
   0.703828527889656  0.800885232762887  0.876086413236985  
   0.340103401896943  0.675342891105335  0.242842578966916  
 >> r = rand(3,3)  
 r =  
   0.4699204825787972  0.4124515864382686  0.9827575401589607  
   0.0118059148057477  0.4695877000100873  0.6805667208195990  
   0.9936553157929338  0.9727665315772230  0.1624513014665614  
 >> r = rand(3,3)  
 r =  
   0.17186568773859429  0.22565708518331024  0.28727929514480505  
   0.91982345102655927  0.77380397661275846  0.00322060012954977  
   0.42991791381680561  0.25561768241919586  0.62438590998652621  
 >> % some more complex stuff  
 >>  
 >>  
 >>  
 >>  
 >> g = -6 + sqrt(10)  
 g = -2.83772233983162  
 >> rand(1,1000)  
 >> g = -6 + sqrt(10)*randn(1,10000)  
 >> hist(g)  
 >> % here comes out a histogram window...  
 >> hist(g,50)  
 >> % here it comes out a histogram window with 50 columns  
 >>  
 >>  
 >>  
 >> % generate the identity matrix  
 >> I = eye(3);  
 >> I  
 I =  
 Diagonal Matrix  
   1  0  0  
   0  1  0  
   0  0  1  
 >>  
 >>  
 >> % helper  
 >> help eye  
 >> % here it comes out a helper description of the function...  
 >>  
 >>  
 >> A = [1,2;3,4;5,6]  
 A =  
   1  2  
   3  4  
   5  6  
 >> sz = size(A) %get the dimensions of the matrix  
 sz =  
   3  2  
 >> size(sz)  
 ans =  
   1  2  
 >> size(A,1) %get the nr of rows of the matrix  
 ans = 3  
 >> size(A,2) %get the nr of cols of the matrix  
 ans = 2  
 >> v = [1,2,3,4]  
 v =  
   1  2  3  4  
 >> length(v) %get the max dimension  
 ans = 4  
 >> length(A)  
 ans = 3  
 >> who %results below come from a different session.... values probably wrong... it just gives an idea  
 Variables in the current scope:  
 A  ans sz  v  
 >> whos %results below come from a different session.... values probably wrong... it just gives an idea  
 Variables in the current scope:  
  Attr Name    Size  
  ==== ====    ====  
     A      3x2  
     ans     9x1  
     sz     1x2  
     v      1x4  
 Total is 41 elements using 328 bytes  
 >>  
 >> A(3,2) % gives you the element in position: row 3, col 2  
 ans = 6  
 >> A(3,:) % gives you the elements in position: row 3, complete row  
 ans =  
   5  6  
 >> A(:,2) % gives you the elements in position: col 2, complete column  
 ans =  
   2  
   4  
   6  
 >> A([1 3], :) % gives you the elements in position: row 1 and 3, complete row  
 ans =  
   1  2  
   5  6  
 >> A(:, 2) = [10;11;12] % assign complete column  
 A =  
   1  10  
   3  11  
   5  12  
 >> A = [A, [100; 101; 102]] % extend the matrix, adding a new column  
 A =  
    1  10  100  
    3  11  101  
    5  12  102  
 >> size(A)  
 ans =  
   3  3  
 >> A(:) % transform a matrix into a vector  
 ans =  
    1  
    3  
    5  
   10  
   11  
   12  
   100  
   101  
   102  
 >>  
 >>  
 >> A = [1,2;3,4;5,6]  
 A =  
   1  2  
   3  4  
   5  6  
 >> B= [11 12; 13 14; 15 16]  
 B =  
   11  12  
   13  14  
   15  16  
 >> C = [A B] % put two matrix one beside the other  
 C =  
   1  2  11  12  
   3  4  13  14  
   5  6  15  16  
 >> C = [A; B] % put two matrix one on top of the other  
 C =  
   1  2  
   3  4  
   5  6  
   11  12  
   13  14  
   15  16  
 >>

martedì 11 marzo 2014

When you perform a massive insert or update generating errors, it is very hard to recognize which are the records which are generating the errors.

The topic in this post is about performing a DML statement and storing the wrong error in an additional table rather than crashing the statement itself.

Suppose you run:

INSERT INTO dest
SELECT *
FROM   source;

SELECT *
       *
ERROR at line 2:
ORA-01400: cannot insert NULL into ("TEST"."DEST"."CODE")


SQL>

But you would like:

INSERT INTO dest
SELECT *
FROM   source
LOG ERRORS INTO err$_dest ('INSERT') REJECT LIMIT UNLIMITED;

99998 rows created.

SQL>

How to do this?
First build the "Error table", which is a clone of your destination table, having no format for columns (every column is varchar).

LOG ERRORS [INTO [schema.]table] [('simple_expression')] [REJECT LIMIT integer|UNLIMITED]

Example:

-- Create the error logging table.
BEGIN
  DBMS_ERRLOG.create_error_log (dml_table_name => 'dest');
END;
/

PL/SQL procedure successfully completed.

SQL>

Which generate:

SQL> DESC err$_dest
 Name                              Null?    Type
 --------------------------------- -------- --------------
 ORA_ERR_NUMBER$                            NUMBER
 ORA_ERR_MESG$                              VARCHAR2(2000)
 ORA_ERR_ROWID$                             ROWID
 ORA_ERR_OPTYP$                             VARCHAR2(2)
 ORA_ERR_TAG$                               VARCHAR2(2000)
 ID                                         VARCHAR2(4000)
 CODE                                       VARCHAR2(4000)
 DESCRIPTION                                VARCHAR2(4000)

SQL>

At this point you can invoke the insert in the way above.
Update and merge and code for a complete example are in the link below (source of this post).

Source:
http://www.oracle-base.com/articles/10g/dml-error-logging-10gr2.php

sabato 8 marzo 2014

Product point of view

Ok, I could not resist...
Probably you have already seen, but I think it is really funny...

venerdì 7 marzo 2014

Kill session in Sap Hana

In Oracle an easy trick to kill a session is to run the following query:

 SELECT 'exec sys.kill_session(' || a.sid || ',' || a.serial# || ');'  
    ,osuser  
    ,terminal  
    ,machine  
    ,program  
    ,a.*  
  FROM v$session a  
 WHERE status = 'ACTIVE' AND username = USER;

Of course you have to be careful on the way you use it... You should be sure about what is running in the correspondent session.
You can use for example the tool offered by SQL Developer (you need to have the rights on the table v$session).

What about SAP HANA? The basic command you need is:

   ALTER SYSTEM CANCEL SESSION 'conn_id';

Where conn_id is in the table M_CONNECTIONS, column CONNECTION_ID.
You can retrieve the complete command with the following query:

 SELECT   
    'alter system cancel session ''' || C.CONNECTION_ID || ''';' command,   
    C.CONNECTION_ID,   
    PS.STATEMENT_STRING  
 FROM M_CONNECTIONS C JOIN M_PREPARED_STATEMENTS PS  
     ON C.CONNECTION_ID = PS.CONNECTION_ID   
     AND C.CURRENT_STATEMENT_ID = PS.STATEMENT_ID  
 WHERE C.CONNECTION_STATUS = 'RUNNING'   
  AND C.CONNECTION_TYPE = 'Remote';

giovedì 6 marzo 2014

Execution plan in SAP HANA

Do you have a slow query in Hana?
Do you want to improve the performances of a query?

First step is to retrieve the execution plan:

 DELETE FROM explain_plan_table WHERE statement_name = 'marco test nr 1';  
 DELETE FROM explain_plan_table WHERE statement_name = 'marco test nr 1';  
 EXPLAIN PLAN SET STATEMENT_NAME = 'marco test nr 1' FOR  
 select * from dummy; --put here your query!!!  
   
   
   
 SELECT 
   table_size,   
   to_number(output_size, 10, 2) as output_size,   
   to_number(subtree_cost, 10, 2) cost,   
   operator_name,   
   operator_details,   
   table_name  
 FROM explain_plan_table  
 WHERE statement_name = 'marco test nr 1'  
 order by operator_id;

The output of the last select will look like this in your SAP HANA STUDIO:

In order to read the execution plan, I believe that the most important column is the "Operator Name".
This is the column telling you how the DBMS is approaching the problem.
Here you will see if it is doing something stupid.
The question is: what is the meaning of the content of this column? Here is the set of possible values:

Operator Name	Description
ROW SEARCH	Starting position of row engine operators. OPERATOR_DETAILS lists projected columns.
LIMIT	Operator for limiting number of output rows
ORDER BY	Operator for sorting output rows
HAVING	Operator for filtering with predicates on top of grouping and aggregation
GROUP BY	Operator for grouping and aggregation
MERGE AGGREGATION	Operator for merging the results of multiple parallel grouping and aggregations
DISTINCT	Operator for duplicate elimination
FILTER	Operator for filtering with predicates
UNION ALL	Operator for producing union-all of input relations
MATERIALIZED UNION ALL	Operator for producing union-all of input relations with intermediate result materialization
BTREE INDEX JOIN	Operator for joining input relations through B-tree index searches. Join type suffix can be added. For example, B-tree index join for left outer join is shown as BTREE INDEX JOIN (LEFT OUTER). Join without join type suffix means inner join.
CPBTREE INDEX JOIN	Operator for joining input relations through CPB-tree index searches. Join type suffix can be added.
HASH JOIN	Operator for joining input relations through probing hash table built on the fly. Join type suffix can be added.
NESTED LOOP JOIN	Operator for joining input relations through nested looping. Join type suffix can be added.
MIXED INVERTED INDEX JOIN	Operator for joining an input relation of row store format with a column table without format conversion using an inverted index of the column table. Join type suffix can be added.
BTREE INDEX SEARCH	Table access through B-tree index search
CPBTREE INDEX SEARCH	Table access through CPB-tree index search
TABLE SCAN	Table access through scanning
AGGR TABLE	Operator for aggregating base table directly
MONITOR SEARCH	Monitoring view access through search
MONITOR SCAN	Monitoring view access through scanning

Source of information:
https://help.sap.com/hana/html/sql_explain_plan.html

mercoledì 5 marzo 2014

Oracle: easy access the file system from PLSQL

How to easily access the file system from PLSQL ?
The goal is to perform an simple command like:
- DOS: dir
- LINUX: ls

and get the result as select.

Very good tool is offered by:
http://plsqlexecoscomm.sourceforge.net/

You need to download the PLSQL package from the web site above, install the PLSQL script and then perform the simple select:

 select *   
 from table(file_pkg.get_file_list(file_pkg.get_file('/')))

martedì 4 marzo 2014

Oracle: Measure disk space used

You need to know how much space your schema is using:

 -- estimated bytes per table used:  
 select   
   table_name,   
   avg_row_len*num_rows/1024/1024 size_mbytes   
 from user_tables   
 order by 2 desc;

 --top table-segments for bytes used:  
 select   
      owner,   
      segment_name,   
      partition_name,   
      segment_type,   
      round(bytes/1024/1024) size_MB   
 from dba_segments   
 where owner like '%MY_FAVOURITE_NAME%'  
 order by bytes desc;

 --top table-segments for bytes used: (per Segment_name)  
 select   
      owner,   
      segment_name,   
      sum(size_MB) size_MB   
 from (  
  select   
       owner,   
       segment_name,   
       partition_name,   
       segment_type,   
       round(bytes/1024/1024) size_MB   
  from   
       dba_segments   
 )  
 where owner like '%MY_FAVOURITE_NAME%'  
 group by owner, segment_name  
 order by size_MB desc;

 -- top owner for bytes used:  
 select   
      owner,   
      round(sum(bytes/1024/1024)) size_MB   
 from dba_segments   
 where owner like '%MY_FAVOURITE_NAME%'  
 group by owner   
 order by 2 desc;

-- disk free  
 select   
      tablespace_name,   
      round(sum(bytes)/(1024*1024*1024),0) GB_free   
 from dba_free_space   
 group by tablespace_name   
 order by 2 desc;