domenica 27 aprile 2014

Basic SQL Operation in R



I want to have in R the equivalent of most of the basic operations normally performed in SQL.
In this post it will follow a sniplet in SQL and immediately after the correspondent in R.

Topics Covered:
- Distinct
- Where
- Inner / outer joins
- Group by


Before starting with the Pure R syntax, just keep in mind that R is providing a very useful package called SQLDF. Through this package it is possible to perform a simple SQL query over tables / data frames.

 # installs everything you need to use sqldf with SQLite  
 # including SQLite itself  
 install.packages("sqldf")  
 # shows built in data frames  
 data()   
 # load sqldf into workspace  
 library(sqldf)  
 sqldf("select * from iris limit 5")  
 sqldf("select count(*) from iris")  
 sqldf("select Species, count(*) from iris group by Species")  
 # create a data frame  
 DF <- data.frame(a = 1:5, b = letters[1:5])  
 sqldf("select * from DF")  
 sqldf("select avg(a) mean, variance(a) var from DF") # see example 15  

Source: http://code.google.com/p/sqldf/



WHERE


 SELECT *   
 FROM df1   
 WHERE product = "Toaster"  


In R:
 df1 = data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3))) ;  
 df <- df1[df1$Product=="Toaster",];  




DISTINCT

the select distinct in SQL:

 select distinct x  
 from my_table;  

The equivalent in R is:

 > x <- list(a=c(1,2,3), b = c(2,3,4), c=c(4,5,6))  
 > xx <- unlist(x)  
 > xx  
 a1 a2 a3 b1 b2 b3 c1 c2 c3   
  1 2 3 2 3 4 4 5 6   
 > unique(xx)  
 [1] 1 2 3 4 5 6  




INNER / OUTER JOINS

Having in SQL the following query:

 select *   
 from product [left] [right] [outer] join countries  
     on (product.customer_id = countries.customer_id)  


In R:
 df1 = data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3)))  
 df2 = data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1)))  
 > df1  
  CustomerId Product  
       1 Toaster  
       2 Toaster  
       3 Toaster  
       4  Radio  
       5  Radio  
       6  Radio  
 > df2  
  CustomerId  State  
       2 Alabama  
       4 Alabama  
       6  Ohio  
 #Outer join:   
 merge(x = df1, y = df2, by = "CustomerId", all = TRUE)  
 #Left outer:   
 merge(x = df1, y = df2, by = "CustomerId", all.x=TRUE)  
 #Right outer:   
 merge(x = df1, y = df2, by = "CustomerId", all.y=TRUE)  
 #Cross join:   
 merge(x = df1, y = df2, by = NULL)  

Source:
http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right


GROUP BY


For the Group By function there are many options.
Let's start with the most basic one:

Having in SQL the following snipplet:
 CREATE TABLE my_table (  
  a varchar2(10 char),   
  b varchar2(10 char),   
  c number  
 );  
 SELECT a, b, mean(c)  
 FROM my_table  
 GROUP BY a, b  


In R:
 grouped_data <- aggregate(my_table, by=list(my_table$a, my_table$b, FUN=mean);  

Alternatively:
 > mydf  
  A B  
 1 1 2  
 2 1 3  
 3 2 3  
 4 3 5  
 5 3 6  
 > aggregate(B ~ A, mydf, sum)  
  A B  
 1 1 5  
 2 2 3  
 3 3 11  



If your data are large, I would also recommend looking into the "data.table" package.

  
 > library(data.table)  
 > DT <- data.table(mydf)  
 > DT[, sum(B), by = A]  
   A V1  
 1: 1 5  
 2: 2 3  
 3: 3 11  



And finally the most recommended ddply function:
 > DF <- data.frame(A = c("1", "1", "2", "3", "3"), B = c(2, 3, 3, 5, 6))  
 > library(plyr)  
 > DF.sum <- ddply(DF, c("A"), summarize, B = sum(B))  
 > DF.sum  
  A B  
 1 1 5  
 2 2 3  
 3 3 11  

Source:
http://stackoverflow.com/questions/18799901/data-frame-group-by-column

venerdì 25 aprile 2014

Boss Vs. Leader

I think it is a bit old, but I would like to have it stamped it on my blog...
I do not have so much time these days :/ this is the most I can do...



domenica 13 aprile 2014

ORACLE: Analytical Functions


The concept of analytical query is something that can highly speed up the development and the execution of your queries.
In particular because they are automatically optimized by oracle itself.

Here there are reported in a veeeeery small nutshell:


Count (member of elements in the same group)
SELECT empno, deptno, 
COUNT(*) OVER (PARTITION BY 
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);

     EMPNO     DEPTNO DEPT_COUNT
---------- ---------- ----------
      7369         20          5
      7566         20          5
      7788         20          5
      7902         20          5
      7876         20          5
      7499         30          6
      7900         30          6
      7844         30          6
      7698         30          6
      7654         30          6
      7521         30          6

11 rows selected.



Row Number (id of the entry within the group)
SELECT empno, deptno, hiredate,
ROW_NUMBER( ) OVER (PARTITION BY deptno ORDER BY hiredate NULLS LAST) SRLNO
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, SRLNO;

EMPNO  DEPTNO HIREDATE       SRLNO
------ ------- --------- ----------
  7782      10 09-JUN-81          1
  7839      10 17-NOV-81          2
  7934      10 23-JAN-82          3
  7369      20 17-DEC-80          1
  7566      20 02-APR-81          2
  7902      20 03-DEC-81          3
  7788      20 09-DEC-82          4
  7876      20 12-JAN-83          5

8 rows selected.


Rank & Dense Rank (member of elements in the same group)
SELECT empno, deptno, sal,
RANK() OVER (PARTITION BY deptno ORDER BY sal DESC NULLS LAST) RANK,
DENSE_RANK() OVER (PARTITION BY deptno ORDER BY sal DESC NULLS LAST) DENSE_RANK
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, RANK;

EMPNO  DEPTNO   SAL  RANK DENSE_RANK
------ ------- ----- ----- ----------
  7839      10  5000     1          1
  7782      10  2450     2          2
  7934      10  1300     3          3
  7788      20  3000     1          1
  7902      20  3000     1          1
  7566      20  2975     3          2
  7876      20  1100     4          3
  7369      20   800     5          4

8 rows selected.


Lead & Lag (next / previous member of the group respect the current element)
SELECT deptno, empno, sal,
LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,
LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, sal DESC;

 DEPTNO  EMPNO   SAL NEXT_LOWER_SAL PREV_HIGHER_SAL
------- ------ ----- -------------- ---------------
     10   7839  5000           2450               0
     10   7782  2450           1300            5000
     10   7934  1300              0            2450
     20   7788  3000           3000               0
     20   7902  3000           2975            3000
     20   7566  2975           1100            3000
     20   7876  1100            800            2975
     20   7369   800              0            1100

8 rows selected.


First Value & Last Value
-- How many days after the first hire of each department were the next
-- employees hired?

SELECT empno, deptno, hiredate ? FIRST_VALUE(hiredate)
OVER (PARTITION BY deptno ORDER BY hiredate) DAY_GAP
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, DAY_GAP;

     EMPNO     DEPTNO    DAY_GAP
---------- ---------- ----------
      7369         20          0
      7566         20        106
      7902         20        351
      7788         20        722
      7876         20        756
      7499         30          0
      7521         30          2
      7698         30         70
      7844         30        200
      7654         30        220
      7900         30        286

11 rows selected.



Source:
http://www.orafaq.com/node/55



mercoledì 9 aprile 2014

File system access on Oracle




It may sound easy, but accessing the file system from oracle can be painful.
I am not talking about read / write a file. I am talking about making a ls or dir command, crete folders, move files, etc.
In this post I would like to recall an easy system about making ls.

Actually the solution is already very well explained in this web page:
http://plsqlexecoscomm.sourceforge.net/


The solution is mainly based on a java package installed in the Oracle DB, which is accessing the file system and arranging the data in a proper way.

First of all it is needed to install the package (available on the link above) and then perform a simple query like the one below:

select * 
from table(
    file_pkg.get_file_list(file_pkg.get_file('/'))
)

And here you are: you get the result of a ls command executed on the root accessible as a simple select.

domenica 30 marzo 2014

SAP HANA: exception Handling




I think that the exception handling in SAP Hana is a bit "non-intuitive" (to be nice…). 


I invest a bit of time in order to understand how does it exactly works. 
I prepared a small tutorial and summerized here below. 



Theory: 

In java
 public class myClass() {  
      ...  
      public void myMethod() {  
           ...  
           try {  
                //code to be executed causing exception  
           }  
           catch(Exception e) {  
                //log the exception  
           }  
           ...  
      }  
      ...  
 }  



Equivalent in SQLScript
 CREATE PROCEDURE myproc AS  
 BEGIN  
      ...  
   DECLARE EXIT HANDLER FOR SQL_ERROR_CODE MY_SQL_ERROR_CODE  
   BEGIN  
           -- log the exception  
   END;  
      -- code to be executed causing exception  
 end;  




Practical example

copy and paste on SAP HANA Studio. First prepare something: 
 CREATE TABLE MYTAB (I INTEGER PRIMARY KEY);  
 drop PROCEDURE myproc;  


Then create this procedure: 
 CREATE PROCEDURE myproc AS  
 BEGIN  
      declare myvar int;  
   DECLARE EXIT HANDLER FOR SQL_ERROR_CODE 1299  
     BEGIN  
         write_debug_log('MARCO TEST',   
                                  'Handler of the NO_DATA_FOUND exception of the select below (after the following begin end block)',   
                                  'SQL_ERROR_CODE = ' || ::SQL_ERROR_CODE || '; SQL_ERROR_MESSAGE = ' || ::SQL_ERROR_MESSAGE);   
     END;  
           begin   
                declare my_test int;  
                --CASE GENERIC FOR ANY POSSIBLE EXCEPTION  
                  DECLARE EXIT HANDLER FOR SQLEXCEPTION  
                  begin   
                             write_debug_log('MARCO TEST',   
                                                 'separate handler for division by 0',   
                                                 'SQL_ERROR_CODE = ' || ::SQL_ERROR_CODE || '; SQL_ERROR_MESSAGE = ' || ::SQL_ERROR_MESSAGE);   
                end;  
                my_test := 1/0;  
           end;  
   SELECT I INTO myVar FROM MYTAB; --NO_DATA_FOUND exception  
   SELECT 'NeverReached_noContinueOnErrorSemantics' FROM DUMMY;  
 END;  


Please, notice that I am using my classing logging procedure.
Just substitute this procedure with any other suitable to your environment for logging...



Then let it run: 
 --execute the code and raise the exceptions  
 call myproc;  
 --check the situation in the log table  
 select * from log   
 order by id desc;  

Again, here is the code for accessing my log table...
Just use the one of your environment.



sabato 15 marzo 2014

Octave Cheat Sheet

Octave Cheat Sheet:



I found this very interesting cheat sheet about Octave here:
Source http://altons.github.io/octave/2013/05/05/octave-commands-cheat-sheet/

I think it is very great, but there are couple of commands missing (like PS1, which in my opinion makes your life better!).
So I have taken it and extendend with some commands.

Management & Help

Task
Command
exits software
quit or exit
list variables in session
who
list variables in session with info about type
whos
deallocate variable(s)
clear varname
displays search path
path
Adds path to search
addpath
clear screen
clc
list .m files in dir
what
search for keyword
lookfor
displays help for function
help funname
Parse and execute the contents of FILE
source(".octaverc")
List installed packages
pkg list
Install packages
pkg install [-auto or -noauto] pkg_name.tar.gz
Describe packages
pkg describe pkg_name
Load/Unload package
pkg load/unload pkg_name
Uninstall package
pkg uninstall pkg_name



Shell Commands

Task
Command
Change Linde starter
PS1(‘desired Starter’); %PS1(‘>> ‘);
change working directory to dir
cd dirname
print working directory
Pwd
print directory listing
ls
return value of named environment variable
getenv (string)
execute arbitrary shell command string
system (cmd)
Load the file
 
You will load a file and make it available with a
Varible having same name of the file without extension
Load FILENAME_IN_PWD
Load(‘FILENAME_IN_PWD’);
Save variable into file
 
You will load a file and make it available with a
Varible having same name of the file without extension
save  FILENAME_IN_PWD VARIABLE_NAME;



Special Operators

Definition
Operator
Example
comment
%
% Initialising
wildcard
*
clear p*
array
[ ]
[1 2; 3 4]
Cell Arrays
{ }
p = { 10 , 15 , "rectangle" }; p{1} =10
ranges
start:step:stop (default step=1)
1:10 or 1:10:100
variable assignment
=
A=magic(3);
do not display
;
a=1;



Workflow

Task
Command
Suspend the execution of the program.
pause; or pause(5)
print string to the stout
fprintf("Hello World")



Vectors & Matrices

Rules

· Square brackets delimit literal matrices.
· Commas separate elements on the same row.
· Semicolons separate rows.
· Commas may be replaced by spaces, and
· semicolons may be replaced by one or more newlines.
· Elements of a matrix may be arbitrary expressions, provided that all the dimensions agree.
· Optional: Matrices are denoted by uppercase letters A,B,X etc while vectors by lowercase letters a,b,y etc.
 

Example
Expression
enter a row vector
[ x, y, ... ]
enter a column vector
[ x; y; ... ]
enter a 2×2 matrix
[ w, x; y, z ]
Return a row vector with N linearly spaced elements between BASE and LIMIT
linspace (BASE, LIMIT, N)
Similar to linspace except that the values are logarithmically spaced
logspace(BASE, LIMIT, N)
Higher Dimensional Arrays
B(z,y,z)=1
Sorting arrays
sort(A); or sort(A, 'descend');
Return true if any element in array is non-zero
any(A)
Return true if all element are non-zero
all(A)
Return a vector of indices of a matrix
[i , j] = find(A<0.5)
 
 

 

Array Operations

Task
Function
select elements of a vector
A(i)
select elements of a matrix
A(i,j)
select rows (columns) corresponding to the elements of a range
A(1:2,1); A(1:2,1:5);
select all rows (columns)
A(:,1); A(3,:);
Delete a row/column of a matrix
A(2,:)= [] or A(:,5) = []

 



Linear Algebra

Task
Function
dimensions of the array
size
returns larger dimension
length
allocates array with 0’s
zeros
allocates array with 1’s
ones
transpose array
'
Compute the determinant of A.
det(A)
inverse of matrix
inv
pseudo inverse of matrix
pinv
Calculate Eigenvalues / Eigenvectors of matrix A
eig(A) or [V,L] = eig(A)
Identity matrix
eye
Compute the rank of a Matrix
rank(A)
returns the lower triangular part of A
tril(A)
returns the upper triangular part of A
triu(A)
Create an N-by-N magic square
magic
Compute the dot product of two vectors
dot(A,B)
Compute the vector cross product of two 3-dimensional vectors X and Y
cross(X,Y)
interval range arrays
v = 1:0.1:2
%vector = STARTING_AT:DELTA:ENDING_AT



Plotting
Task
Command
Example
2-D plot
plot
plot(x,y); plot(x, a+b+c);
Surface plot
surf
surf(theta0, theta1, J);
Contour plot
contour
contour(theta0, theta1, J, logspace(-2, 3, 20))
Specify x-, y-, or z-axis labels for the current axis
[x,y,z]label
xlabel('k lags')
Set a plot window to plot window N
figure
figure;plot(x, a);
close figure window(s)
close
close [***(N),all,all hidden***]
new graphic objects are added to the plot
hold
hold on;plot(x, b);
Clear plot and restore default graphics settings
hold
hold off;
Display a legend for the axes
legend
plot(X,Y);legend('Temperature', 'Gas Demand');
give a title
title
title('myTitle');
export the chart
print
Print –dpng ‘filename.png’



Math Functions

Type
Function
Examples
Sum of elements along dimension DIM
sum
sum([1 2 3 4])
Product of elements along dimension DIM
prod
prod([1 2 3 4)]
Trigonometric
sin, cos, tan
floor((1+tan(1.2)) / 1.2)
Inverse Trigonometric
asin, acos, atan
Natural Logarithm
log
Base 10 Logarithm
log10
log10(100)/log10(10)
Exponentiation
exp
exp(0)
Absolute value
abs
abs(-3)
Square Root
sqrt
sqrt(3^2 + 4^2)
X raised to the Y power
power(X,Y)
power(3,2)
Real part of complex number
real
real(3+5I)
Imaginary part of complex number
imag
imag(3+5I)
Evaluate polynomials
polyval
polyval([2 10.1 0 6],0)
Write formatted polynomial
polyout
polyout([2 -3 1],"x")
Return the largest integer not greater than X
floor
floor(1.9)
Return the smallest integer not less than X
ceil
ceil(3.7)
Return the integer nearest to X
round
round(1.9)
Truncate fractional portion of X and return the integer portion
fix
fix(pi)



Stats Functions

Task
Example
Function
Uniform random numbers btw 0 and 1 (both excluded)
rand
rand(3,5)
Normal(0,1) random numbers
randn
randn(1,3)
Gamma distribution
randg
randg
Exponential distribution
rande
rande(1,10)
Poisson distribution
randp
randp(0.5,3,3)
Min value by column
min
min(A)
Max value by column
max
max(A)



Constants

Name
Expression
Default Variable
ans
Pi
pi
Euler's number
e
Imaginary number
i, j and I
Infinity
inf
Not a Number
NaN
machine precision
eps
true/false
logical 1/0



Logical Operators

Expression
Operator
is greater than
> 
is less than
< 
is greater than or equal to
>=
is less than or equal to
<=
is equal to
==
is not equal to
= or !=
AND with short circuiting
&&
with short circuiting
||
AND
&
OR
|
NOT



Auxiliary Functions

Task
Function
Check a scalar
isscalar(a)
Check a vector
isvector(b)
Check a matrix
ismatrix(b)
is func available
is TAB TAB
Type info
typeinfo(b)



String Functions

Task
Function
Example
Compare 2 strings
strcmp
strcmp("hello","Hello")



Import & Export Data

Task
Command
Example
Read the matrix DATA from a text file
dlmread (FILE, SEP, R0, C0)
dmlread("virus.dat",",",1,1);
Write data to a text file
dlmwrite (FILE, M, DELIM, R, C)
dlmwrite("out.txt",yhat,";",1,1);
Read the csv files
csvread (FILENAME, DLM_OPTS)
csvread("input.csv");
Write csv files
csvwrite (FILENAME, X, DLM_OPTS)
csvwrite("output.csv", yhat);

Defining Functions

Simplest Form

  function name
      body
  end

Example:

  function wakeup
        printf ("\a");
    end

Passing Params

  function name (arg-list)
        body
    end

Example:

  function wakeup (message)
        printf ("\a%s\n", message);
    end
 
  wakeup ("Rise and shine!");

Return Single Value

  function ret-var = name (arg-list)
        body
    end

Example:

  function retval = avg (v)
        retval = sum (v) / length (v);  
    end

Return Multiple Values

  function [ret-var1,ret-var2,…,ret-varn] = name (arg-list)
        body
    end

Example:

  function [mu,sigma] = basicStat(X)
    mu = mean(X);
    sigma = std(X);
  end



Statements

IF Statement

  if (condition)
       then-body
    elseif (condition)
       elseif-body
    else
       else-body
    end

Example:

   if (rem (x, 2) == 0)
       printf ("x is even\n");
     elseif (rem (x, 3) == 0)
       printf ("x is odd and divisible by 3\n");
     else
       printf ("x is odd\n");
     end

Note that the elseif keyword must not be spelled else if, as is allowed in Fortran. If it is, the space between the else and if will tell Octave to treat this as a new if statement within another if statement's else clause

SWITCH Statement

    switch (X)
       case 1
         do_something ();
       case 2
         do_something_else ();
       otherwise
         do_something_completely_different ();
     end

Example:

     A = 7;
     switch A
       case { 6, 7 }
         printf ("variable is either 6 or 7\n");
       otherwise
         printf ("variable is neither 6 nor 7\n");
     end

One advantage of using the switch statement compared to using if statements is that the labels can be strings

  switch (X)
       case "a string"
         do_something
       ...
     endswitch

WHILE Statement

  while (condition)
       body
     end

Example:

     fib = ones (1, 10);
     i = 3;
     while (i <= 10)
       fib (i) = fib (i-1) + fib (i-2);
       i++;
     end

DO-UNTIL Statement

     do
       body
     until (condition)

Example:

     fib = ones (1, 10);
     i = 2;
     do
       i++;
       fib (i) = fib (i-1) + fib (i-2);
     until (i == 10)

FOR Statement

     for var = expression
       body
     end

Example:

  fib = ones (1, 10);
     for i = 3:10
       fib (i) = fib (i-1) + fib (i-2);
     end