adjOutlyingness          package:robustbase          R Documentation

_C_o_m_p_u_t_e _S_k_e_w_n_e_s_s-_a_d_j_u_s_t_e_d _M_u_l_t_i_v_a_r_i_a_t_e _O_u_t_l_y_i_n_g_n_e_s_s

_D_e_s_c_r_i_p_t_i_o_n:

     For an n * p data matrix (or data frame) 'x', compute the
     "_outlyingness_" of all n observations. Outlyingness here is a
     generalization of the Donoho-Stahel outlyingness measure, where
     skewness is taken into account via the medcouple, 'mc()'.

_U_s_a_g_e:

     adjOutlyingness(x, ndir = 250, clower = 3, cupper = 4,
                     alpha.cutoff = 0.75, coef = 1.5, qr.tol = 1e-12)

_A_r_g_u_m_e_n_t_s:

       x: a numeric 'matrix' or 'data.frame'.

    ndir: positive integer specifying the number of directions that
          should be searched.

clower, cupper: the constant to be used for the lower and upper tails,
          in order to transform the data towards symmetry.

alpha.cutoff: number in (0,1) specifying the quantiles (alpha, 1-alpha)
          which determine the "outlier" cutoff.

    coef: positive number specifying the factor with which the
          interquartile range ('IQR') is multiplied to determine
          'boxplot hinges'-like upper and lower bounds.

  qr.tol: positive tolerance to be used for 'qr' and 'solve.qr' for
          determining the 'ndir' directions each determined by a random
          sample of p (out of n) observations.

_D_e_t_a_i_l_s:

     *FIXME*:  Details in the comment of the Matlab code; also in the
     reference(s).

     The method as described can be useful as preprocessing in FASTICA
     (<URL: http://www.cis.hut.fi/projects/ica/fastica/>

_V_a_l_u_e:

     a list with components 

  adjout: numeric of 'length(n)' giving the adjusted outlyingness of
          each observation.

  cutoff: cutoff for "outlier" with respect to the adjusted
          outlyingnesses, and depending on 'alpha.cutoff'.

  nonOut: logical of 'length(n)', 'TRUE' when the corresponding
          observation is *non*-outlying with respect to the cutoff and
          the adjusted outlyingnesses.

_A_u_t_h_o_r(_s):

     Guy Brys; help page and improvements by Martin Maechler

_R_e_f_e_r_e_n_c_e_s:

     Brys, G., Hubert, M., and Rousseeuw, P.J. (2005) A Robustification
     of Independent Component Analysis; _Journal of Chemometrics_,
     *19*, 1-12.

     For the up-to-date reference, please consult <URL:
     http://wis.kuleuven.be/stat/robust.html>

_S_e_e _A_l_s_o:

     the adjusted boxplot, 'adjbox' and the medcouple, 'mc'.

_E_x_a_m_p_l_e_s:

     ## An Example with bad condition number and "border case" outliers

     if(FALSE) {## Not yet ok, because of bug in adjOutl
       dim(longley)
       set.seed(1) ## result is random 
       ao1 <- adjOutlyingness(longley)
       ## which are not outlying ?
       table(ao1$nonOut)  ## all of them
       stopifnot(all(ao1$nonOut))
     }

     ## An Example with outliers :

     dim(hbk)
     set.seed(1)
     ao.hbk <- adjOutlyingness(hbk)
     str(ao.hbk)
     hist(ao.hbk $adjout)## really two groups
     table(ao.hbk$nonOut)## 14 outliers, 61 non-outliers:
     ## outliers are :
     which(! ao.hbk$nonOut) # 1 .. 14   --- but not for all random seeds!

     ## here, they are the same as found by (much faster) MCD:
     cc <- covMcd(hbk)
     stopifnot(all(cc$mcd.wt == ao.hbk$nonOut))

     ## This is revealing (about 1--2 cases, where outliers are *not* == 1:14
     ##  but needs almost 1 [sec] per call:
     if(interactive()) {
       for(i in 1:30) {
         print(system.time(ao.hbk <- adjOutlyingness(hbk)))
         if(!identical(iout <- which(!ao.hbk$nonOut), 1:14)) {
              cat("Outliers:\n"); print(iout)
         }
       }
     }

