|
Great news for those who have wanted to have the ability to
perform basic matrix operations within a SAS data step or
within SAS procedures that allow programming statements (like
the NLMIXED procedure, the experimental MCMC procedure, and
several others). With version 9.2, there is some basic
matrix operation functionality available.
Let me try to describe how to use this functionality. It is
a bit convoluted since the functionality depends on whether
you are invoking the matrix operators from within a procedure
or within a data step.
Procedures which allow programming statements allow ARRAY
statements that appear very much the same as data step
ARRAY statements. There are a few differences, though,
between procedural arrays and data step arrays. I can't say
that I fully understand the differences between procedural
arrays and data step arrays, but one of the differences would
appear to be that procedural arrays are always temporary
arrays. If you try to invoke the matrix operators in a data
step employing an array which is not a temporary array, then
the matrix operations will fail.
There are other differences between data step and procedural
arrays. I don't know whether it is due to those other
differences or not, but the matrix operators that are
directly available to a SAS procedure are not directly
available to the data step. In order to make the matrix
operators available to a data step, one has to write a
subroutine using the FCMP procedure which takes input from
a data step array and passes that input on in the form
required for the matrix functions.
Below are examples. I will demonstrate a matrix inversion
operator. Note that in order to use the matrix operators
at all, one must specify the option
options cmplib=<library>;
I have specifically invoked this as:
options cmplib=sasuser.funcs;
Only after that option has been specified are the matrix
operators available.
Now, suppose that we are writing code for a PROCedure and
that we want to invert a covariance matrix. The covariance
is a square matrix, which we place in a two-dimensional
array which has common number of elements in both dimensions.
Thus, we must have first defined an array and assigned values
to the elements of the array. For an AR(1) covariance
structure and a covariance matrix of size 4, we might have
the following statements:
array cov [4, 4];
do i=1 to 4;
do j=1 to 4;
cov[i,j] = rho**(abs(i-j))*V;
end;
end;
Again, please note that this code is intended for a
procedure, not a data step. The ARRAY statement does
not specify that we have a _TEMPORARY_ array.
Now, we can invert the covariance matrix using the INV()
subroutine. We might like to place the inverse in another
square matrix, so we will first define another array and
then invoke the INV() subroutine:
array inv_cov [4,4];
call inv(cov, inv_cov);
The INV function takes two arguments: 1) the name of the
matrix which is to be inverted, and 2) the name of the
resultant matrix.
If we try this code in a data step, it will fail with a
message that the INV subroutine is unknown. Note, too,
that there is a note in the code below of an illegal array
reference when the statement CALL INV(V,V); is encountered.
data _null_;
array v [3,3] _temporary_;
v[1,1]=1.00; v[1,2]=0.50; v[1,3]=0.25;
v[2,1]=0.50; v[2,2]=1.00; v[2,3]=0.50;
v[3,1]=0.25; v[3,2]=0.50; v[3,3]=1.00;
call inv(v,v);
put v[1,1]= v[1,2]= v[1,3]= /
v[2,1]= v[2,2]= v[2,3]= /
v[3,1]= v[3,2]= v[3,3]= /;
run;
In order to make the INV function available in a data
step, one must write a subroutine using the FCMP
procedure. The subroutine is essentially a pass-thru
routine. We construct a subroutine INVERT which accepts
a matrix as input, invokes the INV subroutine, and returns
the second argument to the INV routine as output.
proc fcmp outlib=sasuser.funcs.mat;
subroutine invert(Mat[*,*], InvMat[*,*]);
outargs InvMat;
call inv(Mat, InvMat);
endsub;
run;
quit;
Now, the subroutine INVERT is available to a data step.
data _null_;
array v [3,3] _temporary_;
v[1,1]=1.00; v[1,2]=0.50; v[1,3]=0.25;
v[2,1]=0.50; v[2,2]=1.00; v[2,3]=0.50;
v[3,1]=0.25; v[3,2]=0.50; v[3,3]=1.00;
call invert(v,v);
put v[1,1]= v[1,2]= v[1,3]= /
v[2,1]= v[2,2]= v[2,3]= /
v[3,1]= v[3,2]= v[3,3]= /;
run;
(Try the above code without the _TEMPORARY_ specification and
see what happens. In the DATA step, _TEMPORARY_ is required!)
So, now it may be of interest to see whether the matrix
inversion routine produces a generalized inverse. Suppose
that we have a covariance matrix which has a row and column
which are zero. Will the INVERT (invoking the INV() routine)
generate a generalized inverse or will the matrix inversion
fail?
data _null_;
array v [3,3] _temporary_;
v[1,1]=1.00; v[1,2]=0.50; v[1,3]=0;
v[2,1]=0.50; v[2,2]=1.00; v[2,3]=0;
v[3,1]=0; v[3,2]=0; v[3,3]=0;
call invert(v,v);
put v[1,1]= v[1,2]= v[1,3]= /
v[2,1]= v[2,2]= v[2,3]= /
v[3,1]= v[3,2]= v[3,3]= /;
run;
It is seen from above that the matrix inversion algorithm
is not a generalized inverse algorithm. If different
subjects had different sized covariance matrices that we
wanted to invert, then we would need to define arrays of
every size that one might encounter in the data, query
how many observations the subject has, write the correct
sized matrix, and invert the correct sized matrix.
The code below shows the basic idea where there are two
different sized matrices. If the data are stored first
in the largest possible matrix and if that matrix is too
large to fill completely, then we must fill a smaller matrix
that can be filled completely and invert the smaller matrix.
data _null_;
array v3_3 [3,3] _temporary_;
v3_3[1,1]=1.00; v3_3[1,2]=0.50; v3_3[1,3]=0.25;
v3_3[2,1]=0.50; v3_3[2,2]=1.00; v3_3[2,3]=0.50;
v3_3[3,1]=0.25; v3_3[3,2]=0.50; v3_3[3,3]=1.00;
call invert(v3_3,v3_3);
put v3_3[1,1]= v3_3[1,2]= v3_3[1,3]= /
v3_3[2,1]= v3_3[2,2]= v3_3[2,3]= /
v3_3[3,1]= v3_3[3,2]= v3_3[3,3]= /;
v3_3[1,1]=1.00; v3_3[1,2]=0.50; v3_3[1,3]=0;
v3_3[2,1]=0.50; v3_3[2,2]=1.00; v3_3[2,3]=0;
v3_3[3,1]=0; v3_3[3,2]=0; v3_3[3,3]=0;
array v2_2 [2,2] _temporary_;
do i=1 to 2;
do j=1 to 2;
v2_2[i,j] = v3_3[i,j];
end;
end;
call invert(v2_2,v2_2);
put v2_2[1,1]= v2_2[1,2]= /
v2_2[2,1]= v2_2[2,2]=;
run;
The matrix operators which are available in 9.2 (all invoked
employing a CALL statement) are:
ADDMATRIX(M1,M2, result) /* M1, M2 and result are m x n */
DET(M, det) /* M is m x m, det is scalar */
ELEMMULT(M1,M2, result) /* M1, M2 and result are m x n */
EXPMATRIX(M, t, result) /* M and result are m x m, t is scalar */
IDENTITY(M) /* M is m x m and becomes I(m) */
INV(M1, M2) /* M1 and M2 are m x m */
MULT(M1,M2, result) /* M1 m x n, M2 n x p, result m x p */
POWER(M, t, result) /* M and result are m x m, t is scalar */
SUBTRACTMATRIX(M1,M2, r) /* M1, M2, and r(esult) are m x n */
TRANSPOSE(M1, M2) /* M1 is m x n, M2 is n x m */
ZEROMATRIX(M1) /* M1 is m x n, becomes zero filled */
I know others have asked at times about matrix capabilities
available in a data step. I hope this is sufficient to get
people started.
Dale
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
|