Couldn't you take advantage of Proc Corr and simply identify any variables
which have a perfect correlation (i.e., =1) with variables other than
If so, than it wouldn't be very difficult to have Proc Corr create an
output file and use either sql, a macro, or arrays to exclude redundant
On Sun, 6 Nov 2005 16:03:33 -0500, Paul Walker <walker.627@OSU.EDU> wrote:
>Does anyone know a fast way to identify variable pairs which are redundant
>(i.e. the columns are identical)?
>Context: I am mining a dataset with around 3000 columns. Some of the
>columns may be redundant, and I would like a quick way to identify them.
>If 5 columns are identified as being equivalent, then I would choose 1 out
>of 5, and reduce the dimensionality of the dataset before additional
>selection procedures are considered.