Date: Fri, 1 Aug 2003 15:18:46 -0400
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: group data into different categories
Content-Type: text/plain
Ougya,
In your example, you might use an arithmetic combination that gives a unique
correspondence between the pairs and numbers from 0 to n.
proc format;
value var1f
0<-2='0'
2<-5='1'
5<-high='2'
other = "error"
; *3 categories for var1;
value var2f
0<-1='0'
1<-high='1'
other = "error"
; *2 categories for var2;
cat = 2 * input(put(var1,3.),3.) + input(put(var2,3.),3.) ;
It is pretty clear that the combination is a one-to-one mapping on the
domain of possible values and will produce an error message for unexpected
values.
If there were three variables with a , b, and c categories ( a >= b >= c )
then the formula would be
b*c*x + b*y + z
where x, y, and z are the respective input/puts. For equality the formula
simply reduces to numbers base b, where b is the common number of
categories. (Just in the last few weeks somebody used this idea on another
problem, but I have forgotten the problem and who presented the solution.
Thanks.)
IanWhitlock@westat.com
-----Original Message-----
From: ougya [mailto:jieguo01@YAHOO.COM]
Sent: Friday, August 01, 2003 2:12 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: group data into different categories
Hi,
Suppose a dataset has the following information
var1 var2 var3
1 3 3
4 5 9
5 3 8
33 3 2
one can format var1-var2 based on the following code
proc format;
value var1f
0<-2='0-2'
2<-5='2-5'
5<-high='5-high'
; *3 categories for var1;
value var2f
0<-1='0-1'
1<-high='1-high'
; *2 categories for var2;
then in the data part, using var1cat=put(var1,var1f.) to get
category.
My difficult is this:
I want to create a category based on both var1 and var2 so that each
observation will be assinged to one of 6 categories (combination of 3 var1
categories and 2 var2 categories), like var1 in (0,2] and var2 in (0,1].
Certainly, this can be done mannually by explicitly writing
if 0<var1<=2 and 0<var2<=1 then cate=1;
if ....
In reality, I have 200 combinations in my dataset. The above way is not
efficient. Anyone has a suggestion?
Thanks very much!
Jay