Date: Mon, 29 Jul 2002 15:58:53 -0400
Reply-To: Ed Heaton <EdHeaton@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ed Heaton <EdHeaton@WESTAT.COM>
Subject: Re: Consolidating categorical data
Content-Type: text/plain; charset="iso-8859-1"
Paula,
This uses your codeing idea, but might be simpler to implement.
Data test(drop=_:) ;
Input slogan $21. ;
Array _slogan[3] $7 ;
key = 0 ;
Do _i=1 to 3 ;
_slogan[_i] = scan(slogan,_i,"-") ;
Select ( subStr(_slogan[_i],1,4) ) ;
When ("visa") key = key + input( substr(_slogan[_i],5) , best. ) *
100000 ;
When ("mast") key = key + input( substr(_slogan[_i],5) , best. ) *
1000 ;
When ("amex") key = key + input( substr(_slogan[_i],5) , best. ) ;
Otherwise put "ERROR: ..." ;
End ;
End ;
Cards4 ;
visa1-mast2-amex3
visa2-mast3-amex9
mast7-visa2-amex8
visa1-mast2-amex3
visa1-amex3-mast2
mast2-visa1-amex3
mast2-amex3-visa1
amex3-visa1-mast2
amex3-mast2-visa1
;;;;
Proc sort data=test out=UnDuped(drop=key) noDupKey ;
By key ;
Run ;
Ed
Edward Heaton, Senior Systems Analyst,
Westat (An Employee-Owned Research Corporation),
1550 Research Boulevard, Room 2018, Rockville, MD 20850-3195
Voice: (301) 610-4818 Fax: (301) 294-3992
mailto:EdHeaton@westat.com http://www.westat.com
-----Original Message-----
From: paula D [mailto:sophe@USA.NET]
Sent: Monday, July 29, 2002 2:46 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Consolidating categorical data
I have one character variable (slogan) with values like this:
visa1-mast2-amex3;
visa2-mast3-amex9;
mast7-visa2-amex8;
....
that is, every value randomly takes 3 of (visa1-visa50, mast1-mast50,
amex1-amex150) and combine with 2 "-" in between. It is Ok if the
value is
visa1-visa1-visa1.
My problem is if the 3 components in any 2 values are the same, such
as
visa1-mast2-amex3 and mast2-visa1-amex3,
I want to treat them as the same, taking 1 of the 2 values to
represent them.
So far I have tried one idea. It works, but too complicated. The idea
is to assign a value to each component in such as way that
1. comp1=substr(slogan, 1, 5); comp2=substr(slogan, 7,5);
comp3=substr(slogan, 13,5);
2. if comp1="visa1" then tag1=1; else if comp1="visa2" then tag1=10;
else if comp1="visa3" then tag1=100......;
3. if comp2="visa1" then tag2=1; else if comp2="visa2" then tag2=10;
else if comp2="visa3" then tag2=100......;
then the total of tag1-tag3 is at a unique range to correspond to each
unique original value in slogan (in this sense, the magnitude does not
have to be 10). Finally, re-format the totals back using original
Slogan values.
I would appreciate if someone can help with any better solution. I
guess I am not the first person who has ever needed to do this, but I
have no idea what keywords to use to query the SAS-L to find previous
listings in this regard. Thank you in advance.
Again, my old email sophe@usa.net is no longer in use. Please use
sophe88@yahoo.com for emails. Thanks.
Paula D