LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (July 2002, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 29 Jul 2002 15:58:53 -0400
Reply-To:     Ed Heaton <EdHeaton@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ed Heaton <EdHeaton@WESTAT.COM>
Subject:      Re: Consolidating categorical data
Comments: To: paula D <sophe@USA.NET>
Content-Type: text/plain; charset="iso-8859-1"

Paula,

This uses your codeing idea, but might be simpler to implement.

Data test(drop=_:) ; Input slogan $21. ; Array _slogan[3] $7 ; key = 0 ; Do _i=1 to 3 ; _slogan[_i] = scan(slogan,_i,"-") ; Select ( subStr(_slogan[_i],1,4) ) ; When ("visa") key = key + input( substr(_slogan[_i],5) , best. ) * 100000 ; When ("mast") key = key + input( substr(_slogan[_i],5) , best. ) * 1000 ; When ("amex") key = key + input( substr(_slogan[_i],5) , best. ) ; Otherwise put "ERROR: ..." ; End ; End ; Cards4 ; visa1-mast2-amex3 visa2-mast3-amex9 mast7-visa2-amex8 visa1-mast2-amex3 visa1-amex3-mast2 mast2-visa1-amex3 mast2-amex3-visa1 amex3-visa1-mast2 amex3-mast2-visa1 ;;;; Proc sort data=test out=UnDuped(drop=key) noDupKey ; By key ; Run ;

Ed

Edward Heaton, Senior Systems Analyst, Westat (An Employee-Owned Research Corporation), 1550 Research Boulevard, Room 2018, Rockville, MD 20850-3195 Voice: (301) 610-4818 Fax: (301) 294-3992 mailto:EdHeaton@westat.com http://www.westat.com

-----Original Message----- From: paula D [mailto:sophe@USA.NET] Sent: Monday, July 29, 2002 2:46 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Consolidating categorical data

I have one character variable (slogan) with values like this:

visa1-mast2-amex3; visa2-mast3-amex9; mast7-visa2-amex8; ....

that is, every value randomly takes 3 of (visa1-visa50, mast1-mast50, amex1-amex150) and combine with 2 "-" in between. It is Ok if the value is

visa1-visa1-visa1.

My problem is if the 3 components in any 2 values are the same, such as

visa1-mast2-amex3 and mast2-visa1-amex3,

I want to treat them as the same, taking 1 of the 2 values to represent them.

So far I have tried one idea. It works, but too complicated. The idea is to assign a value to each component in such as way that

1. comp1=substr(slogan, 1, 5); comp2=substr(slogan, 7,5); comp3=substr(slogan, 13,5); 2. if comp1="visa1" then tag1=1; else if comp1="visa2" then tag1=10; else if comp1="visa3" then tag1=100......; 3. if comp2="visa1" then tag2=1; else if comp2="visa2" then tag2=10; else if comp2="visa3" then tag2=100......;

then the total of tag1-tag3 is at a unique range to correspond to each unique original value in slogan (in this sense, the magnitude does not have to be 10). Finally, re-format the totals back using original Slogan values.

I would appreciate if someone can help with any better solution. I guess I am not the first person who has ever needed to do this, but I have no idea what keywords to use to query the SAS-L to find previous listings in this regard. Thank you in advance.

Again, my old email sophe@usa.net is no longer in use. Please use sophe88@yahoo.com for emails. Thanks.

Paula D


Back to: Top of message | Previous page | Main SAS-L page