Date: Tue, 5 Dec 2006 21:40:55 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: nominal to dummy
In-Reply-To: <ed52c9100612051302p2499aea3x1a5b9af1c5c09d07@mail.gmail.com>
Content-Type: text/plain; format=flowed
johbee@GMAIL.COM replied:
>On 12/5/06, David L Cassell <davidlcassell@msn.com> wrote:
>>
>>n.mitsakakis@UTORONTO.CA wrote:
>> >
>> >Hello,
>> >
>> >I have a SAS data set with many nominal variables, with integer values.
>>For
>> >each one of these variables there is a corresponding format, which
>>assigns
>> >some text to each of the variable values, For example, the variable
>> >"weather" takes values 1-5 and there is a format wthfmt where 1
>>corresponds
>> >to "very good", 2 to "good", 3 to "medium", 4 to "bad" and 5 to "very
>>bad".
>> >
>> >Now I want to generate 4 dummy variables for this data set, and for our
>> >example, their names or labels and values should be as follows:
>> >- "very bad vs very good" should take the value 1 if weather=5,
>>otherwise
>>0
>> >- "bad vs very good" should take the value 1 if the weather=4, otherwise
>>0
>> >- "medium vs very good" should take the value 1 if the weather=3,
>>otherwise
>> >0
>> >- "good vs very good" should take the value 1 if the weather=2,
>>otherwise
>>0
>> >
>> >Now, since I have many nominal variables of this type (not just
>>weather),
>> >and each one of those with different number of levels, I want to find
>>some
>> >way to do this automatically.
>> >
>> >If anyone has any ideas of how this can be done, please let me know, I
>> >would
>> >appreciate it.
>> >
>> >Thanks,
>> >
>> >Nicholas
>>
>>I'm going to agree with my learned colleagues.
>>
>>You probably do not want to do this in exactly the way you
>>have envisioned. SAS has a lot of functionality for problems like
>>these, and you may only need to invoke already-available tools.
>>
>>So, precisely why are you focusing on these contrasts,
>>and in what procedure,
>>and for what kind of data,
>>and from what data sources,
>>and with what meta-data,
>>and from what kind of design,
>>and with what statistical approach in mind,
>>and with what hypotheses,
>>and ...
>>
>>
>>David
>>--
>>David L. Cassell
>>mathematical statistician
>>Design Pathways
>>3115 NW Norwood Pl.
>>Corvallis OR 97330
>
>*--- Perhaps an idea of what you want to do dynamically (see example).
>However, it's a bit much for only 4 required combinations which would be
>just a simple format. Yet, if there are a great number of discrete
>combinations, then you may want this, or an expanded version (changing the
>sql around slightly). But given the few combinations of weather
>conditions,
>and the request, at this point it may be a bit over kill compared to a few
>lines in value $wthcfmt. Yet somehow, I suspect this is what you asking
>for, and under data conditions with many unknow combinations, it would
>useful.
>
>Hope this helps or helps to clarify your question.
>
>Jake
>
>*------ example --------------------------------*;
>options fmtsearch=(work.formats);
>
>
>
>proc
>format;
>
>value
>wthfmt
>
> 1="very
>good"
>
>
>2="good"
>
>
>3="medium"
>
>
>4="bad"
>
> 5="very
>bad";
>
>run;
>
>quit;
>
>
>
>
>
>proc format library=work.formats cntlout=weather_fmts(keep=fmtname label
>start end);
>select
>wth:;
>
>quit;
>
>
>
>proc sql feedback
>undo_polcy=none;
> create table
>a
>
> as select distinct monotonic() as id, label,
>start
> from
>weather_fmts
>
> order by
>start;
>
>
> create table
>b
>
> as select label,
>start
> from
>weather_fmts
>
> order by start
>desc;
>
>
> create table
>b
>
> as select monotonic() as id,
>*
> from
>b
>
> where
>compress(start)='1';
>
>
>
> create table
>tempfmt
>
> as select a.label, b.label as b_label, a.start, b.start as
>b_start
> from
>a,
>
>
>b;
>
>
>
> create table
>wthcfmt
> as select distinct 'WTHCFMT' as FMTNAME length
>32,
> 'C' as type length
>1,
> '' as hlo length
>11,
> compbl(label||' vs '||b_label) as label length
>40,
> compress(start||'_'||b_start) as start length
>16,
> compress(start||'_'||b_start) as end length
>16
> from
>tempfmt;
>
>quit;
>
>
>data
>wthcfmt;
>
> set wthcfmt
>end=eof;
>
>
>fmtname='WTHCFMT';
>
>
>output;
>
> if eof
>then
>
>
>do;
>
>
>fmtname='WTHCFMT';
>
>
>type='C';
>
> default=22;
>length=22;
> start
>='**OTHER**';
>
> end
>='**OTHER**';
>
> label ='No
>Decode';
>
>hlo='O';
>
>
>output;
>
>
>end;
>
>run;
>
>
>
>proc format library=work
>cntlin=wthcfmt;
>quit;
>
>
>
>data
>weather;
>
>format weather
>3.;
>
>input
>weather;
>
>datalines;
>
>1
>
>2
>
>3
>
>4
>
>5
>
>;
>
>run;
>
>
>
>data
>wea2;
>
>set
>weather;
>
> format value 3. wea_cond
>$wthcfmt.;
> length condition
>$3;
>
>value=0;
>
> if weather in (5 4 3 2) then
>value=1;
>
>condition=compress(put(weather,3.-r)||'_'||put(value,3.-r));
>
>
>wea_cond=condition;
>
>run;
>
>
>
>proc print
>data=wea2;
>
>run;
Umm, actually, no. This doesn't help answer any of my concerns.
Let me clarify some of my ramblings. I wrote:
>>So, precisely why are you focusing on these contrasts,
These might not be the correct contrasts. Or these might be
immediately achievable by a proc without any extra work. Or
they might lead to different analyses. So I asked.
>>and in what procedure,
Again, without some background, it is impossible to tell if
the desired proc will automagically do this, or if this will take a
few extra steps, or even this might be the wrong thing to be
analyzing.
>>and for what kind of data,
Do we even have linear models theory to fall back on? It
depends on the data.
>>and from what data sources,
If the data are from a survey sample, then our poster might well
be in the wrong proc. If the variables of interest turn out to
be random effects, or time series data, or repeated measures,
or . . . then the poster might be in the wrong proc. I'm just
trying to get enough information to decide.
>>and with what meta-data,
>>and from what kind of design,
>>and with what statistical approach in mind,
>>and with what hypotheses,
>>and ...
yada yada yada, as I'm sure you're sick of hearing by now.
Bottom line: we need enough information to help without
accidentally pointing our poster down the wrong path.
Hence #1 in my list of "The 7 Habits Of Highly Effective SAS-ers":
"Know Your Problem"
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Talk now to your Hotmail contacts with Windows Live Messenger.
http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get.live.com/messenger/overview
|