Date: Tue, 8 Nov 2011 23:34:04 -0500
Reply-To: R B <ryan.andrew.black@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: R B <ryan.andrew.black@GMAIL.COM>
Subject: Re: Array Question
In-Reply-To: <CAL9q9Trw9yCsxoM=6WktRtOcc_KfJ9uZZZbiWorNrKU3DGdyJQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Works like a charm. Quite efficient. Thank you!
On Tue, Nov 8, 2011 at 10:58 PM, Fareeza Khurshed <fkhurshed@gmail.com>wrote:
> If a SQL solution will work the following does it in 'one step'.
>
>
> *data* have;
>
> input id x;
>
> cards;
>
> 1 1
>
> 1 0
>
> 1 1
>
> 2 2
>
> 2 1
>
> 2 1
>
> 3 4
>
> 3 4
>
> 3 0
>
> ;
>
> *run*;
>
>
> *proc* *sql*;
>
> create table want as
>
> select id, x, sum(x)**0.5* as z
>
> from have
>
> group by id;
>
> *quit*;
>
>
>
> Cheers,
>
> Fareeza
>
>
>
> On Tue, Nov 8, 2011 at 7:42 PM, R B <ryan.andrew.black@gmail.com> wrote:
>
>> Sorry, Joe, but I must not be explaining myself well tonight. Your code:
>>
>> data want;
>> set have;
>> z=x*0.5;
>> run;
>>
>> does not sum the x values associated with a particular id. Suppose we have
>> 3 ids which repeat 3 times as follows:
>>
>> id x
>> 1 1
>> 1 0
>> 1 1
>> 2 2
>> 2 1
>> 2 1
>> 3 4
>> 3 4
>> 3 0
>>
>> If we sum the x-values associated with each id and then multiply by 0.5,
>> we
>> would have the following:
>>
>> z_id1 = (1 + 0 + 1)*.50 = 1
>> z_id2 = (2 + 1 + 1)*.50 = 2
>> z_id3 = (4 + 4+ 0)*.50 = 4
>>
>> and the resulting dataset would be:
>>
>> id x z
>> 1 1 1
>> 1 0 1
>> 1 1 1
>> 2 2 2
>> 2 1 2
>> 2 1 2
>> 3 4 4
>> 3 4 4
>> 3 0 4
>>
>> Hope this clears it up. I have a 100 ids, so I'm looking for an efficient
>> approach.
>>
>> Does this clear up my question?
>>
>> Ryan
>>
>> On Tue, Nov 8, 2011 at 10:28 PM, Joe Matise <snoopy369@gmail.com> wrote:
>>
>> > Does that constant vary by ID, or is it the same? If it's the same,
>> > then it's just
>> >
>> > data want;
>> > set have;
>> > z=x*0.5;
>> > run;
>> >
>> > If it varies by ID, there are a number of ways to do it. Most easily
>> > I think is to have a separate dataset of factors:
>> >
>> >
>> > data factors;
>> > input id factor;
>> > datalines;
>> > 1 0.5
>> > 2 0.3
>> > 3 0.4
>> > ;;;;
>> > run;
>> >
>> > data want;
>> > merge have(in=a) factors(in=b);
>> > by id;
>> > if a and b;
>> > z = x*factor;
>> > run;
>> >
>> > That would do it (requires HAVE to be sorted by ID). Also could use a
>> > custom format, a hash table, or a number of other solutions. If you
>> > must put it in code, which is not generally recommended, you could use
>> > a SELECT statement to organize it more clearly.
>> >
>> > -Joe
>> >
>> > On Tue, Nov 8, 2011 at 9:22 PM, R B <ryan.andrew.black@gmail.com>
>> wrote:
>> > > Thank you for responding. Let me try to clarify.
>> > >
>> > > I'd like the x values associated with each id to be summed and then
>> > > multiplied by a constant (e.g., .50). In the resulting dataset, the z
>> > value
>> > > should be the same for cases associated with id=1, the z value should
>> be
>> > > the same for cases associated with id=2, and so forth.
>> > >
>> > > To be clear, I want all cases to remain in the dataset.
>> > >
>> > > Thank you.
>> > >
>> > > Ryan
>> > >
>> > >
>> > > On Tue, Nov 8, 2011 at 10:14 PM, Joe Matise <snoopy369@gmail.com>
>> wrote:
>> > >
>> > >> What do you actually want (in english, not equations)? Final
>> dataset,
>> > >> what should it actually contain? Are you trying to sum the X value
>> > >> for each id and get a dataset with 1 row per ID with the sum of the X
>> > >> values (or 1/2 of the x values or whatever)?
>> > >>
>> > >> If that's what you are trying to do, then this should work:
>> > >>
>> > >> proc sort data=have;
>> > >> by id;
>> > >> run;
>> > >>
>> > >> data want;
>> > >> set have;
>> > >> by id;
>> > >> if first.id then z=0;
>> > >> z+(0.5*x);
>> > >> if last.id then output;
>> > >> keep id z;
>> > >> run;
>> > >>
>> > >> If something else you will need to explain. You could also use a
>> proc
>> > >> (PROC TABULATE, PROC MEANS/SUMMARY, etc.) to do this, although you'd
>> > >> have to do the 1/2 bit separately.
>> > >>
>> > >> -Joe
>> > >>
>> > >> On Tue, Nov 8, 2011 at 8:48 PM, R B <ryan.andrew.black@gmail.com>
>> > wrote:
>> > >> > Dear SAS-L,
>> > >> >
>> > >> > Here's an illustration of my dataset:
>> > >> >
>> > >> > id x
>> > >> > 1 3
>> > >> > 2 4
>> > >> > 3 5
>> > >> > 1 2
>> > >> > 2 4
>> > >> > 3 9
>> > >> > 1 3
>> > >> > 2 1
>> > >> > 3 2
>> > >> >
>> > >> > I'd like to apply the following equation in a data step:
>> > >> >
>> > >> > data test;
>> > >> > set test;
>> > >> > if id = 1 then z = sum(x)*.50;
>> > >> > if id = 2 then z = sum(x)*.50;
>> > >> > if id = 3 then z = sum(x)*.50;
>> > >> > run;
>> > >> >
>> > >> > I actually have 100 ids, and I would like to create an array (or
>> > >> whatever's
>> > >> > most efficient) to apply the formula above in a data step for each
>> id.
>> > >> >
>> > >> > Any help would be appreciated.
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > Ryan
>> > >> >
>> > >>
>> > >
>> >
>>
>
>
|