Date: Tue, 29 Jun 2010 13:19:22 -0400
Reply-To: "Lamias, Mark (CDC/OID/OD) (CTR)" <bnz6@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Lamias, Mark (CDC/OID/OD) (CTR)" <bnz6@CDC.GOV>
Subject: Re: how to output partial data with bigger "n" freqs
In-Reply-To: <004001cb177c$72974190$57c5c4b0$@com.br>
Content-Type: text/plain; charset=iso-8859-1
Maybe I am misunderstanding your example, but according to what you are requesting, Joe, Richard, and Peter should all have values of "Yes" for the top20inregion variable, since there are only three vendors in this region (any region where the number of vendors less than or equal to 20 would have a value of "Yes" as they would be in the top 20 vendors as well as the bottom 20 vendors). This contradicts your desired output below. Are you just looking for the top vendor in each of perhaps 20 (or more) regions? If so this could be handled by a first.vendorname and first.regionn after obtaining counts for each vendor in region using proc sql. For example:
data work.temp;
input Region $ Vendorname $ amount carname $;
datalines;
North Richard 20000 Honda
North Joe 30000 Ford
North Peter 35000 Ford
North Joe 30000 Ford
North Richard 20000 Honda
North Peter 31000 Ford
North Joe 30000 Ford
North Richard 20000 Honda
North Peter 35000 Ford
North Joe 25000 Ford
North Richard 20000 Honda
North Peter 23000 Ford
North Joe 30000 Ford
North Richard 20000 Honda
North Zico 35000 Ford
North Joe 30000 Ford
North Peter 35000 Ford
North Richard 20000 Honda
North Peter 31000 Ford
North Joe 30000 Ford
North Joe 25000 Ford
North Zico 20000 Honda
North Richard 20000 Honda
North Peter 23000 Ford
South Joe 30000 Ford
South Peter 35000 Ford
South Richard 20000 Honda
South Peter 31000 Ford
South Joe 30000 Ford
South Joe 25000 Ford
South Zico 20000 Honda
South Richard 20000 Honda
South Peter 23000 Ford
;
run;
proc sql;
create table work.temp2(DROP=n) as
select count(*) as n, * from work.temp
group by region, vendorname;
quit;
data work.temp2;
set work.temp2;
by region vendorname;
if first.vendorname and first.region then top20inregion="Yes";
else top20inregion="No";
run;
Mark J. Lamias
SAIC Statistical Consultant & SAIC Team Lead
Office of the Director -- Office of Informatics
National Center for Emerging and Zoonotic Infectious Diseases (NCEZID)
US Centers for Disease Control and Prevention
w: (404) 498-2646
m: (404) 543-1394
f: (404) 639-1391
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Adriano Rodrigues
Sent: Tuesday, June 29, 2010 7:16 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: RES: how to output partial data with bigger "n" freqs
Sorry,
Maybe i not explain very well. Each vendor appear many times on my list in
each region. I want export dataset with all observations, for this top 20
vendors, my data will not have 20 lines, but 500, because each one sell a
lot.
Previous:
The dataset has 3 regions and i want output the top 20 vendors (number of
occurences, not amount total) by region, to work only with them. How to do
it?
Region vendorname amount carname
North Joe 30000 Ford
North Richard 20000 Honda
North Peter 35000 Ford
North Joe 30000 Ford
North Richard 20000 Honda
North Peter 31000 Ford
North Joe 30000 Ford
North Richard 20000 Honda
North Peter 35000 Ford
North Joe 25000 Ford
North Richard 20000 Honda
North Peter 23000 Ford
North Joe 30000 Ford
North Richard 20000 Honda
North Zico 35000 Ford
North Joe 30000 Ford
North Richard 20000 Honda
North Peter 31000 Ford
North Joe 30000 Ford
North Zico 20000 Honda
North Peter 35000 Ford
North Joe 25000 Ford
North Richard 20000 Honda
North Peter 23000 Ford
....
Same for region South and Others. What i need is like this:
Region vendorname amount carname top20inregion
North Joe 30000 Ford Yes
North Richard 20000 Honda No
North Peter 35000 Ford No
North Joe 30000 Ford Yes
North Richard 20000 Honda No
North Peter 31000 Ford No
North Joe 30000 Ford Yes
North Richard 20000 Honda No
North Peter 35000 Ford No
North Joe 25000 Ford Yes
....
....
Where this last variable "top20inregion" is based on number of sells (from
proc freq, for example)
-----Mensagem original-----
De: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] Em nome de Daniel
Fernández
Enviada em: segunda-feira, 28 de junho de 2010 15:09
Para: SAS-L@LISTSERV.UGA.EDU
Assunto: Re: how to output partial data with bigger "n" freqs
hi,
why don´t you do, after proc freq, a proc sort:
proc sort data= freq_output out= need (obs=200);
by descending count; run;
You will get the bigest 200 groups.
HTH
Daniel Fernández.
'In God we trust. All others must bring data'.(Anónimo)
--- El lun, 28/6/10, goladin@gmail.com <goladin@GMAIL.COM> escribió:
De: goladin@gmail.com <goladin@GMAIL.COM>
Asunto: Re: how to output partial data with bigger "n" freqs
Para: SAS-L@LISTSERV.UGA.EDU
Fecha: lunes, 28 de junio, 2010 15:18
Hi,
I think you can use proc sql to achieve your needs.
Regards,
Murphy
Sent from my Nokia phone
-----Original Message-----
From: Adriano Rodrigues
Sent: 28/06/2010 9:12:24 PM
Subject: how to output partial data with bigger "n" freqs
Hi,
I have one dataset with 4 variables region,vendorname,amount,carname.
The dataset has 3 regions and i want output the top 20 vendors (number of
occurences, not amount total) by region, to work only with them. How to do
it?
Basically i need make one new variable "top20 in region?" with Yes/no. But i
really dont know how to do that in faster way.
Poor and fast solution by now is proc freq, get names and make one IF name
by name...but IF i need top 200 would be very bad...
Thanks in advance,
Adriano