Date: Thu, 10 Jan 2008 15:31:06 -0700
Reply-To: Richard Read Allen <peakstat@WISPERTEL.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Richard Read Allen <peakstat@WISPERTEL.NET>
Subject: Re: PROC SQL--select DISTINCT
In-Reply-To: <20080110214751.ABA921BF28D@ws1-1.us4.outblaze.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Tom,
Here a possible SQL solution.
Richard
==================================================
data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc sql;
create table claim1 as
select ID, Date, Claim_ID, max(flag) as sum_flag
from claim
group by ID, Date, Claim_ID;
quit;
proc print;
run;
Tom White wrote:
> Thank you data _null_. This seems to solve my problem. I just couldn't get anywehere with SQL and DISTINCT.
>
> Also thank you to the the rest of the responders whose ideas I will very likely use very soon.
>
> T
>
>
>
> ----- Original Message -----
> From: "data _null_,"
> To: "Tom White"
> Subject: Re: PROC SQL--select DISTINCT
> Date: Thu, 10 Jan 2008 15:13:50 -0600
>
>
> I believe this gives the correct result. It uses 2 PROC SUMMARY's
> which you may find objectionable.
>
>
> data work.claim;
> input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
> format date date9.;
> cards;
> 200279098 01Aug2005 A16620005 92285 0
> 200279098 01Aug2005 A16620005 99243 0
>
> 200279098 02Aug2005 A16620006 92081 0
>
> 200279098 06Sep2005 A16880748 15823 1
> 200279098 06Sep2005 A16880748 15823 1
> 200279098 06Sep2005 A16880748 15823 1
> 200279098 06Sep2005 A16880748 15823 1
>
> 200279098 01Nov2005 A17313211 67904 1
> 200279098 01Nov2005 A17313211 67904 1
> 200279098 01Nov2005 A17313211 67904 1
> 200279098 01Nov2005 A17313211 67904 1
> 200279098 01Nov2005 A17313211 92285 0
> 200279098 01Nov2005 A17313211 92285 0
> ;;;;
> run;
> proc summary data=work.claim nway;
> class id date claim_id code_id;
> output out=work.claim0(drop=_:) idgroup(max(flag) out(flag)=);
> run;
> *proc print;
> run;
> proc summary data=work.claim0 nway;
> class id date;
> output out=work.claim1(drop=_:) sum(flag)=sum_flag;
> run;
> proc print;
> run;
>
> On Jan 10, 2008 2:45 PM, Tom White wrote:
>
>> Perhaps I am making this too difficult.
>>
>> Given this dataset
>>
>> ID DATE CLAIM_ID CODE_ID FLAG
>> 200279098 01Aug2005 A16620005
>> 92285 0
>> 200279098 01Aug2005 A16620005
>> 99243 0
>>
>> 200279098 02Aug2005 A16620006
>> 92081 0
>>
>> 200279098 06Sep2005 A16880748
>> 15823 1
>> 200279098 06Sep2005 A16880748
>> 15823 1
>> 200279098 06Sep2005 A16880748
>> 15823 1
>> 200279098 06Sep2005 A16880748
>> 15823 1
>>
>> 200279098 01Nov2005 A17313211
>> 67904 1
>> 200279098 01Nov2005 A17313211
>> 67904 1
>> 200279098 01Nov2005 A17313211
>> 67904 1
>> 200279098 01Nov2005 A17313211
>> 67904 1
>> 200279098 01Nov2005 A17313211
>> 92285 0
>> 200279098 01Nov2005 A17313211
>> 92285 0
>>
>> I would like to sum up the FLAG field within each ID (in this case I only
>> show one ID, but in real data I have many such IDs) as follows:
>>
>> For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
>> which are the same. (Note that CODE_ID are different).
>> Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.
>>
>> For ID=200279098 and DATE=02Aug2005 I see only one claim so there
>> is nothing going on here.
>> The sum of FLAG is 0.
>>
>> For ID=200279098 and DATE=06Sep2005 I see four entries all of
>> which are identical.
>> In this case the sum of FLAG will be 1. (Not 4.)
>>
>> For ID=200279098 and DATE=01Nov2005 I see six entries, four of
>> which are identical,
>> in which case sum of FLAG=1 and two remaining entries where
>> FLAG=0 and therefore sum FLAG=0.
>> Their total sum is SUM_FLAG=1+0=1.
>>
>> So, the desired table will look like the one below keeping only
>> ID and SUM_FLAG as the final fields of interest.
>>
>>
>> ID SUM_FLAG
>> 200279098 0
>> 200279098 0
>> 200279098 1
>> 200279098 1
>>
>>
>> Perhaps there is an easy way to do this.
>>
>> Thnak you.
>>
>> ----- Original Message -----
>> From: "toby dunn"
>> To: SAS-L@LISTSERV.UGA.EDU
>> Subject: Re: PROC SQL--select DISTINCT
>>
>> Date: Thu, 10 Jan 2008 18:35:21 +0000
>>
>>
>> Tom ,
>>
>> Well when you grab the distinct values you now loose any linking
>> back to the other 2 variables, perhaps this will head you in the
>> right direction:
>>
>> Proc SQL ;
>>
>> Create Table Distinct As
>> Select Distinct Var1 , Var2 , Var3
>> From Have ;
>>
>> Create Table Need As
>> Select A.* , B.Var4 , B.Var5
>> From Distinct As A ,
>> Have As B ;
>>
>> Quit ;
>>
>>
>> If you need distinct values for Var-Var3 seperately then:
>>
>> Proc SQL ;
>> Create Table Need As
>> Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
>> From
>> ( Select Distinct Var1 From Have ) As A ,
>> ( Select Distinct Var2 From Have ) As B ,
>> ( Select Distinct Var3 From Have ) As C ,
>> ( Select Var4 , Var5 From Have ) As D
>> ;
>> Quit ;
>>
>>
>>
>>
>> Toby Dunn
>>
>> "Don't bail. The best gold is at the bottom of barrels of crap."
>> Randy Pausch
>>
>> "Be prepared. Luck is where preparation meets opportunity."
>> Randy Pausch
>>
>>
>>
>>> Date: Thu, 10 Jan 2008 13:20:33 -0500
>>> From: tw2@MAIL.COM
>>> Subject: Re: PROC SQL--select DISTINCT
>>> To: SAS-L@LISTSERV.UGA.EDU
>>>
>>> Toby,
>>>
>>> What I am attempting to do is to select DISTINCT values for VAR1,
>>> VAR3, and VAR3
>>> but all values for VAR4 and VAR5. The data summation problem I am
>>> trying to solve maybe requires something like this.
>>>
>>> T
>>>
>>> ----- Original Message -----
>>> From: "toby dunn"
>>> To: "Tom White" , sas-l@listserv.uga.edu
>>> Subject: RE: PROC SQL--select DISTINCT
>>> Date: Thu, 10 Jan 2008 18:12:22 +0000
>>>
>>>
>>>
>>> Tom ,
>>>
>>> Im not sure what you are trying to end up with. Your code and what
>>> I gather from your write up is basically after distinct values of
>>> soome variables but non distinct values of others. The structure
>>> doesnt makes sense, in order to get the distinct values you will
>>> have to eleminate some of the values from the non distinct values,
>>> in order to get the non distinct values you have to make the
>>> distinct values non distinct.
>>>
>>>
>>>
>>>
>>> Toby Dunn
>>>
>>> "Don't bail. The best gold is at the bottom of barrels of crap."
>>> Randy Pausch
>>>
>>> "Be prepared. Luck is where preparation meets opportunity."
>>> Randy Pausch
>>>
>>>
>>>
>>>> Date: Thu, 10 Jan 2008 12:54:44 -0500
>>>> From: tw2@MAIL.COM
>>>> Subject: PROC SQL--select DISTINCT
>>>> To: SAS-L@LISTSERV.UGA.EDU
>>>>
>>>> Hello SAS-L
>>>>
>>>> I have a SAS data set with many fields (variables).
>>>>
>>>> Suppose I would like to write sql code like
>>>>
>>>> proc sq;
>>>> create table FOO2 as
>>>> select(distinct VAR1), distinct(VAR2), distinct(VAR3),
>>>> VAR4, VAR5
>>>> from FOO1
>>>> group by VAR1, VAR2, VAR3, VAR4, VAR5;
>>>> quit;
>>>>
>>>> SAS does not recognize the distinct function. In other words,
>>>> I cannot write
>>>>
>>>> .....
>>>> select(distinct VAR1), distinct(VAR2), distinct(VAR3),
>>>> .....
>>>> quit;
>>>>
>>>> Is there any way to select multiple DISTINCT variables like I am
>>>> attempting to do above?
>>>>
>>>> Thank you.
>>>> T
>>>>
>>>> --
>>>> Are we headed for a recession? Read more on the Money Portal
>>>> Mail.com Money - http://www.mail.com/Money.aspx?cat=money
>>>>
>>> _________________________________________________________________
>>> Make distant family not so distant with Windows Vista(R) + Windows Live™.
>>>
>>>
>> http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
>>
>>> --
>>> Are we headed for a recession? Read more on the Money Portal
>>> Mail.com Money - http://www.mail.com/Money.aspx?cat=money
>>>
>> _________________________________________________________________
>> Get the power of Windows + Web with the new Windows Live.
>> http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008
>>
>>
>>
>> --
>> Are we headed for a recession? Read more on the Money Portal
>> Mail.com Money - http://www.mail.com/Money.aspx?cat=money
>>
>>
>
>
> --
> Are we headed for a recession? Read more on the Money Portal
> Mail.com Money - http://www.mail.com/Money.aspx?cat=money
>
>
>
|