Date: Wed, 4 Jan 2006 14:24:18 -0600
Reply-To: baogong jiang <bgjiang@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: baogong jiang <bgjiang@GMAIL.COM>
Subject: Re: select distinct id from a big dataset
In-Reply-To: <20060104185528.31293.qmail@web31909.mail.mud.yahoo.com>
Content-Type: text/plain; charset=ISO-8859-1
vora:
I delete that data this morning since it's should not suppose to be that big
and other workers complaims about space limits (we share the same server ).
So I can not try this anymore.
thank you,
baogong
On 1/4/06, VORA MANAN <manancvora@yahoo.com> wrote:
>
> Baoogng,
>
> If your IDs are numeric, try this:
>
> proc sql;
> create table recip_id as
> select recip_id
> from srcdata.98statin
> group by recip_id
> having max(recip_id);
> quit;
>
>
> Let me know if this works.
>
> Thanks,
> Manan.
>
> Dennis Diskin <ddiskin@GMAIL.COM> wrote:
> It dependes on your system. I'd say try it. I suggested proc freq because
> it
> builds an in-memory table instead of sorting, so you need a few tens of
> megabytes of virtuial memory, but not work space.
>
> HTH,
> Dennis Diskin
>
>
> On 1/4/06, baogong jiang wrote:
> >
> > hi, Dennis,
> > *thank you, *The dataset have about 670,000 distinct recip_id. Do I can
> > use prov freq?
> >
> > baogong
> >
> >
> >
> > On 1/4/06, Dennis Diskin
> wrote:
> >
> > > Baogong,
> > >
> > > SQL is probably sorting the file first to find the distinct ID's. This
> > > is what uses a lot of work space. Is your file possibly already in ID
> order?
> > > If not, how many distinct IDs do you expect? If not too many, you
> could use
> > > proc freq for one, to create a distinct ID file:
> > >
> > > proc freq data=srcdata.98statin(keep=recip_id);
> > > table recip_id /out=recip_id(keep=recip_id);
> > > run;
> > >
> > > HTH,
> > > Dennis Diskin
> > >
> > >
> > > On 1/4/06, baogong jiang wrote:
> > > >
> > > > hello, Happy new year to all,
> > > >
> > > > I need to get the distinct recip_id from a big file (100 million
> > > > records
> > > > with 8 variables). I tried the following code:
> > > >
> > > > proc sql;
> > > > create table recip_id as
> > > > select distinct recip_id
> > > > from srcdata.98statin;
> > > >
> > > > I got the error: Insuffient memory, then I tried:
> > > > proc sql;
> > > > create table recip_id as
> > > > select distinct recip_id
> > > > from srcdata.98statin(keep=recip_id);
> > > >
> > > > Still, it not working. I also tried:
> > > >
> > > > proc sort data=srcdata.98statin(keep=recip_id) out=recip_id nodup;by
> > > > recip_id;run;
> > > >
> > > > this run out work space.
> > > >
> > > > Is there any other ways I can slover this problem. at end, I will
> use
> > > > this
> > > > recip_id as look up table to pull information related to thoes
> > > > recip_id.
> > > >
> > > >
> > > > thank you,
> > > >
> > >
> > >
> >
> >
> > --
> > Baoogng Jiang
> > Department of Agronomy
> > Lousisana State University
>
>
>
>
> ---------------------------------
> Yahoo! Photos
> Ring in the New Year with Photo Calendars. Add photos, events, holidays,
> whatever.
>
--
Baoogng Jiang
Department of Agronomy
Lousisana State University
|