LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 4 Jan 2006 14:24:18 -0600
Reply-To:     baogong jiang <bgjiang@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         baogong jiang <bgjiang@GMAIL.COM>
Subject:      Re: select distinct id from a big dataset
Comments: To: VORA MANAN <manancvora@yahoo.com>
In-Reply-To:  <20060104185528.31293.qmail@web31909.mail.mud.yahoo.com>
Content-Type: text/plain; charset=ISO-8859-1

vora:

I delete that data this morning since it's should not suppose to be that big and other workers complaims about space limits (we share the same server ). So I can not try this anymore.

thank you,

baogong

On 1/4/06, VORA MANAN <manancvora@yahoo.com> wrote: > > Baoogng, > > If your IDs are numeric, try this: > > proc sql; > create table recip_id as > select recip_id > from srcdata.98statin > group by recip_id > having max(recip_id); > quit; > > > Let me know if this works. > > Thanks, > Manan. > > Dennis Diskin <ddiskin@GMAIL.COM> wrote: > It dependes on your system. I'd say try it. I suggested proc freq because > it > builds an in-memory table instead of sorting, so you need a few tens of > megabytes of virtuial memory, but not work space. > > HTH, > Dennis Diskin > > > On 1/4/06, baogong jiang wrote: > > > > hi, Dennis, > > *thank you, *The dataset have about 670,000 distinct recip_id. Do I can > > use prov freq? > > > > baogong > > > > > > > > On 1/4/06, Dennis Diskin > wrote: > > > > > Baogong, > > > > > > SQL is probably sorting the file first to find the distinct ID's. This > > > is what uses a lot of work space. Is your file possibly already in ID > order? > > > If not, how many distinct IDs do you expect? If not too many, you > could use > > > proc freq for one, to create a distinct ID file: > > > > > > proc freq data=srcdata.98statin(keep=recip_id); > > > table recip_id /out=recip_id(keep=recip_id); > > > run; > > > > > > HTH, > > > Dennis Diskin > > > > > > > > > On 1/4/06, baogong jiang wrote: > > > > > > > > hello, Happy new year to all, > > > > > > > > I need to get the distinct recip_id from a big file (100 million > > > > records > > > > with 8 variables). I tried the following code: > > > > > > > > proc sql; > > > > create table recip_id as > > > > select distinct recip_id > > > > from srcdata.98statin; > > > > > > > > I got the error: Insuffient memory, then I tried: > > > > proc sql; > > > > create table recip_id as > > > > select distinct recip_id > > > > from srcdata.98statin(keep=recip_id); > > > > > > > > Still, it not working. I also tried: > > > > > > > > proc sort data=srcdata.98statin(keep=recip_id) out=recip_id nodup;by > > > > recip_id;run; > > > > > > > > this run out work space. > > > > > > > > Is there any other ways I can slover this problem. at end, I will > use > > > > this > > > > recip_id as look up table to pull information related to thoes > > > > recip_id. > > > > > > > > > > > > thank you, > > > > > > > > > > > > > > > > -- > > Baoogng Jiang > > Department of Agronomy > > Lousisana State University > > > > > --------------------------------- > Yahoo! Photos > Ring in the New Year with Photo Calendars. Add photos, events, holidays, > whatever. >

-- Baoogng Jiang Department of Agronomy Lousisana State University


Back to: Top of message | Previous page | Main SAS-L page