LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2010, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 30 Apr 2010 13:10:57 -0500
Reply-To:     Joe Matise <snoopy369@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Joe Matise <snoopy369@GMAIL.COM>
Subject:      Re: Error in using Hash Objects
Comments: To: Muthia Kachirayan <muthia.kachirayan@gmail.com>
In-Reply-To:  <y2l2fc7f3341004301010tfa823174wcebd1f226365ef0c@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Can you explain that further? I thought that first statement was just defining the PDV and the second set statement fully brought in the hash table... perhaps I don't understand hash tables adequately though?

-Joe

On Fri, Apr 30, 2010 at 12:10 PM, Muthia Kachirayan < muthia.kachirayan@gmail.com> wrote:

> Naresh Kumar, > > In Mark's code the statement > > set have (keep=tn order_number obs=0); > excludes adding to the hash table. Try replace the first part by > > data want(drop = rc); > if _n_ = 1 then do; > if 0 then set have; > declare hash found_keys(); > found_keys.definekey('tn', 'order_number'); > found_keys.definedone(); > end; > > This will output unduplicated records with all variables > > Muthia Kachirayan > > > On Fri, Apr 30, 2010 at 12:48 PM, Joe Matise <snoopy369@gmail.com> wrote: > > > I think that's what Mark's code does. It only puts two variables into > the > > hash table, but it outputs the entire row (174 vars) to the dataset. > > > > -Joe > > > > On Fri, Apr 30, 2010 at 11:42 AM, naresh kmar <nareshkmar@yahoo.co.in > > >wrote: > > > > > Mark, > > > > > > Thanks. I would like to get the other 174 variables as well in my > output > > > dataset. Actually, I don't need to sort the dataset but would like to > > remove > > > duplicates on the composite key (TN+ORDER_NUMBER). I don't think Hash > > table > > > will not be able to take in all those variables through definedata(). > > > > > > Any thoughts?? > > > > > > Thanks, > > > Naresh > > > > > > > > > > > > > > > > > > ________________________________ > > > From: "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU> > > > To: SAS-L@LISTSERV.UGA.EDU > > > Sent: Fri, 30 April, 2010 8:18:42 PM > > > Subject: Re: Error in using Hash Objects > > > > > > Naresh: > > > > > > You are asking for WAY too much memory. So PROC SORT, which > substitutes > > > disk I/O for memory, may be the preferred tactic. > > > > > > BUT ... you could use a hash if, by "removing duplicates" you mean > > keeping > > > only one record for each combination of identification variables, say > TN > > and > > > ORDER_NUMBER. That's apparently your intention in your code. > > > > > > If so, consider the below. Here the hash table only accomodates the > two > > id > > > variables, merely for maintainng a list tracking which id values have > > > already been encountered at any point in your progress through dataset > > HAVE. > > > > > > > > > data want (drop=rc);; > > > ** Get variable attributes of the key variables into the PDV **; > > > set have (keep=tn order_number obs=0); > > > > > > declare hash found_keys (hashexp:16); > > > found_keys.definekey('TN','ORDER_NUMBER'); > > > found_keys.definedone(); > > > > > > do until (end_of_have); > > > set have end=end_of_have; > > > rc=found_keys.check(); > > > if rc^=0 then do; /* If not yet in table ... */ > > > rc=found_keys.add(); /* .. add to the table ... */ > > > output; /* .. and write to WANT */ > > > end; > > > end; > > > stop; > > > run; > > > > > > > > > Whenever a record is encountered whose TN/ORDER_NUMBER are already in > > > FOUND_KEYS, then no OUTPUT statement is executed. > > > > > > Note this will NOT sort the data, but it will write out only one record > > per > > > TN/ORDER_NUMBER combination. > > > > > > Regards, > > > Mark > > > > > > > -----Original Message----- > > > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > > > > naresh kmar > > > > Sent: Friday, April 30, 2010 10:24 AM > > > > To: SAS-L@LISTSERV.UGA.EDU > > > > Subject: Error in using Hash Objects > > > > > > > > Hi All, > > > > > > > > I am running the below code on 14 million record dataset and received > > > > an error. Could anyone let me know how to resolve this? work.indsn > has > > > > 14 million records and 176 variables. My objective is to sort the > input > > > > dataset and remove duplicates based on the key. I could have used > PROC > > > > sort but heard that Hash objects are more efficient. > > > > > > > > DATA _NULL_ ; > > > > IF _N_=1 THEN SET work.indsn ; > > > > DECLARE HASH HH ( DATASET: 'work.indsn', HASHEXP: 16, ORDERED: 'A') ; > > > > HH.DEFINEKEY ( 'TN', 'ORDER_NUMBER' ) ; > > > > HH.DEFINEDATA ( 'var1','var2',....,'var176') ; /****** ADD ALL > > > > VARIABLES ****/ > > > > HH.DEFINEDONE () ; > > > > HH.OUTPUT(DATASET:'work.outdsn'); > > > > STOP; > > > > RUN; > > > > > > > > ERROR: Hash object added 131056 items when memory failure occurred. > > > > FATAL: Insufficient memory to execute data step program. Aborted > during > > > > the EXECUTION phase. > > > > > > > > Thanks, > > > > Naresh > > > > > > > > > > > > > > > > > > >


Back to: Top of message | Previous page | Main SAS-L page