LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2009, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 29 Apr 2009 08:42:44 -0700
Reply-To:     "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:      Re: How to create a dataset by appending a single (i.e same)
              dataset multiple times.??
Comments: To: Gerhard Hellriegel <gerhard.hellriegel@T-ONLINE.DE>
In-Reply-To:  A<200904290939.n3SLgWXD009480@malibu.cc.uga.edu>
Content-Type: text/plain; charset="iso-8859-1"

Thanks Gerhard! I had forgotten the DEFER option. I occasionally need to set large numbers of files together, and usually I break the job into steps to avoid crashing the program. Your suggestion will provide me a much more elegant solution.

Paul Choate DDS Data Extraction (916) 654-2160 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Gerhard Hellriegel Sent: Wednesday, April 29, 2009 2:39 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: How to create a dataset by appending a single (i.e same) dataset multiple times.??

I meant that statement:

data A; set A A A . .... ..................; run;

That is not optimal. I once cased a IPL on a production mainframe with a set with 50 input datasets with around 400 variables. That caused a memory overflow and went into a loop, unfortunately just in the moment when a message was sent to the system console... That message blocked the console (think one of the 10 billions must have been bad...). If you try that out with sashelp class, you see, that it will last a long time to allocate all the buffers. Using

set a a open=defer a open=defer ... ;

opens all the dataset in sequence. Only one buffer is needed. It is necessary that all datasets have the same structure for that. That has advantages over PROC APPEND, because the overhead for starting the PROC is avoided.

With 100 datasets and short obs you will see no difference. If you use 1000 or more that could be seen.

options fullstimer; %macro test(iterate,ds); data a; set %do i=1 %to &iterate; &ds open=defer %end; ; run; %mend; %test(1000,sashelp.class);

Gerhard

On Wed, 29 Apr 2009 05:01:25 -0400, S=?ISO-8859-1?Q?=C3=B8ren?= Lassen <s.lassen@POST.TELE.DK> wrote:

>Gerhard, >I do not suggest using "set statement with 100 input datasets". I suggest >iterating over the same set statement a number of times. > >Why so complicated? Because the original poster wanted that order >of the obs. > >I think that Dan is right - as the size of the input data set grows, >the advantage of using a single data step decreases, and may >eventually disappear. On the other hand, I still prefer this log >entry (the times were for the original 3 obs. sample data set): > >NOTE: The data set WORK.WANT has 300 observations and 2 variables. >NOTE: DATA statement used: > real time 0.00 seconds > cpu time 0.00 seconds > >to parsing a log with 100 notes about PROC APPEND. > >But of course, if the order of the observations is not >important, your suggestion is probably to be preferred. > >Regards, >Søren > >On Wed, 29 Apr 2009 04:44:37 -0400, Gerhard Hellriegel ><gerhard.hellriegel@T-ONLINE.DE> wrote: > >>I'd not use a set statement with 100 input datasets! Note that SAS creates >>a buffer for each dataset which costs a lot of memory and a lot of CPU >>time. >>If you want to do that, use at least open=defer as option for each >dataset. >> >>One question: why so complicated? The following does the same, only with >>another order for the obs: >> >>data a; >> set sashelp.class; >> do i= 1 to 100; >> output; >> end; >> drop i; >>run; >> >>Gerhard >> >> >> >>On Wed, 29 Apr 2009 01:15:37 -0700, Daniel Nordlund >><djnordlund@VERIZON.NET> wrote: >> >>>> -----Original Message----- >>>> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On >>>> Behalf Of S øren Lassen >>>> Sent: Tuesday, April 28, 2009 11:55 PM >>>> To: SAS-L@LISTSERV.UGA.EDU >>>> Subject: Re: How to create a dataset by appending a single >>>> (i.e same) dataset multiple times.?? >>>> >>>> While I generally recommend using proc append for appending data, >>>> there are limits - running the append procedure one hundred times >>>> after each other will cost a lot of overhead compared to this >>>> solution: >>>> >>>> data want; >>>> do __i=1 to 100; >>>> do __p=1 to __n; >>>> set A nobs=__n point=__p; >>>> output; >>>> end; >>>> end; >>>> stop; >>>> drop __:; >>>> run; >>>> >>>> Regards, >>>> Søren >>>> >>>> On Tue, 28 Apr 2009 23:24:43 -0700, pinu >>>> <amarmundankar@GMAIL.COM> wrote: >>>> >>>> >There is a dataset A as; >>>> >id num >>>> >1 11 >>>> >2 22 >>>> >3 33 >>>> >Now I want to create a dataset named A which will consists >>>> of records >>>> >from A appended 100 times. >>>> >Sample o/p of Dataset A will be: >>>> >1 11 >>>> >2 22 >>>> >3 33 >>>> >1 11 >>>> >2 22 >>>> >3 33 >>>> >.. >>>> >.. >>>> >.. >>>> >.. >>>> >Is there any other way than using the set statement and writing A 100 >>>> >times after that >>>> >e.g. . data A; >>>> > set A A A . .... ..................; >>>> > run; >>> >>>S�ren, >>> >>>I ran a few quick and dirty tests. The first test used a file of 100 >>>records. The second used a file with 100000 records. With the small >>file, >>>the proc append solution ran in approx 3 seconds (real time), your set >>with >>>point solution ran in .09 seconds. With the large file, the proc append >>>solution ran in 25-30 seconds and the set with point solution ran in 15- >30 >>>seconds. This is not conclusive because it was a quick and dirty test. >>> >>>But I did notice two points of interest. First, as one might expect, it >>>appears that the overhead of proc append will become a small percentage >of >>>the overall processing time as file size increases. Second, the times >for >>>both methods were quite variable, probably due to a variety of background >>>tasks (I am running on a WinXP system). But it was interesting that the >>>individual times for proc append with the 100k record file varied between >>>.04 and 3.1 seconds. It would seem that the proc append could >>theoretically >>>finish in as little as 4 seconds. So the overhead of running proc append >>>may not rule out using it 100 times. The variability of these times will >>>probably vary across systems depending on amount of ram (and how the OS >>>manages it), type of file system, background activity, etc. >>> >>>I may may try to benchmark this a little more carefully to get a better >>>assessment of the timings for these two approaches. I would be >interested >>>in your comments (others feel free to jump in here as well). >>> >>>Dan >>> >>>Daniel Nordlund >>>Bothell, WA USA


Back to: Top of message | Previous page | Main SAS-L page