Date: Wed, 29 Apr 2009 09:07:51 -0500
Reply-To: Mary <mlhoward@AVALON.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: How to create a dataset by appending a single (i.e same)
dataset multiple times.??
Content-Type: text/plain; format=flowed; charset="Windows-1252";
reply-type=original
SQL Insert statements (if if is to append to a table in a database then use
pass-through SQL rather than SAS SQL) are likely to be very fast as well;
certainly worth trying.
proc sql noprint;
insert into table (variables) values (select ....);
quit;
-Mary
----- Original Message -----
From: "S øren Lassen" <s.lassen@POST.TELE.DK>
To: <SAS-L@LISTSERV.UGA.EDU>
Sent: Wednesday, April 29, 2009 6:55 AM
Subject: Re: How to create a dataset by appending a single (i.e same)
dataset multiple times.??
> Gerhard,
> You are absolutely right. Still, I think Dan has a point,
> as PROC APPEND basically is faster than the data step.
>
> Therefore, if the input data set is large enough,
> the repeated APPEND solution may be faster than any
> of the data step solutions.
>
> Regards,
> Søren
>
> On Wed, 29 Apr 2009 05:39:11 -0400, Gerhard Hellriegel
> <gerhard.hellriegel@T-ONLINE.DE> wrote:
>
>>I meant that statement:
>>
>>data A;
>> set A A A . .... ..................;
>>run;
>>
>>That is not optimal. I once cased a IPL on a production mainframe with a
>>set with 50 input datasets with around 400 variables. That caused a memory
>>overflow and went into a loop, unfortunately just in the moment when a
>>message was sent to the system console... That message blocked the console
>>(think one of the 10 billions must have been bad...).
>>If you try that out with sashelp class, you see, that it will last a long
>>time to allocate all the buffers.
>>Using
>>
>>set a
>> a open=defer
>> a open=defer
>> ...
>> ;
>>
>>opens all the dataset in sequence. Only one buffer is needed. It is
>>necessary that all datasets have the same structure for that.
>>That has advantages over PROC APPEND, because the overhead for starting
>>the PROC is avoided.
>>
>>With 100 datasets and short obs you will see no difference. If you use
>>1000 or more that could be seen.
>>
>>options fullstimer;
>>%macro test(iterate,ds);
>>data a;
>> set
>> %do i=1 %to &iterate;
>> &ds open=defer
>> %end;
>> ;
>>run;
>>%mend;
>>%test(1000,sashelp.class);
>>
>>Gerhard
>>
>>
>>
>>
>>
>>On Wed, 29 Apr 2009 05:01:25 -0400, S=?ISO-8859-1?Q?=C3=B8ren?= Lassen
>><s.lassen@POST.TELE.DK> wrote:
>>
>>>Gerhard,
>>>I do not suggest using "set statement with 100 input datasets". I suggest
>>>iterating over the same set statement a number of times.
>>>
>>>Why so complicated? Because the original poster wanted that order
>>>of the obs.
>>>
>>>I think that Dan is right - as the size of the input data set grows,
>>>the advantage of using a single data step decreases, and may
>>>eventually disappear. On the other hand, I still prefer this log
>>>entry (the times were for the original 3 obs. sample data set):
>>>
>>>NOTE: The data set WORK.WANT has 300 observations and 2 variables.
>>>NOTE: DATA statement used:
>>> real time 0.00 seconds
>>> cpu time 0.00 seconds
>>>
>>>to parsing a log with 100 notes about PROC APPEND.
>>>
>>>But of course, if the order of the observations is not
>>>important, your suggestion is probably to be preferred.
>>>
>>>Regards,
>>>Søren
>>>
>>>On Wed, 29 Apr 2009 04:44:37 -0400, Gerhard Hellriegel
>>><gerhard.hellriegel@T-ONLINE.DE> wrote:
>>>
>>>>I'd not use a set statement with 100 input datasets! Note that SAS
>>creates
>>>>a buffer for each dataset which costs a lot of memory and a lot of CPU
>>>>time.
>>>>If you want to do that, use at least open=defer as option for each
>>>dataset.
>>>>
>>>>One question: why so complicated? The following does the same, only with
>>>>another order for the obs:
>>>>
>>>>data a;
>>>> set sashelp.class;
>>>> do i= 1 to 100;
>>>> output;
>>>> end;
>>>> drop i;
>>>>run;
>>>>
>>>>Gerhard
>>>>
>>>>
>>>>
>>>>On Wed, 29 Apr 2009 01:15:37 -0700, Daniel Nordlund
>>>><djnordlund@VERIZON.NET> wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
>>>>>> Behalf Of S øren Lassen
>>>>>> Sent: Tuesday, April 28, 2009 11:55 PM
>>>>>> To: SAS-L@LISTSERV.UGA.EDU
>>>>>> Subject: Re: How to create a dataset by appending a single
>>>>>> (i.e same) dataset multiple times.??
>>>>>>
>>>>>> While I generally recommend using proc append for appending data,
>>>>>> there are limits - running the append procedure one hundred times
>>>>>> after each other will cost a lot of overhead compared to this
>>>>>> solution:
>>>>>>
>>>>>> data want;
>>>>>> do __i=1 to 100;
>>>>>> do __p=1 to __n;
>>>>>> set A nobs=__n point=__p;
>>>>>> output;
>>>>>> end;
>>>>>> end;
>>>>>> stop;
>>>>>> drop __:;
>>>>>> run;
>>>>>>
>>>>>> Regards,
>>>>>> Søren
>>>>>>
>>>>>> On Tue, 28 Apr 2009 23:24:43 -0700, pinu
>>>>>> <amarmundankar@GMAIL.COM> wrote:
>>>>>>
>>>>>> >There is a dataset A as;
>>>>>> >id num
>>>>>> >1 11
>>>>>> >2 22
>>>>>> >3 33
>>>>>> >Now I want to create a dataset named A which will consists
>>>>>> of records
>>>>>> >from A appended 100 times.
>>>>>> >Sample o/p of Dataset A will be:
>>>>>> >1 11
>>>>>> >2 22
>>>>>> >3 33
>>>>>> >1 11
>>>>>> >2 22
>>>>>> >3 33
>>>>>> >..
>>>>>> >..
>>>>>> >..
>>>>>> >..
>>>>>> >Is there any other way than using the set statement and writing A
> 100
>>>>>> >times after that
>>>>>> >e.g. . data A;
>>>>>> > set A A A . .... ..................;
>>>>>> > run;
>>>>>
>>>>>S�ren,
>>>>>
>>>>>I ran a few quick and dirty tests. The first test used a file of 100
>>>>>records. The second used a file with 100000 records. With the small
>>>>file,
>>>>>the proc append solution ran in approx 3 seconds (real time), your set
>>>>with
>>>>>point solution ran in .09 seconds. With the large file, the proc
> append
>>>>>solution ran in 25-30 seconds and the set with point solution ran in
> 15-
>>>30
>>>>>seconds. This is not conclusive because it was a quick and dirty test.
>>>>>
>>>>>But I did notice two points of interest. First, as one might expect,
> it
>>>>>appears that the overhead of proc append will become a small percentage
>>>of
>>>>>the overall processing time as file size increases. Second, the times
>>>for
>>>>>both methods were quite variable, probably due to a variety of
>>background
>>>>>tasks (I am running on a WinXP system). But it was interesting that
> the
>>>>>individual times for proc append with the 100k record file varied
>>between
>>>>>.04 and 3.1 seconds. It would seem that the proc append could
>>>>theoretically
>>>>>finish in as little as 4 seconds. So the overhead of running proc
>>append
>>>>>may not rule out using it 100 times. The variability of these times
>>will
>>>>>probably vary across systems depending on amount of ram (and how the OS
>>>>>manages it), type of file system, background activity, etc.
>>>>>
>>>>>I may may try to benchmark this a little more carefully to get a better
>>>>>assessment of the timings for these two approaches. I would be
>>>interested
>>>>>in your comments (others feel free to jump in here as well).
>>>>>
>>>>>Dan
>>>>>
>>>>>Daniel Nordlund
>>>>>Bothell, WA USA
|