| Date: | Thu, 10 Apr 2008 22:30:27 -0400 |
| Reply-To: | Joe Whitehurst <joewhitehurst@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Joe Whitehurst <joewhitehurst@GMAIL.COM> |
| Subject: | Re: proc sort performance |
|
| In-Reply-To: | <200804101404.m3AAlFa3018959@malibu.cc.uga.edu> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
Consider the implications of this 9.2 documentation for your question:
------------------------------
TAGSORT Option
The TAGSORT option in the PROC SORT statement is useful in sorts when there
might not be enough disk space to sort a large SAS data set. When you
specify TAGSORT, the sort is a single-threaded sort. Do not specify TAGSORT
if you want the SAS to use multiple threads to sort.
When you specify the TAGSORT option, only sort keys (that is, the variables
specified in the BY statement) and the observation number for each
observation are stored in the temporary files. The sort keys, together with
the observation number, are referred to as tags. At the completion of the
sorting process, the tags are used to retrieve the records from the input
data set in sorted order. Thus, in cases where the total number of bytes of
the sort keys is small compared with the length of the record, temporary
disk use is reduced considerably. You should have enough disk space to hold
another copy of the data (the output data set) or two copies of the tags,
whichever is greater. Note that while using the TAGSORT option can reduce
temporary disk use, the processing time can be much higher. However, on PCs
with limited available disk space, the TAGSORT option can allow sorts to be
performed in situations where they would otherwise not be possible.
------------------------------
------------------------------
TAGSORT Option
The TAGSORT option in the PROC SORT statement is useful in sorts when there
might not be enough disk space to sort a large SAS data set. When you
specify TAGSORT, the sort is a single-threaded sort. Do not specify TAGSORT
if you want the SAS to use multiple threads to sort.
When you specify the TAGSORT option, only sort keys (that is, the variables
specified in the BY statement) and the observation number for each
observation are stored in the temporary files. The sort keys, together with
the observation number, are referred to as tags. At the completion of the
sorting process, the tags are used to retrieve the records from the input
data set in sorted order. Thus, in cases where the total number of bytes of
the sort keys is small compared with the length of the record, temporary
disk use is reduced considerably. You should have enough disk space to hold
another copy of the data (the output data set) or two copies of the tags,
whichever is greater. Note that while using the TAGSORT option can reduce
temporary disk use, *the processing time can be much higher*. However, on
PCs with limited available disk space, the TAGSORT option can allow sorts to
be performed in situations where they would otherwise not be possible.
------------------------------
On Thu, Apr 10, 2008 at 10:04 AM, Kelvin Yuen <cyuen3@hotmail.com> wrote:
> Does anyone have a idea of whether tagsort option in proc sort can
> practically improve the performance by reducing the number of IO and the
> amount of data written to the disks? According to sas online doc, proc
> sort with tagsort option will load and sort the key fields only and then
> get the corresponding row into the final table. The doc only mentions
> that proc sort with this option can be effective to require less space for
> sort, but does not suggest that the performance can be enhanced because of
> fewer IO operations.
>
|