| Date: | Tue, 4 Jul 2000 11:01:43 -0700 |
| Reply-To: | kmself@IX.NETCOM.COM |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | kmself@IX.NETCOM.COM |
| Subject: | Re: "PROC SQL", "PROC SORT + MERGE" |
| Content-Type: | multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; |
|---|
On Tue, Jul 04, 2000 at 05:08:43PM +0800, wong wrote:
> hi,
>
> Suppose I have 2 datasets with 1 common key, I can merge these 2
> datasets by
> 1) PROC SQL
> or
> 2) Use PROC SORT first, then use MERGE
>
> If both datasets are very large, is it better to use SQL ? (The purpose
> is to use less virtual memory).
How large is large?
- How many records and rows per dataset?
- What OS are you running on? What hardware limitations (disk size,
available memory)?
- Have you tried running the job yet?
Default action of both the SORT/MERGE and SQL methods is roughly
equivalent in terms of memory usage. SAS doesn't (usually) read an
entire dataset to memory, you shouldn't see an appreciable difference
here. More typically, clock time, cpu time, and working set space (SAS
WORK storage) are limiting factors for large merges and/or joins.
Depending on the sort of processing you're doing, the size of the
datasets, available memory, and characteristics of the data itself, you
may want to explore:
- Pre-processing to reduce amount of input data.
- SAS FORMATs.
- Hand-coded hash joins (research Paul Dorfman's posts on this list).
- Use of indexes rather than sorts.
- Forcing the SQL internal hash join (contact SAS tech support).
--
Karsten M. Self <kmself@ix.netcom.com> http://www.netcom.com/~kmself
Evangelist, Opensales, Inc. http://www.opensales.org
What part of "Gestalt" don't you understand? Debian GNU/Linux rocks!
http://gestalt-system.sourceforge.net/ K5: http://www.kuro5hin.org
GPG fingerprint: F932 8B25 5FDD 2528 D595 DC61 3847 889F 55F2 B9B0
[application/pgp-signature]
|