Date: Thu, 17 Aug 2006 17:50:53 -0400
Reply-To: "Luo, Peter" <pluo@DRAFTNET.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Luo, Peter" <pluo@DRAFTNET.COM>
Subject: Re: How slow is normal with large datasets?
Content-Type: text/plain; charset="US-ASCII"
If you do stepwise, 1.5M records and 50 vars will take a while, and # of
vars is far more time consuming than # of records. But then taking a
sample should help. Since you only have a thousand ones, you might want
to take a 5-10% sample out of those million zeros.
-----Original Message-----
From: Paul [mailto:ptvonhippel@CHECKFREE.COM]
Sent: Thursday, August 17, 2006 5:57 AM
To: SAS-L@LISTSERV.VT.EDU
Subject: How slow is normal with large datasets?
I'm analyzing a large data set wth 1.5M rows and 50 variables. With 30
regressors, PROC LOGISTIC takes at least 15 minutes to give results --
and can take much longer if other users are running SAS jobs as well.
It's an imbalanced data set -- a thousand ones, a million zeroes, and
sampling weights that vary from 1 to 100. The DATA step is slow, too,
so I can spend the bulk of my day just coding up a couple of new
variables and running a few regressions that use them.
In my previous job, I did academic research using data sets that were
at least 10 times smaller. So I've learned to use SAS more efficiently
-- avoiding sorts, keeping only the variables I need, using
pass-through SQL. These make a big difference, but I just can't seem to
get the speed down to where I really feel I'm interacting with the
data.
Do I need to accept fate, or should I be pounding the table for a
better technical setup? We're using SAS Enterprise Guide on a remote
server in another state. I'm sharing resources with up to 4 other
users.
I'd be interested in hearing comparisons (either "That sounds about
right" or "I run more complicated analyses on larger data sets and
never have to wait more than a minute for results"). And I'd be
interested in hearing suggestions about diagnosing the problem. I'm not
sure if our connection is slow, or if it's fundamentally inappropriate
to be using EG for this purpose. Maybe I should have my own SAS
installation on my local PC?
Best --
Paul
This message is the property of Draft FCB Group and contains information
which may be privileged or confidential. It is meant only for the
intended recipients and/or their authorized agents. If you believe you
have received this message in error, please notify us immediately by
return e-mail and destroy any printed or electronic copies of the
message. Any unauthorized use, dissemination, disclosure, or copying of
this message or the information contained in it, is strictly prohibited
and may be unlawful. Thank you for your cooperation.
|