LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (August 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 17 Aug 2006 17:50:53 -0400
Reply-To:     "Luo, Peter" <pluo@DRAFTNET.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Luo, Peter" <pluo@DRAFTNET.COM>
Subject:      Re: How slow is normal with large datasets?
Comments: To: Paul <ptvonhippel@CHECKFREE.COM>
Content-Type: text/plain; charset="US-ASCII"

If you do stepwise, 1.5M records and 50 vars will take a while, and # of vars is far more time consuming than # of records. But then taking a sample should help. Since you only have a thousand ones, you might want to take a 5-10% sample out of those million zeros.

-----Original Message----- From: Paul [mailto:ptvonhippel@CHECKFREE.COM] Sent: Thursday, August 17, 2006 5:57 AM To: SAS-L@LISTSERV.VT.EDU Subject: How slow is normal with large datasets?

I'm analyzing a large data set wth 1.5M rows and 50 variables. With 30 regressors, PROC LOGISTIC takes at least 15 minutes to give results -- and can take much longer if other users are running SAS jobs as well. It's an imbalanced data set -- a thousand ones, a million zeroes, and sampling weights that vary from 1 to 100. The DATA step is slow, too, so I can spend the bulk of my day just coding up a couple of new variables and running a few regressions that use them.

In my previous job, I did academic research using data sets that were at least 10 times smaller. So I've learned to use SAS more efficiently -- avoiding sorts, keeping only the variables I need, using pass-through SQL. These make a big difference, but I just can't seem to get the speed down to where I really feel I'm interacting with the data.

Do I need to accept fate, or should I be pounding the table for a better technical setup? We're using SAS Enterprise Guide on a remote server in another state. I'm sharing resources with up to 4 other users.

I'd be interested in hearing comparisons (either "That sounds about right" or "I run more complicated analyses on larger data sets and never have to wait more than a minute for results"). And I'd be interested in hearing suggestions about diagnosing the problem. I'm not sure if our connection is slow, or if it's fundamentally inappropriate to be using EG for this purpose. Maybe I should have my own SAS installation on my local PC?

Best -- Paul This message is the property of Draft FCB Group and contains information which may be privileged or confidential. It is meant only for the intended recipients and/or their authorized agents. If you believe you have received this message in error, please notify us immediately by return e-mail and destroy any printed or electronic copies of the message. Any unauthorized use, dissemination, disclosure, or copying of this message or the information contained in it, is strictly prohibited and may be unlawful. Thank you for your cooperation.


Back to: Top of message | Previous page | Main SAS-L page