LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (June 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 28 Jun 2006 10:05:44 -0400
Reply-To:     Kevin Roland Viel <kviel@EMORY.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Kevin Roland Viel <kviel@EMORY.EDU>
Subject:      Re: UNIX datastep question
In-Reply-To:  <7.0.1.0.2.20060627151242.03591018@viergever.net>
Content-Type: TEXT/PLAIN; charset=US-ASCII

Jennifer,

Having been stung once by ICD-9 codes, the first thing I would do is to obtain a frequency listing of *all* codes. There might not be a difference between say, 714, 714.0, and 714.00, but there may be...

I am pretty surprised that noone has questioned the form. Obviously, many, many patients do NOT have 15 Dx's. These should be held in a separate table.

Also, when against the wall, Ian's sage suggestion (no intention to slight others but I stopped reading intensely at this point) of using a VIEW will serve you well. A VIEW, however, is created on the fly *each* time you hit it. This means it is potentially dynamic and could require more CPU time, but if you don't have the memory or disk space, you have no alternative, given efficient coding.

You might be able to dispense with the flag altogether, either by using formats or a hash. An exercise like this will hone your attention to efficiency, either in execution or space. This is why I always made my students aware of the little things, even if our classroom datasets were only a few hundred observations-at some point, they will have a "big" dataset.

Yours is exactly the example I suggest to the genetics folks who are astounding by the size of our data, which soon will be the entire 3 billion base-pairs of the genome (genome=one persons collection of DNA)-relish the thought!!! I guess I also cite the financial industries likely datasets, too.

Good luck,

Kevin

Kevin Viel Department of Epidemiology Rollins School of Public Health Emory University Atlanta, GA 30322


Back to: Top of message | Previous page | Main SAS-L page