LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2007, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 6 Jun 2007 10:13:12 -0500
Reply-To:   Tom White <tw2@MAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Tom White <tw2@MAIL.COM>
Subject:   Re: PROC FREQ--DATA STEP--MODELING QUESTION
Content-Type:   text/plain; charset="iso-8859-1"

Well, it looks like I can'r even get PROC FREQ to create the firts table, as the error message below indicates. The second table is needed because that's how my real data set looks like. When I apply PROC FREQ to the second table (well, if anyone helps me with PROC FREQ), then I get the firts table. PROC FREQ rolls up the rows in the second table and turns them into the raws in the first table. So, for my purposes, I need to create additianl predictive fields using counts and percentages for the CODE variable. This CODE variable is avery important variable from a business perspective, so we thought that we should use it to generate more predictive fields out of it like counts and percentages. I thought that if I could even genereate table (1) then maybe it would be easier to genearte table (2). Any help, anyone? Thank you. tom

----- Original Message ----- From: "Zack, Matthew M. (CDC/CCHP/NCCDPHP)" To: "Tom White" Subject: RE: PROC FREQ--DATA STEP--MODELING QUESTION Date: Wed, 6 Jun 2007 07:57:17 -0400

Why do you need your second table to run PROC LOGISTIC?

Matthew Zack

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Tom White Sent: Tuesday, June 05, 2007 6:27 PM To: SAS-L@LISTSERV.UGA.EDU Subject: PROC FREQ--DATA STEP--MODELING QUESTION

Hello SAS-L,

The data below are already sorted (or can be sorted if need be) by ID (first) and CODE (second).

The data set of interest contains about 10 mil records and about 100 fields.

In this example, I show 20 obs and two fields of interest.

data foo; input ID $ CODE $; cards; 1271 . 1271 201 1435 . 1435 842 1435 842 1435 307 1435 307 1435 307 1435 309 1435 . 1434 . 8393 070 8393 070 8393 070 8393 070 8393 070 8393 070 8393 070 8393 070 8393 070 ; run;

I would like to produce two datasets like:

(1)

ID CODE COUNT PERCENT PCT_ROW PCT_COL 1271 1 1271 201 1 6.25 100 100 1435 3 1435 842 2 12.5 33.33 100 1435 307 3 18.75 50 100 1435 309 1 6.25 16.67 100 8393 070 9 56.25 100 100

This data set is easily created by using;

proc freq data=foo; tables ID * CODE/outpct out=stats; run;

However, when I run this PROC FREQ on the entire dataset of 10 mil obs, I get a message error

ERROR: The requested table is too large to process. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 10154176 observations read from the data set WORK.FOO. WARNING: The data set WORK.STATS may be incomplete. When this step was stopped there were 0 observations and 6 variables. WARNING: Data set WORK.STATS was not replaced because this step was stopped. NOTE: PROCEDURE FREQ used: real time 54.70 seconds cpu time 53.28 seconds

(2)

The other table I would like to get is

ID CODE COUNT PERCENT PCT_ROW PCT_COL 1271 1271 201 1 6.25 100 100 1435 1 1435 842 1 6.25 16.665 50 1435 842 1 6.25 16.665 50 1435 307 1 6.25 16.667 33.33 1435 307 1 6.25 16.667 33.33 1435 307 1 6.25 16.667 33.33 1435 309 1 6.25 16.67 100 1435 1 1434 1 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11 8393 070 1 6.25 11.11 11.11

This table is simply the same one as table (1) except that it is not rolled-up like table (1) is.

For example, in (1) I show,

ID CODE COUNT PERCENT PCT_ROW PCT_COL 1435 3 1435 842 2 12.5 33.33 100 1435 307 3 18.75 50 100 1435 309 1 6.25 16.67 100

That's 9 instances of ID 1435 as shown in tbale (2) above--not rolled-up.

So I am thinking, since I will need table (2) at some point for modeling, I need to keep (2) in it's original form and not like in form (1) coming out of PROC FREQ. yet, I still need to create the new modeling variables COUNT, PERCENT, PCT_ROW, and maybe PCT_COL as shown in (1) and populate these corect values in the form of table (2).

These new variables in (2) will become inputs to my logistic model I am trying to build.

So, then, since in (1) PERCENT=12.5 for ID=1435 having COUNT=2, then, I am thinking, maybe wrongly, that if I divide 12.5 by 2, I should put 6.25 in above table (2).

And so on with all the rest of the numbers, I just divide numbers in (1) by however many counts (COUNT) I have, and hopefully I get something like table (2).

Please give me some guidance as to how to create table (1) and then, for modeling purposes, how can I keep the original data intact, yet create new modeling fields like counts and percentages as shown above in (2).

Thank you.

tom

= Pedometers as Low as $1 - Free Shipping Huge Selection of Quality Brands Like, Yamax, Sportline, Freestyle, and More. Customer Logos, Free Shipping. Fast Delivery. http://a8-asy.a8ww.net/a8-ads/adftrclick?redirectid=1ed972f8bac8e902c6af baf95c7b7930

-- Get a free http://www.mail.com account & e-mail address today! Choose from over 100 personalized domains.


Back to: Top of message | Previous page | Main SAS-L page