|
Well, it looks like I can'r even get PROC FREQ to create the firts table,
as the error message below indicates. The second table is needed because
that's how my real data set looks like. When I apply PROC FREQ to the
second table (well, if anyone helps me with PROC FREQ), then I get the
firts table. PROC FREQ rolls up the rows in the second table and turns
them into the raws in the first table. So, for my purposes, I need to
create additianl predictive fields using counts and percentages for the
CODE variable. This CODE variable is avery important variable from a
business perspective, so we thought that we should use it to generate
more predictive fields out of it like counts and percentages. I thought
that if I could even genereate table (1) then maybe it would be easier to
genearte table (2). Any help, anyone? Thank you. tom
----- Original Message -----
From: "Zack, Matthew M. (CDC/CCHP/NCCDPHP)"
To: "Tom White"
Subject: RE: PROC FREQ--DATA STEP--MODELING QUESTION
Date: Wed, 6 Jun 2007 07:57:17 -0400
Why do you need your second table to run PROC LOGISTIC?
Matthew Zack
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Tom
White
Sent: Tuesday, June 05, 2007 6:27 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: PROC FREQ--DATA STEP--MODELING QUESTION
Hello SAS-L,
The data below are already sorted (or can be sorted if need be) by ID
(first) and CODE (second).
The data set of interest contains about 10 mil records and about 100
fields.
In this example, I show 20 obs and two fields of interest.
data foo;
input ID $ CODE $;
cards;
1271 .
1271 201
1435 .
1435 842
1435 842
1435 307
1435 307
1435 307
1435 309
1435 .
1434 .
8393 070
8393 070
8393 070
8393 070
8393 070
8393 070
8393 070
8393 070
8393 070
;
run;
I would like to produce two datasets like:
(1)
ID CODE COUNT PERCENT PCT_ROW PCT_COL
1271 1
1271 201 1 6.25 100 100
1435 3
1435 842 2 12.5 33.33 100
1435 307 3 18.75 50 100
1435 309 1 6.25 16.67 100
8393 070 9 56.25 100 100
This data set is easily created by using;
proc freq data=foo;
tables ID * CODE/outpct out=stats;
run;
However, when I run this PROC FREQ on the entire dataset of 10 mil
obs,
I get a message error
ERROR: The requested table is too large to process.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 10154176 observations read from the data set
WORK.FOO.
WARNING: The data set WORK.STATS may be incomplete. When this step
was
stopped there were 0
observations and 6 variables.
WARNING: Data set WORK.STATS was not replaced because this step was
stopped.
NOTE: PROCEDURE FREQ used:
real time 54.70 seconds
cpu time 53.28 seconds
(2)
The other table I would like to get is
ID CODE COUNT PERCENT PCT_ROW PCT_COL
1271
1271 201 1 6.25 100 100
1435 1
1435 842 1 6.25 16.665 50
1435 842 1 6.25 16.665 50
1435 307 1 6.25 16.667 33.33
1435 307 1 6.25 16.667 33.33
1435 307 1 6.25 16.667 33.33
1435 309 1 6.25 16.67 100
1435 1
1434 1
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
8393 070 1 6.25 11.11 11.11
This table is simply the same one as table (1) except that it is not
rolled-up like table (1) is.
For example, in (1) I show,
ID CODE COUNT PERCENT PCT_ROW PCT_COL
1435 3
1435 842 2 12.5 33.33 100
1435 307 3 18.75 50 100
1435 309 1 6.25 16.67 100
That's 9 instances of ID 1435 as shown in tbale (2) above--not
rolled-up.
So I am thinking, since I will need table (2) at some point for
modeling, I need to keep (2) in it's original form and not like in
form
(1) coming out of PROC FREQ. yet, I still need to create the new
modeling variables COUNT, PERCENT, PCT_ROW, and maybe PCT_COL as
shown
in (1) and populate these corect values in the form of table (2).
These new variables in (2) will become inputs to my logistic model I
am
trying to build.
So, then, since in (1) PERCENT=12.5 for ID=1435 having COUNT=2, then,
I
am thinking, maybe wrongly, that if I divide 12.5 by 2, I should put
6.25 in above table (2).
And so on with all the rest of the numbers, I just divide numbers in
(1) by however many counts (COUNT) I have, and hopefully I get
something
like table (2).
Please give me some guidance as to how to create table (1) and then,
for
modeling purposes, how can I keep the original data intact, yet
create
new modeling fields like counts and percentages as shown above in
(2).
Thank you.
tom
=
Pedometers as Low as $1 - Free Shipping
Huge Selection of Quality Brands Like, Yamax, Sportline, Freestyle,
and
More. Customer Logos, Free Shipping. Fast Delivery.
http://a8-asy.a8ww.net/a8-ads/adftrclick?redirectid=1ed972f8bac8e902c6af
baf95c7b7930
--
Get a free http://www.mail.com account & e-mail address today!
Choose from over 100 personalized domains.
|