| Date: | Wed, 14 Feb 2007 09:28:24 -0500 |
| Reply-To: | "data _null_;" <datanull@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "data _null_;" <datanull@GMAIL.COM> |
| Subject: | Re: Manuplating large data sat |
|
| In-Reply-To: | <200702140207.l1DLEcTt007900@mailgw.cc.uga.edu> |
| Content-Type: | text/plain; charset=ISO-8859-1; format=flowed |
I don't understand exactly what you are asking. What if any
significance do the numbers following _B have?
Here is a bit of code that normalizies the data. Perhaps it will be helpful.
data work.qtr1;
infile cards missover;
input Uid:$1. x_b2001 y_b2001 z_b2001 x_B2051 y_b2051 z_b2051
x_b2301 y_b2301 z_b2301;
cards;
a 1 2 5 . . . 3 5 6
b 3 4 5 6 7 4 . . .
c . . . 2 3 5 . .
;;;;
run;
data work.qtr2;
infile cards missover;
input Uid:$1. x_b2001 y_b2001 z_b2001 x_b2555 y_b2555 z_b2555
x_b2301 y_b2301 z_b2301;
cards;
a 1 2 5 . . . 3 5 6
b 3 4 5 . . . 4 7 9
c . . . 2 3 5 . .
;;;;
run;
data work.qtr / view=work.qtr;
set work.qtr1(in=in1) work.qtr2(in=in2);
quarter = in1*1 + in2*2;
run;
proc transpose
data = work.qtr
out = work.Tqtr1(where=(not missing(col1)))
;
by quarter uid;
var x: y: z:;
run;
data work.Tqtr1;
set work.Tqtr1;
attrib xyz length=$1. label='Variable group (x,y,z)';
attrib num length=8 label='Number part of variable name';
xyz = _name_;
num = input(substr(_name_,indexc(_name_,'bB')+1),f4.);
run;
proc print;
run;
On 2/13/07, mesecca L katram <mesecca@yahoo.com> wrote:
> Hi all
> Here I have a large data set it has some 2000+ columns
> the data is in the following format
> Let us call this data set QTR_1
>
>
> Uid x_b2001 y_b2001 z_b2001 x_b2051 y_b2051 z_b2051 x_b2301 y_b2301 z_b2301
> a 1 2 5 . . . 3 5 6
> b 3 4 5 6 7 4 . . .
> c . . . 2 3 5 . . .
>
> These is no pattern for the number for which x_b---- repeat but there 600
> sets of x_b---- y_b---- Z_b----
>
> I have to creat a column count cnt_b2001 cnt_b2501 cnt_b2301 for each of
> this and creat a data set
> and then roll this up the resultant should look like this
>
>
> x_b2001 y_b2001 z_b2001 cnt_2001 x_b2051 y_b2051 z_b2051 cnr_2051 x_b2301
> y_b2301 z_b2301 cnt_2301
> 4 6 10 2 8 10 9 2 3 5 6 1
>
> This simply rolling up the data and counintg the non missing value
>
>
> I have similar QTR_2, QTR_3 QTR_4 with almost 2000+ columns each
> for which the variables are same but columns differ
>
>
> QTR_2
>
> Uid x_b2001 y_b2001 z_b2001 x_b2555 y_b2555 z_b2555 x_b2301 y_b2301 z_b2301
> a 1 2 5 . . . 3 5 6
> b 3 4 5 . . . 4 7 9
> c . . . 2 3 5 . . .
>
>
> x_b2001 y_b2001 z_b2001 cnt_2001 x_b2555 y_b2555 z_b2555 cnt_2051 x_b2301
> y_b2301 z_b2301 cnt_2301
> 4 6 10 2 2 3 5 1 2 12 12 1
>
>
>
> Finally I need to find the count
> and display the report
> Quater x y z count
> QTR_1
> QTR_2
> QTR_3
> QTR_4
>
> as X y Z patterns repaeat 600 times I cannot hardcode these
>
> What do you think is the most feasible way to do it
> roll up by each qtr and then stack these the data might be in Giga bytes if
> I stack before I roll upp
> Please help
> Thanks In advance
>
|