Date: Fri, 12 Oct 2001 08:18:00 -0700
Reply-To: "Huang, Ya" <ya.huang@PFIZER.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Huang, Ya" <ya.huang@PFIZER.COM>
Subject: Re: reading hierarchical file w. varying number of sub-records
Content-Type: multipart/alternative;
Jim,
Here is my solution:
data xx;
input v $;
cards;
126
216
236
173
284
345
356
147
238
221
231
243
;
data xx;
set xx;
length f $1 grpc $5;
retain grp grpc subgi;
if substr(v,1,1)='1' then do; grp+1; grpc=v; end;
subg=substr(v,1,1);
subv=substr(v,2,1);
f='x';
output;
subg=substr(v,1,1);
subv=substr(v,3,1);
f='y';
output;
proc sort;
by grp f subg;
data xx;
set xx;
length vn $20;
retain subgi;
by grp f subg;
if first.subg then subgi=0;
subgi+1;
vn=compress('r'||subg||f||put(subgi,best.));
proc transpose out=yy (drop=grp grpc _name_);
by grp grpc;
var subv;
id vn;
options nocenter ls=64;
proc print heading=v noobs;
run;
--------------------------------------------
r r r r r r r r r r r r r r
1 2 2 1 2 2 3 3 3 3 2 2 2 2
x x x y y y x x y y x x y y
1 1 2 1 1 2 1 2 1 2 3 4 3 4
2 1 3 6 6 6
7 8 3 4 4 5 5 6
4 3 2 7 8 1 3 4 1 3
I added a type 3 record for testing. The result is all
character, it should not be very hard to convert them
to numeric. The order of the vars in the output is
different from your sample. It should not be a problem
since they got the correct name.
I hope it give you a hint.
Regards,
Ya Huang
-----Original Message-----
From: Jim Moody [mailto:moody.77@OSU.EDU]
Sent: Thursday, October 11, 2001 10:26 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: reading hierarchical file w. varying number of sub-records
Hello -
Thanks in advance for any help you can give. I have an ascii data file that
is hierarchical, with a varying number of lines for each case. That is,
each case has a 'type1' record, but may have 0 to ?? type 2, type 3 etc.
records. An example of the type of file would be something like:
/* data file: 3 main cases, with a varying number of type 2 records.
Substantive variables are listed in the next two columns. */
126
216
236
173
284
147
238
221
231
243
The type of record is indexed by the first column, and I want to create a
single record for each case, with the multiple 'type2' records falling into
an array. Thus, the resulting data set should look something like:
R1X1 R1Y1 R2X1 R2X2 R2X3 R2X4 R2Y1 R2Y2 R2Y3 R2Y4
2 6 1 3 . . 6 6 .
.
7 8 . . . . 4
. . .
4 7 3 2 3 4 8 1
1 3
where each column is a variable. I'm using the following code as a start,
which doesn't work.
data a;
infile 'c:\moody\data\hdattst.txt'; /* bring in the data */
input type 1 @;
array r2x (5) r2x1-r2x5; /* 5 is the maximum number of type 2 records in
any case */
array r2z (5) r2z1-r2z5;
if type=1 then do; /* input statements for type 1 lines */
input r1x1 2 r1z1 3;
end;
else if type=2 then do; /* input statements for type 2 lines */
do j=1 to 5 until (type^=2); /* loop until we no longer have a type2
record */
input type 1 @;
if type=2 then do;
input r2x(j) 2 r2z(j) 3;
end;
end;
end;
drop type;
run;;
If I break the code into parts, reading a new data step for records of type
1 a second for records of type 2, etc., then I can get the sub-pieces of
code to work (then I have to re-merge the data). While this strategy works,
it's ugly.
Any suggestions?
Thanks again for the help.
Peaceful thoughts,
Jim
James Moody
Assistant Professor
Department of Sociology
Ohio State University
300 Bricker Hall
190 N. Oval Mall
Columbus, OH 43210
(614) 292-1722
Moody.77@osu.edu
[text/html]