Date: Thu, 24 Jul 2008 10:25:31 -0400
Reply-To: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Subject: Re: Reading multiple text files into one dataset
In-Reply-To: <556804.64076.qm@web55913.mail.re3.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1
if you are reading varying length records in Windows
and using lrecl
then you also need to use the infile option
pad.
Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov
> -----Original Message-----
> From: owner-sas-l@listserv.uga.edu
> [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of dave crimkey
> Sent: Thursday, July 24, 2008 9:00 AM
> To: Arthur Tabachneck; data _null_,
> Cc: SAS-L@listserv.uga.edu
> Subject: Re: Reading multiple text files into one dataset
>
> Thanks. I'm having other problems with the files as well. I
> was trying different methods to read them and decided that
> I'd better try reading just one for now. So I did that with
> the following code (listing out all the variables to be
> safe). Each VAR is a very precise value (about 22 dec
> places). If I look at the file in a text editor, the maximum
> length of a record is 3090 but I keep getting the Truncated
> message and the last variable VAR129 will not read in. In
> addition only half the file is reading in. So I should be
> reading 300 records in and only 150 records are coming in.
> I'm not sure what's going on with my infile statement or the
> file itself but I won't be able to read all the files in
> until I can get one file in.
>
> DATA A;
> INFILE "C:\SasFiles\A_102m_Bx_1.txt" lrecl=3090 DLM='09'x
> missover dsd;
> INPUT VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10
> VAR11 VAR12 VAR13 VAR14 VAR15 VAR16 VAR17 VAR18 VAR19 VAR20
> VAR21 VAR22 VAR23 VAR24 VAR25 VAR26 VAR27 VAR28 VAR29 VAR30
> VAR31 VAR32 VAR33 VAR34 VAR35 VAR36 VAR37 VAR38 VAR39 VAR40
> VAR41 VAR42 VAR43 VAR44 VAR45 VAR46 VAR47 VAR48 VAR49 VAR50
> VAR51 VAR52 VAR53 VAR54 VAR55 VAR56 VAR57 VAR58 VAR59 VAR60
> VAR61 VAR62 VAR63 VAR64 VAR65 VAR66 VAR67 VAR68 VAR69 VAR70
> VAR71 VAR72 VAR73 VAR74 VAR75 VAR76 VAR77 VAR78 VAR79 VAR80
> VAR81 VAR82 VAR83 VAR84 VAR85 VAR86 VAR87 VAR88 VAR89 VAR90
> VAR91 VAR92 VAR93 VAR94 VAR95 VAR96 VAR97 VAR98 VAR99 VAR100
> VAR101 VAR102 VAR103 VAR104 VAR105 VAR106 VAR107 VAR108 VAR109 VAR110
> VAR111 VAR112 VAR113 VAR114 VAR115 VAR116 VAR117 VAR118 VAR119 VAR120
> VAR121 VAR122 VAR123 VAR124 VAR125 VAR126 VAR127 VAR128 VAR129 ;
> RUN;
>
>
> --- On Thu, 7/24/08, data _null_, <datanull@gmail.com> wrote:
>
> From: data _null_, <datanull@gmail.com>
> Subject: Re: Reading multiple text files into one dataset
> To: "Arthur Tabachneck" <art297@netscape.net>, "dave crimkey"
> <d_crimkey@yahoo.com>
> Cc: SAS-L@listserv.uga.edu
> Date: Thursday, July 24, 2008, 8:41 AM
>
> Art, I think what is implied in Dave's message is that all the files
> can be read with the same input statement, creating a few extra
> variables derived from the filename along the way.
>
> Therefore, it is unnecessary to read them using separate data steps.
> The following program creates some files with names as described by
> Dave, then reads the files using two different methods that are really
> mostly the same.
>
> 1) using a filename with a wild card reference or
> 2) using a file of file names and filevar infile option.
>
> In my second example I read the "file of file names" directly from
> Window's DIR command through the "PIPE".
>
> ** 0) Create some files, Dave will not need this;
> filename mydoc "C:\Documents and Settings\&sysuserid\My
> Documents";
> data _null_;
> length Path Command Filename $256;
> path = pathname('mydoc');
> putlog 'NOTE: ' path=;
> path = catx('\',path,'Dave Crimkey');
> command = catx('
> ','mkdir',quote(strip(path)),'2>&1');
> infile dummy1 pipe filevar=command eof=eof;
> input;
> putlog _infile_;
> return;
> eof:
> do subject = 102 to 104 by 1;
> sex = substr('mf',rantbl(12345,.5),1);
> do l = 'x','y';
> do g = 1 to 2;
> filename = cats('A',subject,sex,'_B',l,g);
> putlog 'NOTE: ' filename=;
> filename = cats(path,'\',filename,'.csv');
> file dummy2 filevar=filename dlm=',' dsd lrecl=1024;
> array v[8] (1:8);
> do i = 1 to 3;
> put v[*];
> end;
> end;
> end;
> end;
> stop;
> run;
>
> ** 1) Read them using wildcard, *.CSV in the directory;
> filename allcsv "C:\Documents and Settings\&sysuserid\My
> Documents\Dave Crimkey\*.csv";
> data subjects1;
> length name $16 subjid $4 gender $1 group $3;
> retain name subjid gender group;
> length filename $256;
> infile allcsv filename=filename lrecl=1024 eov=eov dsd dlm=',';
> array v[8];
> input v[*];
> if _n_ eq 1 or eov then do;
> eov = 0;
> name = scan(filename,-2,'.\');
> subjid = substr(name,1,4);
> gender = substr(name,5,1);
> group = substr(name,7,3);
> end;
> run;
> proc contents varnum;
> proc print;
> run;
>
> ** 2) read the files using FILEVAR from piped DIR;
> filename dircsv "C:\Documents and Settings\&sysuserid\My
> Documents\Dave Crimkey";
> data subjects2;
> length command path filename $256;
> path = pathname('DIRCSV');
> command = catx(' ','DIR /b',quote(strip(path)));
> infile dummy1 pipe filevar=command truncover;
> length name $16 subjid $4 gender $1 group $3;
> input name;
> filename = catx('\',path,name);
> name = scan(filename,-2,'.\');
> subjid = substr(name,1,4);
> gender = substr(name,5,1);
> group = substr(name,7,3);
>
> infile dummy2 filevar=filename lrecl=1024 end=eof dsd dlm=',';
> do until(eof);
> array v[8];
> input v[*];
> output;
> end;
> drop path;
> run;
> proc contents varnum;
> proc print;
> run;
>
> proc compare base=subjects1 compare=subjects2;
> run;
>
> data _null_;
> command = catx(' ','RMDIR /Q
> /S',quote(pathname('dircsv')),'2>&1');
> infile dummy pipe filevar=command;
> input;
> list;
> run cancel; *remove cancel to delete DIRCSV;
>
>
>
> On 7/23/08, Arthur Tabachneck <art297@netscape.net> wrote:
> > Dave,
> >
> > See if a method similar to the one shown at http://xrl.us/okm9z will
> > suffice. You can skip the first part (i.e. using a pipe to
> build a file
> > with the file names) since you already created that file.
> >
> > Parsing the file name can be done by using the substr function.
> >
> > HTH,
> > Art
> > --------
> > On Wed, 23 Jul 2008 19:11:58 -0700, dave crimkey
> <d_crimkey@YAHOO.COM>
> > wrote:
> >
> > >I have about 500 text files each containing about 80 variables
> > representing measurements.? The text files are named with
> subject number,
> > sex and group number.? Each subject has four files
> associated with him or
> > her.? For example, file A102m_Bx1.txt would be subject
> number A102, a
> > male, and group Bx1.? Similarly there would be A102m_Bx2.txt,
> > A102m_By1.txt, A102m_By2..txt and so on.? I'd like to read
> all of the
> > files into one dataset while creating three new variables
> representing
> > subject number, sex and group.? I've stored all the filenames in a
> file
> > but I'm not sure how to read them all into the same dataset and add
> the
> > few identifying variables.
> > >?
> > >Thanks,
> > >Dave
> >
>
>
>
>
>
|