Date: Tue, 7 Oct 2003 08:16:01 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: automating the reading in of variable number of columns
Domenic-
Another post warned of large numbers of missing values. Without consulting
the documentation I seem to remember that proc import reads the first 20
rows or something and decides whether the column is numeric or character.
Unlike Excel, a column can't be both although numbers can be stored as
characters. If a column with initially numeric values later has character
values and is read as numeric, the character values will be lost. This sort
of thing makes CSV format unstable, but if your data is generally clean and
uniform, as your sample was, with a little post-import clean up it will
likely work for you.
I've used it on numerous occasions with no problems. If you're working with
data from external sources it is good practice to independently validate
what you have input. If you have an idea of the distribution of the data in
your columns or the expected frequency of the dates or something like that
you should check them.
Glad to help
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: dude [mailto:dude@DUDES.COM]
Sent: Monday, October 06, 2003 5:56 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: automating the reading in of variable number of columns
Paul, Thank you very much, that worked great. When would have to use
infile statements, I'm new to PROC IMPORT, and am not familiar with its
shortcomings if any, any thoughts?
Domenic
Choate, Paul@DDS wrote:
> This works, it was generated by the import wizard:
>
> PROC IMPORT OUT= WORK.test
> DATAFILE= "C:\test.csv"
> DBMS=CSV REPLACE;
> GETNAMES=YES;
> DATAROW=2;
> RUN;
>
>
>
> The CONTENTS Procedure
>
> Data Set Name: WORK.TEST Observations: 4
> Member Type: DATA Variables: 22
>
> -----Engine/Host Dependent Information-----
>
> File Name: C:\WINDOWS\TEMP\SAS Temporary
> Files\_TD3144\test.sas7bdat
> Release Created: 8.0202M0
> Host Created: WIN_PRO
>
>
> -----Alphabetic List of Variables and Attributes-----
>
> # Variable Type Len Pos Format Informat
> ------------------------------------------------------------
> 1 time_t Num 8 0 BEST12. BEST32.
> 2 dow Char 3 152 $3. $3.
> 3 mon Char 3 155 $3. $3.
> 4 day Num 8 8 BEST12. BEST32.
> 5 time Char 8 158 $8. $8.
> 6 year Num 8 16 BEST12. BEST32.
> 7 _0_1P Num 8 24 BEST12. BEST32.
> 8 _1_1P Num 8 32 BEST12. BEST32.
> 9 _2_1P Num 8 40 BEST12. BEST32.
> 10 _3_1P Num 8 48 BEST12. BEST32.
> 11 _0_1Q Num 8 56 BEST12. BEST32.
> 12 _1_1Q Num 8 64 BEST12. BEST32.
> 13 _2_1Q Num 8 72 BEST12. BEST32.
> 14 _3_1Q Num 8 80 BEST12. BEST32.
> 15 _0_2V Num 8 88 BEST12. BEST32.
> 16 _1_2V Num 8 96 BEST12. BEST32.
> 17 _2_2V Num 8 104 BEST12. BEST32.
> 18 _3_2V Num 8 112 BEST12. BEST32.
> 19 _0_2W Num 8 120 BEST12. BEST32.
> 20 _1_2W Num 8 128 BEST12. BEST32.
> 21 _2_2W Num 8 136 BEST12. BEST32.
> 22 _3_2W Num 8 144 BEST12. BEST32.
>
>
> Paul Choate
> DDS Data Extraction
> (916) 654-2160
>
> -----Original Message-----
> From: dude [mailto:dude@DUDES.COM]
> Sent: Sunday, October 05, 2003 9:42 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: automating the reading in of variable number of columns
>
> I have many files that I'd like to read in to SAS, they each have
> variable number of columns, all have header rows. Sometimes the file has
> 16 columns, sometimes 22, othertimes 32.
>
> Sample data is as follows the first line is header row:
>
>
time_t,dow,mon,day,time,year,00-1P,01-1P,02-1P,03-1P,10-1Q,11-1Q,12-1Q,13-1Q
> ,40-2V,41-2V,42-2V,43-2V,50-2W,51-2W,52-2W,53-2W
>
1057017600,Tue,Jul,01,00:00:00,2003,12.0,17.0,24.0,23.0,15.0,18.0,16.0,10.0,
> 11.0,14.0,24.0,23.0,16.0,17.0,17.0,10.0
>
1057018500,Tue,Jul,01,00:15:00,2003,13.0,18.0,28.0,27.0,15.0,17.0,32.0,12.0,
> 13.0,15.0,27.0,27.0,15.0,15.0,33.0,13.0
>
1057019400,Tue,Jul,01,00:30:00,2003,11.0,13.0,22.0,22.0,16.0,13.0,13.0,7.0,1
> 0.0,12.0,22.0,22.0,16.0,12.0,14.0,7.0
>
1057020300,Tue,Jul,01,00:45:00,2003,9.0,11.0,23.0,22.0,10.0,9.0,10.0,5.0,8.0
> ,11.0,22.0,22.0,10.0,8.0,12.0,6.0
>
> I'd like to read in the header row as my variable names. I have done
> separate input statements, but that gets old after a while if you can't
> predict the number of columns.
>
> Is there some sample code on how to do this?
>
> I have looked at Tip # 274 @ http://www.sconsig.com/, but that does not
> name the columns as per the header row.
>
> Thanks in advance.
> Dude,
>
> Please reply to list.
|