| Date: | Wed, 3 Oct 2001 15:16:20 -0400 |
| Reply-To: | muon33@nyc.rr.com |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Michael Stuart <muon33@HOTMAIL.COM> |
| Subject: | Re: Text File Import Problem |
|
| Content-Type: | text/plain; format=flowed |
|---|
Mark (and SAS-L) -- thanks, I'm still having the same problem. I've
adopted some of your coding. First, here's the program:
filename epilist "H:\Vendors\Ffx\MailingTest\EPI-REG-OPEN-100K.TXT" ;
data ffx.epilist (keep = email domain) ;
length email $ 50 ;
infile epilist length=lenvar ;
input @1 email varying. lenvar ;
email = left(trim(lowcase(email))) ;
domain = substr(email,(index(email,'@')+1)) ;
run ;
proc print data=ffx.epilist (obs=10) ;
run ;
THIS GENERATES THE FOLLOWING LOG:
85 data ffx.epilist (keep = email domain) ;
86 length email $ 50 ;
87 infile epilist length=lenvar ;
88 input @1 email varying. lenvar ;
89
90 email = left(trim(lowcase(email))) ;
91
92 domain = substr(email,(index(email,'@')+1)) ;
93
94 run ;
NOTE: The infile EPILIST is:
File Name=H:\Vendors\Ffx\MailingTest\EPI-REG-OPEN-100K.TXT,
RECFM=V,LRECL=256
NOTE: 7771 records were read from the infile EPILIST.
The minimum record length was 43.
The maximum record length was 256.
One or more lines were truncated.
NOTE: The data set FFX.EPILIST has 7771 observations and 2 variables.
NOTE: Compressing data set FFX.EPILIST increased size by 6.19 percent.
Compressed is 103 pages; un-compressed would require 97 pages.
NOTE: DATA statement used:
real time 3.68 seconds
95
96 proc print data=ffx.epilist (obs=10) ;
97
98 run ;
NOTE: There were 10 observations read from the data set FFX.EPILIST.
NOTE: PROCEDURE PRINT used:
real time 0.10 seconds
A NOTE ABOUT THIS LOG -- there are 100K records in this file, not 7771.
Here's the output from the print step:
The SAS System 14:56 Wednesday,
October 3, 2001 2
Obs email
1 tabernathy@andonet.com
mjkarv@pacifier.com
chrismc
2 kelbriney@earthlink.net
justineschubert@hotmail.co
3 walterj@adelphia.net
sheila_dubin@hotmail.com
phca
4 dbparagon@msn.com
rita@simalfa.com
etambrose@yahoo
5 christmas@paradise.net.nz
yvonne.graser@foodbrands
6 churchill25@rcn.com
gabo@enter.net.mx
nanzo@datasy
7 theresa_kuhlman@agilent.com
zelgroup1@mindspring.c
8 nurse37@carolina.net
gsirett@tiaa-cref.org
htamvad
9 arouge@ait-applied.com
bonneau@vvm.com
perelk@eart
10 linhem@epix.net
desertdawn@281.com
jcsimo@netzero.
Obs domain
1 andonet.com
mjkarv@pacifier.com
chrismc
2 earthlink.net
justineschubert@hotmail.co
3 adelphia.net
sheila_dubin@hotmail.com
phca
4 msn.com
rita@simalfa.com
etambrose@yahoo
5 paradise.net.nz
yvonne.graser@foodbrands
6 rcn.com
gabo@enter.net.mx
nanzo@datasy
7 agilent.com
zelgroup1@mindspring.c
8 carolina.net
gsirett@tiaa-cref.org
htamvad
9 ait-applied.com
bonneau@vvm.com
perelk@eart
10 epix.net
desertdawn@281.com
jcsimo@netzero.
A NOTE ABOUT THE OUTPUT - when I view in SAS software output window, there
are no line break within each observation -- I see a series of email
addresses 'glommed' together, separated by a hollow, square box (somethign
non-printable). When I cut & paste the output into any other application
(like IE/Hotmail), the hollow, square boxes become line breaks.
I looked at teh input file with a hex edit, each record on each line is
followed by 0D (CR) 0A (LF). I've tried the infile statement with both
truncover and missover options -- and I'm getting the same results.
This is driving me nuts! Any others ideas?
Thanks ...
>From: "Terjeson, Mark" <TerjeMW@dshs.wa.gov>
>To: "'Mike Stuart'" <muon33@NYC.RR.COM>, SAS-L@LISTSERV.UGA.EDU
>Subject: RE: Text File Import Problem
>Date: Wed, 3 Oct 2001 10:57:47 -0700
>
>Hi Mike,
>
>
> * make sample data ;
>data _null_;
> file 'c:\temp\abc.txt';
> put 'test1@dkadk.com';
> put '3234dk@efg.com';
> put 'jieuw@lmdkadk.com';
> put '3234dk@efg.com';
>run;
>
>
> * Read variable line length flat file ;
>data table1(keep=myline);
> length myline $ 200;
> infile 'c:\temp\abc.txt' length=lenvar;
> input @1 myline $varying. lenvar;
>run;
>
>
> * adding your goodies ;
>data table1(keep=email domain);
> length email $ 50;
> infile 'c:\temp\abc.txt' length=lenvar;
> input @1 email $varying. lenvar;
> email = left(trim(lowcase(email))) ;
> domain = substr(email,(index(email,'@')+1)) ;
>run;
>
>
> * another variation using SCAN() ;
> * to replace INDEX() and SUBSTR() ;
>data table1(keep=email domain);
> length email $50;
> infile 'c:\temp\abc.txt' length=lenvar;
> input @1 email $varying. lenvar;
> email = left(trim(lowcase(email))) ;
> domain = scan(email,2,'@') ;
>run;
>
>
>Hope this is helpful,
>Mark Terjeson
>Washington State Department of Social and Health Services
>Division of Research and Data Analysis (RDA)
>mailto:terjemw@dshs.wa.gov
>
>
>
>-----Original Message-----
>From: Mike Stuart [mailto:muon33@NYC.RR.COM]
>Sent: Wednesday, October 03, 2001 9:29 AM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Text File Import Problem
>
>
>Having problems with a straight-forward import, I think the problem
>has to do with non-viz characters, but I'm not sure. When viewed, the
>file I'm trying to read in is relatively straight-forward, one email
>address per line. I'm using the code below to read in this file.
>
>data ffx.epilist ;
> infile epilist truncover ;
> input @1 email $50. ;
>
> email = left(trim(lowcase(email))) ;
>
> domain = substr(email,(index(email,'@')+1)) ;
>
> run ;
>
>The output however looks like this:
>
>
> obs email domain
> 1 test1@dkadk.com 3234dk@efg.com jieuw@lm dkadk.com
>3234dk@efg.com
>
>etc.
>
>It looks like the line delimiter is missing. Suggestions on how to
>fix? I've tried the import wizard using a number of different
>delimiter option and am getting the same result.
>
>Thanks -
>
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
|