Date: Wed, 8 Sep 2004 17:27:55 -0400
Reply-To: "Miller, Jeremy T." <zyp9@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Miller, Jeremy T." <zyp9@CDC.GOV>
Subject: Re: "Back up" one record during read of complex data set parsing
Content-Type: text/plain; charset="us-ascii"
Have you checked the archives. Here is one that may be particularly
helpful:
Subject: Re: Parsing Text File into separate cols.
From: Ian Whitlock <[PRIVACY PROTECTION]>
Date: Wed, 11 Dec 2002 11:22:44 -0500
------------------------------------------------------------------------
--------
Rashida,
You present an interesting problem. I suspect that the line
"Providers:"
does not give a provider, but has a provider on the following line is an
indication of incomplete about the organization of the file.
I will assume "Providers:" has at most one provider following. The same
question arises about "Specialty(ies):" - what does the situation look
like
when there is more than one? I assume whatever is only on one line. I
did
add a second provider in the first case to see how the program would
handle
it.
When faced with a messy reading problem it is often best to simplify by
reducing the data to a more manageable and then obtaining the final data
set. In this, case one problem is identifying a logical record. I
assumed
ever logical record begins with "Group NAME:" and that line is always
present.
The next problem is the use of quotes some times. The DSD option can
handle
both situations, so I turned it into a DSD problem with a delimiter
"FF"X
which presumably is never in the file. (Hey, Michael! Is this a sleazy
trick?)
Hopefully this is enough to understand the logic of the program. If not
just ask questions. After you look more closely at the
provider/specialty
problem
you may find the program easy to fit the situation. If not ask more
questions. Here is the program.
data w ;
retain seq ;
length line $ 100 ;
infile cards dsd dlm="ff"x ;
input line :$char100. ;
if line = "Providers:" then
do ;
input line :$char100. ;
line = "Providers: " || line ;
end ;
if upcase(line) =: "GROUP NAME:" then seq + 1 ;
cards ;
"Group Name: David G. Parker, DDS, PA"
"Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813)
685-5611"
Office Status: Accepting New Patients
Providers:
"Parker, David G., DDS"
"Parker's Brother"
Primary Office #: 112716
Specialty(ies): General Practice - Dental
Group Name: Abdoney Periodontics and Implant Surgery
"Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511
(813)
684-5554"
Office Status: Accepting New Patients
Providers:
"Abdoney, Mark Allen, DMD"
Specialty(ies): Periodontics
;
data q ( keep = gpname addr primoffice officestat providers spec prob );
length test $ 20
rest gpname addr primoffice officestat
providers spec prob $ 100
;
do until ( last.seq ) ;
set w ;
by seq ;
x = index ( line , ":" ) ;
if x > 0 then
do ;
test = substr ( line , 1 , x ) ;
rest = substr ( line , x + 2 ) ;
end ;
else
do ;
test = "problem" ;
rest = line ;
end ;
select ( upcase(test) ) ;
when ( "GROUP NAME:" ) gpname = rest ;
when ( "ADDRESS/PHONE:" ) addr = rest ;
when ( "PRIMARY OFFICE #:" ) primoffice = rest ;
when ( "OFFICE STATUS:" ) officestat = rest ;
when ( "PROVIDERS:" ) providers = rest ;
when ( "SPECIALTY(IES):" ) spec = rest ;
OTHERWISE PROB = LINE ;
end ;
end ;
run ;
[PRIVACY PROTECTION]
-----Original Message-----
From: Rashida Patwa [mailto:[PRIVACY PROTECTION]]
Sent: Wednesday, December 11, 2002 10:18 AM
Subject: Parsing Text File into separate cols.
Hi, need some help to parse this text file into separate cols. I have
showed 2 records and the rest of the records are in the same pattern. I
have colored text blue for record 1 and colored green for record 2.
I need these info into cols.
eg: group name street addr city state zip phone
doc
name doc # Specialty
This text file is a variable length. How can I do this? The file has
over
1000 docs with 8-9 lines per doc.
Any help would be appreciated.
Thanks.
"Group Name: David G. Parker, DDS, PA"
"Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813)
685-5611"
Office Status: Accepting New Patients
Providers:
"Parker, David G., DDS"
Primary Office #: 112716
Specialty(ies): General Practice - Dental
Group Name: Abdoney Periodontics and Implant Surgery
"Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511
(813)
684-5554"
Office Status: Accepting New Patients
Providers:
"Abdoney, Mark Allen, DMD"
Specialty(ies): Periodontics
Rashida Patwa
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of jsl
Sent: Wednesday, September 08, 2004 12:16 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: "Back up" one record during read of complex data set parsing
I am trying to read a bunch of data from a big text file and parse some
important information out of it. The top of each record has a "code
word" that I can key off until I get to the next occurance of that code
word. I want to keep some of the lines in between the code words and
dump others. I am struggling, however, with one problem. I am using a
do until loop that does everything in the loop until it reaches the next
occurance of the code word. The problem is I then lose that record that
contains the code (which is important since I key off that record to
parse much of the data). For example, here's what my code looks like:
if LINE="CODEWORD" then do;
input @2 ID $char10.;
do until (LINE2="CODEWORD");
input @2 LINE2 $char10. @;
if LINE2="WANT" then do;
VAR1=substr(LINE2,1,3);
VAR2=substr(LINE2,4,3);
input DUMP $3.;
end;
else do;
input DUMP $3.;
end;
end;
It partially works in that it gets my data for every other occurance of
CODEWORD. It misses every other one because when the until loop
advances the record to check for the code word, I then no longer have
the record with the code word in it for the original IF statement.
There is nothing else in the text file that I can key off of to know
that I am approaching the next record.
Anybody know what I am doing wrong?
Thanks,
Jim