LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (September 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 8 Sep 2004 17:27:55 -0400
Reply-To:     "Miller, Jeremy T." <zyp9@CDC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Miller, Jeremy T." <zyp9@CDC.GOV>
Subject:      Re: "Back up" one record during read of complex data set parsing
Content-Type: text/plain; charset="us-ascii"

Have you checked the archives. Here is one that may be particularly helpful:

Subject: Re: Parsing Text File into separate cols. From: Ian Whitlock <[PRIVACY PROTECTION]> Date: Wed, 11 Dec 2002 11:22:44 -0500

------------------------------------------------------------------------ -------- Rashida,

You present an interesting problem. I suspect that the line "Providers:" does not give a provider, but has a provider on the following line is an indication of incomplete about the organization of the file.

I will assume "Providers:" has at most one provider following. The same question arises about "Specialty(ies):" - what does the situation look like when there is more than one? I assume whatever is only on one line. I did add a second provider in the first case to see how the program would handle it.

When faced with a messy reading problem it is often best to simplify by reducing the data to a more manageable and then obtaining the final data set. In this, case one problem is identifying a logical record. I assumed ever logical record begins with "Group NAME:" and that line is always present.

The next problem is the use of quotes some times. The DSD option can handle both situations, so I turned it into a DSD problem with a delimiter "FF"X which presumably is never in the file. (Hey, Michael! Is this a sleazy trick?)

Hopefully this is enough to understand the logic of the program. If not just ask questions. After you look more closely at the provider/specialty problem you may find the program easy to fit the situation. If not ask more questions. Here is the program.

data w ; retain seq ; length line $ 100 ; infile cards dsd dlm="ff"x ; input line :$char100. ; if line = "Providers:" then do ; input line :$char100. ; line = "Providers: " || line ; end ; if upcase(line) =: "GROUP NAME:" then seq + 1 ; cards ; "Group Name: David G. Parker, DDS, PA" "Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813) 685-5611" Office Status: Accepting New Patients Providers: "Parker, David G., DDS" "Parker's Brother" Primary Office #: 112716 Specialty(ies): General Practice - Dental Group Name: Abdoney Periodontics and Implant Surgery "Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511 (813) 684-5554" Office Status: Accepting New Patients Providers: "Abdoney, Mark Allen, DMD" Specialty(ies): Periodontics ;

data q ( keep = gpname addr primoffice officestat providers spec prob ); length test $ 20 rest gpname addr primoffice officestat providers spec prob $ 100 ; do until ( last.seq ) ; set w ; by seq ; x = index ( line , ":" ) ; if x > 0 then do ; test = substr ( line , 1 , x ) ; rest = substr ( line , x + 2 ) ; end ; else do ; test = "problem" ; rest = line ; end ; select ( upcase(test) ) ; when ( "GROUP NAME:" ) gpname = rest ; when ( "ADDRESS/PHONE:" ) addr = rest ; when ( "PRIMARY OFFICE #:" ) primoffice = rest ; when ( "OFFICE STATUS:" ) officestat = rest ; when ( "PROVIDERS:" ) providers = rest ; when ( "SPECIALTY(IES):" ) spec = rest ; OTHERWISE PROB = LINE ; end ; end ; run ;

[PRIVACY PROTECTION]

-----Original Message----- From: Rashida Patwa [mailto:[PRIVACY PROTECTION]] Sent: Wednesday, December 11, 2002 10:18 AM Subject: Parsing Text File into separate cols.

Hi, need some help to parse this text file into separate cols. I have showed 2 records and the rest of the records are in the same pattern. I have colored text blue for record 1 and colored green for record 2. I need these info into cols.

eg: group name street addr city state zip phone doc name doc # Specialty

This text file is a variable length. How can I do this? The file has over 1000 docs with 8-9 lines per doc. Any help would be appreciated.

Thanks.

"Group Name: David G. Parker, DDS, PA" "Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813) 685-5611" Office Status: Accepting New Patients Providers: "Parker, David G., DDS" Primary Office #: 112716 Specialty(ies): General Practice - Dental Group Name: Abdoney Periodontics and Implant Surgery "Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511 (813) 684-5554" Office Status: Accepting New Patients Providers: "Abdoney, Mark Allen, DMD" Specialty(ies): Periodontics

Rashida Patwa

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of jsl Sent: Wednesday, September 08, 2004 12:16 PM To: SAS-L@LISTSERV.UGA.EDU Subject: "Back up" one record during read of complex data set parsing

I am trying to read a bunch of data from a big text file and parse some important information out of it. The top of each record has a "code word" that I can key off until I get to the next occurance of that code word. I want to keep some of the lines in between the code words and dump others. I am struggling, however, with one problem. I am using a do until loop that does everything in the loop until it reaches the next occurance of the code word. The problem is I then lose that record that contains the code (which is important since I key off that record to parse much of the data). For example, here's what my code looks like: if LINE="CODEWORD" then do; input @2 ID $char10.; do until (LINE2="CODEWORD"); input @2 LINE2 $char10. @; if LINE2="WANT" then do; VAR1=substr(LINE2,1,3); VAR2=substr(LINE2,4,3); input DUMP $3.; end; else do; input DUMP $3.; end; end;

It partially works in that it gets my data for every other occurance of CODEWORD. It misses every other one because when the until loop advances the record to check for the code word, I then no longer have the record with the code word in it for the original IF statement. There is nothing else in the text file that I can key off of to know that I am approaching the next record.

Anybody know what I am doing wrong?

Thanks,

Jim


Back to: Top of message | Previous page | Main SAS-L page