Date: Wed, 23 Jun 2004 11:57:51 -0600
Reply-To: Alan Churchill <EmailMeDirectly@ThisWebSite-erratix.us>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Alan Churchill <EmailMeDirectly@THISWEBSITE-ERRATIX.US>
Subject: Re: Problems to read unstructured text into a dataset
Here's an alternate way to approach. There are some subtleties here that
need to be worked out but hopefully it is less confusing than the way you
have listed:
data test ;
attrib record length=$2048 ;
input ;
retain record ;
if _infile_ =: "---" then
do ;
saksnr = scan(record,2,":") ;
arkiv = scan(record,4,":") ;
output ;
record = "" ;
end ;
else
record = trim(record) || trim(_infile_) ;
datalines4;
SAKSNR.: 1991/1
ARKIV: RAPPORTARKIV
TITTEL: GEOLOGICAL COMLETION REPORT PL 098 WELL 7120/10-2 NOVEMBER
1990
SAKSDATO: 02.01.1991 SISTE DOK.: 02.01.1991 ANT.DOK: 1
SANSV.: OD/ODH/KT
DOKNR.: 1 JAVD: ODJ SBH: OD/ODH/K
JDATO:02.01.1991 UOFF 5 A
AVS/MOT: ESSO
DOK.TITTEL: GEOLOGICAL COMLETION REPORT PL 098 WELL 7120/10-2
AVSKR.DOKNR.: DATO: 02.01.1991
AVSKR. MÅTE: X
----------------------------------------------------------------------------
-------------------
SAKSNR.: 1991/2
ARKIV: RAPORTARKIV NR 9
TITTEL: RESERVOARTEKNISK RAPPORT STATFJORD NORD 1 EKS
SAKSDATO: 02.01.1991 SISTE DOK.: 02.01.1991 ANT.DOK: 1
SANSV.: OD/ODH/RO
DOKNR.: 1 JAVD: ODJ SBH: OD/ODH/R
JDATO:02.01.1991 UOFF FORTR
AVS/MOT: OD-S
DOK.TITTEL: RESERVOARTEKNISK RAPPORT STATFJORD NORD 1 EKS
AVSKR.DOKNR.: DATO: 02.01.1991
AVSKR. MÅTE: X
----------------------------------------------------------------------------
-------------------
;;;;
run;
--
Alan Churchill
Savian
"Bridging SAS and Microsoft technologies"
(719) 687-5954
"Rune Runnestoe" <rune@fastlane.no> wrote in message
news:24410121.0406230941.17b2b965@posting.google.com...
> Hi All,
>
> This is how the code looks like:
> ---------------------------------
>
> data test;
> infile cards dlm=":" dsd;
> length fname $40 value $100;
> if _n_=1 then do;
> recnum=0;***Create a record counter;
> end;
> input @1 _check_ $1.@;
> if _check_="-" then do;
> delete;***Get rid of the ------ lines;
> end;
> else do;
> input @1 fname $ value $;
> if fname='SAKSNR.' then recnum=recnum+1;***Each record begins with
> SAKSNR;
> end;
> drop _check_;
> retain recnum;***Keep the counter;
> cards;
> --------------------------------------------------------------------------
---------------------
> SAKSNR.: 1991/1
> ARKIV: RAPPORTARKIV
> TITTEL: GEOLOGICAL COMLETION REPORT PL 098 WELL 7120/10-2 NOVEMBER
> 1990
> SAKSDATO: 02.01.1991 SISTE DOK.: 02.01.1991 ANT.DOK: 1
> SANSV.: OD/ODH/KT
> DOKNR.: 1 JAVD: ODJ SBH: OD/ODH/K
> JDATO:02.01.1991 UOFF 5 A
> AVS/MOT: ESSO
> DOK.TITTEL: GEOLOGICAL COMLETION REPORT PL 098 WELL 7120/10-2
> AVSKR.DOKNR.: DATO: 02.01.1991
> AVSKR. MÅTE: X
> --------------------------------------------------------------------------
---------------------
> SAKSNR.: 1991/2
> ARKIV: RAPORTARKIV NR 9
> TITTEL: RESERVOARTEKNISK RAPPORT STATFJORD NORD 1 EKS
> SAKSDATO: 02.01.1991 SISTE DOK.: 02.01.1991 ANT.DOK: 1
> SANSV.: OD/ODH/RO
> DOKNR.: 1 JAVD: ODJ SBH: OD/ODH/R
> JDATO:02.01.1991 UOFF FORTR
> AVS/MOT: OD-S
> DOK.TITTEL: RESERVOARTEKNISK RAPPORT STATFJORD NORD 1 EKS
> AVSKR.DOKNR.: DATO: 02.01.1991
> AVSKR. MÅTE: X
> --------------------------------------------------------------------------
---------------------
> run;
>
>
> The result should be like this:
> --------------------------------
> SAKSNR 1991/1
> ARKIV RAPPORTARKIV
> TITTEL GEOLOGICAL COMLETION REPORT PL 098 WELL 7120/10-2
> NOVEMBER 1990
> SAKSDATO 02.01.1991
> SISTE DOK 02.01.1991
> ANT.DOK 1
> SANSV OD/ODH/KT
> DOKNR 1
> JAVD ODJ
> SBH OD/ODH/K
> JDATO 02.01.1991
> UOFF 5 A
> AVS/MOT ESSO
> DOK.TITTEL GEOLOGICAL COMLETION REPORT PL 098 WELL 7120/10-2
> AVSKR.DOKNR
> DATO 02.01.1991
> AVSKR. MÅTE X
> SAKSNR 1991/2
> ARKIV RAPORTARKIV NR 9
> TITTEL RESERVOARTEKNISK RAPPORT STATFJORD NORD 1 EKS
> SAKSDATO 02.01.1991
> SISTE DOK 02.01.1991
> ANT.DOK 1
> SANSV OD/ODH/RO
> DOKNR 1
> JAVD ODJ
> SBH OD/ODH/R
> JDATO 02.01.1991
> UOFF FORTR
> AVS/MOT OD-S
> DOK.TITTEL RESERVOARTEKNISK RAPPORT STATFJORD NORD 1 EKS
> AVSKR.DOKNR
> DATO 02.01.1991
> AVSKR. MÅTE X
>
>
>
> But it doesn't
> ---------------
> Something is wrong with the code. And I cant'f find out what it is.
> In the first record for instance, it doesn't catch the following fname
> /value:
> SISTE DOK 02.01.1991
> ANT.DOK 1
> DOKNR 1
> JAVD ODJ
> SBH OD/ODH/K
> JDATO 02.01.1991
> UOFF 5 A
> DATO 02.01.1991
>
> In the TEST dataset, "DATO" is treated like the value of
> "AVSKR.DOKNR", in fact, "AVSKR.DOKNR" has no value. And "SISTE DOK" is
> treted like a part of the value of "SAKSNR". Similarly, "JAVD" is
> wrongly treated like a part of the value of fname "DOKNR".
>
>
> Another concern is that cards is used. May be it would be more
> convenient to treate the records as if the were contained in a file.
> You see, in this simple case, it's just two records. But the files I
> work with in real, may have thousands of records. They might cause the
> editor file to be pretty large.
> Can anyone suggest how I should make the code it I used a file, not
> cards ?
>
> The dataset TEST is like this:
> --------------------------------
> attribute1 value
> attribute2 value
> attribute3 value
> ...
>
> Actually, I would rather prefer it to be like this:
> ----------------------------------------------------
> attribute1 attribute2 attribute3
> value value value
> value value value
>
> That is, so each record in the file becomes a row in the dataset.
>
>
>
> Regards
> Rune Runnestoe
|