| Date: | Mon, 27 Dec 2010 21:30:44 -0600 |
| Reply-To: | Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US> |
| Subject: | Re: Is there an easier way to solve this? |
| Content-Type: | text/plain; charset="iso-8859-1" |
I've been following this thread and thinking the same. Seems to make the code robust you need to consider issues with the data entry, unless you have a QA/QC program on the front end.
Warren Schlechte
-----Original Message-----
From: toby dunn [mailto:tobydunn@HOTMAIL.COM]
Sent: Mon 12/27/2010 8:05 PM
Subject: Re: Is there an easier way to solve this?
Art,
Since you essentually dealing with a free form text string I have a question. Is there the possibility that any of the three parts you want to parse will be missing?
Toby Dunn
"I'm a hell bent 100% Texan til I die"
"Don't touch my Willie, I don't know you that well"
> Date: Mon, 27 Dec 2010 20:14:40 -0500
> From: art297@ROGERS.COM
> Subject: Re: Is there an easier way to solve this?
> To: SAS-L@LISTSERV.UGA.EDU
>
> Max,
>
> Tonight you get a free lesson.
>
> When you want to know if something is more or less efficient, DON'T ask the
> list! Just build some code that allows you to make the two (or however
> many) sets of code comparable and run them.
>
> If you find something that the rest of us might be interested, then share
> the results.
>
> Thus, for your current question, wouldn't something like the following tell
> you what you want to know?:
>
> data _null_;
> file "c:\testdata.txt";
> input;
> do i=1 to 100000;
> put _infile_;
> end;
> cards;
> Uptown, Smalltown XX 12345
> The City of New York Big Apple Big Apple NY 85044
> ;
> run;
>
> data address;
> length city $40 state $2 zip $5;
> informat zip $revers5.;
> informat state $revers2.;
> informat city $revers40.;
> infile "c:\testdata.txt";
> input @;
> _infile_=reverse(_infile_);
> input zip state city &;
> run;
>
> data b;
> Length zip $ 5 state $ 2 city $ 40;
> infile "c:\testdata.txt";
> input;
> Zip=scan(_infile_, -1);
> State=scan(_infile_, -2);
> City=tranwrd(_infile_, state||" "||zip, " ");
> run;
>
> The log will tell you everything that you want to know and you, then, can
> decide if it is just something to learn or something you should share.
>
> In the present case, it might just be worth sharing.
>
> Art
> -------
> On Mon, 27 Dec 2010 18:26:35 -0500, bbser 2009 <bbser2009@GMAIL.COM> wrote:
>
> >Nat
> >
> >
> >Yes. And alternative to my first code using tranwrd(), maybe it would be
> >robust to adjust it like this (using tranwrd() twice instead of just once):
> >
> >...
> >zip=scan(x, -1);
> >temp=tranwrd(x, zip, " "); *First get rid of the value of zip from the full
> >string in x;
> >state=scan(temp, -1); *It is minus one, not minus two;
> >city=tranwrd(temp, state, " "); *Secondly get rid of state from temp;
> >...
> >
> >I guess this will get rid of the problem of varying number of blanks and is
> >better than my second code where I used scan() and catx().
> >Now I am wondering, how this compares to Saren's in term of efficiency?
> >
> >Max
> >
> >-----Original Message-----
> >From: Nat Wooding [mailto:nathani@verizon.net]
> >Sent: December-27-10 5:28 PM
> >To: 'bbser 2009'
> >Subject: RE: [SAS-L] Is there an easier way to solve this?
> >
> >Max
> >
> >I don't use Tranwrd enough to know all of its nuances. If you used
> >
> >Record = compbl( record );
> >
> >You would get rid of extra blanks.
> >
> >Nat
> >
> >-----Original Message-----
> >From: bbser 2009 [mailto:bbser2009@gmail.com]
> >Sent: Monday, December 27, 2010 5:04 PM
> >To: 'Nat Wooding'
> >Cc: SAS-L@LISTSERV.UGA.EDU
> >Subject: RE: [SAS-L] Is there an easier way to solve this?
> >
> >Nat
> >
> >Thanks for let me know the "continue" statement. Glad to add it to my
> >arsenal.
> >As for my earlier code, i just thought it might not be robust or something.
> >For example, if some of the records like below have two more spaces between
> >NY and 111111.
> >
> >xxxx xxx xx NY 111111
> >
> >Then using tranwrd(record, a||""||b, "") does not replace "NY 111111"
> >totally with blanks.
> >
> >Max
> >
> >-----Original Message-----
> >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Nat
> >Wooding
> >Sent: December-27-10 4:08 PM
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: Re: [SAS-L] Is there an easier way to solve this?
> >
> >Max
> >
> >SAS fussed at me until I made the variable for City longer (I used 50).
> >
> >Your earlier solution was simpler but this does work. One thing that I
> would
> >do would be to stop the loop as soon as it got to the end of the words in
> >the string as in the following code.
> >
> >Nat
> >
> >do i=1 to 20;
> > word[i]=scan(x,-i);
> > if word[i]='' then continue;**<<<< new line;
> >end;
> >
> >-----Original Message-----
> >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of bbser
> >2009
> >Sent: Monday, December 27, 2010 3:42 PM
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: Re: Is there an easier way to solve this?
> >
> >Got typos in the code. Here is the new one.
> >------------------
> >How about this below? It looks "elementary" for newbies like me and seemly
> >works fine for whatever the longest USA city names.
> >
> >Max
> >
> >----------
> >data a;
> > keep x zip state city;
> > x="The City of New York Big Apple Big Apple NY 85044";
> > Length zip $ 5 state $ 2 city $ 30;
> > array word{20} $ 15;
> > do i=1 to 20;
> > word[i]=scan(x,-i);
> > end;
> > zip=word[1];
> > state=word[2];
> > city=catx("", of word20-word3);
> >run;
> >proc print;
> >run;
> >
> >-----Original Message-----
> >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Nat
> >Wooding
> >Sent: December-27-10 1:16 PM
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: Re: [SAS-L] Is there an easier way to solve this?
> >
> >Matt
> >
> >I took the liberty to send my reply to the list in case there are any bored
> >Birdies listening in.
> >
> >I totally agree that SAS is not publicizing it as well as they could but I
> >am generally unhappy with having the NLS formats and informats segregated
> >from their peers, particularly since some are very useful.
> >
> >Did you notice the companion informat
> >
> >$REVERJw.@ inputs text right to left, preserves leading and trailing
> >blanks
> > ABCD | $reverj6. | ' DCBA'
> >
> >I copied this from TS486 and not the standard docs.
> >
> >Nat
> >-----Original Message-----
> >From: matt.pettis@thomsonreuters.com
> [mailto:matt.pettis@thomsonreuters.com]
> >
> >Sent: Monday, December 27, 2010 12:43 PM
> >To: nathani@VERIZON.NET
> >Subject: RE: Is there an easier way to solve this?
> >
> >Thanks Nat! It is indeed a nice informat to keep in your back pocket...
> >just think SAS shouldn't be hiding this light under a bushel...
> >
> >Thanks again,
> >Matt
> >
> >-----Original Message-----
> >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Nat
> >Wooding
> >Sent: Monday, December 27, 2010 11:38 AM
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: Re: Is there an easier way to solve this?
> >
> >Matt
> >
> >Art and I spoke of this offline earlier today. 9.1.3 docs have an entry for
> >it within the normal informats but refer you to the NLS docs.
> >
> >Art and I also corresponded about the width defaulting to 1. I find that if
> >I have a length statement in the code, I do not need to supply a width.
> >
> >Nat
> >
> >-----Original Message-----
> >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Matthew
> >Pettis
> >Sent: Monday, December 27, 2010 12:28 PM
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: Re: Is there an easier way to solve this?
> >
> >That '$revers.' informat wasn't documented in my local SAS Help files. I
> >had to google it and found that it is a NLS informat. Anybody know why it
> >wouldn't be in the base help files that come with my SAS install (and I
> have
> >9.2).
> >
> >Just curious,
> >Thanks,
> >matt
> >
> >-----Original Message-----
> >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Arthur
> >Tabachneck
> >Sent: Monday, December 27, 2010 9:42 AM
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: Re: Is there an easier way to solve this?
> >
> >Nat pointed out to me, offline, that I had left off the critical statement
> >in Søren's code, namely the informat assignment:
> >
> > informat zip state city_soren $revers.;
> >
> >Yes, indeed, a VERY nice solution.
> >
> >Art
> >--------
> >On Mon, 27 Dec 2010 10:05:50 -0500, Arthur Tabachneck <art297@ROGERS.COM>
> >wrote:
> >
> >>Nat,
> >>
> >>Either you (or I) have had too much Excedrin or our systems are
> functioning
> >>differently (hmmm .. may they've had too much Excedrin). Running your
> code
> >>I get (what I expected), Søren's city 'k' for New York and Søren's city
> >'a'
> >>for Vienna.
> >>
> >>If you get something different, please email me a copy of the resulting
> >>file.
> >>
> >>Art
> >>--------
> >>On Mon, 27 Dec 2010 09:27:12 -0500, Nat Wooding <nathani@VERIZON.NET>
> >wrote:
> >>
> >>>Art
> >>>
> >>>It looks to me that both solutions produce identical results. Try the
> >>>following (which includes a merge sans by statement!!)
> >>>
> >>>Nat
> >>>
> >>> data soren;
> >>> length firstname lastname address_SOREN $20 city_soren $40 state $2 zip
> >>>$5;
> >>> informat zip state city_soren $revers.;
> >>> input firstname lastname /
> >>> address_SOREN & /
> >>> @;
> >>> _infile_=reverse(_infile_);
> >>> input zip state city_soren &;
> >>> drop zip state firstname lastname;
> >>>cards;
> >>>Lee Athnos
> >>>1215 Raintree Circle
> >>>New York NY 85044
> >>>Heidie Baker
> >>>1751 Diehl Road
> >>>Vienna VA 22124
> >>>;run;
> >>>
> >>>data Art;
> >>> Length City_art $ 40
> >>> State $ 2
> >>> Zip $ 5
> >>>;
> >>>keep city_art address_art;
> >>> infile cards ;
> >>> input FirstName $ LastName $ /
> >>> Address_art $ 1 - 20 /
> >>> @;
> >>> _infile_ = reverse(_infile_);
> >>> input zip state city_art &;
> >>> city_art=reverse(trim(city_art));
> >>> state=reverse(state);
> >>> zip=reverse(zip);
> >>> cards;
> >>>Lee Athnos
> >>>1215 Raintree Circle
> >>>New York NY 85044
> >>>Heidie Baker
> >>>1751 Diehl Road
> >>>Vienna VA 22124
> >>>;
> >>>run;
> >>>
> >>>Data Test;
> >>>merge soren art;
> >>>run;
> >>>
> >>>
> >>>-----Original Message-----
> >>>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> >Arthur
> >>>Tabachneck
> >>>Sent: Monday, December 27, 2010 9:16 AM
> >>>To: SAS-L@LISTSERV.UGA.EDU
> >>>Subject: Re: Is there an easier way to solve this?
> >>>
> >>>Søren,
> >>>
> >>>Definitely creates less of a dependency on Excedrin. However, it would
> >>>require three more uses of the reverse function and, city appears strange
> >>>unless one trims the leading(following?) space:
> >>>
> >>>data Address;
> >>> Length City $ 40
> >>> State $ 2
> >>> Zip $ 5
> >>>;
> >>> infile cards ;
> >>> input FirstName $ LastName $ /
> >>> Address $ 1 - 20 /
> >>> @;
> >>> _infile_ = reverse(_infile_);
> >>> input zip state city &;
> >>> city=reverse(trim(city));
> >>> state=reverse(state);
> >>> zip=reverse(zip);
> >>> cards;
> >>>Lee Athnos
> >>>1215 Raintree Circle
> >>>New York NY 85044
> >>>Heidie Baker
> >>>1751 Diehl Road
> >>>Vienna VA 22124
> >>>;
> >>>run;
> >>>
> >>>But, definitely a nice way to accomplish the task.
> >>>
> >>>Art
> >>>--------
> >>>On Mon, 27 Dec 2010 01:11:50 -0500, S=?ISO-8859-1?Q?=C3=B8ren?= Lassen
> >>><s.lassen@POST.TELE.DK> wrote:
> >>>
> >>>>Art,
> >>>>How about this:
> >>>>data address;
> >>>> length firstname lastname address $20 city $40 state $2 zip $5;
> >>>> informat zip state city $revers.;
> >>>> input firstname lastname /
> >>>> Address & /
> >>>> @;
> >>>> _infile_=reverse(_infile_);
> >>>> input zip state city &;
> >>>>cards;
> >>>>John Doe
> >>>>33 10 Av.
> >>>>Uptown, Smalltown XX 12345
> >>>>;run;
> >>>>
> >>>>Regards,
> >>>>Søren
> >>>>
> >>>>On Sun, 26 Dec 2010 11:07:11 -0500, Arthur Tabachneck
> <art297@ROGERS.COM>
> >>>>wrote:
> >>>>
> >>>>>The following was a question that was raised on the SAS discussion
> >forum.
> >>>>>You are confronted with data that has 3 lines per subject, but the
> third
> >>>>>line has variables that may contain embedded spaces, but there is only
> >>one
> >>>>>space between variables.
> >>>>>
> >>>>>The only suggestion I could think of was the one shown below. Is there
> >>an
> >>>>>easier way?
> >>>>>
> >>>>>data work.Address (drop=_:);
> >>>>> infile cards;
> >>>>> input FirstName $ LastName $ /
> >>>>> Address $ 1 - 20 /
> >>>>> _Third_Line & $80.;
> >>>>> format City $10.;
> >>>>> Zip=scan(_Third_Line,-1);
> >>>>> State=scan(_Third_Line,-2);
> >>>>> call scan(_Third_Line, -2, _position, _length);
> >>>>> City=substr(_Third_Line,1,_position-1);
> >>>>> cards;
> >>>>>Lee Athnos
> >>>>>1215 Raintree Circle
> >>>>>New York NY 85044
> >>>>>Heidie Baker
> >>>>>1751 Diehl Road
> >>>>>Vienna VA 22124
> >>>>>;
> >>>>>
> >>>>>Thanks in advance,
> >>>>>Art
|