| Date: | Tue, 8 Feb 2011 09:22:38 -0500 |
| Reply-To: | Arthur Tabachneck <art297@ROGERS.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Arthur Tabachneck <art297@ROGERS.COM> |
| Subject: | Re: Help with parsing a string |
|
| Content-Type: | text/plain; charset=ISO-8859-1 |
Søren,
Of course that could be further complicated depending upon the setting you
have for the cardimage system option. See, e.g.:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer
.htm#a000201795.htm
or, in short form: http://tiny.cc/uft5v
Art
--------
On Tue, 8 Feb 2011 01:47:33 -0500, S=?ISO-8859-1?Q?=C3=B8ren?= Lassen
<s.lassen@POST.TELE.DK> wrote:
>Ann,
>Assuming that the U=flower in then second dataline should have been
>OU=flower.
>
>Named input is definitely the way to go. Only problem is the period
>before the <name>=, where SAS expects a blank (BTW, is it really a period,
>and not some other unprntable character, which got translated to a period
>when you posted here?):
>
>So, we have to manipulate the line read first, so that SAS can recognize
>the names:
>
>data test;
> if _N_=1 then
> prxid=prxparse('s/\.([A-Z]+=)/ $1/');
> retain prxid;
> drop prxid;
> input @;
> call prxchange(prxid,-1,substr(_INFILE_,1));
> length record $4 type $6 serial CN OU O U $40;
> input record type serial cn= ou= o= u=;
>cards;
>0001 apple1 00.25.Monkey@address.com.CN=I'll be a monkeys uncle.OU=Mocking
>Bird City.O=some other data.U=some other data . . .
>0001 apple6 00679D46CKJL.CN=Help - I need someone.OU=flower.O=some other
>data.U=some other data . . .
>0001 apple6 00679D46CKJL.CN=Help - I need someone.OU=flower.U=some other
>data.
>0001 apple6 00679D46CKJL.CN=Help - I need someone.OU=flower.O=some other
>data.
>;run;
>
>What the string in PRXPARSE does: "s" means substitution.
>"/" is a delimiter. "\." means a period (special character, escaped with
>"\", if the "period" is not a period, but e.g. hex(0A) you can change the
>"\." to a hexidecimal digit: "\x0A"). The paranthesis means "remember this"
>(create capture buffer). "[A-Z]" means uppercase character, "+" means one
>or more of aforementioned, "=" means "=". "/" is a delimiter, meaning
>here comes the string to substitute - which is very simply a blank
>followed by whatever was remembered (capture buffer no. 1, a name followed
>by "="). "/" is a delimiter, end of replacement string.
>
>I found a strange bug in SAS (9.1 on Windows), namely that PRXCHANGE
>truncates the _infile_ variable to 80 characters - even more strangely,
>that can be circumvented by using "substr(_INFILE_,1)" instead. If you
>are reading from an actual file (and not a cards statement) you will
>probably not be affected by this, and can just write
> call prxchange(prxid,-1,_infile_);
>
>Regards,
>Søren
>
>On Thu, 3 Feb 2011 15:31:01 -0500, Ann Mackey <thearchies@LIVE.COM> wrote:
>
>>I've and have tried many, MANY, things, but I'm just not getting it -
>>Neurons just aren't firing too bright today.
>>
>>Here's a sample of the data - all on one line, the third chunk is over 200
>>characters - notice the many types of delimiters, spaces, ., =, '.CN=',
>>etc., and every variable can be a different length:
>>
>>1 6 13
>>0001 apple1 00.25.Monkey@address.com.CN=I'll be a monkeys uncle.OU=Mocking
>>Bird City.O=some other data.U=some other data . . .
>>0001 apple6 00679D46CKJL.CN=Help - I need someone.U=flower.O=some other
>>data.U=some other data . . .
>>
>>I want to get all 0001 records, keeping the type (apple1), and then parse
>>out the first three sections of the last looong variable length field, in
>>this case it would be:
>>
>>Record Type Serial CN OU
>>0001 apple1 00.25.Monkey@address.com I'll be a monkeys uncle Mocking
>>Bird City
>>0001 apple6 00679D46CKJL Help - I need someone flower
>>
>>I've attempted this with scan, prxparse, indexw substr, DLM=, etc.
>>
>>Any direction/help is greatly appreciated, and dare I say... eagerly
>>anticipated!!
>>Thanks,
>>Ann
|