Date: Thu, 30 Sep 2004 20:57:23 +0100
Reply-To: Peter Crawford <Peter.Crawford@BLUEYONDER.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Crawford <Peter.Crawford@BLUEYONDER.CO.UK>
Subject: FW: Infile for csv file where char variable contains inbedded
comma
Content-Type: text/plain; charset="us-ascii"
-----Original Message-----
From: Peter Crawford [mailto:Peter.Crawford@blueyonder.co.uk]
Sent: 30 September 2004 20:48
To: 'Pudding Man'
Subject: RE: Infile for csv file where char variable contains inbedded comma
Hey "Man"
I'm sorry to find I want to criticize your data for not being in valid .CSV
format. That, I consider would require, not this line
"Completed","A123",""123 MAIN TREET, LOT 126 TR"","n/a"
but either
this
"Completed","A123","123 MAIN TREET, LOT 126 TR","n/a"
or this
"Completed","A123","""123 MAIN TREET, LOT 126 TR""","n/a"
These two are the cards I used in this simple data step
55 data uu;
56 infile cards dsd truncover;
57 length var10 $35. var11 $4. var12 $100. var13 $10. ;
58 input var10 - var13 @1 card1 $char100.;
59 put (_all_)(/=);
60 cards;
var10=Completed
var11=A123
var12=123 MAIN TREET, LOT 126 TR
var13=n/a
card1="Completed","A123","123 MAIN TREET, LOT 126 TR","n/a"
var10=Completed
var11=A123
var12="123 MAIN TREET, LOT 126 TR"
var13=n/a
card1="Completed","A123","""123 MAIN TREET, LOT 126 TR""","n/a"
NOTE: The data set WORK.UU has 2 observations and 5 variables.
When I pass your original line in, I get this output in the log
var10=Completed var11=A123 var12=""123 MAIN TREET var13=LOT 126 TR
card1="Completed","A123",""123 MAIN TREET, LOT 126 TR"","n/a" This is
because quotes(") need to be balanced, and when a quote is to be treated as
data it is repeated. By starting with '""' your data cell is interpreted as
starting with '"' but having no embedded delimiters. So the first comma it
finds is expected to be a data delimiter
I really don't know how you expect to parse that data
,""123 MAIN TREET, LOT 126 TR"",
Have you tried it through the Microsoft import routine in excel or access ?
What sort of application created that data?
Peter
-----Original Message-----
From: Pudding Man [mailto:pudding.man@gmail.com]
Sent: 30 September 2004 17:05
To: Peter Crawford
Cc: sas-l@listserv.uga.edu
Subject: Re: Infile for csv file where char variable contains inbedded comma
On Wed, 29 Sep 2004 19:08:31 -0400, Peter Crawford
<peter.crawford@blueyonder.co.uk> wrote:
>
>DSD infile option
>is meant to deal with all this
But does it? Effectively? Perhaps a demo?
Plunking around with W2k V9, I find DSD behavior to
be both a benefit and a hindrance.
I had hoped to just read the entire intended VAR12
field and parse. The combination of contigious dq's
and embedded comma seem to confound DSD.
If I'm not mistaken (always possible), it _should_
be practical to read VAR12 as 2 comma-separated
fields, then concatenate and parse. When I tried
this, reading the second var ate the terminating
comma and subsequent field value. Nicht Gut!
The following is materially ugly, but seems to
work:
data uu(drop = _:);
infile cards dsd column=_p1;
length var10 $35. var11 $4. var12 $100. var13 $10.;
input var10 $ var11 $ @;
_p2 = index(substr(_infile_, _p1 + 2), '""') + _p1 + 3;
_len = _p2 - _p1;
input @_p1 _var12x $varying100. _len +(-2) _dum $ var13 $;
var12 = input(tranwrd(_var12x, '""', '"'), $quote100.);
put _all_;
cards;
"Completed","A123",""123 MAIN TREET, LOT 126 TR"","n/a"
; run;
Skoal,
Puddin'
******************************************************
*** Puddin' Man PuddingDotMan at GmailDotCom ***
******************************************************;
"I'm sitting here wonderin'
would a matchbox hold my clothes.
I ain't got so many matches
but I got so far to go."
from "Matchbox Blues", Blind Lemon Jefferson, maybe 1927
>On Wed, 29 Sep 2004 15:47:28 -0500, Pudding Man <pudding.man@GMAIL.COM>
>wrote:
>>I will studiously avoid all your code, most of your data.
>>
>>This shouldn't be necessary, but ...
>>
>>Consider use of TRANWRD for "" --> ", then $QUOTE to
>>remove remaining dq's. Following might suggest something
>>viable:
>>
>>data _null_;
>> input v12x :&$100.;
>> v12 = input(tranwrd(v12x, '""', '"'), $quote100.);
>> put _all_;
>>cards;
>>""123 MAIN TREET, LOT 126 TR""
>>; run;
>>
>>Hope it hep's ...
>>
>> Puddin'
>>
>>******************************************************
>>*** Puddin' Man PuddingDotMan at GmailDotCom ***
>>******************************************************;
>>
>>"Now, I may -look- like I'm crazy,
>> but po' John do know right from wrong!"
>> -from "Drop Down, Mama", Sleepy John Estes
>>
>>On 23 Sep 04 22:59:48 GMT, chuck_sas@CFEDATA.COM (Chuck Enright)
>>wrote:
>>
>>>In the following data step, variable VAR12 contains an inbedded comma
>>>in
>the
>>>value and the variable is wrapped with 2 sets of double quotes. How
>>>do I modify the infile, informat, or input statement to read the
>>>value with the comma and exclude the quote?
>>>
>>>Desired output
>>>VAR12 = 123 MAIN TREET, LOT 126 TR
>>>
>>>
>>>DATA TEST;
>>> INFILE DATALINES MISSOVER DSD;
>>>
>>> INFORMAT
>>> VAR01 $20.
>>> VAR02 8.
>>> VAR03 MMDDYY8.
>>> VAR04 $20.
>>> VAR05 $8.
>>> VAR06 8.
>>> VAR07 8.
>>> VAR08 8.
>>> VAR09 MMDDYY8.
>>> VAR10 $35.
>>> VAR11 $4.
>>> VAR12 $100.
>>> VAR13 $10.
>>> VAR14 $28.
>>> VAR15 $2.
>>> VAR16 $10.
>>> VAR17 $25.
>>> VAR18 $1.
>>> VAR19 $5.
>>> VAR20 COMMA15.2
>>> VAR21 COMMA15.2
>>> VAR22 $10.
>>> VAR23 COMMA15.2
>>> VAR24 8.
>>> VAR25 $100.
>>> VAR26 $25. ;
>>>
>>> INPUT
>>> VAR01 $ VAR02 VAR03 VAR04 $ VAR05 $ VAR06 VAR07
>>> VAR08 VAR09 VAR10 $ VAR11 $ VAR12 $ VAR13 $ VAR14 $
>>> VAR15 $ VAR16 $ VAR17 $ VAR18 $ VAR19 $ VAR20 VAR21
>>> VAR22 $ VAR23 VAR24 VAR25 $ VAR26 $
>>> ;
>>>datalines;
>>>"ABCDEFGHIJKLM","1234","01/01/04","ABC123456789","Q12345","315.00","0
>>>","10
>","01/23/04","Completed","A123",""123
>>>MAIN TREET, LOT 126
>>>TR"","n/a","DALLAS","TX","11111","COUNTY","Y","O","30000.00","26000.0
>>>0",""
>,"35000.00","16.67","No
>>>Reason","ABC"
>>>;
>>>RUN;
|