|
This solves your first two problems, which are relatively easy. It does not
solve the last two, which don't seem to have the right number of codes in
them, which causes some problems. Not sure quite where I'd go for those,
unless you have a lookup table of valid codes for each spot, in which case
perhaps you could check for those?
data problems;
infile datalines truncover;
retain prx1;
input linedata $500.;
if _n_ = 1 then
prx1 =
prxparse("/(\d*),(.*),(\d*),(.*),(\d*),(.*),(\d*),(.*),(\d*),(.*)/");
rc = prxmatch(prx1,linedata);
array code[5];
array val[5] $100;
if rc then
do __t = 1 to 5;
code[__t] = input(prxposn(prx1,(__t-1)*2+1,linedata),BEST12.);
val[__t] = prxposn(prx1,__t*2,linedata);
end;
datalines;
111111,VANITY TOP OVAL , 31X22",4456,MARBLE
FACTORY,The,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture
22222,MIRROR 251/2X21/4X33, NLCM-2433,9250,FOREMOST ,GROUPS
INC,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture
3333,ACID ,BRUSH 1/2" 3/PK,1120,OATEY CANADA SUPPLY CHAIN
SERV.CO.,30,Plumbing,3004,Rough
Plumbing
444,FLARED MALE AD,. 1/4X1/4 48-4B,4056,FAIRVIEW FITTINGS &
MFG,LTD,30,,3004,Rough Plumbing
;;;;
run;
-Joe
On Fri, Sep 24, 2010 at 3:01 PM, Ming Chen <chenming@gmail.com> wrote:
> Hi All,
>
> I am trying to import a client csv format data file with comma within some
> columns. Since the client won't fix their crap and the boss thinks we can
> fix it by at least manually removing the comma in the txt field.
>
> By checking the data, I am sure that regular expression can come to rescue
> me since there are some obvious patterns in the data. However, I am just
> know a little bit the regular expression stuff and I really need some help
> here.
>
>
> Here are some sample data:
> columns: code1, code1_desc, code2, code2_desc, code3,code3_desc,
> code4,code4_desc,code5,code5_desc
>
> 111111,VANITY TOP OVAL , 31X22",4456,MARBLE FACTORY,
> The,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture
> 22222,MIRROR 251/2X21/4X33, NLCM-2433,9250,FOREMOST ,GROUPS
> INC,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture
> 3333,ACID ,BRUSH 1/2" 3/PK,1120,OATEY CANADA SUPPLY CHAIN
> SERV.CO.,30,Plumbing,3004,Rough
> Plumbing
> 444,FLARED MALE AD,. 1/4X1/4 48-4B,4056,FAIRVIEW FITTINGS & MFG,
> LTD,30,,3004,Rough Plumbing
>
> I am sure that I can fix the csv file just by removing the comma in the 5
> code_desc columns and there are only number with comma before and after the
> desc part.
>
> Thanks
>
> Ming
>
|