Date: Wed, 3 Feb 2010 05:23:17 -0800
Reply-To: ChrisG <chris.godlewski@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: ChrisG <chris.godlewski@GMAIL.COM>
Organization: http://groups.google.com
Subject: Re: data cleaning
Content-Type: text/plain; charset=UTF-8
On 3 fév, 11:20, ChrisG <chris.godlew...@gmail.com> wrote:
> On 3 fév, 01:24, art...@NETSCAPE.NET (Arthur Tabachneck) wrote:
>
>
>
> > Chris,
>
> > I'm not sure what you are asking. Â In my example, the columns didn't have
> > any variable names present in the txt file.
>
> > Art
> > -------
> > On Tue, 2 Feb 2010 14:22:31 -0800, ChrisG <chris.godlew...@GMAIL.COM>
> > wrote:
>
> > >On 2 f鶬 22:58, art...@NETSCAPE.NET (Arthur Tabachneck) wrote:
> > >> Chris,
>
> > >> If you're reading in a txt file (e.g., c:\have.txt), then I think
> > >> something like the following will get you what you want:
>
> > >> data want (drop=first_stuff);
> > >> Â infile "c:\have.txt" truncover;
> > >> Â retain id;
> > >> Â informat first_stuff $25.;
> > >> Â format date mmddyy10.;
> > >> Â input first_stuff & type role id2 league percentage;
> > >> Â date=inputn(first_stuff,'mmddyy',10);
> > >> Â if missing(date) then id=first_stuff;
> > >> Â else do;
> > >> Â Â output;
> > >> Â end;
> > >> run;
>
> > >> HTH,
> > >> Art
> > >> ---------
> > >> On Tue, 2 Feb 2010 08:14:42 -0800, ChrisG <chris.godlew...@GMAIL.COM>
> > >> wrote:
>
> > >> >Hi
>
> > >> >I have this kind of data :
>
> > >> >1    LN469113 Corp  .    .    .    .    .
> > >> >3 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 4 Â Â Â 171906 Â 0 Â Â Â 0
> > >> >4 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 15 Â Â Â 171906 Â 0 Â Â Â 0
> > >> >5 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 150 Â Â 171906 Â 0 Â Â Â 0
> > >> >6 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 220 Â Â 171906 Â 0 Â Â Â 0
> > >> >7 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 364908 Â 0 Â Â Â 0
> > >> >8 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 909795 Â 0 Â Â Â 0
> > >> >1    LN469121 Corp  .    .    .    .    .
> > >> >3 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 4 Â Â Â 171906 Â 0 Â Â Â 0
> > >> >4 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 15 Â Â Â 171906 Â 0 Â Â Â 0
> > >> >5 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 150 Â Â 171906 Â 0 Â Â Â 0
> > >> >6 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 220 Â Â 171906 Â 0 Â Â Â 0
> > >> >7 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 364908 Â 0 Â Â Â 0
> > >> >8 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 909795 Â 0 Â Â Â 0
> > >> >1    LN469125 Corp  .    .    .    .    .
> > >> >3 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 4 Â Â Â 171906 Â 0 Â Â Â 0
> > >> >4 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 15 Â Â Â 171906 Â 0 Â Â Â 0
> > >> >5 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 150 Â Â 171906 Â 0 Â Â Â 0
> > >> >6 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 220 Â Â 171906 Â 0 Â Â Â 0
> > >> >7 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 364908 Â 0 Â Â Â 0
> > >> >8 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 909795 Â 0 Â Â Â 0
>
> > >> >And i need it that way (i added the name of variables for
> > >> >convenience) :
>
> > >> >id            date          type   role   id2
> > >> league  percentage
> > >> >LN469113 Corp  11/25/2005    6    4    171906  0    0
> > >> >LN469113 Corp  11/25/2005    6    15    171906  0    0
> > >> >LN469113 Corp  11/25/2005       6     150   171906
> > 0 Â Â Â 0
> > >> >LN469113 Corp  11/25/2005    6    220   171906  0    0
> > >> >LN469113 Corp  11/25/2005    6    251   364908  0    0
> > >> >LN469113 Corp  11/25/2005       6     251   909795
> > 0 Â Â Â 0
> > >> >LN469121 Corp  11/25/2005    6    4    171906  0    0
> > >> >LN469121 Corp  11/25/2005       6     15    171906
> > 0 Â Â Â 0
> > >> >LN469121 Corp  11/25/2005    6    150   171906  0    0
> > >> >LN469121 Corp  11/25/2005    6    220   171906  0    0
> > >> >LN469121 Corp  11/25/2005    6    251   364908  0    0
> > >> >LN469121 Corp  11/25/2005    6    251   909795  0    0
> > >> >LN469125 Corp  11/25/2005       6     4    171906
> > 0 Â Â Â 0
> > >> >LN469125 Corp  11/25/2005       6     15    171906
> > 0 Â Â Â 0
> > >> >LN469125 Corp  11/25/2005    6    150   171906  0    0
> > >> >LN469125 Corp  11/25/2005       6     220   171906
> > 0 Â Â Â 0
> > >> >LN469125 Corp  11/25/2005       6     251   364908
> > 0 Â Â Â 0
> > >> >LN469125 Corp  11/25/2005       6     251   909795
> > 0 Â Â Â 0
>
> > >> >I definitely can't make SAS copy paste the content of a cell in
> > >> >another colmun and duplicate it ...
> > >> >And i am not a proc iml - user at all by the way
>
> > >> >Hopefully someone around here had a similar problem...
>
> > >> >Thanks in advance
> > >> >Cheers
> > >> >CG
>
> > >Dear Art
>
> > >just one hint please
> > >in the raw dataset in .txt
>
> > >1    LN469113 Corp  .    .    .    .    .
> > >3 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 4 Â Â Â 171906 Â 0 Â Â Â 0
> > >4 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 15 Â Â Â 171906 Â 0 Â Â Â 0
> > >5 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 150 Â Â 171906 Â 0 Â Â Â 0
> > >6 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 220 Â Â 171906 Â 0 Â Â Â 0
> > >7 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 364908 Â 0 Â Â Â 0
> > >8 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 909795 Â 0 Â Â Â 0
> > >1    LN469121 Corp  .    .    .    .    .
> > >3 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 4 Â Â Â 171906 Â 0 Â Â Â 0
> > >4 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 15 Â Â Â 171906 Â 0 Â Â Â 0
> > >5 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 150 Â Â 171906 Â 0 Â Â Â 0
> > >6 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 220 Â Â 171906 Â 0 Â Â Â 0
> > >7 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 364908 Â 0 Â Â Â 0
> > >8 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 909795 Â 0 Â Â Â 0
> > >1    LN469125 Corp  .    .    .    .    .
> > >3 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 4 Â Â Â 171906 Â 0 Â Â Â 0
> > >4 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 15 Â Â Â 171906 Â 0 Â Â Â 0
> > >5 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 150 Â Â 171906 Â 0 Â Â Â 0
> > >6 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 220 Â Â 171906 Â 0 Â Â Â 0
> > >7 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 364908 Â 0 Â Â Â 0
> > >8 Â Â Â 11/25/2005 Â Â Â 6 Â Â Â 251 Â Â 909795 Â 0 Â Â Â 0
>
> > >should the columns have some names ?
> > >if yes any or some specific that fits what i need in the sas output
> > >dataset at the end ?
>
> > >thanx
> > >CG
>
> ok sorry for my ignorance
> i have understood at least partially what was wrong
> i deleted the first column from the raw txt data with this kind of
> weird "id"
> hence now the first column is with the BL142536 and stuff and the
> dates (as in the 2nd column from the sample above)
> and the code is working at leaast i get some dataset at the end but i
> only get this :
> (extraction)
>
> id    date   type   role   id2   league  percentage
> BL478027 Â Â Â Â 09/11/2008
> BL478031 Â Â Â Â 09/11/2008
> BL478035 Â Â Â Â 09/11/2008
> BL478356 Â Â Â Â 02/18/2005
> BL478356 Â Â Â Â 02/18/2005
> BL478356 Â Â Â Â 02/18/2005
> BL478396 Â Â Â Â 12/01/2004
> BL478396 Â Â Â Â 12/01/2004
> BL478396 Â Â Â Â 12/01/2004
> BL478396 Â Â Â Â 12/01/2004
> BL478396 Â Â Â Â 12/01/2004
> BL478396 Â Â Â Â 12/01/2004
>
> this is already a huge step forward but why i don't get the data for
> the rest of the variables : type until percentage ???
>
> thanks in advance !
> best
> CG
just in case i put here the code from Art that finally works
perfectly !
data want (drop=first_stuff);
infile "C:\Users\Standard\Desktop\bb.txt" truncover delimiter='09'x;
retain loanid;
informat first_stuff $25. ;
format date mmddyy10. type $2. role $3. id2 $10. league $25.
percentage $5. ;
input first_stuff /* delete->date*/ type role id2 league
percentage;
date=inputn(first_stuff,'mmddyy',10);
if missing(date) then loanid=first_stuff;
else do;
output ;
end;
run;
many thanks to Art !
and thanks to everybody that replied to my query here
Best
CG
|