I have a large or medium sized dataset with 13m obs and 4 vars, 2 being
numeric ids - one unique id and one batch id. Size is about 700mb.
Unfortunately its not correctly sorted and I need to transpose the dataset
and so the transpose is failing. I'm sorting it now - hopefully it won't
take too long on this toy pc (1GB ram WinXP)... there you go 25 mins, but
would a hash sort have made a noticable difference? I figure the time it
would take me to get one working I may as well sort it using proc sort, but:-
1) would the hash fit in memory? (The text vars are 3 char - the transpose
id variable - and about 600 chars - the transpose var; By is by the batch id)
2) what performance gains might I expect, though of course I realise my
mileage may vary - isn't the majority of the work I/O?
3) I'm guessing there's no such thing as a hash transpose :)
Any comments much appreciated,