|
You could try storing the data as character and make a view that converts
them to numbers.
data mydata_real ;
input (flag1-flag5) ($1.) ;
cards;
10101
run;
proc sql noprint ;
create view mydata_view as select
input(flag1,1.) as flag1
,input(flag2,1.) as flag2
,input(flag3,1.) as flag3
,input(flag4,1.) as flag4
,input(flag5,1.) as flag5
from mydata_real
;
quit;
On Thu, 19 Apr 2012 18:02:12 -0400, Bolotin Yevgeniy
<YBolotin@SCHOOLS.NYC.GOV> wrote:
>This is a semi-theoretical question, so don't spend too much time on it
>- i can just keep using numerics, if no better solution is available
>
>
>
>
>
>I have a dataset with a large number of 0/1 variables
>
>
>
>If variables are stored as numeric, they take up 3 bytes
>
>If variables are stored as character, they take up 1 byte
>
>
>
>However, i do various aggregate functions over subsets of these
>variables, which take about half a second each for numerics and 5
>seconds each for characters
>
>(using proc sql, 5 second results for both input(var, 1.) and case
>when... . proc sql does not implicitly convert char<->num)
>
>Aggregates = sum, average, count, etc
>
>
>
>The data is not indexed, so it's not just an index lookup vs table scan
>
>
>
>
>
>Is there any way for me to get this to work fast but still get the space
>savings of character variables?
>
>
>
>
>
>
>
>
>
>Yevgeniy (Eugene) Bolotin
>Report Developer, Progress Reports
>NYC Department of Education
>
>(212) 374-7949
>
>ybolotin@schools.nyc.gov <blocked::mailto:mmadow@schools.nyc.gov>
>
>
|