|
At 12:43 am 2/11/2005 -0500, Haigang Zhou wrote:
>I have some stock prices as shown below. I want to count the number of
>occurrences of "-9" for each stock (stk1 - stk5), and delete a stock if the
>total number of occurrences of "-9" exceeds 15% of the total observations.
>For example, stk1 has 5 occurrences of "-9", and therefore, I want to delete
>the entire column from the dataset.
You've made this problem more difficult than it needs to be by having a bad
data structure. Information about values (in this case, the name of the
stock) is stored in column names, when it should be stored in a
variable. If you rearrange your data, the problem becomes much easier.
You could normalize your data set in a data step:
===== untested code
data normal;
retain date;
length stock $3.;
set one;
array stocks acx--btr;
do over stocks;
call vname(stocks, stock);
price = stocks;
output;
end;
keep date stock price;
run;;
=====
Then it's trivial to find the stocks you want to delete using PROC SQL:
===== untested code
proc sql noprint;
select stock
into :drop separated by ' '
from normal
group by stock
having sum(price=-9) / count(*) >= .15
;
quit;
=====
Then you can use the drop macro variable to drop the unwanted columns from
your data set:
===== untested code
data two;
set one;
drop &DROP.;
run;;
=====
There are past postings to SAS-L which contain similar code. Harry
Droogendyk mentioned back in 2003 that CALL VNAME is expensive, but given
your data structure you might be struck with expensive. data normal above
could be a view, which might speeds things up a bit - or not, maybe SAS
would have to create a spill file and that might be more expensive than
using a regular data set. Perhaps someone could benchmark this.
>Thank you very much!
>
>
>
>DATA ONE;
> INPUT date acx dyk ojm wuq btr;
> DATALINES;
> 1 16.15 19.63 19.55 15.86 12.36
> 2 10.74 -9.00 13.01 11.62 17.77
> 3 18.10 11.51 10.95 10.09 15.88
> 4 17.00 18.80 11.19 15.87 15.56
> 5 12.38 -9.00 16.14 10.73 12.85
> 6 -9.00 14.16 -9.00 11.22 16.59
> 7 12.60 13.54 12.85 -9.00 11.90
> 8 -9.00 13.17 10.52 14.97 13.55
> 9 12.75 13.18 14.41 18.20 14.13
> 10 -9.00 17.35 11.92 18.61 10.02
> 11 18.49 -9.00 -9.00 16.01 16.41
> 12 18.40 12.07 19.66 13.66 18.61
> 13 -9.00 12.39 15.79 18.00 10.07
> 14 -9.00 19.72 12.83 19.28 15.46
> 15 16.77 19.11 -9.00 11.47 10.56
> 16 10.03 -9.00 10.08 15.21 12.65
> 17 16.18 10.04 14.14 12.86 14.45
> 18 19.01 14.67 12.67 19.76 12.06
> 19 18.86 19.03 16.91 12.99 15.85
> 20 18.02 12.06 14.99 16.02 13.70
>;
>RUN;
-----
Jack Hamilton
jfh@alumni.stanford.org
Sacramento, California
|