Date: Thu, 10 Feb 2005 23:31:52 -0800 Jack Hamilton "SAS(r) Discussion" Jack Hamilton Re: Computing Percentages within Subtotals To: Haigang Zhou <200502110543.j1B5hAxY013039@listserv.cc.uga.edu> text/plain; charset="us-ascii"; format=flowed

At 12:43 am 2/11/2005 -0500, Haigang Zhou wrote:

>I have some stock prices as shown below. I want to count the number of >occurrences of "-9" for each stock (stk1 - stk5), and delete a stock if the >total number of occurrences of "-9" exceeds 15% of the total observations. >For example, stk1 has 5 occurrences of "-9", and therefore, I want to delete >the entire column from the dataset.

You've made this problem more difficult than it needs to be by having a bad data structure. Information about values (in this case, the name of the stock) is stored in column names, when it should be stored in a variable. If you rearrange your data, the problem becomes much easier.

You could normalize your data set in a data step:

===== untested code data normal; retain date; length stock \$3.; set one; array stocks acx--btr; do over stocks; call vname(stocks, stock); price = stocks; output; end; keep date stock price; run;; =====

Then it's trivial to find the stocks you want to delete using PROC SQL:

===== untested code proc sql noprint;

select stock into :drop separated by ' ' from normal group by stock having sum(price=-9) / count(*) >= .15 ;

quit; =====

Then you can use the drop macro variable to drop the unwanted columns from your data set:

===== untested code data two;

set one;

drop &DROP.;

run;; =====

There are past postings to SAS-L which contain similar code. Harry Droogendyk mentioned back in 2003 that CALL VNAME is expensive, but given your data structure you might be struck with expensive. data normal above could be a view, which might speeds things up a bit - or not, maybe SAS would have to create a spill file and that might be more expensive than using a regular data set. Perhaps someone could benchmark this.

>Thank you very much! > > > >DATA ONE; > INPUT date acx dyk ojm wuq btr; > DATALINES; > 1 16.15 19.63 19.55 15.86 12.36 > 2 10.74 -9.00 13.01 11.62 17.77 > 3 18.10 11.51 10.95 10.09 15.88 > 4 17.00 18.80 11.19 15.87 15.56 > 5 12.38 -9.00 16.14 10.73 12.85 > 6 -9.00 14.16 -9.00 11.22 16.59 > 7 12.60 13.54 12.85 -9.00 11.90 > 8 -9.00 13.17 10.52 14.97 13.55 > 9 12.75 13.18 14.41 18.20 14.13 > 10 -9.00 17.35 11.92 18.61 10.02 > 11 18.49 -9.00 -9.00 16.01 16.41 > 12 18.40 12.07 19.66 13.66 18.61 > 13 -9.00 12.39 15.79 18.00 10.07 > 14 -9.00 19.72 12.83 19.28 15.46 > 15 16.77 19.11 -9.00 11.47 10.56 > 16 10.03 -9.00 10.08 15.21 12.65 > 17 16.18 10.04 14.14 12.86 14.45 > 18 19.01 14.67 12.67 19.76 12.06 > 19 18.86 19.03 16.91 12.99 15.85 > 20 18.02 12.06 14.99 16.02 13.70 >; >RUN;

----- Jack Hamilton jfh@alumni.stanford.org Sacramento, California

Back to: Top of message | Previous page | Main SAS-L page