|
Benjamen,
Perhaps if you work with a small data set such as the one generated by
data dailydata ;
do fiveminutes = 1 to 3 ;
do obs = 1 to 10 ;
field1 = ceil ( 10 * ranuni (4936) ) ;
output ;
end ;
end ;
run ;
Then you could eyeball the data your code produces and wonder why you
thought there should be a relationship between the graphs.
One of the advantages of SAS as a programming language is that it lets you
get close to the data easily or to make up data to test your ideas of
analysis. When I first compared SAS to FORTRAN many years ago it was the
immediate feed back of information that impressed me the most.
IanWhitlock@westat.com
-----Original Message-----
From: Witness [mailto:bmeyer67@CALVIN.EDU]
Sent: Monday, July 08, 2002 5:12 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: PROC FREQ vs. PROC MEANS
Ok, so I'm still working on the 5 minute analysis I asked about last
week or so.
I have the entire script done, and am getting read to graph it. And I
can get two different graphs, and I'm not sure why.
Here's the code differences:
Method 1:
PROC FREQ DATA=DailyData;
BY FiveMinutes;
TABLES Field1*FiveMinutes / NOCOL NOROW OUT=Counted_Month;
RUN;
...
Method 2:
PROC MEANS DATA=DailyData;
BY FiveMinutes;
ID Field1 FiveMinutes;
OUTPUT OUT=Counted_Month
N(FiveMinutes) = COUNT;
RUN;
RUN;
After this, I sort the data again, and then use another PROC means to
get the means and standard deviation of the COUNT column, like so:
PROC MEANS DATA=Counted_Month;
BY FiveMinutes;
ID Field1 FiveMinutes COUNT;
OUTPUT OUT=Temp_Counted
MEAN(COUNT) = DayMean
STDDEV(COUNT) = DayStdDev;
RUN;
So the only difference between the two is because of the two methods
above, as far as getting the data goes.
The graph is generated using PROC REG: PLOT. When I output the data, I
get the following difference:
Method 1:
Generates a graph on a scale of 1 to 10, and is pretty much a
straight
line, though a few are anomalous.
Method 2:
Generates a graph on a scale of 1 to 3000, and the points vary
within a
100 or 200 area, with about an eight (possibly more) being anomalous.
Why am I getting this difference? ** Note: I had to break what is done
in methods 1 & 2 out of the second PROC MEANS because SAS didn't like me
using MEAN(COUNT) with N(FiveMinutes) = COUNT right before it, or even
MEAN(N(FiveMinutes)), this also applied to the STDDEV().)
I am really befuddled by the difference (especially since it is so
dramatic). I am leaning towards Method 2 as providing the correct graph
based on what other people have done where I work.
Thanks,
Benjamen R. Meyer
|