|
Dale McLerran <stringplayer_2@yahoo.com> replied:
> I still must disagree. If you were to allow a burn-in of, say,
> 50 iterations before you start outputting u or e values, then
> your approach would generate data which follow an AR(1) process.
> Note that your code produces
>
> u0 ~ N(0,1),
>
> /* First iteration */
> u1=rho*u0 + err
> err~N(0,1)
> u1~N(0,1+rho**2)
> output u1
> assign u0=u1
>
> /* Second iteration */
> u1=rho*u0 + err
> err~N(0,1)
> u1~N(0,1+rho**2+rho**4)
> output u1
> assign u0=u1
>
> /* Third iteration */
> u1=rho*u0 + err
> err~N(0,1)
> u1~N(0,1+rho**2+rho**4+rho**6)
> output u1
> assign u0=u1
>
> ...
>
> /* K-th iteration */
> u1=rho*u0 + err
> err~N(0,1)
> u1~N(0,1+rho**2+rho**4+rho**6+...+rho**(2K))
> output u1
> assign u0=u1
>
>
> The difference between variance of u1 at iteration m and variance
> of u1 at iteration m+1 is
>
> V(u1{m+1}) - V(u1{m}) = rho**(2*(m+1))
>
>
> For m sufficiently large, the difference in variances does become
> negligible. However, for the first two iterations (m=1), the
> difference in variance is rho**4. That is not small unless
> rho is near zero. The following code shows the variance function
> by iteration for rho=0.7.
>
> data var;
> rho=0.7;
> V=1;
> do iter=1 to 50;
> V = 1 + (rho**2)*V;
> output;
> end;
> run;
>
> proc print data=var;
> by rho;
> id rho;
> var iter V;
> format V 10.8;
> run;
>
>
> The variance approaches a stable value by about the 5th
> iteration. However, for iterations 1 through 3, the variance
> is quite far from the stable value.
>
> Now, to demonstrate the validity of your AR(1) generation process
> you use the code
>
> data test;
> seed=1234579; u0 = rannor(seed);
> do iter=1 to 100000;
> /* u{t} = rho*u{t-1} + err */
> u1 = 0.7*u0 + rannor(seed);
> output;
> u0=u1;
> end;
> run;
> title "Covariance matrix: errors constructed as u{t}=rho*u{t-1} +
> err";
>
> proc corr data=test cov;
> var u:;
> run;
>
>
> This is a very long process, and certainly the variance is stable
> for most of the 100000 iterations. If you were using a long
> sequence like this, then I would have little concern with assuming
> that ALL the data follow the same AR(1) process. However, you have
> relatively short processes (20 or 40 iterations). The first few
> terms of a short process could have strong effect on your parameter
> estimates.
>
> Note that the simulation which I previously posted shows the
> variance structure generated by your data generation process
> through five iterations. That is, the generation model
>
> u0 = rannor(seed);
> u1 = 0.7*u0 + rannor(seed);
> u2 = 0.7*u1 + rannor(seed);
> u3 = 0.7*u2 + rannor(seed);
> u4 = 0.7*u3 + rannor(seed);
> u5 = 0.7*u4 + rannor(seed);
>
> is exactly the same as
>
> u0~N(0,1)
>
> /* First iteration */
> u1=rho*u0 + err
> err~N(0,1)
> u1~N(0,1+rho**2)
> output u1
> assign u0=u1
>
> /* Second iteration */
> u1=rho*u0 + err
> err~N(0,1)
> u1~N(0,1+rho**2+rho**4)
> output u1
> assign u0=u1
>
> ...
>
> /* Fifth iteration */
> u1=rho*u0 + err
> err~N(0,1)
> u1~N(0,1+rho**2+rho**4+rho**6+rho**8+rho**10)
> output u1
> assign u0=u1
>
>
> It most certainly does demonstrate that for the first few
> iterations, your process is NOT AR(1). But the alternative
> process in which we have u{i}=rho*u{i-1} + sqrt(1-rho*rho)*err
> DOES FOLLOW THE AR(1) PROCESS FROM ITERATION 1.
>
> IF YOU WANT YOUR DATA GENERATION MODEL TO BE AR(1) FOR ALL
> OBSERVATIONS, THEN YOU NEED TO MODIFY YOUR CODE. As I stated
> previously, one way to modify your code so that all the data
> follow an AR(1) process is to allow a burn-in period of some M
> iterations before you start accepting data as following the
> AR(1) process. Better yet is to use
>
> u0 = err{0};
> do iter=1 to n;
> u1 = rho*u0 + (1-rho*rho)*err{iter};
> output;
> u0=u1;
> end;
>
> where err{j}~N(0,V). This follows the AR(1) process from the
> very first iteration.
I'm going to side with Dale on this one. Mostly. :-)
I agree with Dale that if you want the generated stream to be correct
from the first iteration, you need to do this:
u0 = rannor(seed);
do i=1 to k;
u1 = rho*u0 + sqrt(1-rho*rho)*rannor(seed);
output;
u0 = u1;
end;
This is the formulation that I learned in intro time series courses,
back when dinosaurs still ruled the earth. Shiling is writing the
time series in a traditional manner, using the duality of AR processes
as infinite-order MA processes. But that's not particularly efficient
for process generation. It *is* really useful once you start using
the backshift operator and you want to do a little theory. I think
that Dale's formulation will make the generation process a lot easier
once Shiling moves on up to more complex time series models, with
higher-order ARIMA(p,d,q) models.
But the idea of 'burn-in' is not new. It's a fundamental part of
some statistical tools. The Gibbs sampler comes to mind, for one.
So Shiling could make this work. I would recommend a burn-in period
of at least M=10 for this case, although the computer cost is so
small that there's really no point in not using a burn-in period
of, say, M=1000. If Shiling builds a much more complex time series
model, he may want to use an M of 100 or so, just to address any
long-term hysteresis of his models.
So here's my version of Shiling's code:
u0 = rannor(seed);
M = 1000; /* just pick something big enough */
do i=1 to k+M;
u1 = rho*u0 + rannor(seed);
if i > M then output;
u0 = u1;
end;
With this sort of minor mod, Shiling's code will do just what he
wants.
Just sticking my nose in,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|