| Date: | Wed, 6 Dec 2006 22:33:54 -0800 |
| Reply-To: | David L Cassell <davidlcassell@MSN.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | David L Cassell <davidlcassell@MSN.COM> |
| Subject: | Re: help to choose stat analysis |
| In-Reply-To: | <1165412747.370276.253860@73g2000cwn.googlegroups.com> |
| Content-Type: | text/plain; format=flowed |
|---|
tpejovich@GOOGLEMAIL.COM wrote:
>
>Hi,
>
>I have a following problem.
>I have weather data (Wind speed, direction, min temp, rain and gust)
>and weather delays at the airport. When there were no weather related
>delays values are set to 0.
>
>I've did find that correlation between weather parameters and delays
>exist and that is very low.
>Now I want to find somehow (if possible) tresholds of each weather
>element , when they realy start to matter eg. delays dramaticly istart
>to increase.
>
>Data is not normaly distributed and not linear.
>
>Some people so far suggested ordered probit or loglinear analysis if I
>divide parameters or delays into groups.
>
>Any other suggestions? I've never done anything with non-parametric
>data so I really could use all the help I can get!
>
>Thanks in advance!
>Tash
Unfortunately, your data are not normally distributed and not
looking like a continuous variable (in part) because of bad database
management decisions.
You have a large number of zeroes. These are not real. You (or
some people working with you) have chosen to stick fake values in
for the data when the delays are zeroes. This means that all those
values are actually useless. If you cannot find out the real values for
your regressors when the delays are zero, then you need to
throw out ALL the delay=0 data and simply model postive DELAY as
a function of your regressors.
As others have noted, you cannot get adequate models without
accounting for effects that are non-local. If I were modeling
flight delays at PDX (Portland Oregon, USA) I would insist on having
additional data on weather issues for all the majors airline hubs which
feed into Portland (Chicago, Denver, Minneapolis, Dallas, San Francisco,
...) because most of the time at PDX, flight delays are caused by
*inbound* delays. Unless you have a variable which separates
inbound causes of delays from local (outbound) causes, you cannot
separate these impacts.
If I were you, I would complain to the people who provided the data,
and see if there are ways of repairing the flaws that are going to
ruin your modeling efforts.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Stay up-to-date with your friends through the Windows Live Spaces friends
list.
http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mk
|