LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 6 Dec 2006 22:33:54 -0800
Reply-To:   David L Cassell <davidlcassell@MSN.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   David L Cassell <davidlcassell@MSN.COM>
Subject:   Re: help to choose stat analysis
In-Reply-To:   <1165412747.370276.253860@73g2000cwn.googlegroups.com>
Content-Type:   text/plain; format=flowed

tpejovich@GOOGLEMAIL.COM wrote: > >Hi, > >I have a following problem. >I have weather data (Wind speed, direction, min temp, rain and gust) >and weather delays at the airport. When there were no weather related >delays values are set to 0. > >I've did find that correlation between weather parameters and delays >exist and that is very low. >Now I want to find somehow (if possible) tresholds of each weather >element , when they realy start to matter eg. delays dramaticly istart >to increase. > >Data is not normaly distributed and not linear. > >Some people so far suggested ordered probit or loglinear analysis if I >divide parameters or delays into groups. > >Any other suggestions? I've never done anything with non-parametric >data so I really could use all the help I can get! > >Thanks in advance! >Tash

Unfortunately, your data are not normally distributed and not looking like a continuous variable (in part) because of bad database management decisions.

You have a large number of zeroes. These are not real. You (or some people working with you) have chosen to stick fake values in for the data when the delays are zeroes. This means that all those values are actually useless. If you cannot find out the real values for your regressors when the delays are zero, then you need to throw out ALL the delay=0 data and simply model postive DELAY as a function of your regressors.

As others have noted, you cannot get adequate models without accounting for effects that are non-local. If I were modeling flight delays at PDX (Portland Oregon, USA) I would insist on having additional data on weather issues for all the majors airline hubs which feed into Portland (Chicago, Denver, Minneapolis, Dallas, San Francisco, ...) because most of the time at PDX, flight delays are caused by *inbound* delays. Unless you have a variable which separates inbound causes of delays from local (outbound) causes, you cannot separate these impacts.

If I were you, I would complain to the people who provided the data, and see if there are ways of repairing the flaws that are going to ruin your modeling efforts.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Stay up-to-date with your friends through the Windows Live Spaces friends list. http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mk


Back to: Top of message | Previous page | Main SAS-L page