LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (December 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 19 Dec 2009 11:28:08 -0600
Reply-To:   Satindra Chakravorty <satindra@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Satindra Chakravorty <satindra@GMAIL.COM>
Subject:   Re: OT: World Cup Soccer Model
Comments:   To: sudip chatterjee <sudip.memphis@gmail.com>
In-Reply-To:   <1ca751e30912190816s3e9f7364h832727daad252d99@mail.gmail.com>
Content-Type:   text/plain; charset=ISO-8859-1

This is an interesting topic. However, your email raises several questions.

1. the formualtion of your model suggests that predictor variables are based on information known only AFTER the dependant variable daa (i.e. outcome of the World Cup) is known. I am not sure how you would predict the winner of the 2010 tournament if this is the case.

2. I am assuming the unit of observation in your modeling sample is a participating country; hence your N of 32, since there are currently 32 nations in the World Cup. There have been 18 World Cups held so far. If each of the 32 nations played in each of the 18 World Cups, you might have 18 records per country. However, not each one of the 32 countries scheduled to play in the 2010 Cup has appeared in each fo the previous Cups. In fact, I am not mistaken, about 1/2 of these 32 countries may have only participated in only 3-4 Cups. so you have a significant missing data issue on your hands.

3. I don't know what your predictor variables are. For the teams that have had several Cup appearances, you are using observations over a very long period of time (1930 - present). There have been significant changes in many factors that might influence the level of play for a given country over time. Typically, one would want to model using data that is representative of future data that the model will be applied to. Since you don't have the luxury of simply discarding old data which would significantly affect your sample size, are you doing anything else to account for time-based effects on predictor variables?

4. For validation purposes, one would typically have a portion of data similar to that on which the model is trained held out from modeling fitting. The same predictor variables used in the model would be constructed using the holdout validation data and this would then be scored using the model. Again, you probably can't set aside any portion of the modeling data for validation due to a restricted sample size. In such cases one option might be to find somewhat similar data to test the model on. the FIFA Confederation Cups come to mind. I don't know how long of a history this tournament has; however, it is a dress-rehersal for the World cup and many World Cup participating teams play in the Confederations cup. If you could contruct the same model attributes using Confederation Cup data, maybe you could use outcomes from this tournament to validate your World Cup winner prediction model?

5. Finally, have you considered other modeling techniques? A decision tree comes to mind - non-parametric, robust, easily handles missing data, naturally handles interactions, etc.

Satindra.

On Sat, Dec 19, 2009 at 10:16 AM, sudip chatterjee <sudip.memphis@gmail.com>wrote:

> Dear Users, > > I must start with the fact that I am a fanatic soccer fan. Most of you > might > know that in June there will be world cup soccer in South Africa. So, > prediction model are floating in terms of who will win the world cup this > time. My interest, knowledge and experience provoked me to make a > prediction > model ( who will win world cup ) this year. I went to FIFA website & > collected all relevant informations about the team taking part in this year > world cup & also about past world cup facts. I made the model & it seems, I > need to validate the model before I start discussing the results so here > are > my question > > 1) My data collection forced me to model in this way > depVar(t-1) = predVar(t) > > I was wondering if this kind of modeling sounds ok ? Do I need to add any > special remark while doing this kind of modeling, I am using simple > logistic > regression . Where my N= 32 and I my depvar is the information if any > country has won the world cup before from 1930 - 2006. Now my predictors > are > current informations. > > 2) After my model in logistic regression I want check the results through > simulation process what kind of proc's will help me to do that ? > > I must say that I have no commercial interest but shear interest & I work > on > this model only during weekends. > > I wish all of you an advanced Merry Christmas ! > > regards >


Back to: Top of message | Previous page | Main SAS-L page