LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 12 Dec 2006 15:49:40 -0500
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: Stats SAS Project for college statistics course,
              issue with results
Comments: To: rlira007@gmail.com
In-Reply-To:  <1165915833.853418.239770@80g2000cwy.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"

'Best model' in this situation might not be good enough for prime time, but may help you learn much about data analysis and statistical modelling. I'll offer a few comments about the data analysis and predictive modelling side of your questions, but leave the statistical estimation side of your questions to real statisticians.

First you'll need to tighten up your description of your data. It looks as if you are trying to fit a model that has six continuous rate and estimated mean variables, plus a categorical variable (Group) to estimated GNP per capita. No respectable econometrician would claim that your predictors would support a good predictive model for per capita GNP, so let's assume that you are trying as an academic exercise to develop the best model for these data. I see no reason to expect predictors of a model based on these data to be 'in whack'.

SAS provides and your instructor has helped by specifying a number of useful exploratory and diagnostic statistics and graphs. Focus on the residuals (prediction errors). A bad model usually violates a statistical estimation method's assumptions about the distribution of prediction errors; for example, marked deviations from a random distribution of prediction errors (OLS). Residuals much larger than most (outliers) may indicate that a model omits important predictive variables, and patterns of sets of residuals suggest a wrong form of statistical model.

A quick review of documentation of diagnostic statistics, such as Cook's D, and attention to plots of residuals, will help you learn more about assumptions underlying statistical models. Simpler linear models certainly can generate negative predictions of variables that have strictly positive domains. Exponential transformations of dependent and predictive values may improve the fit of the model to data.

Always keep in mind the central question: what determines the variable that you are trying to predict (per capita GNP). As George Box once said, "All models are wrong. Some are useful." You will learn that eventually everything determines everything else. Since death rates and life expectancies likely depend as much on per capita GNP as per capital GNP depends on them, I'd focus on the fit of variables that could have positive or adverse impact on economic conditions. Remember that an implausible linear relation between a predictor and an outcome requires support from extraordinary evidence. In the sample that you have (likely summarized over a short interval of time), the chances of discovering a misleading sample likely exceed the chances of discovering a previously unknown truth. A good fit does not necessarily yield a good model. Sig

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of rlira007@gmail.com Sent: Tuesday, December 12, 2006 4:31 AM To: sas-l@uga.edu Subject: Stats SAS Project for college statistics course, issue with results

Hi, I'm attempting to analyze data for 97 nations with predictors 7 or 8, I believe. Anyway, here is the direct request: Analyze these data to estimate the best model that describes the relationship between the response (Gross National Product) and the predictors (all other variables except for country). Can these variables be used to predict GNP? Which of these variables are the most important? Are there significant differences among the 6 groups of countries?

This is the program I put together for this problem:

data GNPPredictor; input index LBR DeathRate InfantDeath LifeEXPM LifeEXPF GNP Group Country $; cards; 1 24.7 5.7 30.8 69.6 75.5 600 1 Albania 2 12.5 11.9 14.4 68.3 74.7 2250 1 Bulgaria 3 13.4 11.7 11.3 71.8 77.7 2980 1 Czechoslovakia 4 12 12.4 7.6 69.8 75.9 * 1 Former_E._Germany 5 11.6 13.4 14.8 65.4 73.8 2780 1 Hungary 6 14.3 10.2 16 67.2 75.7 1690 1 Poland 7 13.6 10.7 26.9 66.5 72.4 1640 1 Romania 8 14 9 20.2 68.6 74.5 * 1 Yugoslavia 9 17.7 10 23 64.6 74 2242 1 USSR 10 15.2 9.5 13.1 66.4 75.9 1880 1 Byelorussian_SSR 11 13.4 11.6 13 66.4 74.8 1320 1 Ukrainian_SSR 12 20.7 8.4 25.7 65.5 72.7 2370 2 Argentina 13 46.6 18 111 51 55.4 630 2 Bolivia 14 28.6 7.9 63 62.3 67.6 2680 2 Brazil 15 23.4 5.8 17.1 68.1 75.1 1940 2 Chile 16 27.4 6.1 40 63.4 69.2 1260 2 Columbia 17 32.9 7.4 63 63.4 67.6 980 2 Ecuador 18 28.3 7.3 56 60.4 66.1 330 2 Guyana 19 34.8 6.6 42 64.4 68.5 1110 2 Paraguay 20 32.9 8.3 109.9 56.8 66.5 1160 2 Peru 21 18 9.6 21.9 68.4 74.9 2560 2 Uruguay 22 27.5 4.4 23.3 66.7 72.8 2560 2 Venezuela 23 29 23.2 43 62.1 66 2490 2 Mexico 24 12 10.6 7.9 70 76.8 15540 3 Belgium 25 13.2 10.1 5.8 70.7 78.7 26040 3 Finland 26 12.4 11.9 7.5 71.8 77.7 22080 3 Denmark 27 13.6 9.4 7.4 72.3 80.5 19490 3 France 28 11.4 11.2 7.4 71.8 78.4 22320 3 Germany 29 10.1 9.2 11 65.4 74 5990 3 Greece 30 15.1 9.1 7.5 71 76.7 9550 3 Ireland 31 9.7 9.1 8.8 72 78.6 16830 3 Italy 32 13.2 8.6 7.1 73.3 79.9 17320 3 Netherlands 33 14.3 10.7 7.8 67.2 75.7 23120 3 Norway 34 11.9 9.5 13.1 66.5 72.4 7600 3 Portugal 35 10.7 8.2 8.1 72.5 78.6 11020 3 Spain 36 14.5 11.1 5.6 74.2 80 23660 3 Sweden 37 12.5 9.5 7.1 73.9 80 34064 3 Switzerland 38 13.6 11.5 8.4 72.2 77.9 16100 3 U.K. 39 14.9 7.4 8 73.3 79.6 17000 3 Austria 40 9.9 6.7 4.5 75.9 81.8 25430 3 Japan 41 14.5 7.3 7.2 73 79.8 20470 3 Canada 42 16.7 8.1 9.1 71.5 78.3 21790 3 U.S.A. 43 40.4 18.7 181.6 41 42 168 5 Afghanistan 44 28.4 3.8 16 66.8 69.4 6340 4 Bahrain 45 42.5 11.5 108.1 55.8 55 2490 4 Iran 46 42.6 7.8 69 63 64.8 3020 4 Iraq 47 22.3 6.3 9.7 73.9 77.4 10920 4 Israel 48 38.9 6.4 44 64.2 67.8 1240 4 Jordan 49 26.8 2.2 15.6 71.2 75.4 16150 4 Kuwait 50 31.7 8.7 48 63.1 67 * 4 Lebanon 51 45.6 7.8 40 62.2 65.8 5220 4 Oman 52 42.1 7.6 71 61.7 65.2 7050 4 Saudi_Arabia 53 29.2 8.4 76 62.5 65.8 1630 4 Turkey 54 22.8 3.8 26 68.6 72.9 19860 4 United_Arab_Emirates 55 42.2 15.5 119 56.9 56 210 5 Bangladesh 56 41.4 16.6 130 47 49.9 * 5 Cambodia 57 21.2 6.7 32 68 70.9 380 5 China 58 11.7 4.9 6.1 74.3 80.1 14210 5 Hong_Kong 59 30.5 10.2 91 52.5 52.1 350 5 India 60 28.6 9.4 75 58.5 62 570 5 Indonesia 61 23.5 18.1 25 66.2 72.7 * 5 Korea 62 31.6 5.6 24 67.5 71.6 2320 5 Malaysia 63 36.1 8.8 68 60 62.5 110 5 Mongolia 64 39.6 14.8 128 50.9 48.1 170 5 Nepal 65 30.3 8.1 107.7 59 59.2 380 5 Pakistan 66 33.2 7.7 45 62.5 66.1 730 5 Philippines 67 17.8 5.2 7.5 68.7 74 11160 5 Singapore 68 21.3 6.2 19.4 67.8 71.7 470 5 Sri_Lanka 69 22.3 7.7 28 63.8 68.9 1420 5 Thailand 70 31.8 9.5 64 63.7 67.9 * 5 Vietnam 71 35.5 8.3 74 61.6 63.3 2060 6 Algeria 72 47.2 20.2 137 42.9 46.1 610 6 Angola 73 48.5 11.6 67 52.3 59.7 2040 6 Botswana 74 46.1 14.6 73 50.1 55.3 1010 6 Congo 75 38.8 9.5 49.4 57.8 60.3 600 6 Egypt 76 48.6 20.7 137 42.4 45.6 120 6 Ethiopia 77 39.4 16.8 103 49.9 53.2 390 6 Gabon 78 47.4 21.4 143 41.4 44.6 260 6 Gambia 79 44.4 13.1 90 52.2 55.8 390 6 Ghana 80 47 11.3 72 56.5 60.5 370 6 Kenya 81 44 9.4 82 59.1 62.56 5310 6 Libya 82 48.3 25 130 38.1 41.2 200 6 Malawi 83 35.5 9.8 82 59.1 62.5 960 6 Morocco 84 45 18.5 141 44.9 48.1 80 6 Mozambique 85 44 12.1 135 55 57.5 1030 6 Namibia 86 48.5 15.6 105 48.8 52.2 360 6 Nigeria 87 48.2 23.4 154 39.4 42.6 240 6 Sierra_Leone 88 50.1 20.2 132 43.4 46.6 120 6 Somalia 89 32.1 9.9 72 57.5 63.5 2530 6 South_Africa 90 44.6 15.8 108 48.6 51 480 6 Sudan 91 46.8 12.5 118 42.9 49.5 810 6 Swaziland 92 31.1 7.3 52 64.9 66.4 1440 6 Tunisia 93 52.2 15.6 103 49.9 52.7 220 6 Uganda 94 50.5 14 106 51.3 54.7 110 6 Tanzania 95 45.6 14.2 83 50.3 53.7 220 6 Zaire 96 51.1 13.7 80 50.4 52.5 420 6 Zambia 97 41.7 10.3 66 56.5 60.1 640 6 Zimbabwe ; proc gplot data=GNPPredictor; plot GNP*LBR; plot GNP*DeathRate; plot GNP*InfantDeath; plot GNP*LifeEXPM; plot GNP*LifeEXPF; plot GNP*Group; run;

proc univariate data=GNPPredictor plot normal; var GNP; histogram GNP / normal kernel(L=2); qqplot GNP / normal (L=1 mu=est sigma=est); run;

proc reg data=GNPPredictor; model GNP = LBR DeathRate InfantDeath LifeEXPM LifeEXPF Group/p r clb cli clm; plot r.*p.; run; quit;

Okay, My issue is that after running, the end results of predicted GNP as opposed to my stated GNP data are completely out of whack. How can I have a predicted GNP that's in the negative? Where is my program wrong?

Hopefully someone can give me at least a hint of what to do.


Back to: Top of message | Previous page | Main SAS-L page