LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2009, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 1 Jul 2009 19:12:49 -0700
Reply-To:     Daniel Nordlund <djnordlund@VERIZON.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Daniel Nordlund <djnordlund@VERIZON.NET>
Subject:      Re: Need help with self-selection bias....
In-Reply-To:  <200907020047.n620CHXk021557@malibu.cc.uga.edu>
Content-type: text/plain; charset=iso-8859-1

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Pete > Sent: Wednesday, July 01, 2009 5:48 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Need help with self-selection bias.... > > Hi Folks- > > Longtime SAS programmer here, but a newbie to SAS procedures > that correct > for self-selection bias. > > I have non-random data on energy project costs that was > self-reported by a > handful of companies. Some companies censored the project > data by only > sending us a share of their projects and the associated costs > (most likely > their best performing and most costly projects were > submitted). Anyway, I > do have some information on the total number of projects that > each company > undertook as well as the number of projects that were > actually submitted to > us. > > I have two questions: > 1) Is there a way to re-weight the sample data using acceptable bias > corrections so that I can report average project costs that > may be more > indicative of the true population? > > 2) If I were to model project costs (dependent) against a handful of > independent variables, does it sound like the Heckman > two-stage method using > the proportion of submitted projects makes the most sense? > Are there better > methods out there? > > Many thanks~you folks have been very helpful in the past.

Pete,

I think we are going to need a lot more information before you get "useful" advice. 1. What are your study questions? Are you trying to evaluate some intervention or are you just doing descriptive analyses? 2. How many companies are we talking? 3. What is the range of "proportion of submitted projects"? Are we talking about a range like 90-100% or is it more like 60-100% or even worse?

Depending on your answers (especially to 3.) you may or may not have any options. Your main problem appears to be missing data, and the data is not missing at random. You might want to look at multiple imputation methods. If the proportion of submitted projects is never very low, you might just downweight those companies proportional to their submission percentage. If the submission percentage is very low or a lot of companies have held back projects, I am not hopeful about doing anything useful.

You asked about reweighting based on "acceptable bias corrections." Did you have something particular in mind?

Your missing data problem is different from the typical selection bias scenario where observation units (patients, companies, ...) self-select into a treatment/intervention group. In this scenario, one could use a 2-stage Heckman (or other type of propensity score analysis) to try to adjust for differences based on observable characteristics. I don't know if this applies in your situation, but I suspect it won't be helpful in dealing with your missing data. Although, if you know something about the characteristics of the projects not submitted (not just the percentage of unsubmitted projects) you might be able to do something, but you will need to be able to model the probability of a project being submitted.

If you can provide more context to your research/evaluation task, someone may be able to give you better help.

Dan

Daniel Nordlund Bothell, WA USA


Back to: Top of message | Previous page | Main SAS-L page