Date: Wed, 1 Jul 2009 19:12:49 -0700
Reply-To: Daniel Nordlund <djnordlund@VERIZON.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Daniel Nordlund <djnordlund@VERIZON.NET>
Subject: Re: Need help with self-selection bias....
In-Reply-To: <200907020047.n620CHXk021557@malibu.cc.uga.edu>
Content-type: text/plain; charset=iso-8859-1
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Pete
> Sent: Wednesday, July 01, 2009 5:48 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Need help with self-selection bias....
>
> Hi Folks-
>
> Longtime SAS programmer here, but a newbie to SAS procedures
> that correct
> for self-selection bias.
>
> I have non-random data on energy project costs that was
> self-reported by a
> handful of companies. Some companies censored the project
> data by only
> sending us a share of their projects and the associated costs
> (most likely
> their best performing and most costly projects were
> submitted). Anyway, I
> do have some information on the total number of projects that
> each company
> undertook as well as the number of projects that were
> actually submitted to
> us.
>
> I have two questions:
> 1) Is there a way to re-weight the sample data using acceptable bias
> corrections so that I can report average project costs that
> may be more
> indicative of the true population?
>
> 2) If I were to model project costs (dependent) against a handful of
> independent variables, does it sound like the Heckman
> two-stage method using
> the proportion of submitted projects makes the most sense?
> Are there better
> methods out there?
>
> Many thanks~you folks have been very helpful in the past.
Pete,
I think we are going to need a lot more information before you get "useful"
advice.
1. What are your study questions? Are you trying to evaluate some
intervention or are you just doing descriptive analyses?
2. How many companies are we talking?
3. What is the range of "proportion of submitted projects"? Are we talking
about a range like 90-100% or is it more like 60-100% or even worse?
Depending on your answers (especially to 3.) you may or may not have any
options. Your main problem appears to be missing data, and the data is not
missing at random. You might want to look at multiple imputation methods.
If the proportion of submitted projects is never very low, you might just
downweight those companies proportional to their submission percentage. If
the submission percentage is very low or a lot of companies have held back
projects, I am not hopeful about doing anything useful.
You asked about reweighting based on "acceptable bias corrections." Did you
have something particular in mind?
Your missing data problem is different from the typical selection bias
scenario where observation units (patients, companies, ...) self-select into
a treatment/intervention group. In this scenario, one could use a 2-stage
Heckman (or other type of propensity score analysis) to try to adjust for
differences based on observable characteristics. I don't know if this
applies in your situation, but I suspect it won't be helpful in dealing with
your missing data. Although, if you know something about the
characteristics of the projects not submitted (not just the percentage of
unsubmitted projects) you might be able to do something, but you will need
to be able to model the probability of a project being submitted.
If you can provide more context to your research/evaluation task, someone
may be able to give you better help.
Dan
Daniel Nordlund
Bothell, WA USA
|