```Date: Wed, 10 Apr 2002 11:48:18 -0700 Reply-To: "Grichuhin, Theodore J" Sender: "SAS(r) Discussion" From: "Grichuhin, Theodore J" Subject: Re: hospital charge data Content-Type: text/plain; charset="iso-8859-1" In the older datasets circa 1990, a value of 999999.99 meant this claim is a cost outlier and to look for a continuation record, which will have the amount over 999,999.99. There is a separate field that flags these records. -----Original Message----- From: Robert Virgile [mailto:virgile@ATTBI.COM] Sent: Wednesday, April 10, 2002 10:30 To: SAS-L@LISTSERV.UGA.EDU Subject: Re: hospital charge data Frank, Some discussion has already taken place here, but I'll add a couple of ideas. First, you may want to distinguish between cleaning your data vs. finding outliers. What does 9999999 really mean? What does 0 really mean? Second, a practical approach might work backwards. How many data points do you really want to check? It's easy enough for proc univariate to find the 99th percentile for each variable. Is checking 1% of the data values too much work? In similar terms, if mean + 3 SD generates too many points to check, then change it to mean + 4 SD. Alternatively, if the data points are largely invalid using your initial cutoff method, then relax it to include more data points. Good luck. Bob V. -----Original Message----- From: Frank Schiffel Newsgroups: bit.listserv.sas-l To: SAS-L@LISTSERV.UGA.EDU Date: Tuesday, April 09, 2002 4:38 PM Subject: hospital charge data >we're trying to determine outliers in a data set of a few million variables. > >obviously there are pure errors, some high values, and something that we just don't want to report as its not meaningful. > >its not a nice Gaussian distributions, there is some skewness in the data. > >what's a good way to do this? put a floor as whatever the insurance pays for ER visits (say \$50), look at the mean plus 3 SD? cap at the 1% and 99% in proc univariate? (obvioiusly I'm running out of ideas) > >I haven't seen how nationally this is dealt with in some of the analysis (sometimes they just sample and don't do anything, assuming their large n will cover it). we're going to report at a county level and some are pretty small. plus once you slice and dice data, you know how that goes. we'll do some demographics on it also in the reporting. > >it helps we can't legally report n less than 20 for an average value. > >but the cleaning is a real problem. > >any comments or suggestions would be helpful. > > >Frank Schiffel, Research Analyst III >Bureau of Health Care Performance Monitoring >Center for Health Information Management and Evaluation >PO Box 570 >Jefferson City, MO 65102-0570 > >573 751-6279 ```

Back to: Top of message | Previous page | Main SAS-L page