LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (May 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 May 2006 15:53:26 -0400
Reply-To:     Nathaniel_Wooding@DOM.COM
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Nat Wooding <Nathaniel_Wooding@DOM.COM>
Subject:      Re: Lab data / Documentation of Below Detectable Limits (BDL) or
              Above Threshhold Value (ATV)
Comments: To: alte@uni-greifswald.de
In-Reply-To:  <4459C306.2080301@uni-greifswald.de>
Content-Type: text/plain; charset="ISO-8859-1"

Dietrich

For quite a number of years I have maintained a couple data bases of laboratory results or data from reports generated from these (ie, I don't have access to the original lab data but can harvest the stuff in the reports). These data are from enivronmental water , soil, etc samples or from water discharged by power plants. These data include just about all of the catagories that you mention -- detected amounts, values that are below detection (I'll call these bdl) represented by '<some number" or by '<QL',words, etc.

My solution has always been to store the data as a character value and store exactly what the lab or report gave me. In our case, we report these data to various agencies who have differing thoughts as to how to handle bdl data -- use the number, use half, use 0, ...; Hence, I cannot give up the information stored in that "<". Also, there are times when we need to estimated loadings, ie, the mass of stuff discharged over some time. Here, I multiply the flow by the concentration. If the concentration is bdl, then I definitely want to be able to state that the loading is less than the value given.

As I recall, you are associated with the medical college of a university. I have zero knowledge of any standards that may exist for medical data or what the practices are in analyzing these data.

As to using special missing values, I would question whether you have too many parameters and possible values for each parameter to use this approach (this assumes that a single data set would contain data on multiple parameters and that there may be a number of detection limits ). This approach, if it would work, will avoid the step of having to strip off the "<" and create a number but I would hope that someone would use the non-detect value (values) as part of the analysis. In the case of our data, at least, this is important information albeit a bit sticky to deal with.

You asked about a public file:

The United States Geological Survey (USGS) has started posting various water data online. The following very lengthy url is for water quality data from the Potomac River in the state of West Virginia

http://nwis.waterdata.usgs.gov/wv/nwis/qwdata?search_station_nm=potomac&search_station_nm_match_type=beginning&sort_key=site_no&group_key=NONE&sitefile_output_format=html_table&column_name=agency_cd&column_name=site_no&column_name=station_nm&column_name=lat_va&column_name=long_va&column_name=state_cd&column_name=county_cd&column_name=alt_va&column_name=huc_cd&begin_date=&end_date=&format=html_table&inventory_output=0&rdb_inventory_output=file&date_format=YYYY-MM-DD&rdb_compression=file&qw_sample_wide=0&survey_email_address=&list_of_search_criteria=search_station_nm

A perhaps easier way to reach the site would be to go to

http://nwis.waterdata.usgs.gov/wv/nwis/qwdata

there , check the box in the second column labeled site name. Submit this and on the next page enter "Potomac" in the site name field and click on the 'table of sites' radio button. If you scan the columns labeled "ammonia", you will see a few values with "<" prefixes.

This particular site is relatively new. In dealing with old tape-format USGS data, I seem to recall that they would present the number and included a column with a flag which indicated when something was bdl.

I do have one suggestion if you are going to offer these data to general users: offer a link to some sort of narrative that discusses values that are not detected and how one may need to use them in analyses.

Thanks for an interesting topic. I hope that we see some more replies.

Nat Wooding

Dietrich Alte <alte@UNI-GREIFSW ALD.DE> To Sent by: "SAS(r) SAS-L@LISTSERV.UGA.EDU Discussion" cc <SAS-L@LISTSERV.U GA.EDU> Subject Lab data / Documentation of Below Detectable Limits (BDL) or 05/04/2006 05:01 Above Threshhold Value (ATV) AM

Please respond to alte@uni-greifswa ld.de

Dear SAS-Lers,

we just had a very engaged discussion in our group how to represent special non-numerical off the scale (OTS) data from laboratory analyses.. The original data from the lab software are character vars, with OTS values represented as "<2.0" or ">100" and on-scale data as "3.0", "3.7", etc.. There are also (really) missing data like "no analysis made -->not enough blood". (A little extra complication is, that through (ir)regular calibration of lab machines, the scale limits undergo small changes from time to time, like <2.0 could change to <2.1, then <1.9 etc.).

Our task is to make the data _publicly_ available in numerical format for many (>100) SAS and SPSS users (in their own data formats).

The SPSS party (mostly medical/dental folk, some rather fresh in statistical analysis) likes the data in pure numerical form, e.g. "<2.0" transformed to 2.0 and then labeled as "<2.0" (to make sure OTS data are not dropped in (numerical) analyses when coded as missing).

The SAS party (statisticians, (bio-)mathematicians) sees the potential errors (bias, high influence etc.), when these "imputed" values enter regression analyses and thus rather votes for using methods that can handle censored data, and represent the data either through a) coding them as special .X-like missing values (i.e. technically missing, but not semantically) with relevant labels ("<2.0") or b) leave the data in character form as they come from the lab and have each analyst decide him/herself what to do.

Questions: 1) What are your experiences with this kind of data? 2) What is the best way for this task for a public-use-file? (with many users, who we sometimes do not know) 3) Are there any official/international rules how to do it?

Very interested in your input!

Regards

Dietrich

-- ---------------------------------------------------------------- DIETRICH ALTE, Dipl.-Statistiker, Dr. rer. med. Projektmanager "Study of Health in Pomerania (SHIP)" Institut für Epidemiologie & Sozialmedizin EMA-Universität Greifswald - Medizinische Fakultät Walther-Rathenau-Str. 48, D-17487 Greifswald, Germany URL www.medizin.uni-greifswald.de/epidem/ Phone ++49(0)3834-867713, Fax ++49(0)3834-866684 ----------------------------------------------------------------

----------------------------------------- CONFIDENTIALITY NOTICE: This electronic message contains information which may be legally confidential and/or privileged and does not in any case represent a firm ENERGY COMMODITY bid or offer relating thereto which binds the sender without an additional express written confirmation to that effect. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.


Back to: Top of message | Previous page | Main SAS-L page