LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (June 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 2 Jun 2006 10:01:14 -0700
Reply-To:   Mak <makgeha@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Mak <makgeha@GMAIL.COM>
Organization:   http://groups.google.com
Subject:   Messy Data editing HELP!!!!!!!!
Comments:   To: sas-l@uga.edu
Content-Type:   text/plain; charset="iso-8859-1"

I am trying to edit a large data file (about 1 million records) on dairy cattle survival analysis. The problem that I am facing is that these are field data collected from farmers and there are a lot of irregularities that I want to get rid of. I have 12 variables in the data set for each cow with records on different lactations. The response variable is disease and is coded from 0 to 9 (each code represents a certain disease and 0 means no disease reported). The format is as follows: Herd| Cow# | Lactation# | Disease etc...

The thing is that I have records on cows that are for example in the second lactation and reported having a disease (thus being taken out of the herd) and then the same cow appears again in the third lactation which makes no sense at all. Another problem is that I have for the same cow at the same lactation two disease scores, one that shows no disease and the other shows a disease. I want to write a program that deals with these cases. For the first case, look up the lactation number and check if, after a disease is reported, the cow shows up again in the next lactation, then I want the disease score to be changed to 0. For the second case I want the program to check if the cow appears in the next lactation then obviously the reported records that shows a disease is wrong then I want to delete it and keep the right record, and in case the cow doesn't appear in the next lactation just delete the two records since we don't have any basis to judge on which information is correct.

An example of the cases is as follows: Herd|Cow# | Lacation#| Disease 1 | 1 | 01 | 0 1 | 1 | 02 | 5 1 | 1 | 03 | 2 1 | 1 | 04 | 0 1 | 1 | 05 | 5

obviously in this case, the disease report in lactation 2 & 3 is wrong and I want it to be changed to 0 or to be on the safer side delete all the records about this specific cow. We might have the same cow number but different herd numbers (herds are the blocking factor)

Herd|Cow# | Lacation#| Disease 1 | 2 | 01 | 0 1 | 2 | 01 | 5 1 | 2 | 02 | 0 1 | 2 | 03 | 0 1 | 2 | 04 | 5

obviously in this case, for lactation 1 we should keep the record that shows no disease and delete the other one or as in the previous case delete all the information about cow# 2

a third problem that i am facing is having lags between lactations for example Herd|Cow# | Lacation#| Disease 1 | 3 | 01 | 0 1 | 3 | 03 | 0 1 | 3 | 04 | 2

information about lactation 2 in this case is missing and so i want to delete all the records on that particular cow.

I have been cracking my scull on this issue for the past couple of month with no successfull result.

I would really appreciate it if there would be someone out there to help me out.

Thanks everybody.


Back to: Top of message | Previous page | Main SAS-L page