Date: Wed, 14 Mar 2007 15:23:14 +1100
Reply-To: "Johnson, David" <David.Johnson@CBA.COM.AU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Johnson, David" <David.Johnson@CBA.COM.AU>
Subject: Re: Randomly splitting a dataset
Content-Type: text/plain; charset="us-ascii"
Every day, another nugget of knowledge; a former colleague called them
"factoids".
I thought the SurveySelect procedure was part of one of the products
other than SAS/Stat. Had I known this two days ago, I would not have
coded about five steps to select one sample record from each of six
class values (A to F), where a random half of the records had values
between 0 and 1 and the other half have values between 1 and 2.
Shall I spend an hour now RTFMing and muddle my way through getting this
from the proc, or would you like to take pity on me Dave and suggest
some syntax? I've written some code to generate a sample data set.
Data CLIENTS;
Do CLIENTID = 1 To 10000 By 1;
SECCLASS = Substr( "ABCDEF", Ceil( RanUni( 1234) * 6), 1);
LOSSRATE = RanUni( 7890) * 2;
Output;
End;
Run;
Kind regards
David
/* - - - - - - - - - - - - - - - - - - - - -
It is a capital mistake to theorize before one has data.
Insensibly one begins to twist facts to suit theories, instead of
theories to suit facts.
-Sir Arthur Conan Doyle
- - - - - - - - - - - - - - - - - - - - - */
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
David L Cassell
Sent: Wednesday, 14 March 2007 2:46 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Randomly splitting a dataset
kaylom01@ODJFS.STATE.OH.US wrote:
>
>Hello- I was wondering if anyone can help with SAS coding to randomly
>split a dataset into two parts. I have a dataset and I want to randomly
>divide it into two parts so that I can build a logistic regression
>model with one half of the data and then test the model on the second
>half. Any information/suggestions would be greatly appreciated!!! Thank
>you for your help and time.
Here's but one way:
proc surveyselect data=YourData out=OutStuff seed=49487
outall
samprate=50;
run;
Now you have a new variable SELECTED in your output, and 50% of your
data will have SELECTED=1 while the rest have SELECTED=0. Split on
that, using a WHERE clause in your data set options.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Rates near 39yr lows! $430K Loan for $1,399/mo - Paying Too Much?
Calculate new payment
http://www.lowermybills.com/lre/index.jsp?sourceid=lmb-9632-18226&moid=7
581
************** IMPORTANT MESSAGE *****************************
This e-mail message is intended only for the addressee(s) and contains information which may be
confidential.
If you are not the intended recipient please advise the sender by return email, do not use or
disclose the contents, and delete the message and any attachments from your system. Unless
specifically indicated, this email does not constitute formal advice or commitment by the sender
or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries.
We can be contacted through our web site: commbank.com.au.
If you no longer wish to receive commercial electronic messages from us, please reply to this
e-mail by typing Unsubscribe in the subject line.
**************************************************************