|
Hi Venky,
Both of your responses were valid, and Jin Liang also responded with a good
comment as well about a situation involving multiple data sets.
I was not thinking of this in a broad scope. I assumed a situation
involving only one dataset with only one WHERE statement. Of course any
additional statements added, either by mistake or intent, will affect the
outcome in any given data set. But, if you have the same exact single data
set, and you use the same program statements, and the only difference is
the placement of the WHERE statement as an option or a statement, then the
results remain the same. The difference is in what SAS places in the PDV.
It's not my intent to upset anyone, nor do I think of myself as some kind
of officiando. It just so happens that this person asked a question about a
statement that I happened to be learning about at the same time and I
decided to share what I learned, for better or worse.
My apologies to you or to anyone that may have been offended by what I had
to offer. I need friends and support, not enemies. The idea here is that we
share what we know, we learn form each other and our blunders, and we all
become more proficient in the end. Maybe in about 10 years I may be as
knowledgeable as some of y'all, but until then I will keep plugging along
and learning what I can. When I get too far out of bounds, I trust someone
will correct me. I'll learn from those mistakes and march on.
On Mon, 11 Sep 2006 16:55:32 -0400, Venky Chakravarthy <swovcc@HOTMAIL.COM>
wrote:
>Hi Paul,
>
>So what would you need to change in order to have the results to be
>different in TEST1 and TEST2? Notice that I had used the WHERE= data set
>option on the OUTPUT data set and NOT on the INPUT sashelp.class. The point
>I was trying to convey was that the results can be different if the WHERE=
>dataset option is applied on the output. The WHERE statement applies to the
>input as specified. To illustrate, suppose CAROL turned CARL through the
>miracles of modern medicine. To carry this out we assign the new GENDER
>inside the data step. Would the results be the same? Try running:
>
>/* (1) WHERE dataset option */
>data test1 (where = (sex="M")) ;
> set sashelp.class ;
> if upcase(name) = "CAROL" then
> do ;
> sex = "M" ;
> name = "Carl" ;
> end ;
>run ;
>
>/* (2) WHERE statement */
>data test2 ;
> set sashelp.class ;
> where sex = "M" ;
> if upcase(name) = "CAROL" then
> do ;
> sex = "M" ;
> name = "Carl" ;
> end ;
>run ;
>
>Venky Chakravarthy
>
>On Mon, 11 Sep 2006 11:52:59 -0400, Paul St Louis <pstloui@DOT.STATE.TX.US>
>wrote:
>
>>Hi Venky,
>>
>>I am not aware of any differences in the end result according to whether
>>the WHERE option is used as a data statement or a dataset option. As I
>>suspected, in running your sample code below, the end result that is
>>produced is the same.
>>
>>The only difference I know of (and I'm relatively new here, so I can be
>>wrong,) is how the information is organized by SAS to begin with. The
>>dataset option streamlines what SAS sends to the PDV. The data statement
>>option takes what SAS has already sent to the PDV and further narrows down
>>the end results.
>>
>>I encourage the poster to post some example code which produces differing
>>results so that folks can look it over.
>>
>>>>>
>>Hi Paul,
>>
>>This is quite useful if the poster had asked the difference between WHERE
>>and IF. However, the poster asked why there is a difference between (1)
and
>>(2) below:
>>
>>/* (1) WHERE dataset option */
>>data test1 (where = (sex="M")) ;
>> set sashelp.class ;
>>run ;
>>
>>/* (2) WHERE statement */
>>data test2 ;
>> set sashelp.class ;
>> where sex = "M" ;
>>run ;
>>
>>Cheers,
>>
>>Venky Chakravarthy
>>
>>>>>
>>On Fri, 8 Sep 2006 13:25:45 -0400, Paul St Louis <pstloui@DOT.STATE.TX.US>
>>wrote:
>>
>>>From SAS Programming by Example...
>>>
>>>For the Where statement...consider a comparison to the IF statement:
>>>
>>>Are there important differences in the IF and WHERE statement? Both
>produce
>>>identical results, but there are differences between IF and WHERE
>>>statements. The WHERE statement may be more efficient then the subsetting
>>IF
>>>(espcially if you are taking a very small subset from a large file)
>because
>>>it checks on the validity of the condition before the observation is
>>>brought into a temporary holding area, whereas the subsetting IF
statement
>>>brings in the entire observation and then checks the condition to see if
>>>the observation is to be kept or not. This temporary holding area is
>called
>>>the program dta vector(PDF). A WHERE statement can only be used with
>>>variables in the existing data set, whereas a subsetting IF statement can
>>>be used with raw data as well.
>>>
>>>Another difference between a subsetting IF statement and a WHERE
statement
>>>may surface when you use the FIRST. and LAST. logical variables. When the
>>>WHERE condition is not ture, the observation is not brought into the PDV,
>>>and therefore it does not affect the logical values of the FIRST. and
>LAST.
>>>variables.
>>>
>>>Another major difference between IF and WHERE statements is that you may
>>>include WHERE statements in SAS procedures. For example, if you have a
>data
>>>set called ALL(containing the variables ID, SEX, and SALARY), and you
want
>>>a listing only for MALES(M), you could code this as:
>>>
>>>PROC PRINT DATA=ALL;
>>> WHERE SEX = 'M';
>>>RUN;
>>>
>>>This saves the work of creating a new data set just to obtain your
>listing.
>>>
>>>The Where Option limits what is brought into the PDV.
>>>
>>>>>>
>>>On Fri, 8 Sep 2006 12:51:21 -0400, Subscribe SAS-L Anonymous
>>><paulsrowland@YAHOO.COM> wrote:
>>>
>>>>what is the difference between using a where statement in a proc or data
>>>>step versus using the where= option in a proc or data step??
>>>>
>>>>my collegue and I subsetting a dataset using different methods(ie clause
>>vs
>>>>option) on a date field that could possibly be null. We ended up with
>>>>different record counts. Does anyone know why?
>>>>
>>>>thanks!
|