Date: Fri, 20 Dec 2002 09:39:05 -0500
Reply-To: Quentin McMullen <Quentin_McMullen@BROWN.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Quentin McMullen <Quentin_McMullen@BROWN.EDU>
Subject: Re: SQLheads (was RE: new "clashvars" macro)
In-Reply-To: <08B08C9FA5EBD311A2CC009027D5BF8104BFC20C@remailnt2-re01.westat.com>
Content-Type: text/plain; charset="iso-8859-1"
Many thanks to Sig, Ian, Dianne, Ed, and David for your comments. (4 out of
5 responses to my post came from SAS Mecca, that's what attracted me to
Westat in the first place. Why did I leave? : )
I am happy to accept Ian's point that I gave up my imagined control long
ago. Indeed, even as I wrote that the SAS data step language allowed me to
communicate "explicit control of the process", it seemed a bit silly to me.
Because of course there is so much happening implicitly (implicit loop of
the datastep, initializiation of *some* variables at top of the loop, etc.).
My exploration of SAS started a few years back as a user, with no
programming background. At the time I thought of the SAS data step as a
communication of a "goal" rather than "instructions". That is to say, I
recognized I was writing instructions, but I had no idea how they were being
carried out. And 6 months into my SAS programming career, when I went to a
SI training course, I was confused as to why the instructor spent so much
time talking about the Program Data Vector, which seemed like unnecessary
minutiae to me.
And since them I have come to believe that to understand a SAS data step, I
need to think in terms of the sequential processing of instructions. So
when I'm debugging a data step, I might write out the PDV on paper (still
haven't learned the data step debugger), and work line by line through the
data step. And when I do this, I can (hopefully) understand the wonderful
results that come from a step with a DOW-loop or two, or a merge with
multiple by-values in different data sets, or....
So I suppose my real mistake was hoping that the path I took in learning SAS
(moving from believing in an automagic data step to thinking about
individual processes), would serve me well in other languages. I suppose if
I want to embrace SQL, I need to give up my (imagined) control over
processes, and accept the higher level of abstraction that communication via
SQL allows. And I will admit that the few times I have played with setting
up some data structures in a manner that would make them useful for SQL, it
has always been an educational/enjoyable experience. That is to say,
focussing on logical processing of the data seems to encourage logical
structuring of the data, and vice-versa. Since Sig's bookcase is now more
than 1 staircase away, I suppose I'll have to visit Amazon for a little
Christmas reading.
Kind Regards,
--Quentin
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
> Sigurd Hermansen
> Sent: Thursday, December 19, 2002 6:16 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: SQLheads (was RE: new "clashvars" macro)
>
>
> Bravo, Ian. Despite my penchant for needling a certain SAS-L icon every so
> often, I recognize your expertise in SQL. To what you have stated so well
> already, I would add one simple but important idea. That is, if
> two programs
> produce the same tabular data object, one can interchange the two in a
> function that yields that same tabular data object. I recall overhearing
> this specific instance of a general principle some years ago when Harlan
> Mills was holding court on the subject of program verification.
> If we design
> a program as a composition of functions, some provided by the programming
> environment and others by a programmer, the program becomes a verifiable
> composite function. A logic programming language such as SQL puts
> functions
> that yield tabular data objects in an abstract framework that embeds these
> functions properly (in the sense of a logic diagram) within a composite
> function. SQL queries, like SAS data steps before them, encapsulate the
> underlying operations of devices so each query simply yields a
> tabular data
> object.
>
> While any one SQL compiler might fail to implement a composite function
> properly, a tested SQL compiler will fail far less often than an average
> programmer working under deadline pressure. This generalization does not
> apply to Quentin, in that we know him to be an exceptionally
> perceptive and
> thoughtful programmer, but it does have general implications. One can
> improve the accuracy of programming by raising the level of
> abstraction from
> variables in a sequence of records to tabular data objects (or
> even abstract
> relations). Rather than trace operations on data items through a
> sequence of
> assignments, conditional branches, and loops (the Turing-complete language
> of Paul Dorfman), Quentin might test the implications of a logical process
> (a query) applied to a set of tabular data objects (SAS datasets)
> by testing
> the yield of that query (another SAS dataset). While this method
> of checking
> a program ignores information in the sequencing of rows of data
> and makes it
> difficult to view intermediate results, it does focus on the
> relation of the
> final result to the data sources.
>
> Sig
>
> -----Original Message-----
> From: Ian Whitlock
> Sent: Wednesday, December 18, 2002 2:09 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: SQLheads (was RE: new "clashvars" macro)
>
>
> Quentin,
>
> In part you wrote:
>
> >>>>>>>>>>>>
> While I appreciate Sig's compliment, I think in my case the
> emphasis should
> be placed on "SQLhead **in the making** ". In the fraternity of SQLheads
> (which of course includes too many listers to mention), I'm nothing but a
> pledge, or even just a potential pledge.
>
> I think part of what holds me back from pursuing SQL more (besides
> laziness), is that I'm still troubled that I don't give
> "directions" to SQL.
> That is, when I write a DATA step I feel like I am communicating
> directions
> to SAS. When I write SQL, it feels less like I'm writing directions, and
> more like I'm writing a "goal". That is, in SQL, the
> communication seems to
> be "here is what I want, now do whatever it takes to make it".
> So in SQL I
> may not know what is actually happening (a sort? a hash table a ...?) to
> give me what I want. So I imagine it would be hard for me to debug SQL
> code, since each line is not really an instruction (in some
> awkward sense).
>
> I can understand how many folks enjoy this aspect of SQL, i.e. you don't
> *have* to know what it's doing under the hood, and if you really want to
> know, there are ways to find out. But there's some part of me (inner
> control freak?) that likes the communicating explicit control of the
> process, rather than just describing the desired outcome.
>
> That said, when every once in a while I play with a SQL step to replace a
> handful of DATA/PROC steps, or see some of the SQL solutions posted on the
> L, I can definitely see the attraction. So I imagine I won't make it too
> long without making a serious effort to expand my toolbox accordingly.
> <<<<<<<<<<<<<<
>
> You gave up control a long time ago, when you decided to write
> programs and
> get your meat in a grocery store, instead of running it down and
> killing it
> with a rock.
>
> When you write
>
> x = 2 * x ;
>
> do you care which register(s) the work is done in? Do you care whether it
> was accomplished by shifting the bits or some other means? Do you care
> whether the number was stored in data memory or instruction memory?
> Whatever control you feel is an illusion. Moreover, it doesn't matter,
> because you are more interested in the fact X is doubled than how
> it go that
> way. Are you not?
>
> You have pinpointed the greatest roadblock to learning SQL, your
> history and
> the feeling of control. Why do you care whether a sort or a hash
> table was
> used? Well maybe the files are too big for your machine, then you have to
> care. Otherwise, why? How often is the speed of you machine the limiting
> factor in the problems you solve? the method may be interesting, but why
> should you care in terms of solving the problem? Are you more
> interested in
> knowing how a problem got solved, or in knowing how to solve it? You may
> have to give up one to make progress with the other.
>
> You know that it is dangerous in solving a problem to take advantage of
> accidents in the data. Now think how much easier it is to
> control that urge
> when the solution doesn't involve the method and therefore cannot take
> advantage of data accidents.
>
> Just as the DATA step has rules, so SQL has rules. In SQL, it is still a
> matter of knowing what rules produce what results. Notice I say "produce
> what results" not how the results were produced. Some day you will say,
> "What I like about SQL, is that I have control over the results."
> But then
> the question may be, why do you care about the details of the results? It
> will then be knowing how to manipulate the agents that determine
> the results
> that will be important.
>
> At every stage you have to give up control of the details of the method to
> get better control over solving harder problems. Striving to
> understand the
> details is only important in that helps to provide the rules for obtaining
> the solution. When the knowledge fails to help in providing the rules,
> consider it intellectual entertainment or decide whether you want it.
>
> If it helps, look at history. The first programming engineers felt like
> they lost control when they could no longer flip the switches setting the
> program. The 0/1 programmers felt like they lost control with the simple
> mnemonics of assembly language. The assembly programmers fought losing
> control to the Fortran and COBOL compilers. The C programmer
> fights losing
> control to the SAS compiler. Now you fight losing control to an SQL
> compiler that will decide the method of solution. You belong to an
> illustrious line of losers, but I doubt if you stay there much
> longer. When
> you start to solve SAS-L problems with SQL you have already committed
> yourself. You just don't know it yet.
>
> IanWhitlock@westat.com
|