Date: Sun, 4 Feb 2007 11:19:35 -0500
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: why infile much faster than proc import
Peter,
While I completely agree with almost everything you said, I definitely
would hope that you are wrong with respect to whether your preferred
practice is in line with mainstream thinking.
My own use of proc import is limited to three tasks:
(1) importing spreadsheets (I still find that more reliably auditable than
using the excel libname engine);
(2) initially investigating a new data structure; and
(3) obtaining a draft code template (i.e., running proc import, pressing
function key F4 to retrieve the actual data step that was submitted, and
then saving and tweaking that code for future use).
Art
-----
On Sun, 4 Feb 2007 09:18:51 -0500, Peter Crawford
<peter.crawford@BLUEYONDER.CO.UK> wrote:
>On Sat, 3 Feb 2007 14:34:42 -0500, Wensui Liu <liuwensui@GMAIL.COM> wrote:
>
>>I just did a speed comparison of csv file import between infile and
>>proc import and realized infile is much much ... ... faster.
>>
>>what's the trick behind it?
>>
>>thanks.
>>--
>>WenSui Liu
>>A lousy statistician who happens to know a little programming
>>(http://spaces.msn.com/statcompute/blog)
>
>
>WenSui Liu
>
>I'm not surprised with the results you reported.
>
>I know this may not be in mainstream thinking, but I would only use
>proc import for a preliminary look at column structure of a file.
>Even then, I would only do that for an input file of unknown origins.
>
>Generally we know where a file has come from and what it's structure
>should be. So, generally, I would use a datastep infile statement to
>connect the process with the data, and input statements to parse the
>data. And I can expect to be precise in my definition and results.
>That is something proc import cannot match when reading plain text,
>because it does not know the information structure, so it has to try
>to discover what columns are present in the input file.
>
>If we want to import from something that is not plain text, why use
>proc import? Instead we can use the relevant SAS library engine to
>deliver the data directly. Under the covers the excel engine
>gemerates syntax that looks like some flavour of sql.
>
>Proc import will never satisfy my need to be specific and precise
>about parsing input text. The data step provides all the
>flexibility and control for handling text in complex structure.
>For simple structure, data step syntax is probably easier to learn
>than proc import ( but thats just my opinion)
>
>Peter Crawford
>Crawford Software Consultancy
|