I don't think you can make the statement that a flat file and a SAS infile
statement are always 10x faster. It depends on the sending and receiving
ends and how well those are coded. Parsers vary a lot in speed as you know.
There are new technologies being discussed such as binary XML. I'm not sure
where it will fall out but issues on SOAP speed won't be around forever. Too
much is invested now in SOAs IMO.
On the Windows platform, look at Indigo (I forgot the official name) due to
be released in Vista. I haven't played with it yet but it supposedly
contains binary xml transports.
Savian "Bridging SAS and Microsoft Technologies"
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Sent: Tuesday, February 28, 2006 7:43 AM
Subject: Re: Reading large and complex XML
Sorry to drag this up but its relevent to what I'm looking at currently.
This thread was going somewhere before it was diverted by some
In short - XML data transfer: esp. inter-operable messaging protocol around
SOAP, which used to be Simple Object Access Protocol but is now Service
Oriented Architectural Planning. Meant for Web Services.
Problem: increased processing time over "binary" formats. This is discussed
quite usefully in wiki (!):
"Because of the lengthy XML format, SOAP is considerably slower than
competing middleware technologies such as CORBA. Typically, SOAP is about 10
times slower than binary network protocols such as RMI or IIOP. Of course,
this is not an issue when only small messages are sent."
As such any slow performance is not so much a SAS XML Engine limitation
rather a limitation of the format itself. While it is possible to reduce
bandwidth use of large XML messages by compression, set against the
additional CPU overhead at either end, this does nothing to address the
processing overhead required to parse the message, unless as (Sig?)
suggested the formatting can at least partially be seperated from the data.
IOW a flat file and a SAS infile statement will always be (10x +) faster.
The requirement for repeated large XML loads is arguably bad system
architecture. Rather those loads should only be in the form of an update or
query. Obviously as time goes on the capacity for XML transfers in
conventional hardware will increase with processing / I/O...
On Tue, 17 Jan 2006 09:41:35 -0500, Sigurd Hermansen <HERMANS1@WESTAT.COM>
>Actually the schema comprises the whole of a database's metadata. Good so
far ... The catch comes in where XML packages data element between tags. The
schema predetermines the header and attributes of each data table. A table
name links these metadata to columns and rows of data values. A table-name
tag and end tag can mark the beginning and end of a table of delimited data
>In a rough sketch,
>I would argue that transports or replications of very large databases would
work better were it possible to append url's for each Data-Table to a
Schema. Individual data tables will likely compress to a small fraction of
full size and can be zcat'd through a RDBMS's bulk loader.
>An XML extension along these lines would take advantage of the separation
of scheme and data and tabular representation of both. Domain and constraint
tables in a Schema serve as a basis for validating contents of Data-Table's
and triggering exceptions.
>From: email@example.com on behalf of Alan Churchill
>Sent: Mon 1/16/2006 8:41 PM
>To: 'Sigurd Hermansen'; SAS-L@LISTSERV.UGA.EDU
>Subject: RE: Reading large and complex XML
>What about the XML streams already containing a schema embedded at the top?
>This is good XML practice and should already be there. A good XML parser
>will be able to read in the schema and then do a forward read of the XML
>parsing appropriately and breaking it into tables.
>The RDBMS's are starting to accommodate XML in and out of relational
>Savian "Bridging SAS and Microsoft Technologies"
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sigurd
>Sent: Monday, January 16, 2006 9:50 AM
>Subject: Re: Reading large and complex XML
>A couple of years ago we attempted to load GB-sized XML files into SAS and
>found the process unacceptably slow. Since then we opt for forward
>engineering a data model (basically capturing SQL CREATE statements from
>data modelling tools) and streaming data into SAS datasets. The same method
>also works well when using RDBMS bulk load methods to transfer data into an
>XML standards accommodate transfers of very complex data structures.
>Relational database table structures have very simple data structures.
>Repetitive tags and extra parsing really drags down performance of the SAS
>Perhaps those exporting data to your database could transfer metadata
>in XML and provide the actual data as compressed 'flat files'. The Unix
>command in a SAS filename pipe streams data into SAS datasets very quickly
>The current XML standards seem almost hostile to the idea of a relational
>database. To me that seems shortsighted. While encapsulating databases in
>one text streams sounds like a good idea, why would it not make equal sense
>to encapsulate metadata in a header stream and support bulk loads into
>related tables from separate data streams?
>From: firstname.lastname@example.org [mailto:email@example.com] On
>Behalf Of Jørgen Mangor Iversen
>Sent: Monday, January 16, 2006 8:13 AM
>Subject: Reading large and complex XML
>What do you do when you have to get a large (11BG) XML file into SAS? The
>example in mind has a XMLMAP prepared with the SAS XML MAPPER tool,
>14 tables with up to 33 million rows. The box is a powerfull HP-UX, SAS is
>version 9.1.3 and the XML engine is hopeless! It proccesses each defined
>table one at the time, at 7 hours a piece! This leads to a running time of
>more then 4 days. This job has to be done daily, as you might imagine.
>Am I the only one who have ever come across such a problem? Is there a
>debate somewhere or another forum where SAS vs. XML is discussed?