|
I don't have any magic settings that will make transforms from XML documents
of some arbitrary form into SAS datasets of some arbitrary form. When you
say
"The XMLMAP functionality is used to define the SAS datasets from the XML
files
as they have a complex structure."
do you mean that the SAS datasets have complex structures or that the
XML files have complex structures? Sounds like a bit of both. We have seen
many efficiency issues in situations where
we are mapping files with repeating segments into SAS 'flat files' with
repeating groups of variables. That means that in each case of repeating
segments the number of repeating groups of variables has to equal the
maximum number of repeating segments per entity. This transformation often
creates tens of thousands of variables and leaves acres of space devoted to
missing values.
It should not take too much data modelling to estimate size and scope of the
result of a transformation. In some cases it makes sense to map XML to a
relational database. SAS can then remap what information remains into any
required data structure.
Vendors now provide specialized caching systems for XML parsing. That
suggests to me that XML parsing has a long way to go before it matches the
efficiency of other database transfer methods. XML statndards will probably
have to change to get around intrinsic inefficiencies in database transfer
rates. All of those tags have to affect storage and processing efficiency.
Better mapping technologies can do only so much.
Sig
-----Original Message-----
From: Geoff [mailto:gcd_smith@HOTMAIL.COM]
Sent: Wednesday, January 29, 2003 11:46 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: XMLMAP and large files
I are loading XML files into SAS v8.2 (Solaris) using the new XMLMAP
technology (downloaded from www.sas.com). The XMLMAP functionality is used
to define the SAS datasets from the XML files as they have a complex
structure.
It is possible to load in an XML file containing 1000 'records' but a file
containing 10000 records causes SAS to hang or to terminate abnormally
(core dump). Ideally, I need to load about 300,000 records although 50,000
would be acceptable.
I am also performing a XSLT Transform on the XML file so that I can get
around a problem caused by the XMLMAP functionality (this problem has been
verified by SAS). I am currently using a Xalan-Java XSLT processor and
unfortunately it is taking a very long time to perform the operation, about
17 hours for a 200Mb file. Does anyone know how to make this quicker? (I am
using the SAX option).
Any help would be greatly appreciated.
Many thanks, Geoff
|