LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (October 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 6 Oct 2010 13:25:28 -0400
Reply-To:     "Gerstle, John (CDC/OID/NCHHSTP)" <yzg9@CDC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Gerstle, John (CDC/OID/NCHHSTP)" <yzg9@CDC.GOV>
Subject:      Re: Resubmit: Reading XML files via XML92 getting 0 observation
              datasets
Comments: To: Joe Matise <snoopy369@gmail.com>
In-Reply-To:  <AANLkTin0iF2fc_=BeV9ovjyvdpa1=1_UPmyHqL92yH4N@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"

Joe,

That's a good idea and I did try something similar which did not work. But the structure of the file is not conducive to splitting. There's a small node at the top - Header Info - meta-data of the file, then the second major node is split into 2 smaller nodes, both with a lot of data within. I did split the file with only the Header node, but that doesn't speak to the rest of the file. Of course, the map and a shortened version of the map (Header only) did not work (the 1 observation still not read).

I do have a Tech support ticket open.

thanks

John Gerstle Scientific Information Specialist Centers for Disease Control and Prevention NCHHSTP\DHAP-SE\QSDMB\Data Management Team Phone: 404-639-3980 Fax: 404-639-8642 Email: yzg9 at cdc dot gov Socrates, proclaimed: "I came to know one thing; that I know nothing".

"Every question I answer will simply lead to another question."

From: Joe Matise [mailto:snoopy369@gmail.com] Sent: Wednesday, October 06, 2010 1:12 PM To: Gerstle, John (CDC/OID/NCHHSTP) Cc: SAS-L@listserv.uga.edu Subject: Re: Resubmit: Reading XML files via XML92 getting 0 observation datasets

Not sure what the structure is, but is it splittable into multiple files? If so, can you do that and see if it's some specific high level node(s) that fails, or possibly even if it's just the size?

IE, if you have 4000 nodes at the second-highest level (or thereabouts) with 70 lines each, can you split that into 1000 node files, or even 100 node files and try reading each in? If some read in some don't, then you might be able to pinpoint the issue, if it's data related.

-Joe

On Wed, Oct 6, 2010 at 9:14 AM, Gerstle, John (CDC/OID/NCHHSTP) <yzg9@cdc.gov> wrote:

Alan, I have XMLSpy (and DiffDog) and have tried looking for XML code issues but haven't found anything definitive. The problem file is over 280k lines so not easy to eyeball. I compared it with a smaller XML file that SAS has no issue reading and really haven't found anything besides, what looks like, some child-child-child nodes not aligned but that could be data driven (some clients have the data and some do not).

SAX vs Dom - could you define these terms?

Thanks

John Gerstle Scientific Information Specialist Centers for Disease Control and Prevention NCHHSTP\DHAP-SE\QSDMB\Data Management Team Phone: 404-639-3980 Fax: 404-639-8642 Email: yzg9 at cdc dot gov Socrates, proclaimed: "I came to know one thing; that I know nothing".

"Every question I answer will simply lead to another question."

>>-----Original Message----- >>From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On >>Behalf Of Alan Churchill >>Sent: Tuesday, October 05, 2010 6:13 PM >>To: SAS-L@LISTSERV.UGA.EDU >>Subject: RE: Resubmit: Reading XML files via XML92 getting 0 observation >>datasets >> >>John, >> >>Look at SAX vs Dom on why access is limited. It depends on the engine >>chosen. >> >>It is hard to guess as to what is happening w/o seeing the XML in question. >>Have you opened up the files in something like XmlSpy to look for >>differences? >> >>Alan >> >>Alan Churchill >>Savian >>Work: 719-687-5954 >>Cell: 719-310-4870 >> >>-----Original Message----- >>From: Gerstle, John (CDC/OID/NCHHSTP) [mailto:yzg9@CDC.GOV] >>Sent: Tuesday, October 05, 2010 9:26 AM >>Subject: Resubmit: Reading XML files via XML92 getting 0 observation >>datasets >> >>SAS v9.22, WinXP, XML Mapper >> >>I've manually created map file from a complex schema and am using the XML92 >>engine to read in the XML data files. I have successfully tested this method >>on 3 XML files, 1 of which is close to 450MB in size. Recently, I received a >>new sample file (only 14Mb) and now it's failing (well, it's failing in the >>sense that no data observations are being read by SAS). Interestingly, >>within XML Mapper, I can use the Table View tab to see the data, correctly >>mapped. But Base SAS is unable to replicate this. Even SAS Explorer is >>unable to open any 'tables' to view. >> >>Code: >> >>libname incoming xml92 "&xml_file" >> xmlmap="&xml_map" >> xmlschema="&xml_schema" >> xmltype=xmlmap >> xmlmeta=schemadata; >>proc print data=incoming.x_headerinfo; run; >> >>...where the x_headerinfo is the first node of data in the file. >> >>Log: >>NOTE: Processing XMLMap version 1.9. >>NOTE: Libref INCOMING was successfully assigned as follows: >> Engine: XML92 >> Physical Name: W:\Data_Management\test.xml >>2111 proc print data=incoming.x_headerinfo; run; >> >>NOTE: Access by observation number not available. Observation numbers will >>be counted by PROC PRINT. >>NOTE: No observations in data set INCOMING.x_headerinfo. >>NOTE: There were 0 observations read from the data set >>INCOMING.x_headerinfo. >> >> >>I've added an End Path for the table, which is the same as the Path, set as >>End. And added an automatic enumerator to the table. No luck on the Base >>SAS side but I see correct mapping in the Table View of XML Mapper. >> >>I've been researching this problem for the past 2 weeks and have read >>several really good papers on the subject (Larry Hoyle's recent papers and >>Lex Jensen's workshop at SGF2010), but haven't found reference to this >>specific problem. >> >>I feel that I've missed something in my map, though the map does work for >>the other data files, so it's possible that the data file in question is >>problematic. >> >>3 Questions: >>1) What are the reasons why Base SAS is unable to achieve access by >>observation number in an XML file? (something to do with Sequential Reading >>of the file instead of Random reading?) >>2) Any references to suggest? >>3) Any suggestions for the above problem? >> >>I'm considering having the sender re-create their XML file. the only thing I >>can find in their file that might be problematic is that the order of nodes >>is not the same as one of the other test files that does work. >> >> >>John Gerstle >>Scientific Information Specialist >>Centers for Disease Control and Prevention NCHHSTP\DHAP-SE\QSDMB\Data >>Management Team >>Phone: 404-639-3980 >>Fax: 404-639-8642 >>Email: yzg9 at cdc dot gov >>Socrates, proclaimed: "I came to know one thing; that I know nothing". >> >>"Every question I answer will simply lead to another question."


Back to: Top of message | Previous page | Main SAS-L page