2008-09-02

Parsing XML with an Internet connection

I'm trying to use XSLT (XALAN-J in my case) to extract and re-format some information from some XML files. These files refer to some external entities such as the XHTML definition at http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd. The SAX parser, even with validation turned off, insists on going out on the Web to fetch that DTD and its brethren.

I'm at my workplace, and access to the Web is through a proxy. There are Java system properties I could set to give my application Web access, but I'd have to include my proxy user ID and password, and the external access might still slow my application down.

I tore my hair out for some time and Googling took a long time, this time, to bring me toward a solution.

The solution, in a nutshell, works like this:

  • Manually download all required files to a local directory that will later be included with the app;

  • Provide a catalog redirecting references from their customary URIs to local ones;

  • Include the Apache Commons Resolver library;

  • Provide a CatalogResolver.properties file to tell the Resolver where to find the catalog;

  • attach a new CatalogResolver() to the XML reader.


This is helpfully pointed out and very well explained here.

No comments: