2008-12-14

HTML Help Gotcha on toc.hhc

If you're trying to create MS HTML Help files (.CHM) programmatically, you may need to generate your own TOC.

The documentation available from Microsoft does its best to keep you in the dark: They tell you how to generate it in Help Workshop using their point-and-click GUI. It's serviceable but not much fun to use. No wonder there's such a huge cottage industry of MS Help front ends!

Paul Wise bravely documents what he could re-engineer about the innards of the .CHM file but only devotes a rather short section to the Sitemap format of the TOC, which I quote here:

These formats are based on HTML and use the following doctype:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

The <HEAD> tag contains a <meta> tag providing information on the program that generated the files and a comment indicating the version of the file. e.g.:

<meta name="GENERATOR"content="Microsoft® HTML Help Workshop 4.1">
<!-- Sitemap 1.0 -->

The <BODY> tag contains an <OBJECT> tag that stores properties of the file in <param> tags, followed by a <UL> tag, whose <LI> tags have <OBJECT> tags that store the properties of the Contents/Index items in <param> tags. e.g.:

<BODY>
<OBJECT type="text/site properties">
<param name="Property Name" value="Property Value">

</OBJECT>
<UL>
<LI> <OBJECT type="text/sitemap">
<param name="Property Name" value="Property Value">

</OBJECT>

</UL>
</BODY>

Note that the Property Names and Property Values and tags are not case-sensitive, but HHW will always write all three in the default capitilization, when appropriate.

Note that the tags are mostly in uppercase and the <LI> tag is not closed; this is in compliance with the doctype.

I spent some days generating the TOC from a different structure. To help with eyeball debugging, my HTML was nicely indented. I started a new line for each tag - not entirely unreasonable. Without the indentation, here's an example of what I created:
<HTML>
<BODY>
<UL>
<LI>
<OBJECT type="text/sitemap">
<param name="Name" value="Heading 1">
<param name="Local" value="h1.htm">
</OBJECT>
</LI>
</UL>
</BODY>
</HTML>

Try as I might, HHW (the Workshop) would not display such a TOC, and HHC (the Compiler) would not produce a working .CHM from it. I spent a long time figuring out what's needed to make it work, as in the following:
<HTML>
<BODY>
<UL>
<LI><OBJECT type="text/sitemap">
<param name="Name" value="Heading 1">
<param name="Local" value="h1.htm">
</OBJECT>
</LI>
</UL>
</BODY>
</HTML>

This appears to be one of the best kept secrets of the HTML Help generating industry:

If you don't put <LI> on the same line as <OBJECT>, the TOC parser will fail!

So the accepted format is essentially an HTML Unordered List of Objects with details in Params, give or take a horrible bug in the scanner/parser.

On the bright side, in my experience the following bits are not essential:

  • As Paul states, tags and even attribute names are accepted in upper or lower case.

  • The stuff in the <HEAD> isn't needed and can be dispensed with.

  • The text/site properties OBJECT is only needed if you want to fiddle with the options

  • Tags may be consistently closed (XML fashion), including those that HTML is inconsistent about, such as <meta> and <param. Similarly, it's OK to have closing </LI>'s.

1 comment:

Anonymous said...

Nice1