Version 3 (modified by 15 years ago) ( diff ) | ,
---|
Html To Dockbook Project
Problem
In order to have a complete conversion tool, it's necessary to be able to convert existing documentation, written in HTML, to quickbook. As this currently stand, good progress is being made on the following part of the document conversion pipeline:
docbook --[boostbook + xsltproc]--> HTML --[quickbook css]-->quickbook
However, this project still lacks an important part:
HTML --[html to docbook (missing)]--> docbook --> [above pipeline] --> result
The aim of this subproject then, is to investigate some open source solutions to this problem, and try and see which one will work best for boost.
Converting HTML to docbook XML
What exactly should this tool do? As input it should take an HTML document (which may not necessarily be valid XHTML) and map the HTML tags to docbook XML. For example:
<h1>My Section</h1> <p>Some text</p>
should become something like:
<section id="my_section"> <title>My Section</title> <para>Some text</p> </section>
Two main problems present themselves. In the first case, what should the tool do if the original document doesn't validate as XHTML? Secondly, there will certainly be a many-to-one mapping from HTML to docbook. Is it possible to determine a general solution for this?