wiki:HtmlToDockbookProject

Version 4 (modified by Matias Capeletto, 15 years ago) ( diff )

--

http://matias.capeletto.googlepages.com/ibdp.png


Html To Dockbook Project

Problem

In order to have a complete conversion tool, it's necessary to be able to convert existing documentation, written in HTML, to quickbook. As this currently stand, good progress is being made on the following part of the document conversion pipeline:

docbook --[boostbook + xsltproc]--> HTML --[quickbook css]-->quickbook

However, this project still lacks an important part:

HTML --[html to docbook (missing)]--> docbook --> [above pipeline] --> result

The aim of this subproject then, is to investigate some open source solutions to this problem, and try and see which one will work best for boost.

Converting HTML to docbook XML

What exactly should this tool do? As input it should take an HTML document (which may not necessarily be valid XHTML) and map the HTML tags to docbook XML. For example:

<h1>My Section</h1>
<p>Some text</p>

should become something like:

<section id="my_section">
<title>My Section</title>
<para>Some text</p>
</section>

Two main problems present themselves. In the first case, what should the tool do if the original document doesn't validate as XHTML? Secondly, there will certainly be a many-to-one mapping from HTML to docbook. Is it possible to determine a general solution for this?


Note: See TracWiki for help on using the wiki.