Changes between Version 4 and Version 5 of HtmlToDockbookProject


Ignore:
Timestamp:
Jun 28, 2007, 9:14:01 AM (15 years ago)
Author:
glynos
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HtmlToDockbookProject

    v4 v5  
    6060<section id="my_section">
    6161<title>My Section</title>
    62 <para>Some text</p>
     62<para>Some text</para>
    6363</section>
    6464}}}
    6565
    6666Two main problems present themselves.  In the first case, what should the tool do if the original document doesn't validate as XHTML?  Secondly, there will certainly be a many-to-one mapping from HTML to docbook. Is it possible to determine a general solution for this?
     67
     68=== Open Source Solutions ===
     69
     70For me, I'm comfortable with the idea of recommending [http://tidy.sourceforge.net Tidy] to produce validating XHTML.  Its open source and cross platform.  Furthermore its the original author's responsibility to ensure that their input is valid and I feel that this task falls out of the scope of this sub-project.
     71
     72For the second point, I have found the following resources for projects which have attempted to address this problem:
     73
     74 1. [http://www.eecs.umich.edu/~ppadala/projects/tidy/ http://www.eecs.umich.edu/~ppadala/projects/tidy/]
     75 2. [http://wiki.docbook.org/topic/Html2DocBook http://wiki.docbook.org/topic/Html2DocBook]
     76
     77The first of these seemed initially promising (and was proposed as a possible solution by Mathias) but I was unable to make it compile.  This makes me wonder whether this project is dead or not.  I've sent an e-mail to the developer and I'm awaiting a response.
     78
     79The second of these, as an XSL stylesheet, seems the more natural solution.  Its still not perfect and doesn't completely obviate the need for manually rechecking and retagging, but I feel that using this and adapting it for own needs my be fruitful.  I haven't tried hacking the stylesheet yet (this will be the next thing I try) but of the things I've found so far this seems the most promising.
     80
     81
     82=== Conclusion (so far) ===
     83
     84With the (still only limited) investigation I've done so far, I think that the most natural solution for converting what is one XML format to another, is to use an XSL stylesheet.  Short of developing one specifically for this project, it is best to use the one provided in solution 2 as this has been developed by someone who already has a lot of experience with docbook (I haven't yet been in touch with him yet).  Further adapting it for boost's requirements, I feel, may be the most fruitful solution.
     85
    6786
    6887----