packages icon



 mswordview.1()                                               mswordview.1()




 NAME
      mswordview - convert word 8 files to html

 SYNOPSIS
      mswordview [-v] [--version] [-n] [--nocredits] [-c] [--defaultfontsize points]
      [-w type] [--horizontalwhite type] [-u type] [--verticalwhite type] [-s url]
      [--symbolurl url] [-p url] [--patternurl url] [-d url] [--wingdingurl url] [-h]
      [--ignoreheadings] [-a] [--noannotations] [-m] [--mainonly] [-b] [--riskbadole]
      [-e] [--nofontfaces] [-o filename] [--outputfile filename] [-g erroroutputfile]
      [--errorfile erroroutputfile] [-y tabvalue] [--tabsize tabvalue] [-i dir] [--
      imagesdir dir] [-j url] [--imagesurl url ] [-k] [--notablewidth] filename

 DESCRIPTION
      mswordview breaks the OLE word document into its component streams,
      and then converts the document and its graphics to html.

    OPTIONS
      -v, --version
           Output program version.

      -n, --nocredits
           Dont append credits at end of the html output.

      -c, --corehtmlonly
           Dont put <html> and </html> around output.

      -f points, --defaultpointsize points
           The base pointsize for mswordview is 10 (like ms word), you can
           change this to a different size if you feel that your output is
           too large, otherwise e.g a 12 point font becomes a html font+2,
           which can look too big. An aside here... many of the files that
           mswordview outputs are tagged as being in unicode, often this
           turns out to be unnecessary, but theres no sure way to know
           sometimes (short of examining every single character in advance
           to see if it falls into the ascii range) if this header is
           necessary. Thus netscape will use a unicode font, as most
           european readers wont ever have read a document in this font they
           wont have customized the unicode base font size as they might
           have done the western font size. So if you have set your usual
           language encoding fontsize away from the default, then do the
           same for your unicode font, sorry about the long entry ;-)

      -w type, --horizontalwhite type
           attempting to convert formatting done in word with whitespace
           such as space and tab is quite difficult.  In html output theres
           no easy way to get nice lined up text using spaces so white space
           padding looks awful but of course so does no attempt to do
           formatting. So i have given five options, the default type is 0
           but i am beginning to think that 2 is the best option really.
           0 convert runs of more than one space into hardcoded spaces i.e.
           &nbsp; and convert tabs into a clear gif with width equal to the



                                    - 1 -       Formatted:  January 15, 2025






 mswordview.1()                                               mswordview.1()




           tabsize option.
           1 convert runs of more than one space into hardcoded spaces i.e.
           &nbsp; and convert tabs into a run of &nbsp;'s
           2 convert runs of more than one space into hardcoded spaces i.e.
           &nbsp; but dont convert tabs into anything.
           3 dont convert spaces into anything at all but convert tabs into
           a clear gif with width equal to the tabwidth option.
           4 dont convert spaces into anything at all but convert tabs into
           a run of &nbsp;'s
           5 dont convert spaces into anything at all and dont convert tabs
           into anything at all.

      -u type, --verticalwhite type
           what to do with multiple line breaks is set here. There are three
           options for type i.e
           0 the default, a single line break becomes a <br>, but if theres
           a run of more that one, then the first two are transformed into a
           <p>, if theres more then they are outputted as <br>, the
           intention here is to retain the meaning that word usually
           associates with two linebreaks, which is that thats the end of
           the paragraph, but to be aware and support the fact that the
           users of word often whack away madly at the return key to try and
           force formatting decisions by that mechanism.
           1 replaces each line break one for one with a <br>
           2 replaces a single line break with <br>, and a run of more that
           one (no matter how long) with a single <p>


      -t seconds, --timeout seconds
           time out after so many seconds

 conversion process to
      , useful if you use this as a web gateway, coz theres noone watching the
           reaslize that iits gone into a busy loop.  -s url, --symbolurl
           url this is the url that will be used to find the gif pics that
           are used for displaying the ms symbol font. Not the tidiest of
           solutions for the problem, but it works.

      -d url, --wingdingurl url
           this is the url that will be used to find the gif pics that are
           used for displaying the ms wingding font. Not the tidiest of
           solutions for the problem, but it works.

      -p url, --patternurl url
           this is the url that will be used to find the background patterns
           that msword can use as backdrops for cells of a table, this is
           hardly the most important of msword features, but theres always
           someone bleating for some feature that appears ridiculous to me
           to be included, so here this one is in all its glory.  This dir
           is also used for any extra graphics that mswordview might use,
           e.g. the clear gif optionally used for tab.



                                    - 2 -       Formatted:  January 15, 2025






 mswordview.1()                                               mswordview.1()




      -h, --ignoreheadings
           dont convert msword heading types into html heading levels,
           sometimes users use heading types inappropiately, if the user
           used heading types but changed the attributes to make the heading
           type inappropiate for html heading levels, use this option.

      -a, --noannotations
           By default mswordview will output annotations, but msword itself
           doesnt print annotations when outputting to paper, so to not have
           them included use this option.

      -m, --mainonly
           With this option then no footers or headers are shown.

      -b, --riskbadole
           With this option on then mswordview will attempt to decode files
           whose ole tables are corrupt, more than likely the broken word
           file will crash mswordview, and crash it hard.

      -e, --nofontfaces
           With this option set mswordview wont insert fontface tags, as it
           stands fontfaces are on by default, but this feature is alpha so
           it is only supported for ascii based languages (i.e western
           european only) and then only under certain conditions , as it is
           suprisingly difficult to be sure which of a few choices is the
           correct font to use otherwise.

      -o filename, --outputfile filename
           set the filename to place output in, use - as the filename to
           output to standard output (the screen). The default is that
           output is put into a file the same name as the input file with a
           .html ending. Any graphic files created have the same prefix as
           this file.

      -g filename, --errorfile filename
           set the filename to place error messages in. The default is the
           stderr (the screen)

      -y tabvalue , --tabsize tabvalue
           specifies either the amount of pixels of indentation that a tab
           should be translated into, or the amount of hard spaces to
           replace one with, multiples of 8 only work in the second case.
           read the horitontalwhite entry to understand which one will get
           used. Pixels is the default measurement. This is messy because
           tabs are obviously messy things under html, and wed all be better
           off if they didnt exist at all, but we live in a world where they
           get used for indentation, and worse, alignment, which youll
           basically just be damn lucky if you see any hint of that in the
           html output :-) Tabs basically just dont work.





                                    - 3 -       Formatted:  January 15, 2025






 mswordview.1()                                               mswordview.1()




      -i directory, --imagesdir directory
           Specifies the dir into which the graphics will be saved into, the
           default is the same dir that the html file is placed in. If you
           use this but intend to move the graphics before viewing the html
           information, or for some other reason you want the html to link
           to the graphics with some custom img src url then use --imageurl
           in conjunction with this

      -j url, --imagesurl url
           Specifies the url in which the graphics from the word doc can be
           found, the default is the same dir that mswordview put the
           graphics itself.

      -k, --notablewidth
           With this on, table widths are not specified.

 BUGS
      I appear to have gone a little mad on the number of command line
      options, i have only 4 letters left l,q,x & z. Some of these options
      arent really needed, i dont use any of them myself :-)

      mswordview can be incredibly slow when a document is fastsaved and has
      many tables.

 MORE INFORMATION
      More information may be got at
      http://www.gnu.org/~caolan/docs/MSWordView.html or
      http://skynet.csn.ul.ie/~caolan/docs/MSWordView.html

 SEE ALSO
      laola(1), lls(1), elser(1), catdoc(1), word2x(1)

 AUTHOR
       Caolan McNamara
       WWW: http://www.csn.ul.ie/~caolan/
       Mail: Caolan.McNamara@ul.ie


















                                    - 4 -       Formatted:  January 15, 2025