mswordview.1() mswordview.1()
NAME
mswordview - convert word 8 files to html
SYNOPSIS
mswordview [-v] [--version] [-n] [--nocredits] [-c] [--defaultfontsize points]
[-w type] [--horizontalwhite type] [-u type] [--verticalwhite type] [-s url]
[--symbolurl url] [-p url] [--patternurl url] [-d url] [--wingdingurl url] [-h]
[--ignoreheadings] [-a] [--noannotations] [-m] [--mainonly] [-b] [--riskbadole]
[-e] [--nofontfaces] [-o filename] [--outputfile filename] [-g erroroutputfile]
[--errorfile erroroutputfile] [-y tabvalue] [--tabsize tabvalue] [-i dir] [--
imagesdir dir] [-j url] [--imagesurl url ] [-k] [--notablewidth] filename
DESCRIPTION
mswordview breaks the OLE word document into its component streams,
and then converts the document and its graphics to html.
OPTIONS
-v, --version
Output program version.
-n, --nocredits
Dont append credits at end of the html output.
-c, --corehtmlonly
Dont put <html> and </html> around output.
-f points, --defaultpointsize points
The base pointsize for mswordview is 10 (like ms word), you can
change this to a different size if you feel that your output is
too large, otherwise e.g a 12 point font becomes a html font+2,
which can look too big. An aside here... many of the files that
mswordview outputs are tagged as being in unicode, often this
turns out to be unnecessary, but theres no sure way to know
sometimes (short of examining every single character in advance
to see if it falls into the ascii range) if this header is
necessary. Thus netscape will use a unicode font, as most
european readers wont ever have read a document in this font they
wont have customized the unicode base font size as they might
have done the western font size. So if you have set your usual
language encoding fontsize away from the default, then do the
same for your unicode font, sorry about the long entry ;-)
-w type, --horizontalwhite type
attempting to convert formatting done in word with whitespace
such as space and tab is quite difficult. In html output theres
no easy way to get nice lined up text using spaces so white space
padding looks awful but of course so does no attempt to do
formatting. So i have given five options, the default type is 0
but i am beginning to think that 2 is the best option really.
0 convert runs of more than one space into hardcoded spaces i.e.
and convert tabs into a clear gif with width equal to the
- 1 - Formatted: October 31, 2025
mswordview.1() mswordview.1()
tabsize option.
1 convert runs of more than one space into hardcoded spaces i.e.
and convert tabs into a run of 's
2 convert runs of more than one space into hardcoded spaces i.e.
but dont convert tabs into anything.
3 dont convert spaces into anything at all but convert tabs into
a clear gif with width equal to the tabwidth option.
4 dont convert spaces into anything at all but convert tabs into
a run of 's
5 dont convert spaces into anything at all and dont convert tabs
into anything at all.
-u type, --verticalwhite type
what to do with multiple line breaks is set here. There are three
options for type i.e
0 the default, a single line break becomes a <br>, but if theres
a run of more that one, then the first two are transformed into a
<p>, if theres more then they are outputted as <br>, the
intention here is to retain the meaning that word usually
associates with two linebreaks, which is that thats the end of
the paragraph, but to be aware and support the fact that the
users of word often whack away madly at the return key to try and
force formatting decisions by that mechanism.
1 replaces each line break one for one with a <br>
2 replaces a single line break with <br>, and a run of more that
one (no matter how long) with a single <p>
-t seconds, --timeout seconds
time out after so many seconds
conversion process to
, useful if you use this as a web gateway, coz theres noone watching the
reaslize that iits gone into a busy loop. -s url, --symbolurl
url this is the url that will be used to find the gif pics that
are used for displaying the ms symbol font. Not the tidiest of
solutions for the problem, but it works.
-d url, --wingdingurl url
this is the url that will be used to find the gif pics that are
used for displaying the ms wingding font. Not the tidiest of
solutions for the problem, but it works.
-p url, --patternurl url
this is the url that will be used to find the background patterns
that msword can use as backdrops for cells of a table, this is
hardly the most important of msword features, but theres always
someone bleating for some feature that appears ridiculous to me
to be included, so here this one is in all its glory. This dir
is also used for any extra graphics that mswordview might use,
e.g. the clear gif optionally used for tab.
- 2 - Formatted: October 31, 2025
mswordview.1() mswordview.1()
-h, --ignoreheadings
dont convert msword heading types into html heading levels,
sometimes users use heading types inappropiately, if the user
used heading types but changed the attributes to make the heading
type inappropiate for html heading levels, use this option.
-a, --noannotations
By default mswordview will output annotations, but msword itself
doesnt print annotations when outputting to paper, so to not have
them included use this option.
-m, --mainonly
With this option then no footers or headers are shown.
-b, --riskbadole
With this option on then mswordview will attempt to decode files
whose ole tables are corrupt, more than likely the broken word
file will crash mswordview, and crash it hard.
-e, --nofontfaces
With this option set mswordview wont insert fontface tags, as it
stands fontfaces are on by default, but this feature is alpha so
it is only supported for ascii based languages (i.e western
european only) and then only under certain conditions , as it is
suprisingly difficult to be sure which of a few choices is the
correct font to use otherwise.
-o filename, --outputfile filename
set the filename to place output in, use - as the filename to
output to standard output (the screen). The default is that
output is put into a file the same name as the input file with a
.html ending. Any graphic files created have the same prefix as
this file.
-g filename, --errorfile filename
set the filename to place error messages in. The default is the
stderr (the screen)
-y tabvalue , --tabsize tabvalue
specifies either the amount of pixels of indentation that a tab
should be translated into, or the amount of hard spaces to
replace one with, multiples of 8 only work in the second case.
read the horitontalwhite entry to understand which one will get
used. Pixels is the default measurement. This is messy because
tabs are obviously messy things under html, and wed all be better
off if they didnt exist at all, but we live in a world where they
get used for indentation, and worse, alignment, which youll
basically just be damn lucky if you see any hint of that in the
html output :-) Tabs basically just dont work.
- 3 - Formatted: October 31, 2025
mswordview.1() mswordview.1()
-i directory, --imagesdir directory
Specifies the dir into which the graphics will be saved into, the
default is the same dir that the html file is placed in. If you
use this but intend to move the graphics before viewing the html
information, or for some other reason you want the html to link
to the graphics with some custom img src url then use --imageurl
in conjunction with this
-j url, --imagesurl url
Specifies the url in which the graphics from the word doc can be
found, the default is the same dir that mswordview put the
graphics itself.
-k, --notablewidth
With this on, table widths are not specified.
BUGS
I appear to have gone a little mad on the number of command line
options, i have only 4 letters left l,q,x & z. Some of these options
arent really needed, i dont use any of them myself :-)
mswordview can be incredibly slow when a document is fastsaved and has
many tables.
MORE INFORMATION
More information may be got at
http://www.gnu.org/~caolan/docs/MSWordView.html or
http://skynet.csn.ul.ie/~caolan/docs/MSWordView.html
SEE ALSO
laola(1), lls(1), elser(1), catdoc(1), word2x(1)
AUTHOR
Caolan McNamara
WWW: http://www.csn.ul.ie/~caolan/
Mail: Caolan.McNamara@ul.ie
- 4 - Formatted: October 31, 2025