packages icon



 HXNORMALIZE(1)                      7.x                      HXNORMALIZE(1)
 HTML-XML-utils                                               HTML-XML-utils

                                 10 Jul 2011



 NAME
      hxnormalize - pretty-print an HTML file

 SYNOPSIS
      hxnormalize [ -x ] [ -X ] [ -e ] [ -d ] [ -s ] [ -L ] [ -i indent ] [
      -l line-length ] [ -c commentmagic ] [ file-or-URL ]

 DESCRIPTION
      The hxnormalize command pretty-prints an HTML or XML file, and also
      tries to fix small HTML errors. The output is the same file, but with
      a maximum line length and with optional indentation to indicate the
      nesting level of each line.

 OPTIONS
      The following options are supported:

      -x        Applies XML conventions: empty elements are written with a
                slash at the end (e.g., <IMG />) and, if the input is HTML,
                any < and & inside <style> and <script> elements are escaped
                as &lt; and &amp;. (The input is assumed to be HTML unless
                the -X option is present.) Implies -e.

      -e        Always inserts endtags, even if HTML does not require them
                (for example: </p> and </li>).

      -X        Makes hxnormalize assume the input is well-formed XML. It
                does not try to infer omitted HTML tags, does not assume
                elements such as <img> and <br> are empty, and does not
                treat < and & inside <style> and <script> as normal
                characters.

      -d        Omit the DOCTYPE from the output.

      -i indent Set the number of spaces to indent each nesting level.
                Default is 2.  Not all elements cause an indent. In general,
                elements that can occur in a block environment are started
                on a new line and cause an indent, but inline elements, such
                as EM and SPAN do not cause an indent.

      -l line-length
                Sets the maximum length of lines.  hxnormalize will wrap
                lines so that all lines are as long as possible, but no
                longer than this length. Default is 72. Words that are
                longer than the line length will not be broken, and will
                extend past this length. A word is a sequence of characters
                delimited by white space.) The content of the STYLE, SCRIPT
                and PRE elements will not be line-wrapped.

      -s        Omit <span> tags that don't have any attributes.



                                    - 1 -         Formatted:  April 20, 2024






 HXNORMALIZE(1)                      7.x                      HXNORMALIZE(1)
 HTML-XML-utils                                               HTML-XML-utils

                                 10 Jul 2011



      -L        Remove redundant lang and xml:lang attributes. (I.e., those
                whose value is the same as the language inherited from the
                parent element.)

      -c commentmagic
                Comments are normally placed right after the preceding text.
                That is usually correct for short comments, but some
                comments are meant to be on a separate line.  commentmagic
                is a string and when that string occurs inside a comment,
                hxnormalize will output an empty line before that comment.
                E.g. -c "====" can be used to put all comments that contain
                ==== on a separate line, preceded by an empty line. By
                default, no comments are treated that way.

 OPERANDS
      The following operand is supported:

      file-or-URL
                The name or URL of an HTML file. If absent, standard input
                is read instead.

 EXIT STATUS
      The following exit values are returned:

      0         Successful completion.

      > 0       An error occurred in the parsing of the HTML file.
                hxnormalize will try to correct the error and produce output
                anyway.

 ENVIRONMENT
      To use a proxy to retrieve remote files, set the environment variables
      http_proxy and ftp_proxy.  E.g., http_proxy="http://localhost:8080/"

 BUGS
      The error recovery for incorrect HTML is primitive.  hxnormalize will
      not omit an endtag if the white space after it could possibly be
      significant. E.g., it will not remove the first </p> from
      <div><p>text</p> <p>text</p></div>.  hxnormalize can currently only
      retrieve remote files over HTTP. It doesn't handle password-protected
      files, nor files whose content depends on HTTP cookies.  When
      converting from XML to HTML (option -X without option -x), any pairs
      of <![CDATA[ and ]]> are removed and character entities &lt; &gt;
      &quot; &apos; and &amp; are expanded (to <, >, ", ' and &,
      respectively), but any other character entities are not expanded. To
      expand other character entities, pipe the input through hxunent(1)
      first.  To limit lines to a given number of characters, hxnormalize
      breaks lines at spaces (or inside tags). Some writing systems do not
      use spaces between words and thus hxnormalize may not be able to break



                                    - 2 -         Formatted:  April 20, 2024






 HXNORMALIZE(1)                      7.x                      HXNORMALIZE(1)
 HTML-XML-utils                                               HTML-XML-utils

                                 10 Jul 2011



      lines, except at already existing line breaks.  To make short lines
      longer, hxnormalize will combine lines and replace a line break by a
      space, except in writing systems that do not put spaces between words,
      where the line break is replaced by nothing.  hxnormalize currently
      only does the latter for Japanese, Chinese, Korean, Khmer and Thai.
      (The text must be correctly marked up with lang or xml:lang.)

 SEE ALSO
      asc2xml(1), xml2asc(1), hxunent(1), UTF-8 (RFC 2279)











































                                    - 3 -         Formatted:  April 20, 2024