packages icon



 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



 NAME
      pavuk - HTTP , HTTP over SSL , FTP, FTP over SSL and Gopher recursive
      document retrieval program

 SYNOPSIS
      pavuk [-mode {normal | resumeregets | [-X] [-runX] [-bg/-nobg]
      [prefs/-noprefs] [-h] [-v] [-logfile $file] [-slogfile $file] [-
      auth_file $file] [-cdir $dir] [-scndir $dir] [-scenario $str] [-dumpscn
      $filename] [-lmax $nr] [-dmax $nr] [-leave_level $nr] [-maxsize $nr]
      [-minsize $nr] [-asite $list] [-dsite $list] [-adomain $list] [-ddomain
      $list] [-asfx $list] [-dsfx $list] [-aprefix $list] [-dprefix $list]
      [-amimt $list] [-dmimet $list] [-pattern $pattern] [-url_pattern
      $pattern] [-rpattern $regexp] [-url_rpattern $regexp] [-skip_pattern
      $pattern] [-skip_url_pattern $pattern] [-skip_rpattern $regexp] [-
      skip_url_rpattern $regexp] [-newer_than $time] [-older_than $time] [-
      schedule $time] [-reschedule $nr] [-dont_leave_site/-leave_site] [-
      dont_leave_dir/-leave_dir] [-http_proxy $site[:$port]] [-ftp_proxy
      $site[:$port]] [-ssl_proxy $site[:$port]] [-gopher_proxy $site[:$port]]
      [-ftp_httpgw/-noftp_httpgw] [-ftp_dirtyproxy/-noftp_dirtyproxy] [-
      gopher_httpgw/-nogopher_httpgw] [-noFTP/-FTP] [-noHTTP/-HTTP] [-
      noSSL/-SSL] [-noGopher/-Gopher] [-FTPdir/-noFTPdir] [-noCGI/-CGI] [-
      FTPlist/-noFTPlist] [-FTPhtml/-noFTPhtml] [-noRelocate/-Relocate] [-
      force_reget/-noforce_reget] [-nocache/-cache] [-check_size/-
      nocheck_size] [-noRobots/-Robots] [-noEnc/-Enc] [-auth_name $user] [-
      auth_passwd $pass] [-auth_scheme 1/2/3] [-auth_reuse_nonce/-
      no_auth_reuse_nonce] [-http_proxy_user $user] [-http_proxy_pass $pass]
      [-http_proxy_auth 1/2] [-auth_reuse_proxy_nonce/-
      no_auth_reuse_proxy_nonce] [-ssl_key_file $file] [-ssl_cert_file $file]
      [-ssl_cert_passwd $pass] [-from $email] [-send_from/-nosend_from] [-
      identity $str] [-auto_referer/-noauto_referer] [-alang $list] [-
      acharset $list] [-retry $nr] [-nregets $nr] [-nredirs $nr] [-
      preserve_time/-nopreserve_time] [-preserve_perm/-nopreserve_perm] [-
      preserve_slinks/-nopreserve_slinks] [-bufsize $nr] [-maxrate $nr] [-
      minrate $nr] [-user_condition $str] [-cookie_file $file] [-
      cookie_send/-nocookie_send] [-cookie_recv/-nocookie_recv] [-
      cookie_update/-nocookie_update] [-cookies_max [-disable_html_tag
      $TAG,[$ATTRIB][;...]] [-enable_html_tag $TAG,[$ATTRIB][;...]] [-
      tr_del_chr $str] [-tr_str_str $str1 $str2] [-tr_chr_chr [-index_name
      $str] [-store_index/-nostore_index] [-store_name $str] [-debug/-
      nodebug] [-debug_level $level] [-browser $str] [-urls_file $file] [-
      file_quota $nr] [-trans_quota $nr] [-fs_quota $nr] [-fnrules $t $m $r]
      [-store_info/-nostore_info] [-all_to_local/-noall_to_local] [-
      sel_to_local/-nosel_to_local] [-all_to_remote/-noall_to_remote] [-
      url_strategie $strategie] [-remove_adv/-noremove_adv] [-adv_re $RE] [-
      check_bg/-nocheck_bg] [-send_if_range/-nosend_if_range] [-sched_cmd
      $str] [-unique_log/-nounique_log] [-post_cmd $str] [-ssl_version $v]
      [-unique_sslid/-nounique_sslid] [-aip_pattern $re] [-dip_pattern $re]
      [-use_http11/-nouse_http11] [-local_ip $addr] [-request $req] [-
      formdata $req] [-nthreads $nr] [-immesg/-noimmesg] [-dumpfd $nr] [URLs]



                                    - 1 -          Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                  pavuk(0.9pl25c)
 Internet utils                                                Internet utils

                                      1



      pavuk -mode {normal | singlepage | [-base_level $nr]

      pavuk -mode sync [-ddays $nr] [-subdir $dir] [-remove_old/-
      noremove_old]

      pavuk -mode resumeregets [-subdir $dir]

      pavuk -mode linkupdate [-X] [-h] [-v] [-cdir $dir] [-subdir

      pavuk -mode reminder [-remind_cmd $str]

 DESCRIPTION
      This manual page describes how to use pavuk. Pavuk can be used to
      mirror contents of internet/intranet servers and to maintain copies in
      a local tree of documents.  Pavuk stores retrieved documents in
      locally mapped disk space. The structure of the local tree is the same
      as the one on the remote server. Each supported service (protocol) has
      its own subdirectory in the local tree.  Each referenced server has
      its own subdirectory in these protocols subdirectories; followed by
      the port number on which the service resides, delimited by character
      can be be changed. With the option -fnrules you can change the default
      layout of the local document tree, without losing link consistency.
      With pavuk it is possible to have up-to-date copies of remote
      documents in the local disk space.
      As of version 0.3pl2, pavuk can automatically restart broken
      connections, and reget partial content from an FTP server (which must
      support the REST command), from a properly configured HTTP/1.1 server,
      or from a HTTP/1.0 server which supports Ranges.
      As of version 0.6 it is possible to handle configurations via so
      called scenarios.  The best way to create such a configuration file is
      to use the X Window interface and simply save the created
      configuration. The other way is to use the -dumpscn switch.
      As of version 0.7pl1 it is possible to store authentification
      information into an authinfo file, which pavuk can then parse and use.
      As of version 0.8pl4 pavuk can fetch documents for use in a local
      proxy/cache server without storing them to local documents tree.
      As of version 0.9pl4 pavuk supports SOCKS (4/5) proxies if you have
      the required libraries.
      As of version 0.9pl12 pavuk can preserve permissions of remote files
      and symbolic links, so it can be used for powerful FTP mirroring.
      Pavuk supports SSL connections to FTP servers, if you specify ftps://
      URL instead of ftp://.
      Pavuk can automaticaly handle file names with unsafe characters for
      filesystem.  This is yet implemented only for Win32 platform and it is
      hardcoded.
      Pavuk can now use HTTP/1.1 protocol for communication with HTTP
      servers.  It can use persistant connections, so one TCP connection
      should be used to transfere several documents without closing it. This
      feature saves netwok bandwidth and also speedup network communication.



                                    - 2 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      Pavuk can do configurable POST requests to HTTP servers and support
      also file uploading via HTTP POST request.
      Pavuk can run configurable number of concurently runnig downloading
      threads when compild with multithreading support.


 Format of supported URLs
      HTTP
      http://[[user][:password]@]host[:port][/document]
      [[user][:password]@]host[:port][/document]

      HTTPS
      https://[[user][:password]@]host[:port][/document]
      ssl[.domain][:port][/document]

      FTP
      ftp://[[user][:password]@]host[:port][/relative_path]
      ftp://[[user][:password]@]host[:port][//absolute_path]
      ftp[.domain][:port][/document]

      FTPS ftps://[[user][:password]@]host[:port][/relative_path]
      ftps://[[user][:password]@]host[:port][//absolute_path]
      ftps[.domain][:port][/document]

      Gopher gopher://host[:port][/type[document]]
      gopher[.domain][:port][/type[document]]



 OPTIONS
       All options are case insensitive.


 List of options chapters
      Mode
      Help
      Indicate/Logging/Interface options
      Special start
      Scenario/Task options
      Directory options
      Preserve options
      Proxy options
      Proxy Authentification
      Protocol/Download Option
      Authentification
      Site/Domain Limitation Options
      Limitation Document properties
      Limitation Document name
      Limitation Protocol Option



                                    - 3 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      Other Limitation Options
      Cookie
      Filename/URL Conversion Option
      Other Options


 Mode
      -mode {normal , linkupdate , sync
           set operation mode.
           normal - retrieves recursive documents
           linkupdate - update remote URLs in local HTML documents to local
           URLs if these URLs exist in the local tree
           sync - synchronize remote documents with local tree (if a local
           copy of a document is older than remote, the document is
           retrieved again, otherwise nothing happens)
           singlepage - URL is retrieved as one page with all inline objects
           (picture, sound ...)
           resumeregets - pavuk scans the local tree for files that were not
           retrieved fully and retrieves them again (uses partial get if
           possible)
           singlereget - get URL until it is retrieved in full
           dontstore - transfer page from server, but don't store it to the
           local tree. This mode is suitable for fetching pages that are
           held in a local proxy/cache server.
           reminder - used to inform the user about changed documents
           ftpdir - used to list of contents of FTP directories

           default operation mode is normal mode.


 Help
      -h   print long verbose help message

      -v   version informations and configuration at compilation time.


 Indicate/Logging/Interface options
      -quiet
           Don't show any messages on the screen.

      -verbose
           Force to show output messages on the screen (default)

      -progress/-noprogress
           show retrieving progress while running in the terminal.

      -stime/-nostime
           show start and end time of transfer.




                                    - 4 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      -xmaxlog $nr
           maximal number of log lines in the Log widget. 0 means unlimited.
           This option is available only when compiled with the GTK+ GUI.

      -logfile $file
           file where all produced messages are stored.

      -unique_log/-nounique_log
           When logfile as specified with the option -logfile is already
           used by another process, try to generate new unique name for the
           log file.

      -slogfile $file
           file to store short logs in. This file contains one line of
           informations per processed document.  This is meant to be used in
           connection with any sort of script to produce some statistics,
           for validating links on your website, or for generating simple
           sitemaps.  Multiple pavuk processes can use this file
           concurrently, without overwriting each others entries.  Record
           structure:

           - PID of pavuk process
           - TIME current time
           - COUNTER in the format current/total number of URLs
           - STATUS contains the type of the error: FATAL, ERR,
             WARN or OK
           - ERRCODE is the number code of the error
             (see errcode.h in pavuk sources)
           - URL of the document
           - PARENTURL first parent document of this URL
             (when it doesn't have parent - [none])
           - FILENAME is the name of the local file the
             document is saved under
           - SIZE size of requested document if known
           - DOWNLOAD_TIME time which takes downloading of this
             document in format seconds.mili_seconds
           - HTTPRESP contains the first line of the HTTP server
             response

      -language $str
           native language that pavuk should use for communication with its
           user (works only when there is a message catalog for that
           language) GNU gettext support (for message internationalization)
           must also be compiled in.

      -gui_font $font
           font used in the GUI interface. To list available X fonts use the
           xlsfonts command. This option is available only when compiled
           with GTK+ GUI support.



                                    - 5 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



 Special start
      -X   start program with X Window interface (if compiled with support
           for GTK+).

      -runX
           When used together with the -X option, pavuk starts processing of
           URLs immediately after the GUI window is launched. Without the -X
           given, this option doesn't have any effect.  Only available when
           compiled with GTK+ support .

      -bg/-nobg
           This option allows pavuk to detach from its terminal and run in
           background mode.  Pavuk will not output any messages to the
           terminal then. If you want to see messages, you have to use the
           -log_file option to specify a file where messages will be
           written.

      -check_bg/-nocheck_bg
           Normally, programs sent into the background after being run in
           foreground continue to output messages to the terminal.  If this
           option is activated, pavuk checks if it is running as background
           job and will not write any messages to the terminal in this case.
           After it becomes a foreground job again, it will start writing
           messages to terminal in the normal way.  This option is available
           only when your system supports retrieving of terminal info via
           tc*() functions.

      -prefs/-noprefs
           When you turn this option on, pavuk will preserve all settings
           when exiting, and when you run pavuk with GUI interface again,
           all settings will be restored.  The settings will be stored in
           the ~./pavuk_prefs file.  This option is available only when
           compiled with GTK+.

      -schedule $time
           Execute pavuk at the time specified as parameter. The Format of
           the $time parameter is YYYY.MM.DD.hh.mm.  You need a properly
           configured scheduling with the at command on your system for
           using this option.

      -reschedule $nr
           Execute pavuk periodically with $nr hours period.  You need
           properly configured scheduling with the at command on your system
           for using this option.

      -sched_cmd $str
           Command to use for scheduling. Pavuk explicitly supports
           scheduling with at $str should contain regular characters and
           macros, escaped by % character.  Supported macros are:



                                    - 6 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



              %f
               - for script filename
              %t
               - for time (in format HH:MM)
               - all macros as supported by the strftime() function

      -urls_file $file
           If you use this option, pavuk will read URLs from $file before it
           starts processing.  In this file, each URL needs to be on a
           separate line. After the last URL, a single dot . followed by a
           LF (line-feed) character denotes the end.  Pavuk will start
           processing right after all URLs have been read.  If $file is
           given as the - character, standard input will be read.

      -store_info/-nostore_info
           This option causes pavuk to store information about each document
           into a separate file in the .pavuk_info directory. This file is
           used to store the original URL from which the document was
           downloaded. For files that are downloaded with the HTTP or HTTPS
           protocols, the whole HTTP response header is stored there.  I
           recommend to use this option when you are using options that
           change the default layout of the local document tree, because
           this info file helps pavuk to map the local filename to the URL.
           This option is also very useful when different URLs have the same
           filename in the local tree. When this occurs, pavuk detects this
           using info files, and it will prefix the local name with numbers.

      -request $req
           With this option you can specify extended informations for
           starting URLs.  With this option you can specify query data for
           POST or GET .  Current syntax of this option is : URL:["]$url["]
           [METHOD:["]{GET|PUT}["]] [ENCODING:["]{u|m}["]]
           [FIELD:["]variable=value["]] [FILE:["]variable=filename["]]

           - URL: specifies request URL
           - METHOD: specifies request method for URL and is
             one of GET or POST.
           - ENCODING: specifies encoding for request body data.
               m is for multipart/form-data encoding
               u is for application/x-www-form-urlencoded
               encoding
           - FIELD: specifies field of request data in format
               variable=value. For encoding of special characters
               in variable and value you can use same encoding
               as is used in application/x-www-form-urlencoded
               encoding.
           - FILE: specifies special field of query, which is
               used to specify file for POST based file upload.




                                    - 7 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      -formdata $req
           This option gives you chance to specify contents for HTML forms
           found during traversing document tree.
            Current syntax of this option is same as for -request option,
           but ENCODING: and METHOD: are meaningless in this option
           semantics.
            In URL: you have to specify HTML form action URL, which will be
           matched against action URLs found in processed HTML documents. If
           pavuk finds action URL which matches that supplied in -formdata
           option, pavuk will construct GET or POST request from data
           suplied in this option and from default form field values
           supplied in HTML document. Values supplied on commandline have
           precedence before that supplied in commandline.

      -nthreads $nr
           By means of this option you can specify how many concurrent
           threads will download documents.  This option is available only
           when pavuk is compiled to support multithreading.

      -immesg/-noimmesg
           Default pavuks behaviour when running multiple downloading
           threads is to buffer all output messages in memory buffer and
           flush that buffered data just when thread finishes processing of
           one document. With this option you can change this behaviour to
           see the messages immediatly when it is produced. It is only
           usable when you want to debug some specials.  This option is
           available only when pavuk is compiled to support multithreading.

      -dumpfd $nr
           For scripting is sometimes usable to be able to download document
           directly to pipe or variable instead of storing it to regular
           file. In such case you can use this option to dump data for
           example to stdout ($nr = 1).


 Scenario/Task options
      -scenario $str
           name of scenario to load and/or run. Scenarios are files with a
           structure similiar to the .pavukrc file.  Scenarios contain saved
           configurations. You can use it for periodical mirroring.
           Parameters from scenarios specified at the command line should be
           overwritten by command line parameters.  To be able to use this
           option, you need to scpecify scenario base directory with option
           -scndir.

      -dumpscn $filename
           Store actual configuration into scenario file with name
           $filename.




                                    - 8 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



 Directory options
      -msgcat $dir
           directory which contains the message catalog for pavuk. If you do
           not have permission to store a pavuk message catalog in the
           system directory, you should simply create a similiar structure
           of directories in your home directory as it is on your system.

           For example:

           Your native language is German, and your home directory is
           /home/jano.

           You should at first create the directory
           /home/jano/locales/de/LC_MESSAGES/, then put the German pavuk.mo
           there and set -msgcat to /home/jano/locales/.  If you have
           properly set locale environment values, you will see pavuk
           speaking German.  This option is available only when you compiled
           in support for GNU gettext messages internationalization.

      -cdir $dir
           directory where are all retrieved documents are stored. If not
           specified, the current directory is used. If the specified
           directory doesn't exist, it will be created.

      -scndir $dir
           directory in which your scenarios are stored.


 Preserve options
      -preserve_time/-nopreserve_time
           store downloaded document with same modification time as on the
           remote site. Modification time will be set only when such
           information is available (some FTP servers do not support the
           MDTM command, and some documents on HTTP servers are created
           online so pavuk can't retrieve the modification time of this
           document).

      -preserve_perm/-nopreserve_perm
           store downloaded document with the same permissions as on the
           remote site. This option has effect only when downloading a file
           through FTP protocol and assumes that the -ftplist option is
           used.

      -preserve_slinks/-nopreserve_slinks
           set symbolic links to point exactly to same location as on the
           remote server; don't do any relocations. This option has effect
           only when downloading file through FTP protocol and assumes that
           the -ftplist option is used.




                                    - 9 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           For example, assume that on the FTP server ftp.xx.org there is a
           symbolic link /pub/pavuk/pavuk-current.tgz, which points to
           /tmp/pub/pavuk-0.9pl11.tgz.  Pavuk will create symbolic link
           ftp/ftp.xx.org_21/pub/pavuk/pavuk-current.tgz
           if option -preserve_slinks will be used this symbolic link will
           point to /tmp/pub/pavuk-0.9pl11.tgz
           if option -preserve_slinks want be used , this symbolic link will
           point to
            ../../tmp/pub/pavuk-0.9pl11.tgz


 Proxy options
      -http_proxy $site[:$port]
           if this parameter is used, then all HTTP requests are going
           through this proxy server. This is useful if your site resides
           behind a firewall, or if you want to use a HTTP proxy cache
           server.  The default port number is 8080.

      -nocache/-cache
           use this switch whenever you want to get the document directly
           from the site and not from your HTTP proxy cache server.

      -ftp_proxy $site[:$port]
           if this parameter is used, then all FTP requests are going
           through this proxy server.  This is useful if your site resides
           behind a firewall, or if you want to use an FTP proxy cache
           server.  The default port number is 22. Pavuk supports three
           different types of proxies for FTP, see the options -ftp_httpgw,
           -ftp_dirtyproxy. If none of the mentioned options is used, then
           pavuk assumes a regular FTP proxy.

      -ftp_httpgw/-noftp_httpgw
           the specified FTP proxy is a HTTP gateway for the FTP protocol.

      -ftp_dirtyproxy/-noftp_dirtyproxy
           the specified FTP proxy is a HTTP proxy which supports a CONNECT
           request (pavuk should use full FTP protocol, except of active
           data connections).  If both -ftp_dirtyproxy and -ftp_httpgw are
           specified, -ftp_dirtyproxy is preferred.

      -gopher_proxy $site[:$port]
           Gopher gateway or proxy/cache server.

      -gopher_httpgw/-nogopher_httpgw
           the specified Gopher proxy server is a HTTP gateway for Gopher
           protocol

      -ssl_proxy $site[:$port]
           SSL proxy (tunneling) server [as that in CERN httpd + patch or in



                                   - 10 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           Squid] with enabled CONNECT request (at least on port 443).  This
           option is available only when compiled with SSL support (you need
           the SSleay or OpenSSL libraries and the development headers)


 Proxy Authentification
      -http_proxy_user $user
           username for HTTP proxy authentification.

      -http_proxy_pass $pass
           password for HTTP proxy authentification

      -http_proxy_auth {1/2/3}
           authentification scheme for proxy access. Similar meaning as the
           -auth_scheme option

      -auth_reuse_proxy_nonce/-noauth_reuse_proxy_nonce
           while using HTTP Proxy Digest access authentification scheme use
           first received nonce value in more following requests.


 Protocol/Download Options
      -retry $nr
           set the number of attempts to transfer processed document.
           Normally set to 1.

      -nregets $nr
           set the number of allowed regets on a single document, after a
           broken transfer.

      -nredirs $nr
           set number of allowed HTTP redirects. (use this for prevention of
           loops)

      -force_reget/-noforce_reget
           force reget'ing of the whole document after a broken transfer
           when the server doesn't support retrieving of partial content

      -timeout $nr
           timeout for stalled connections in minutes. This value is also
           used for connection timeouts.

      -noRobots/-Robots
           this switch suppresses the use of the robots.txt standard, which
           is used to restrict access of Web robots to some locations on the
           web server.

      -noEnc/-Enc
           this switch supresses using of gzip or compress or deflate



                                   - 11 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           encoding by transfer. I don't know if some servers are broken or
           what, but they are propagating that MIME type application/gzip or
           application/compress as encoded. This is usable when you doesn't
           have gzip which is used to decode document encoded this way.

      -check_size/-nocheck_size
           the option -nocheck_size should be used if you are trying to
           download pages from a HTTP server which sends a wrong Content-
           Length: field of the header in response.

      -maxrate $nr
           If you don't want to give all your transfer bandwidth to pavuk,
           use this option to set pavuk's maximum transfer rate. This option
           accepts a floating point number to specify the transfer rate in
           kB/s. If you want get optimal settings, you also have to play
           with the size of the read buffer (option -bufsize ) because pavuk
           is doing flow control only at application level.

      -minrate $nr
           If you hate slow transfer rates, this option allows you to break
           transfers with slow speed. You can set the minimum transfer rate,
           and if the connection gets slower than the given rate, the
           transfer will be stopped. The minimum transfer rate is given in
           kB/s.

      -bufsize $nr
           This option is used to specify the size of the read buffer
           (default size: 32kB).  If you have a very fast connection, you
           may increase the size of the buffer to get a better read
           performance. If you need to decrease the transfer rate, you may
           need to decrease the size of the buffer and set the maximal
           transfer rate with the -maxrate option.  This option accepts the
           size of the buffer in kB.

      -fs_quota $nr
           If you are running pavuk on a multiuser system, you may need to
           avoid filling up your file system.  This option lets you specify
           how many space must remain free.  If pavuk detects an underrun of
           the free space, it will stop downloading files. Specify this
           quota in kB.

      -file_quota $nr
           This option is useful when you want limit downloading of big
           files, but want to download at least $nr kilobytes. A big file
           will be transferred, and when it reaches the specified size,
           transfer will break.  Such a document will be processed as
           properly downloaded, so be careful when using this option.

      -trans_quota $nr



                                   - 12 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           If you are aware that your selection should address a big amount
           of data, you can use this option to limit the amount of
           transferred data.

      -url_strategy $strategy
           This option allows you to specify a downloading order for URLs.
           This option accepts the following strings as parameters :

           level - will order URLs as it loads it from HTML files
           leveli - as previous, but inline objects URLs come first
           pre - will insert URLs from actual HTML document at start, before
           other
           prei - as previous, but inline objects URLs come first

      -send_if_range/-nosend_if_range
           send If-Range: header in HTTP request. I found out, that some
           HTTP servers (greetings, MS :-)) are sending different ETag:
           fields in different responses for the same, unchanged document.
           This generates problems when pavuk attempts to reget a document
           from such a server: pavuk will remember the old ETag value.  If
           the server checks it with the new ETag value and it differs, it
           will refuse to send only part of the document.

      -ssl_version $v
           Set required SSL protocol version for SSL communication. $v is
           one of ssl2, ssl23 or ssl3.  This option is available only when
           compiled with SSL support.

      -unique_sslid/-nounique_sslid
           This option can be used if you want to use a unique SSL ID for
           all SSL sessions.  This option is available only when compiled
           with SSL support.

      -use_http11/-nouse_http11
           This option is used to switch between HTTP/1.0 and HTTP/1.1
           protocol used with HTTP servers. Now is using of HTTP/1.1
           protocol not default because its implementation is very fresh and
           not 100% tested. Even though using of HTTP/1.1 is very
           recomended, because it is faster than HTTP/1.0 and uses less
           network bandwidth for initiating connections.

      -local_ip $addr
           You can use this option when you want to use specified network
           interface for communication with other hosts. This option is
           suitable for multihomed hosts with several network interfaces.
           Address should be entered as regular IP address or as host name.

      -identity $str
           this option allows you to specify content of User-Agent: field of



                                   - 13 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           HTTP request.  This is usable, when scripts on remote server
           returns different document on same URL for different browsers, or
           if some HTTP server refuse to serve document for Web robots like
           pavuk.

      -auto_referer/-noauto_referer
           this option forces pavuk to send HTTP Referer: header field with
           starting URLs.  Content of this field will be self URL. Using
           this option is required, when remote server checks the Referer:
           field.


 Authentification
      -auth_file $file
           file where you have stored authentification information for
           access to some service. For file structure see below in FILES
           section.

      -auth_name $user
           if you are using this parameter, program is doing
           authentification with each HTTP access to document. Use this only
           if you know that only one HTTP server could be accessed or use -
           asite option to specify site to which you use authentification.
           Else your auth parameters will be sent to each accessed HTTP
           server.

      -auth_passwd $passwd
           value of this parameter is used as password for authentification

      -auth_scheme {1 , 2 , 3}
           this parameter specifies used authentification scheme.
           1 means user authentification scheme is used as defined in
           HTTP/1.0 or HTTP/1.1.  Password and user name are sent unencoded.
           2 means Basic authentification scheme is used as defined in
           HTTP/1.0.  Password and user name are sent BASE64 encoded.
           3 means Digest access authentification scheme based on MD5
           checksums as defined in RFC2069.

      -auth_reuse_nonce/-noauth_reuse_nonce
           while using HTTP Digest access authentification scheme use first
           received nonce value in more following requests.

      -ssl_key_file $file
           file with public key for SSL certificate (learn more from SSLeay
           or OpenSSL documentation) This option is available only when
           compiled with SSL support (you need SSleay or OpenSSL libraries
           and development headers)

      -ssl_cert_file $file



                                   - 14 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           certificate file in PEM format (learn more from SSLeay or OpenSSL
           documentation) This option is available only when compiled with
           SSL support (you need SSleay or OpenSSL libraries and development
           headers)

      -ssl_cer_passwd $str
           password used to generate certificate (learn more from SSLeay or
           OpenSSL documentation) This option is available only when
           compiled with SSL support (you need SSLeay or OpenSSL libraries
           and development headers)

      -from $email
           this parameter is used when accessing anonymous FTP server as
           password or is optionaly inserted into From field in HTTP
           request. If not specified program discovers this from USER
           environment variable and from site hostname.

      -send_from/-nosend_from
           this option is used for enabling or disabling sending of user
           identification, entered in -from option , as FTP anonymous user
           password and From: field of HTTP request.  As default is this
           option off.


 Site/Domain Limitation Options
      -asite $list
           specify comma separated list of allowed sites on which referenced
           documents are stored.

      -dsite $list
           specify comma separated list of disallowed sites.  Previous
           parameter is opposite to this one. If both are used the last
           occurrence of them is used to be valid.

      -adomain $list
           specify comma separated list of allowed domains on which
           referenced documents are stored.

      -ddomain $list
           specify comma separated list of disallowed domains. Previous
           parameter is opposite to this one. If both are used the last
           occurrence of them is used to be valid.


 Limitation Document properties
      -amimet $list
           list of comma separated allowed MIME types

      -dmimet $list



                                   - 15 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           list of comma sepparated disallowed MIME types Previous parameter
           is opposite to this one. If both are used the last occurrence of
           them is used to be valid.

      -maxsize $nr
           maximal allowed size of document. This option is applied only
           when pavuk is able to detect the document before starting the
           transfer.

      -minsize $nr
           minimal allowed size of document This option is applied only when
           pavuk is able to detect the document before starting the
           transfer.

      -newer_than $time
           Allow only transfer of documents with modification time newer
           than specified in parameter $time. Format of $time is:
           YYYY.MM.DD.hh:mm.  To apply this option pavuk must be able to
           detect modification time of document.

      -older_than $time
           Allow only transfer of documents with modification time older
           than specified in parameter $time. Format of $time is:
           YYYY.MM.DD.hh:mm.  To apply this option pavuk must be able to
           detect modification time of document.

      -noCGI/-CGI
           this switch prevents to transfer dynamically generated parametric
           documents through CGI interface. This is detected with occurence
           of ? character inside URL.

      -alang $list
           this allows you to specify ordered comma separated list of
           preferred natural languages. This option work only with HTTP
           protocol using Accept-Language: MIME entry.

      -acharset $list
           this options allows you to enter comma separated list of prefered
           encoding of transfered documents. This works only with HTTP and
           HTTPS urls and only if such document encodings are located on
           destination server.
           example: -acharset iso-8859-2,windows-1250,utf8


 Limitation Document name
      -asfx $list
           this parameter allows you to specify set of suffixes used to
           restrict selection of documents which will be processed.




                                   - 16 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      -dsfx $list
           set of suffixes that are used to specify restriction on selection
           of documents.  This one is inverse to previous option. They are
           segregating each other.

      -aprefix $list , -dprefix $list
           this two options allow you to specify set of allowed or
           disallowed preffixes of documents. They are segregating each
           other.

      -pattern $pattern
           this option allows you to specify wildcart pattern for documents.
           All documents are tested if they match this pattern.

      -rpattern $reg_exp
           this is equal option as previous, but this uses regular
           expressions. Available only on platforms which have any supported
           RE implementation.

      -skip_pattern $pattern
           this option allows you to specify wildcard pattern for documents
           that should be skiped.  All documents are tested if they match
           this pattern.

      -skip_rpattern $reg_exp
           this is equal option as previous, but this uses regular
           expressions. Available only on platforms which have any supported
           RE implementation.

      -url_pattern $pattern
           this option allows you to specify wildcard pattern for URLs. All
           URLs are tested if they match this pattern.
           Example:
           -url_pattern http://\*.idata.sk:\*/~ondrej/\* . this option
           enables all HTTP URLs from domain .idata.sk on all ports which
           are located under /~ondrej/.

      -url_rpattern $reg_exp
           this is equal option as previous, but this uses regular
           expressions. Available only on platforms which have any supported
           RE implementation.

      -skip_url_pattern $pattern
           this option allows you to specify wildcard pattern for URLs that
           should be skiped.  All URLs are tested if they match this
           pattern.

      -skip_url_rpattern $reg_exp
           this is equal option as previous, but this uses regular



                                   - 17 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           expressions. Available only on platforms which have any supported
           RE implementation.

      -aip_pattern $re
           this option allows you to limit set of transferred documents by
           server IP address.  IP address can be specified as regular
           expressions, so it is posible to specify set of IP addresses by
           one expression.  Available only on platforms which have any
           supported RE implementation.

      -dip_pattern $re
           this option similiar to previous option, but is used to specifiy
           set of disallowed IP addresses.  Available only on platforms
           which have any supported RE implementation.

      -enable_js/-disable_js
           this options are used to enable or disable downloading of
           JavaScript script sources.  This doesn't mean, that pavuk will


 Limitation Protocol Option
      -noHTTP/-HTTP
           this switch supresses all transfers through HTTP protocol.

      -noSSL/-SSL
           this switch supresses all transfers through HTTPS protocol (HTTP
           protocol over SSL) .  This option is available only when compiled
           with SSL support (you need SSleay or OpenSSL libraries and
           development headers)

      -noGopher/-Gopher
           supress all transfers through Gopher Internet protocol.

      -noFTP/-FTP
           this switch prevents processing documents allocated on all FTP
           servers.

      -noFTPS/-FTPS
           this switch prevents processing documents allocated on all FTP
           servers accesed through SSL.

      -FTPhtml/-noFTPhtml
           By using of option -FTPhtml you can force pavuk to process HTML
           files downloaded with FTP protocol.

      -FTPdir/-noFTPdir
           force recursive processing of FTP directories too.

      -disable_html_tag $TAG,[$ATTRIB][;...]



                                   - 18 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           -enable_html_tag $TAG,[$ATTRIB][;...] enable or disable
           processing of particular HTML tags or attributes.

           For example if you don't want to process all images you should
           use option -disable_html_tag 'IMG,SRC;INPUT,SRC;BODY,BACKGROUND'
           .


 Other Limitation Options
      -subdir $dir
           subdirectory of local tree directory, to limit some of the modes
           {sync , resumeregets , linkupdate} in its tree scan.

      -dont_leave_site/-leave_site
           (Don't) leave starting site.

      -dont_leave_dir/-leave_dir
           (Don't) leave starting directory. If -dont_leave_dir option is
           used pavuk will stay only in starting directory (including its
           own subdirectories).

      -lmax $nr
           set maximal allowed level of tree traverse. Default is set to 0,
           what means that pavuk can traverse ad infinitum. As of version
           0.8pl1 inline objects of HTML pages are placed at same level as
           parent HTML page.

      -leave_level $nr
           maximal level of documents outside from site of starting URL.
           Default is set to 0. 0 means that checking is not applied.

      -site_level $nr
           maximal level of sites outside from site of starting URL. Default
           is set to 0. 0 means that checking is not applied.

      -dmax $nr
           set maximal allowed number of documents that are processed.
           Default value is 0. That means no restrictions are used in number
           of processed documents.

      -FTPlist/-noFTPlist
           When option -FTPlist will be used, pavuk will retrieve content of
           FTP directories with FTP command LIST instead of NLST. So the
           same listing will be retrieved as with "ls -l" UNIX command.
           This option is required if you need to preserve permisions of
           remote files or you need to preserve symbolic links.  Pavuk
           supports wide listing on FTP servers with regular BSD or SYSV
           style "ls -l" directory listing, on FTP servers with EPFL listing
           format, VMS style listing, DOS/Windows style listing and Novel



                                   - 19 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           listing format.

      -user_condition $str
           script or program name for users own conditions. You can write
           any script which should with exit value decide if download URL or
           not.  Script gets from pavuk any number of options, with this
           meaning :

              -url $url - processed URL
              -parent $url - any number of parent URLs
              -level $nr - level of this URL from starting URL
              -size $nr - size of requested URL
              -date $datenr - modification time of requested URL in format
              YYYYMMDDhhmmss

           Warning : use user conditions only if required because of big
           slowdowns caused by forking scripts for each checked URL.


 Cookie
      -cookie_file $file
           file where are stored cookie infos. This file must be in Netscape
           cookie file format (generated with Netscape Navigator or
           Comunicator ...).

      -cookie_send/-nocookie_send
           use collected cookies in HTTP/HTTPS requests.

      -cookie_recv/-nocookie_recv
           store received cookies from HTTP/HTTPS responses into memory
           cookie cache.

      -cookie_update/-nocookie_update
           update cookie file on disc and synchronize it with changes made
           by any concurrent processes.

      -cookies_max $nr
           maximal number of cookies in memory cookie cache

      -disabled_cookie_domains $list
           comma-separated list of cookie domains which are permited to send
           cookies stored into cookie cache

      -cookie_check/-nocookie_check
           check when receiving cookie, if cookie domain is equal to domain
           of server which sends this cookie.


 Filename/URL Conversion Option



                                   - 20 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      -noRelocate/-Relocate
           this switch prevents the program to rewrite relative URLs to
           absolute, after HTML document is transfered.

      -all_to_local/-noall_to_local
           this option forces pavuk to change all URLs inside HTML document
           to local URLs immediately after download of document.

      -sel_to_local/-nosel_to_local
           this option forces pavuk to change all URLs, which acomplish
           conditions for download, to local inside HTML document
           immediately afer download of document. I recommend to use this
           option, when you are sure, that transfer will be without any
           problems. This option should save a lot of processor time.

      -all_to_remote/-noall_to_remote
           this option forces pavuk to change all URLs inside HTML document
           to remote URLs immediately after download of document.

      -tr_del_chr $str
           all chracters found in $str will be deleted from local name of
           document.  $str should contain escape sequences similiar like in
           tr command :
           \n - newline
           \r - carrriage return
           \t - horizontal tab space
           \0xXX - hexadecimal  ASCII value
           [:upper:] - all uppercase letters
           [:lower:] - all lowercase letters
           [:alpha:] - all letters
           [:alnum:] - all letters and digits
           [:digit:] - all digits
           [:xdigit:] - all hexadecimal digits
           [:space:] - all horizontal and vertical whitespaces
           [:blank:] - all horizontal whitespaces
           [:cntrl:] - all control characters
           [:print:] - all printable characters including space
           [:nprint:] - all non printable characters
           [:punct:] - all punctation characters
           [:graph:] - all printable charactes excluding space

      -tr_str_str $str1 $str2
           string $str1 from local name of document will be replaced with
           $str2.

      -tr_chr_chr $chrset1 $chrset2
           characters from $chrset1 from local name of document will be
           replaced with corresponding character from $chrset2. $charset1
           and $charset2 should have same syntax as $str in -tr_del_chr



                                   - 21 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           option.

      -store_name $str
           whan you want to change local filename of first file downloaded
           with siglepage mode, you should use this option.

      -index_name $str
           with this option you can change directory index name. As default
           is used _._.html .

      -store_index/-nostore_index
           With option -nostore_index you should deny storing of directory
           indexes into HTML files.

      -fnrules $t $m $r
           Uff this is quiet powerfull option ! This option is used to
           flexible change layout of local document tree. It accepts three
           parameters. First parameter $t is used to say what type is
           following pattern. F is used for wildcard pattern (uses
           fnmatch()) and R is used for regular expression pattern (using
           any supported RE implementation).  Second parameter is matching
           pattern used to select URLs for this rule.  If URL match this
           pattern, then local name for this URL is computed following rules
           of third parameter.  And third parameter is local name building
           rule. Pavuk now supports two kinds of local name building rules.
           One is simple based only on simple macros and other more
           complicated extended rule, which also enables to perform several
           functions. Recognition between those two kinds of rules is done
           by looking at first character of rule.  In case when first
           character is '(' , rule is extended and in all other cases it is
           the simple kind of rule.

           Simple rule should contain literals or escaped macros.  Macros
           are escaped by % character or by $ character.

           Here is list of recognized macros:


           $x - wher x is any positive number. This macro is replaced with
           x-th substring matched by RE pattern. (if you use this you need
           to understan RE !)
           %i - is replaced with protocol id (http,https,ftp,gopher)
           %p - is replaced with password. (use this only when usable)
           %u - is replaced with username.
           %h - is replaced with host name.
           %m - is replaced with domain name.
           %r - is replaced with port number.
           %d - is replaced with path to document.
           %n - is replaced with document name.



                                   - 22 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           %b - is replaced with basename of document (without extension).
           %e - is replaced with extension.
           %s - is replaced with searchstring.
           %x - where x is positive number. This macro is replaced with x-th
           directory from path to document from begining.
           %-x - where x is positive number. This macro is replaced with x-
           th directory from path to document from end.

           Here is example. If you want place document into single
           directories by extension, you should use following fnrules
           option:
           -fnrules F '*' '/%e/%n'

           Extended rule ever begins with character '('. It uses some kind
           of LISP like syntax.

           Here are base rules for writing - the local filename of of this
           kind is return value function
           - each function is enclosed inside round braces ()
           - first token right after openinng brace is function name
           - each function have nonzero fixed number of parameters
           - each function returns numeric or string value
           - function parameters are separated by any number of space
           characters
           - parameter of function shuld be string, number, macro or other
           function
           - string is ever quoted with "
           - each numeric parameter can be in any encoding supported by
           strtod() function (octal, decimal, hexadecimal, ...)
           - there is no implicit conversion from number to string
           - each macro is prefixed by % character and is one chracter long
           - each macro is replaced by its string representation from
           current URL
           - function parameters are typed strictly
           - toplevel function must return string value

           Extended rule supports full set of % escaped macros supported
           with simple rules, plus two folowing addition macros :
           %U - URL string
           %o - default localname for URL

           Here is description of all supported

           sc - concat two string parameters
              - accepts two string parameters
              - returns string value
           ss - substring form string
              - accepts three prameters.
                - first is string from which we want to cut subpart



                                   - 23 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



                - second is number which represents starting position in
           string
                - third is number which represents ending position in string
              - returns string value
           hsh - compute modulo hash value from string with specified base
              - accepts two parameters
                - first is string for which we are computing the hash value
                - second is numeric value for base of modulo hash
              - returns numeric value
           md5 - compute MD5 checksum for string
              - accepts one string value
              - returns string which represents MD5 checksum
           lo - convert all characters inside string to lower case
              - accepts ane string value
              - returns string value
           up - convert all chracters inside string to upper case
              - accepts one string value
              - returns string value
           ue - encode unsafe characters in string with same encoding which
           is used for encoding unsafe characters inside URL (%xx) As
           default are encoded all nonascii values when this function is
           used.
              - accepts two string values
                - first is string which we want to encode
                - second is string which contains unsafe characters
              - return string value
           dc - delete unwanted characters from string (have similiar
           functionality as -tr_del_chr option)
              - accepts two string values
                - first is string from which we want delete
                - second is string which contains characters we want to
           delete.
              - returns string value
           tc - replace character with other character in string (have
           similiar functionality as -tr_chr_chr option)
              - accepts three string values
                - first is string inside which we want to replace characters
                - second is set of characters which we want to replace
                - third is set of characters with which we are replacing
              - returns string value
           ts - replace some string inside string with any other string
           (have similiar functionality as -tr_str_str option)
              - accepts three string values
                - first is string inside which we want to replace string
                - second is the from string
                - third is to string
              - returns string value
           spn - calculate initial lenght of string which contains only
           specified set of characters.  (have same functionality as



                                   - 24 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           strspn() libc function)
              - accepts two string values
                - first is input string
                - second is set of acceptable characters
              - returns numeric value
           cspn - calculate initial lenght of string which doesn't contain
           specified set of characters.  (have same functionality as
           strcspn() libc function)
              - accepts two string values
                - first is input string
                - second is set of unacceptable characters
              - returns numeric value
           sl - calculate lenght of string
              - accepts one string value
              - returns numeric value
           ns - convert number to string by format
              - accepts two parameters
                - first parameter is format string same as for printf()
           function
                - second is number which we want to convert
              - returns string value
           lc - return position of last occurence of specified chracter
           inside string
              - accepts two string parameters
                - first string which we are searching in
                - second string contains character for which we are looking
           for
              - returns numeric value
           + - add two numeric values
              - accepts two numeric values
              - returns numeric value
           - - subtract two numeric values
              - accepts two numeric values
              - returns numeric value
           % - modulo addition
              - accepts two numeric values
              - returns numeric value
           * - multiple two numeric values
              - accepts two numeric values
              - returns numeric value
           / - divide two numeric values
              - accepts two numeric values
              - returns numeric value

           For example, if you are mirroring very huge number of internet
           sites into same local directory, too much enties in one
           directory, should cause performance problems. You may use for
           example hsh or md5 functions to generate one additional level of
           hash directies based on hostname whit one of following options :



                                   - 25 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           -fnrules F '*' '(sc (nc "%02d/" (hsh %h 100)) %o)'
           -fnrules F '*' '(sc (ss (md5 %h) 0 2) %o)'

      -base_level $nr
           Number of directory levels to omit in local tree.

           For example when downloading URL
           ftp://ftp.idata.sk/pub/unix/www/pavuk-0.7pl1.tgz you enter at
           command line -base_level 4 in local tree will be created
           www/pavuk-0.7pl1.tgz not ftp/ftp.idata.sk_21/pub/unix/www/pavuk-
           0.7pl1.tgz as normaly.

      -remove_adv/-noremove_adv
           This option is used for turn on/off of removing HTML tags which
           contains advertisement banners.  The banners are not removed from
           HTML file, but are commented out. Such URLs will also not be
           downloaded.  This option have efect only when used used option -
           adv_re. This option is available only when your system have
           support for POSIX or Bell V8 regualar expressions.

      -adv_re $RE
           This option is used to specify regular expressions for matching
           URLs of advertisement banners.  For example : -adv_re
           http://ad.doubleclick.net/.* is used to match all files from
           server ad.doubleclick.net.  This option is available only when
           your system have any supported regualar expressions
           implementation.


 Other Option
      -sleep $nr
           this option allows you to specify number of seconds during that
           the program will be suspended between two transfers.

      -ddays $nr
           if document has modification time later as $nr days , then in
           sync mode pavuk attempts to retrieve newer copy of document from
           remote server.

      -remove_old/-noremove_old
           remove improper documents (that , which doesn't exist on remote
           site).  This option have effect only when used in sync mode.

      -browser $str
           is used to set your browser command (in URL tree you can use
           right click to raise menu, from which you can start browser on
           actualy selected URL).  This option is available only when
           compiled with GTK GUI and with support for URL tree preview.




                                   - 26 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      -debug/-nodebug
           turns on displaying of debug messages. This option is available
           only when compiled with -DDEBUG.  If -debug option is used pavuk
           will output verbose information about documents, whole protocol
           level information, locking informations and more (depends on
           -debug_level setup).

      -debug_level $level
           Set level of required debug informations. $level can be numeric
           value which represent binary mask for requested debug levels, or
           comma separated list of supported debug levels.  Currently pavuk
           supports following debug levels :
           html - for HTML parser debuging
           protos - to see server side protocol messages
           protoc - to see client side protocol messages
           procs - to see some special procedure calls
           locks - for debuging of documents locking
           net - for debuging some low level netwok stuff
           misc - for miscelanous unsorted debug messages
           user - for verbose user level messages
           all - request all currently supported debug levels

      -remind_cmd $str
           this option have effect only when running pavuk in reminder mode.
           To command speciefied with this option pavuk sends result of
           running reminder mode.  There are listed URLs which are changed
           and URLs which have any errors.

      -nscache_dir $dir
           Path to Netscape browser cache directory. If you specify this
           path, pavuk tryies to find out if you have URL in this cache. If
           URL is there it will be fetched else pavuk will download it from
           network.

      -post_cmd $str
           Post-processing command, which will be executed after sucessfull
           download of document.  This command may somehow handle with
           document. During time of running this command, pavuk leaves
           actual document locked, so there isn't chance that some other
           pavuk process will modify document.  This postprocessing command
           will get three additional parameters from pavuk.
              - local name of document
              - 1/0 1 if document is HTML document, 0 if not
              - original URL of this document


 ENVIRONMENTAL VARIABLES
      USER variable is used to construct email address from user and
           hostname



                                   - 27 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      LC_* or LANG
           used to set internationalized environment

      PAVUKRC_FILE
           with this variable you can specify alternative location for your
           pavukrc configuration file.


 REQUIRED EXTERNAL PROGRAMS
      at   is used for scheduling.

      gunzip
           is used to decode gzip or compress encoded documents.

 Bugs
      If you find any, please let me know.

 FILES
      /opt/pavuk/etc/pavukrc

      ~/.pavukrc

      ~/.pavuk_prefs

           These files are used as default configuration files.  You may
           specify there some constant values like your proxy server or your
           prefered WWW browser. Configuration options reflect command line
           options.  Not all parameters are suitable for use in default
           configuration file. You should select only some of them, which
           you really need.

           File ~/.pavuk_prefs is special file which contains automaticaly
           stored configuration. This file is used only when runing GUI
           interface of pavuk and option -prefs is active.

           First (if present) parsed file is /opt/pavuk/etc/pavukrc then
           ~/.pavukrc (if present), then ~/.pavuk_prefs (if present).  Last
           the command line is parsed. The precedence is as follows :

           - highest -
           Entered in user interface
           Entered in command line
           ~/.pavuk_prefs
           ~/.pavukrc
           /opt/pavuk/etc/pavukrc
           - lowest -

           Here is table of config file - command line options pairs.




                                   - 28 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           MaxLevel:                  --->  -lmax
           MaxDocs:                   --->  -dmax
           MaxSize:                   --->  -maxsize
           MinSize:                   --->  -minsize
           SleepBetween:              --->  -sleep
           MaxRetry:                  --->  -retry
           MaxRegets:                 --->  -nregets
           MaxRedirections:           --->  -nredirs
           CommTimeout:               --->  -timeout
           RegetRollbackAmount:       --->  -rollback
           DocExpiration:             --->  -ddays
           UseCache:                  --->  -nocache
           UseRobots:                 --->  -noRobots
           AllowFTP:                  --->  -noFTP
           AllowHTTP:                 --->  -noHTTP
           AllowSSL:                  --->  -noSSL
           AllowGopher:               --->  -noGopher
           AllowCGI:                  --->  -noCGI
           AllowGZEncoding:           --->  -noEnc
           AllowFTPRecursion:         --->  -FTPdir
           ForceReget:                --->  -force_reget
           Debug:                     --->  -debug
           AllowedSites:              --->  -asite
           DisallowedSites:           --->  -dsite
           AllowedDomains:            --->  -adomain
           DisallowedDomains:         --->  -ddomain
           AllowedPrefixes:           --->  -aprefix
           DisallowedPrefixes:        --->  -dprefix
           AllowedSufixes:            --->  -asfx
           DisallowedSufixes:         --->  -dsfx
           AllowedMIMETypes:          --->  -amimet
           DisallowedMIMETypes:       --->  -dmimet
           PreferredLanguages:        --->  -alang
           PreferredCharset:          --->  -acharset
           WorkingDir:                --->  -cdir
           WorkingSubDir:             --->  -subdir
           HTTPAuthorizationScheme:   --->  -auth_scheme
           HTTPAuthorizationName:     --->  -auth_name
           HTTPAuthorizationPassword: --->  -auth_passwd
           AuthReuseDigestNonce:      --->  -auth_reuse_nonce
           SSLCertPassword:           --->  -ssl_cert_passwd
           SSLCertFile:               --->  -ssl_cert_file
           SSLKeyFile:                --->  -ssl_key_file
           EmailAddress:              --->  -from
           MatchPattern:              --->  -pattern
           REMatchPattern:            --->  -rpattern
           SkipMatchPattern:          --->  -skip_pattern
           SkipREMatchPattern:        --->  -skip_rpattern
           URLMatchPattern:           --->  -url_pattern



                                   - 29 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           URLREMatchPattern:         --->  -url_rpattern
           SkipURLMatchPattern:       --->  -skip_url_pattern
           SkipURLREMatchPattern:     --->  -skip_url_rpattern
           DefaultMode:               --->  -mode
           FTPProxy:                  --->  -ftp_proxy
           HTTPProxy:                 --->  -http_proxy
           SSLProxy:                  --->  -ssl_proxy
           GopherProxy:               --->  -gopher_proxy
           FTPViaHTTPProxy:           --->  -ftp_httpgw
           GopherViaHTTPProxy:        --->  -gopher_httpgw
           HTTPProxyUser:             --->  -http_proxy_user
           HTTPProxyPass:             --->  -http_proxy_pass
           HTTPProxyAuth:             --->  -http_proxy_auth
           AuthReuseProxyDigestNonce: --->  -auth_reuse_proxy_nonce
           Browser:                   --->  -browser
           ScenarioDir:               --->  -scndir
           ShowProgress:              --->  -progress
           XMaxLogSize:               --->  -xmaxlog
           LogFile:                   --->  -logfile
           RemoveOldDocuments:        --->  -remove_old
           AuthFile:                  --->  -auth_file
           BaseLevel:                 --->  -base_level
           FTPDirtyProxy:             --->  -ftp_dirtyproxy
           ActiveFTPData:             --->  -ftp_active/-ftp_passive
           ShowDownloadTime:          --->  -stime
           NLSMessageCatalogDir:      --->  -msgcat
           Quiet:                     --->  -quiet/-verbose
           NewerThan:                 --->  -newer_than
           OlderThan:                 --->  -older_than
           Reschedule:                --->  -reschedule
           DontLeaveSite:             --->  -dont_leave_site/-leave_site
           DontLeaveDir:              --->  -dont_leave_dir/-leave_dir
           PreserveTime:              --->  -preserve_time/-nopreserve_time
           LeaveLevel:                --->  -leave_level
           GUIFont:                   --->  -gui_font
           UserCondition:             --->  -user_condition
           CookieFile:                --->  -cookie_file
           CookieSend:                --->  -cookie_send/-nocookie_send
           CookieRecv:                --->  -cookie_recv/-nocookie_recv
           CookieUpdate:              --->  -cookie_update/-nocookie_update
           CookiesMax:                --->  -cookies_max
           CookieCheckDomain:         --->  -cookie_check/-nocookie_check
           DisabledCookieDomains:     --->  -disabled_cookie_domains
           DisableHTMLTag:            --->  -disable_html_tag
           EnableHTMLTag:             --->  -enable_html_tag
           TrDeleteChar:              --->  -tr_del_chr
           TrStrToStr:                --->  -tr_str_str
           TrChrToChr:                --->  -tr_chr_chr
           IndexName:                 --->  -index_name



                                   - 30 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           StoreName:                 --->  -store_name
           PreservePermisions:        --->  -preserve_perm/-nopreserve_perm
           PreserveAbsoluteSymlinks:  --->  -preserve_slinks/-nopreserve_slinks
           FTPListCMD:                --->  -FTPlist/-noFTPlist
           MaxRate:                   --->  -maxrate
           MinRate:                   --->  -minrate
           ReadBufferSize:            --->  -bufsize
           BgMode:                    --->  -bg/-nobg
           CheckSize:                 --->  -check_size/-nocheck_size
           SLogFile:                  --->  -slogfile
           Identity:                  --->  -identity
           SendFromHeader:            --->  -send_from/-nosend_from
           RunX:                      --->  -runX
           FnameRules:                --->  -fnrules
           StoreDocInfoFiles:         --->  -store_info/-nostore_info
           AllLinksToLocal:           --->  -all_to_local/-noall_to_local
           AllLinksToRemote:          --->  -all_to_remote/-noall_to_remote
           SelectedLinksToLocal:      --->  -sel_to_local/-nosel_to_local
           ReminderCMD:               --->  -remind_cmd
           AutoReferer:               --->  -auto_referer/-noauto_referer
           URLsFile:                  --->  -urls_file
           UsePreferences:            --->  -prefs/-noprefs
           FTPhtml:                   --->  -FTPhtml/-noFTPhtml
           StoreDirIndexFile:         --->  -store_index/-nostore_index
           Language:                  --->  -language
           FileSizeQuota:             --->  -file_quota
           TransferQuota:             --->  -trans_quota
           FSQuota:                   --->  -fs_quota
           EnableJS:                  --->  -enable_js/-disable_js
           UrlSchedulingStrategy:     --->  -url_strategy
           NetscapeCacheDir:          --->  -nscache_dir
           RemoveAdvertisement:       --->  -remove_adv/-noremove_adv
           AdvBannerRE:               --->  -adv_re
           CheckIfRunnigAtBackground: --->  -check_bg/-nocheck_bg
           SendIfRange:               --->  -send_if_range/-nosend_if_range
           SchedulingCommand:         --->  -sched_cmd
           UniqueLogName:             --->  -unique_log/-nounique_log
           PostCommand:               --->  -post_cmd
           URL:                       --->  one URL (more lines with URL:
                                            ... means more URL's)

      line which begins with '#' means comment.
      TrStrToStr: and TrChrToChr: must contain two quoted strings.  All
      parameter names are case insensitive. If here is missing any option,
      try to look inside config.c source file.

      See pavukrc.sample file for example

      .pavuk_authinfo



                                   - 31 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



           File should contain as many authentification records as you need.
           Records are separated by any number of empty lines.  Parameter
           name is case insensitive.

           Structure of record:

           Proto: <proto ID>    ---> identification of protocol
                                     (ftp/http/https/..)
                                - required field
           Host: <host:[port]>  ---> host name
                                - required field
           User: <user>         ---> name of user
                                - optional
           Pass: <password>     ---> password for user
                                - optional
           Base: <path>         ---> base prefix of document path
                                - optional
           Realm: <name>        ---> realm for HTTP authentification
                                - optional
           Type: <type>         ---> HTTP authentification scheme
                                          - 1 - user auth scheme
                                          - 2 - Base auth scheme (default)
                                          - 3 - Digest auth scheme
                                - optional

      see pavuk_authinfo.sample file for example

      ~/.pavuk_keys
           this is file where are stored information about configurable menu
           option shortcuts. This is available only when compiled with
           Gtk+1.2 and higher.

      ~/.pavuk_remind_db
           this file contains informations about URLs for running in
           reminder mode. Structure of this file is very easy. Each line
           contains information abou one URL.  first entry in line is last
           known modification time of URL (stored in time_t format - number
           of secons from 1.1.1970 GMT).  And second entry is URL.


 SEE ALSO
      look into ChangeLog file for more informations about new features in
      particular versions of pavuk.


 AUTHOR
      Ondrejicka Stefan, <ondrej@idata.sk>
      Grammatic corrections in this man page by Kai Duebbert <kad@gmx.de>
      Resorted by Sergey Taranenko <star@itk.dp.ua>



                                   - 32 -         Formatted:  April 19, 2024






 pavuk(0.9pl25c)                 28 Jan 2000                 pavuk(0.9pl25c)
 Internet utils                                               Internet utils

                                      1



      Many corrections by Colin Marquardt.


 AVAILABILITY
      pavuk is available via anonymous FTP from
      ftp://ftp.idata.sk/pub/unix/www/ or from
      ftp://sunsite.unc.edu/pub/Linux/apps/www/mirroring/. Or via Pavuk
      HOMEPAGE at http://www.idata.sk/~ondrej/pavuk/









































                                   - 33 -         Formatted:  April 19, 2024