pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
NAME
pavuk - HTTP , HTTP over SSL , FTP, FTP over SSL and Gopher recursive
document retrieval program
SYNOPSIS
pavuk [-mode {normal | resumeregets | [-X] [-runX] [-bg/-nobg]
[prefs/-noprefs] [-h] [-v] [-logfile $file] [-slogfile $file] [-
auth_file $file] [-cdir $dir] [-scndir $dir] [-scenario $str] [-dumpscn
$filename] [-lmax $nr] [-dmax $nr] [-leave_level $nr] [-maxsize $nr]
[-minsize $nr] [-asite $list] [-dsite $list] [-adomain $list] [-ddomain
$list] [-asfx $list] [-dsfx $list] [-aprefix $list] [-dprefix $list]
[-amimt $list] [-dmimet $list] [-pattern $pattern] [-url_pattern
$pattern] [-rpattern $regexp] [-url_rpattern $regexp] [-skip_pattern
$pattern] [-skip_url_pattern $pattern] [-skip_rpattern $regexp] [-
skip_url_rpattern $regexp] [-newer_than $time] [-older_than $time] [-
schedule $time] [-reschedule $nr] [-dont_leave_site/-leave_site] [-
dont_leave_dir/-leave_dir] [-http_proxy $site[:$port]] [-ftp_proxy
$site[:$port]] [-ssl_proxy $site[:$port]] [-gopher_proxy $site[:$port]]
[-ftp_httpgw/-noftp_httpgw] [-ftp_dirtyproxy/-noftp_dirtyproxy] [-
gopher_httpgw/-nogopher_httpgw] [-noFTP/-FTP] [-noHTTP/-HTTP] [-
noSSL/-SSL] [-noGopher/-Gopher] [-FTPdir/-noFTPdir] [-noCGI/-CGI] [-
FTPlist/-noFTPlist] [-FTPhtml/-noFTPhtml] [-noRelocate/-Relocate] [-
force_reget/-noforce_reget] [-nocache/-cache] [-check_size/-
nocheck_size] [-noRobots/-Robots] [-noEnc/-Enc] [-auth_name $user] [-
auth_passwd $pass] [-auth_scheme 1/2/3] [-auth_reuse_nonce/-
no_auth_reuse_nonce] [-http_proxy_user $user] [-http_proxy_pass $pass]
[-http_proxy_auth 1/2] [-auth_reuse_proxy_nonce/-
no_auth_reuse_proxy_nonce] [-ssl_key_file $file] [-ssl_cert_file $file]
[-ssl_cert_passwd $pass] [-from $email] [-send_from/-nosend_from] [-
identity $str] [-auto_referer/-noauto_referer] [-alang $list] [-
acharset $list] [-retry $nr] [-nregets $nr] [-nredirs $nr] [-
preserve_time/-nopreserve_time] [-preserve_perm/-nopreserve_perm] [-
preserve_slinks/-nopreserve_slinks] [-bufsize $nr] [-maxrate $nr] [-
minrate $nr] [-user_condition $str] [-cookie_file $file] [-
cookie_send/-nocookie_send] [-cookie_recv/-nocookie_recv] [-
cookie_update/-nocookie_update] [-cookies_max [-disable_html_tag
$TAG,[$ATTRIB][;...]] [-enable_html_tag $TAG,[$ATTRIB][;...]] [-
tr_del_chr $str] [-tr_str_str $str1 $str2] [-tr_chr_chr [-index_name
$str] [-store_index/-nostore_index] [-store_name $str] [-debug/-
nodebug] [-debug_level $level] [-browser $str] [-urls_file $file] [-
file_quota $nr] [-trans_quota $nr] [-fs_quota $nr] [-fnrules $t $m $r]
[-store_info/-nostore_info] [-all_to_local/-noall_to_local] [-
sel_to_local/-nosel_to_local] [-all_to_remote/-noall_to_remote] [-
url_strategie $strategie] [-remove_adv/-noremove_adv] [-adv_re $RE] [-
check_bg/-nocheck_bg] [-send_if_range/-nosend_if_range] [-sched_cmd
$str] [-unique_log/-nounique_log] [-post_cmd $str] [-ssl_version $v]
[-unique_sslid/-nounique_sslid] [-aip_pattern $re] [-dip_pattern $re]
[-use_http11/-nouse_http11] [-local_ip $addr] [-request $req] [-
formdata $req] [-nthreads $nr] [-immesg/-noimmesg] [-dumpfd $nr] [URLs]
- 1 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
pavuk -mode {normal | singlepage | [-base_level $nr]
pavuk -mode sync [-ddays $nr] [-subdir $dir] [-remove_old/-
noremove_old]
pavuk -mode resumeregets [-subdir $dir]
pavuk -mode linkupdate [-X] [-h] [-v] [-cdir $dir] [-subdir
pavuk -mode reminder [-remind_cmd $str]
DESCRIPTION
This manual page describes how to use pavuk. Pavuk can be used to
mirror contents of internet/intranet servers and to maintain copies in
a local tree of documents. Pavuk stores retrieved documents in
locally mapped disk space. The structure of the local tree is the same
as the one on the remote server. Each supported service (protocol) has
its own subdirectory in the local tree. Each referenced server has
its own subdirectory in these protocols subdirectories; followed by
the port number on which the service resides, delimited by character
can be be changed. With the option -fnrules you can change the default
layout of the local document tree, without losing link consistency.
With pavuk it is possible to have up-to-date copies of remote
documents in the local disk space.
As of version 0.3pl2, pavuk can automatically restart broken
connections, and reget partial content from an FTP server (which must
support the REST command), from a properly configured HTTP/1.1 server,
or from a HTTP/1.0 server which supports Ranges.
As of version 0.6 it is possible to handle configurations via so
called scenarios. The best way to create such a configuration file is
to use the X Window interface and simply save the created
configuration. The other way is to use the -dumpscn switch.
As of version 0.7pl1 it is possible to store authentification
information into an authinfo file, which pavuk can then parse and use.
As of version 0.8pl4 pavuk can fetch documents for use in a local
proxy/cache server without storing them to local documents tree.
As of version 0.9pl4 pavuk supports SOCKS (4/5) proxies if you have
the required libraries.
As of version 0.9pl12 pavuk can preserve permissions of remote files
and symbolic links, so it can be used for powerful FTP mirroring.
Pavuk supports SSL connections to FTP servers, if you specify ftps://
URL instead of ftp://.
Pavuk can automaticaly handle file names with unsafe characters for
filesystem. This is yet implemented only for Win32 platform and it is
hardcoded.
Pavuk can now use HTTP/1.1 protocol for communication with HTTP
servers. It can use persistant connections, so one TCP connection
should be used to transfere several documents without closing it. This
feature saves netwok bandwidth and also speedup network communication.
- 2 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
Pavuk can do configurable POST requests to HTTP servers and support
also file uploading via HTTP POST request.
Pavuk can run configurable number of concurently runnig downloading
threads when compild with multithreading support.
Format of supported URLs
HTTP
http://[[user][:password]@]host[:port][/document]
[[user][:password]@]host[:port][/document]
HTTPS
https://[[user][:password]@]host[:port][/document]
ssl[.domain][:port][/document]
FTP
ftp://[[user][:password]@]host[:port][/relative_path]
ftp://[[user][:password]@]host[:port][//absolute_path]
ftp[.domain][:port][/document]
FTPS ftps://[[user][:password]@]host[:port][/relative_path]
ftps://[[user][:password]@]host[:port][//absolute_path]
ftps[.domain][:port][/document]
Gopher gopher://host[:port][/type[document]]
gopher[.domain][:port][/type[document]]
OPTIONS
All options are case insensitive.
List of options chapters
Mode
Help
Indicate/Logging/Interface options
Special start
Scenario/Task options
Directory options
Preserve options
Proxy options
Proxy Authentification
Protocol/Download Option
Authentification
Site/Domain Limitation Options
Limitation Document properties
Limitation Document name
Limitation Protocol Option
- 3 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
Other Limitation Options
Cookie
Filename/URL Conversion Option
Other Options
Mode
-mode {normal , linkupdate , sync
set operation mode.
normal - retrieves recursive documents
linkupdate - update remote URLs in local HTML documents to local
URLs if these URLs exist in the local tree
sync - synchronize remote documents with local tree (if a local
copy of a document is older than remote, the document is
retrieved again, otherwise nothing happens)
singlepage - URL is retrieved as one page with all inline objects
(picture, sound ...)
resumeregets - pavuk scans the local tree for files that were not
retrieved fully and retrieves them again (uses partial get if
possible)
singlereget - get URL until it is retrieved in full
dontstore - transfer page from server, but don't store it to the
local tree. This mode is suitable for fetching pages that are
held in a local proxy/cache server.
reminder - used to inform the user about changed documents
ftpdir - used to list of contents of FTP directories
default operation mode is normal mode.
Help
-h print long verbose help message
-v version informations and configuration at compilation time.
Indicate/Logging/Interface options
-quiet
Don't show any messages on the screen.
-verbose
Force to show output messages on the screen (default)
-progress/-noprogress
show retrieving progress while running in the terminal.
-stime/-nostime
show start and end time of transfer.
- 4 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
-xmaxlog $nr
maximal number of log lines in the Log widget. 0 means unlimited.
This option is available only when compiled with the GTK+ GUI.
-logfile $file
file where all produced messages are stored.
-unique_log/-nounique_log
When logfile as specified with the option -logfile is already
used by another process, try to generate new unique name for the
log file.
-slogfile $file
file to store short logs in. This file contains one line of
informations per processed document. This is meant to be used in
connection with any sort of script to produce some statistics,
for validating links on your website, or for generating simple
sitemaps. Multiple pavuk processes can use this file
concurrently, without overwriting each others entries. Record
structure:
- PID of pavuk process
- TIME current time
- COUNTER in the format current/total number of URLs
- STATUS contains the type of the error: FATAL, ERR,
WARN or OK
- ERRCODE is the number code of the error
(see errcode.h in pavuk sources)
- URL of the document
- PARENTURL first parent document of this URL
(when it doesn't have parent - [none])
- FILENAME is the name of the local file the
document is saved under
- SIZE size of requested document if known
- DOWNLOAD_TIME time which takes downloading of this
document in format seconds.mili_seconds
- HTTPRESP contains the first line of the HTTP server
response
-language $str
native language that pavuk should use for communication with its
user (works only when there is a message catalog for that
language) GNU gettext support (for message internationalization)
must also be compiled in.
-gui_font $font
font used in the GUI interface. To list available X fonts use the
xlsfonts command. This option is available only when compiled
with GTK+ GUI support.
- 5 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
Special start
-X start program with X Window interface (if compiled with support
for GTK+).
-runX
When used together with the -X option, pavuk starts processing of
URLs immediately after the GUI window is launched. Without the -X
given, this option doesn't have any effect. Only available when
compiled with GTK+ support .
-bg/-nobg
This option allows pavuk to detach from its terminal and run in
background mode. Pavuk will not output any messages to the
terminal then. If you want to see messages, you have to use the
-log_file option to specify a file where messages will be
written.
-check_bg/-nocheck_bg
Normally, programs sent into the background after being run in
foreground continue to output messages to the terminal. If this
option is activated, pavuk checks if it is running as background
job and will not write any messages to the terminal in this case.
After it becomes a foreground job again, it will start writing
messages to terminal in the normal way. This option is available
only when your system supports retrieving of terminal info via
tc*() functions.
-prefs/-noprefs
When you turn this option on, pavuk will preserve all settings
when exiting, and when you run pavuk with GUI interface again,
all settings will be restored. The settings will be stored in
the ~./pavuk_prefs file. This option is available only when
compiled with GTK+.
-schedule $time
Execute pavuk at the time specified as parameter. The Format of
the $time parameter is YYYY.MM.DD.hh.mm. You need a properly
configured scheduling with the at command on your system for
using this option.
-reschedule $nr
Execute pavuk periodically with $nr hours period. You need
properly configured scheduling with the at command on your system
for using this option.
-sched_cmd $str
Command to use for scheduling. Pavuk explicitly supports
scheduling with at $str should contain regular characters and
macros, escaped by % character. Supported macros are:
- 6 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
%f
- for script filename
%t
- for time (in format HH:MM)
- all macros as supported by the strftime() function
-urls_file $file
If you use this option, pavuk will read URLs from $file before it
starts processing. In this file, each URL needs to be on a
separate line. After the last URL, a single dot . followed by a
LF (line-feed) character denotes the end. Pavuk will start
processing right after all URLs have been read. If $file is
given as the - character, standard input will be read.
-store_info/-nostore_info
This option causes pavuk to store information about each document
into a separate file in the .pavuk_info directory. This file is
used to store the original URL from which the document was
downloaded. For files that are downloaded with the HTTP or HTTPS
protocols, the whole HTTP response header is stored there. I
recommend to use this option when you are using options that
change the default layout of the local document tree, because
this info file helps pavuk to map the local filename to the URL.
This option is also very useful when different URLs have the same
filename in the local tree. When this occurs, pavuk detects this
using info files, and it will prefix the local name with numbers.
-request $req
With this option you can specify extended informations for
starting URLs. With this option you can specify query data for
POST or GET . Current syntax of this option is : URL:["]$url["]
[METHOD:["]{GET|PUT}["]] [ENCODING:["]{u|m}["]]
[FIELD:["]variable=value["]] [FILE:["]variable=filename["]]
- URL: specifies request URL
- METHOD: specifies request method for URL and is
one of GET or POST.
- ENCODING: specifies encoding for request body data.
m is for multipart/form-data encoding
u is for application/x-www-form-urlencoded
encoding
- FIELD: specifies field of request data in format
variable=value. For encoding of special characters
in variable and value you can use same encoding
as is used in application/x-www-form-urlencoded
encoding.
- FILE: specifies special field of query, which is
used to specify file for POST based file upload.
- 7 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
-formdata $req
This option gives you chance to specify contents for HTML forms
found during traversing document tree.
Current syntax of this option is same as for -request option,
but ENCODING: and METHOD: are meaningless in this option
semantics.
In URL: you have to specify HTML form action URL, which will be
matched against action URLs found in processed HTML documents. If
pavuk finds action URL which matches that supplied in -formdata
option, pavuk will construct GET or POST request from data
suplied in this option and from default form field values
supplied in HTML document. Values supplied on commandline have
precedence before that supplied in commandline.
-nthreads $nr
By means of this option you can specify how many concurrent
threads will download documents. This option is available only
when pavuk is compiled to support multithreading.
-immesg/-noimmesg
Default pavuks behaviour when running multiple downloading
threads is to buffer all output messages in memory buffer and
flush that buffered data just when thread finishes processing of
one document. With this option you can change this behaviour to
see the messages immediatly when it is produced. It is only
usable when you want to debug some specials. This option is
available only when pavuk is compiled to support multithreading.
-dumpfd $nr
For scripting is sometimes usable to be able to download document
directly to pipe or variable instead of storing it to regular
file. In such case you can use this option to dump data for
example to stdout ($nr = 1).
Scenario/Task options
-scenario $str
name of scenario to load and/or run. Scenarios are files with a
structure similiar to the .pavukrc file. Scenarios contain saved
configurations. You can use it for periodical mirroring.
Parameters from scenarios specified at the command line should be
overwritten by command line parameters. To be able to use this
option, you need to scpecify scenario base directory with option
-scndir.
-dumpscn $filename
Store actual configuration into scenario file with name
$filename.
- 8 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
Directory options
-msgcat $dir
directory which contains the message catalog for pavuk. If you do
not have permission to store a pavuk message catalog in the
system directory, you should simply create a similiar structure
of directories in your home directory as it is on your system.
For example:
Your native language is German, and your home directory is
/home/jano.
You should at first create the directory
/home/jano/locales/de/LC_MESSAGES/, then put the German pavuk.mo
there and set -msgcat to /home/jano/locales/. If you have
properly set locale environment values, you will see pavuk
speaking German. This option is available only when you compiled
in support for GNU gettext messages internationalization.
-cdir $dir
directory where are all retrieved documents are stored. If not
specified, the current directory is used. If the specified
directory doesn't exist, it will be created.
-scndir $dir
directory in which your scenarios are stored.
Preserve options
-preserve_time/-nopreserve_time
store downloaded document with same modification time as on the
remote site. Modification time will be set only when such
information is available (some FTP servers do not support the
MDTM command, and some documents on HTTP servers are created
online so pavuk can't retrieve the modification time of this
document).
-preserve_perm/-nopreserve_perm
store downloaded document with the same permissions as on the
remote site. This option has effect only when downloading a file
through FTP protocol and assumes that the -ftplist option is
used.
-preserve_slinks/-nopreserve_slinks
set symbolic links to point exactly to same location as on the
remote server; don't do any relocations. This option has effect
only when downloading file through FTP protocol and assumes that
the -ftplist option is used.
- 9 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
For example, assume that on the FTP server ftp.xx.org there is a
symbolic link /pub/pavuk/pavuk-current.tgz, which points to
/tmp/pub/pavuk-0.9pl11.tgz. Pavuk will create symbolic link
ftp/ftp.xx.org_21/pub/pavuk/pavuk-current.tgz
if option -preserve_slinks will be used this symbolic link will
point to /tmp/pub/pavuk-0.9pl11.tgz
if option -preserve_slinks want be used , this symbolic link will
point to
../../tmp/pub/pavuk-0.9pl11.tgz
Proxy options
-http_proxy $site[:$port]
if this parameter is used, then all HTTP requests are going
through this proxy server. This is useful if your site resides
behind a firewall, or if you want to use a HTTP proxy cache
server. The default port number is 8080.
-nocache/-cache
use this switch whenever you want to get the document directly
from the site and not from your HTTP proxy cache server.
-ftp_proxy $site[:$port]
if this parameter is used, then all FTP requests are going
through this proxy server. This is useful if your site resides
behind a firewall, or if you want to use an FTP proxy cache
server. The default port number is 22. Pavuk supports three
different types of proxies for FTP, see the options -ftp_httpgw,
-ftp_dirtyproxy. If none of the mentioned options is used, then
pavuk assumes a regular FTP proxy.
-ftp_httpgw/-noftp_httpgw
the specified FTP proxy is a HTTP gateway for the FTP protocol.
-ftp_dirtyproxy/-noftp_dirtyproxy
the specified FTP proxy is a HTTP proxy which supports a CONNECT
request (pavuk should use full FTP protocol, except of active
data connections). If both -ftp_dirtyproxy and -ftp_httpgw are
specified, -ftp_dirtyproxy is preferred.
-gopher_proxy $site[:$port]
Gopher gateway or proxy/cache server.
-gopher_httpgw/-nogopher_httpgw
the specified Gopher proxy server is a HTTP gateway for Gopher
protocol
-ssl_proxy $site[:$port]
SSL proxy (tunneling) server [as that in CERN httpd + patch or in
- 10 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
Squid] with enabled CONNECT request (at least on port 443). This
option is available only when compiled with SSL support (you need
the SSleay or OpenSSL libraries and the development headers)
Proxy Authentification
-http_proxy_user $user
username for HTTP proxy authentification.
-http_proxy_pass $pass
password for HTTP proxy authentification
-http_proxy_auth {1/2/3}
authentification scheme for proxy access. Similar meaning as the
-auth_scheme option
-auth_reuse_proxy_nonce/-noauth_reuse_proxy_nonce
while using HTTP Proxy Digest access authentification scheme use
first received nonce value in more following requests.
Protocol/Download Options
-retry $nr
set the number of attempts to transfer processed document.
Normally set to 1.
-nregets $nr
set the number of allowed regets on a single document, after a
broken transfer.
-nredirs $nr
set number of allowed HTTP redirects. (use this for prevention of
loops)
-force_reget/-noforce_reget
force reget'ing of the whole document after a broken transfer
when the server doesn't support retrieving of partial content
-timeout $nr
timeout for stalled connections in minutes. This value is also
used for connection timeouts.
-noRobots/-Robots
this switch suppresses the use of the robots.txt standard, which
is used to restrict access of Web robots to some locations on the
web server.
-noEnc/-Enc
this switch supresses using of gzip or compress or deflate
- 11 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
encoding by transfer. I don't know if some servers are broken or
what, but they are propagating that MIME type application/gzip or
application/compress as encoded. This is usable when you doesn't
have gzip which is used to decode document encoded this way.
-check_size/-nocheck_size
the option -nocheck_size should be used if you are trying to
download pages from a HTTP server which sends a wrong Content-
Length: field of the header in response.
-maxrate $nr
If you don't want to give all your transfer bandwidth to pavuk,
use this option to set pavuk's maximum transfer rate. This option
accepts a floating point number to specify the transfer rate in
kB/s. If you want get optimal settings, you also have to play
with the size of the read buffer (option -bufsize ) because pavuk
is doing flow control only at application level.
-minrate $nr
If you hate slow transfer rates, this option allows you to break
transfers with slow speed. You can set the minimum transfer rate,
and if the connection gets slower than the given rate, the
transfer will be stopped. The minimum transfer rate is given in
kB/s.
-bufsize $nr
This option is used to specify the size of the read buffer
(default size: 32kB). If you have a very fast connection, you
may increase the size of the buffer to get a better read
performance. If you need to decrease the transfer rate, you may
need to decrease the size of the buffer and set the maximal
transfer rate with the -maxrate option. This option accepts the
size of the buffer in kB.
-fs_quota $nr
If you are running pavuk on a multiuser system, you may need to
avoid filling up your file system. This option lets you specify
how many space must remain free. If pavuk detects an underrun of
the free space, it will stop downloading files. Specify this
quota in kB.
-file_quota $nr
This option is useful when you want limit downloading of big
files, but want to download at least $nr kilobytes. A big file
will be transferred, and when it reaches the specified size,
transfer will break. Such a document will be processed as
properly downloaded, so be careful when using this option.
-trans_quota $nr
- 12 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
If you are aware that your selection should address a big amount
of data, you can use this option to limit the amount of
transferred data.
-url_strategy $strategy
This option allows you to specify a downloading order for URLs.
This option accepts the following strings as parameters :
level - will order URLs as it loads it from HTML files
leveli - as previous, but inline objects URLs come first
pre - will insert URLs from actual HTML document at start, before
other
prei - as previous, but inline objects URLs come first
-send_if_range/-nosend_if_range
send If-Range: header in HTTP request. I found out, that some
HTTP servers (greetings, MS :-)) are sending different ETag:
fields in different responses for the same, unchanged document.
This generates problems when pavuk attempts to reget a document
from such a server: pavuk will remember the old ETag value. If
the server checks it with the new ETag value and it differs, it
will refuse to send only part of the document.
-ssl_version $v
Set required SSL protocol version for SSL communication. $v is
one of ssl2, ssl23 or ssl3. This option is available only when
compiled with SSL support.
-unique_sslid/-nounique_sslid
This option can be used if you want to use a unique SSL ID for
all SSL sessions. This option is available only when compiled
with SSL support.
-use_http11/-nouse_http11
This option is used to switch between HTTP/1.0 and HTTP/1.1
protocol used with HTTP servers. Now is using of HTTP/1.1
protocol not default because its implementation is very fresh and
not 100% tested. Even though using of HTTP/1.1 is very
recomended, because it is faster than HTTP/1.0 and uses less
network bandwidth for initiating connections.
-local_ip $addr
You can use this option when you want to use specified network
interface for communication with other hosts. This option is
suitable for multihomed hosts with several network interfaces.
Address should be entered as regular IP address or as host name.
-identity $str
this option allows you to specify content of User-Agent: field of
- 13 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
HTTP request. This is usable, when scripts on remote server
returns different document on same URL for different browsers, or
if some HTTP server refuse to serve document for Web robots like
pavuk.
-auto_referer/-noauto_referer
this option forces pavuk to send HTTP Referer: header field with
starting URLs. Content of this field will be self URL. Using
this option is required, when remote server checks the Referer:
field.
Authentification
-auth_file $file
file where you have stored authentification information for
access to some service. For file structure see below in FILES
section.
-auth_name $user
if you are using this parameter, program is doing
authentification with each HTTP access to document. Use this only
if you know that only one HTTP server could be accessed or use -
asite option to specify site to which you use authentification.
Else your auth parameters will be sent to each accessed HTTP
server.
-auth_passwd $passwd
value of this parameter is used as password for authentification
-auth_scheme {1 , 2 , 3}
this parameter specifies used authentification scheme.
1 means user authentification scheme is used as defined in
HTTP/1.0 or HTTP/1.1. Password and user name are sent unencoded.
2 means Basic authentification scheme is used as defined in
HTTP/1.0. Password and user name are sent BASE64 encoded.
3 means Digest access authentification scheme based on MD5
checksums as defined in RFC2069.
-auth_reuse_nonce/-noauth_reuse_nonce
while using HTTP Digest access authentification scheme use first
received nonce value in more following requests.
-ssl_key_file $file
file with public key for SSL certificate (learn more from SSLeay
or OpenSSL documentation) This option is available only when
compiled with SSL support (you need SSleay or OpenSSL libraries
and development headers)
-ssl_cert_file $file
- 14 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
certificate file in PEM format (learn more from SSLeay or OpenSSL
documentation) This option is available only when compiled with
SSL support (you need SSleay or OpenSSL libraries and development
headers)
-ssl_cer_passwd $str
password used to generate certificate (learn more from SSLeay or
OpenSSL documentation) This option is available only when
compiled with SSL support (you need SSLeay or OpenSSL libraries
and development headers)
-from $email
this parameter is used when accessing anonymous FTP server as
password or is optionaly inserted into From field in HTTP
request. If not specified program discovers this from USER
environment variable and from site hostname.
-send_from/-nosend_from
this option is used for enabling or disabling sending of user
identification, entered in -from option , as FTP anonymous user
password and From: field of HTTP request. As default is this
option off.
Site/Domain Limitation Options
-asite $list
specify comma separated list of allowed sites on which referenced
documents are stored.
-dsite $list
specify comma separated list of disallowed sites. Previous
parameter is opposite to this one. If both are used the last
occurrence of them is used to be valid.
-adomain $list
specify comma separated list of allowed domains on which
referenced documents are stored.
-ddomain $list
specify comma separated list of disallowed domains. Previous
parameter is opposite to this one. If both are used the last
occurrence of them is used to be valid.
Limitation Document properties
-amimet $list
list of comma separated allowed MIME types
-dmimet $list
- 15 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
list of comma sepparated disallowed MIME types Previous parameter
is opposite to this one. If both are used the last occurrence of
them is used to be valid.
-maxsize $nr
maximal allowed size of document. This option is applied only
when pavuk is able to detect the document before starting the
transfer.
-minsize $nr
minimal allowed size of document This option is applied only when
pavuk is able to detect the document before starting the
transfer.
-newer_than $time
Allow only transfer of documents with modification time newer
than specified in parameter $time. Format of $time is:
YYYY.MM.DD.hh:mm. To apply this option pavuk must be able to
detect modification time of document.
-older_than $time
Allow only transfer of documents with modification time older
than specified in parameter $time. Format of $time is:
YYYY.MM.DD.hh:mm. To apply this option pavuk must be able to
detect modification time of document.
-noCGI/-CGI
this switch prevents to transfer dynamically generated parametric
documents through CGI interface. This is detected with occurence
of ? character inside URL.
-alang $list
this allows you to specify ordered comma separated list of
preferred natural languages. This option work only with HTTP
protocol using Accept-Language: MIME entry.
-acharset $list
this options allows you to enter comma separated list of prefered
encoding of transfered documents. This works only with HTTP and
HTTPS urls and only if such document encodings are located on
destination server.
example: -acharset iso-8859-2,windows-1250,utf8
Limitation Document name
-asfx $list
this parameter allows you to specify set of suffixes used to
restrict selection of documents which will be processed.
- 16 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
-dsfx $list
set of suffixes that are used to specify restriction on selection
of documents. This one is inverse to previous option. They are
segregating each other.
-aprefix $list , -dprefix $list
this two options allow you to specify set of allowed or
disallowed preffixes of documents. They are segregating each
other.
-pattern $pattern
this option allows you to specify wildcart pattern for documents.
All documents are tested if they match this pattern.
-rpattern $reg_exp
this is equal option as previous, but this uses regular
expressions. Available only on platforms which have any supported
RE implementation.
-skip_pattern $pattern
this option allows you to specify wildcard pattern for documents
that should be skiped. All documents are tested if they match
this pattern.
-skip_rpattern $reg_exp
this is equal option as previous, but this uses regular
expressions. Available only on platforms which have any supported
RE implementation.
-url_pattern $pattern
this option allows you to specify wildcard pattern for URLs. All
URLs are tested if they match this pattern.
Example:
-url_pattern http://\*.idata.sk:\*/~ondrej/\* . this option
enables all HTTP URLs from domain .idata.sk on all ports which
are located under /~ondrej/.
-url_rpattern $reg_exp
this is equal option as previous, but this uses regular
expressions. Available only on platforms which have any supported
RE implementation.
-skip_url_pattern $pattern
this option allows you to specify wildcard pattern for URLs that
should be skiped. All URLs are tested if they match this
pattern.
-skip_url_rpattern $reg_exp
this is equal option as previous, but this uses regular
- 17 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
expressions. Available only on platforms which have any supported
RE implementation.
-aip_pattern $re
this option allows you to limit set of transferred documents by
server IP address. IP address can be specified as regular
expressions, so it is posible to specify set of IP addresses by
one expression. Available only on platforms which have any
supported RE implementation.
-dip_pattern $re
this option similiar to previous option, but is used to specifiy
set of disallowed IP addresses. Available only on platforms
which have any supported RE implementation.
-enable_js/-disable_js
this options are used to enable or disable downloading of
JavaScript script sources. This doesn't mean, that pavuk will
Limitation Protocol Option
-noHTTP/-HTTP
this switch supresses all transfers through HTTP protocol.
-noSSL/-SSL
this switch supresses all transfers through HTTPS protocol (HTTP
protocol over SSL) . This option is available only when compiled
with SSL support (you need SSleay or OpenSSL libraries and
development headers)
-noGopher/-Gopher
supress all transfers through Gopher Internet protocol.
-noFTP/-FTP
this switch prevents processing documents allocated on all FTP
servers.
-noFTPS/-FTPS
this switch prevents processing documents allocated on all FTP
servers accesed through SSL.
-FTPhtml/-noFTPhtml
By using of option -FTPhtml you can force pavuk to process HTML
files downloaded with FTP protocol.
-FTPdir/-noFTPdir
force recursive processing of FTP directories too.
-disable_html_tag $TAG,[$ATTRIB][;...]
- 18 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
-enable_html_tag $TAG,[$ATTRIB][;...] enable or disable
processing of particular HTML tags or attributes.
For example if you don't want to process all images you should
use option -disable_html_tag 'IMG,SRC;INPUT,SRC;BODY,BACKGROUND'
.
Other Limitation Options
-subdir $dir
subdirectory of local tree directory, to limit some of the modes
{sync , resumeregets , linkupdate} in its tree scan.
-dont_leave_site/-leave_site
(Don't) leave starting site.
-dont_leave_dir/-leave_dir
(Don't) leave starting directory. If -dont_leave_dir option is
used pavuk will stay only in starting directory (including its
own subdirectories).
-lmax $nr
set maximal allowed level of tree traverse. Default is set to 0,
what means that pavuk can traverse ad infinitum. As of version
0.8pl1 inline objects of HTML pages are placed at same level as
parent HTML page.
-leave_level $nr
maximal level of documents outside from site of starting URL.
Default is set to 0. 0 means that checking is not applied.
-site_level $nr
maximal level of sites outside from site of starting URL. Default
is set to 0. 0 means that checking is not applied.
-dmax $nr
set maximal allowed number of documents that are processed.
Default value is 0. That means no restrictions are used in number
of processed documents.
-FTPlist/-noFTPlist
When option -FTPlist will be used, pavuk will retrieve content of
FTP directories with FTP command LIST instead of NLST. So the
same listing will be retrieved as with "ls -l" UNIX command.
This option is required if you need to preserve permisions of
remote files or you need to preserve symbolic links. Pavuk
supports wide listing on FTP servers with regular BSD or SYSV
style "ls -l" directory listing, on FTP servers with EPFL listing
format, VMS style listing, DOS/Windows style listing and Novel
- 19 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
listing format.
-user_condition $str
script or program name for users own conditions. You can write
any script which should with exit value decide if download URL or
not. Script gets from pavuk any number of options, with this
meaning :
-url $url - processed URL
-parent $url - any number of parent URLs
-level $nr - level of this URL from starting URL
-size $nr - size of requested URL
-date $datenr - modification time of requested URL in format
YYYYMMDDhhmmss
Warning : use user conditions only if required because of big
slowdowns caused by forking scripts for each checked URL.
Cookie
-cookie_file $file
file where are stored cookie infos. This file must be in Netscape
cookie file format (generated with Netscape Navigator or
Comunicator ...).
-cookie_send/-nocookie_send
use collected cookies in HTTP/HTTPS requests.
-cookie_recv/-nocookie_recv
store received cookies from HTTP/HTTPS responses into memory
cookie cache.
-cookie_update/-nocookie_update
update cookie file on disc and synchronize it with changes made
by any concurrent processes.
-cookies_max $nr
maximal number of cookies in memory cookie cache
-disabled_cookie_domains $list
comma-separated list of cookie domains which are permited to send
cookies stored into cookie cache
-cookie_check/-nocookie_check
check when receiving cookie, if cookie domain is equal to domain
of server which sends this cookie.
Filename/URL Conversion Option
- 20 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
-noRelocate/-Relocate
this switch prevents the program to rewrite relative URLs to
absolute, after HTML document is transfered.
-all_to_local/-noall_to_local
this option forces pavuk to change all URLs inside HTML document
to local URLs immediately after download of document.
-sel_to_local/-nosel_to_local
this option forces pavuk to change all URLs, which acomplish
conditions for download, to local inside HTML document
immediately afer download of document. I recommend to use this
option, when you are sure, that transfer will be without any
problems. This option should save a lot of processor time.
-all_to_remote/-noall_to_remote
this option forces pavuk to change all URLs inside HTML document
to remote URLs immediately after download of document.
-tr_del_chr $str
all chracters found in $str will be deleted from local name of
document. $str should contain escape sequences similiar like in
tr command :
\n - newline
\r - carrriage return
\t - horizontal tab space
\0xXX - hexadecimal ASCII value
[:upper:] - all uppercase letters
[:lower:] - all lowercase letters
[:alpha:] - all letters
[:alnum:] - all letters and digits
[:digit:] - all digits
[:xdigit:] - all hexadecimal digits
[:space:] - all horizontal and vertical whitespaces
[:blank:] - all horizontal whitespaces
[:cntrl:] - all control characters
[:print:] - all printable characters including space
[:nprint:] - all non printable characters
[:punct:] - all punctation characters
[:graph:] - all printable charactes excluding space
-tr_str_str $str1 $str2
string $str1 from local name of document will be replaced with
$str2.
-tr_chr_chr $chrset1 $chrset2
characters from $chrset1 from local name of document will be
replaced with corresponding character from $chrset2. $charset1
and $charset2 should have same syntax as $str in -tr_del_chr
- 21 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
option.
-store_name $str
whan you want to change local filename of first file downloaded
with siglepage mode, you should use this option.
-index_name $str
with this option you can change directory index name. As default
is used _._.html .
-store_index/-nostore_index
With option -nostore_index you should deny storing of directory
indexes into HTML files.
-fnrules $t $m $r
Uff this is quiet powerfull option ! This option is used to
flexible change layout of local document tree. It accepts three
parameters. First parameter $t is used to say what type is
following pattern. F is used for wildcard pattern (uses
fnmatch()) and R is used for regular expression pattern (using
any supported RE implementation). Second parameter is matching
pattern used to select URLs for this rule. If URL match this
pattern, then local name for this URL is computed following rules
of third parameter. And third parameter is local name building
rule. Pavuk now supports two kinds of local name building rules.
One is simple based only on simple macros and other more
complicated extended rule, which also enables to perform several
functions. Recognition between those two kinds of rules is done
by looking at first character of rule. In case when first
character is '(' , rule is extended and in all other cases it is
the simple kind of rule.
Simple rule should contain literals or escaped macros. Macros
are escaped by % character or by $ character.
Here is list of recognized macros:
$x - wher x is any positive number. This macro is replaced with
x-th substring matched by RE pattern. (if you use this you need
to understan RE !)
%i - is replaced with protocol id (http,https,ftp,gopher)
%p - is replaced with password. (use this only when usable)
%u - is replaced with username.
%h - is replaced with host name.
%m - is replaced with domain name.
%r - is replaced with port number.
%d - is replaced with path to document.
%n - is replaced with document name.
- 22 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
%b - is replaced with basename of document (without extension).
%e - is replaced with extension.
%s - is replaced with searchstring.
%x - where x is positive number. This macro is replaced with x-th
directory from path to document from begining.
%-x - where x is positive number. This macro is replaced with x-
th directory from path to document from end.
Here is example. If you want place document into single
directories by extension, you should use following fnrules
option:
-fnrules F '*' '/%e/%n'
Extended rule ever begins with character '('. It uses some kind
of LISP like syntax.
Here are base rules for writing - the local filename of of this
kind is return value function
- each function is enclosed inside round braces ()
- first token right after openinng brace is function name
- each function have nonzero fixed number of parameters
- each function returns numeric or string value
- function parameters are separated by any number of space
characters
- parameter of function shuld be string, number, macro or other
function
- string is ever quoted with "
- each numeric parameter can be in any encoding supported by
strtod() function (octal, decimal, hexadecimal, ...)
- there is no implicit conversion from number to string
- each macro is prefixed by % character and is one chracter long
- each macro is replaced by its string representation from
current URL
- function parameters are typed strictly
- toplevel function must return string value
Extended rule supports full set of % escaped macros supported
with simple rules, plus two folowing addition macros :
%U - URL string
%o - default localname for URL
Here is description of all supported
sc - concat two string parameters
- accepts two string parameters
- returns string value
ss - substring form string
- accepts three prameters.
- first is string from which we want to cut subpart
- 23 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
- second is number which represents starting position in
string
- third is number which represents ending position in string
- returns string value
hsh - compute modulo hash value from string with specified base
- accepts two parameters
- first is string for which we are computing the hash value
- second is numeric value for base of modulo hash
- returns numeric value
md5 - compute MD5 checksum for string
- accepts one string value
- returns string which represents MD5 checksum
lo - convert all characters inside string to lower case
- accepts ane string value
- returns string value
up - convert all chracters inside string to upper case
- accepts one string value
- returns string value
ue - encode unsafe characters in string with same encoding which
is used for encoding unsafe characters inside URL (%xx) As
default are encoded all nonascii values when this function is
used.
- accepts two string values
- first is string which we want to encode
- second is string which contains unsafe characters
- return string value
dc - delete unwanted characters from string (have similiar
functionality as -tr_del_chr option)
- accepts two string values
- first is string from which we want delete
- second is string which contains characters we want to
delete.
- returns string value
tc - replace character with other character in string (have
similiar functionality as -tr_chr_chr option)
- accepts three string values
- first is string inside which we want to replace characters
- second is set of characters which we want to replace
- third is set of characters with which we are replacing
- returns string value
ts - replace some string inside string with any other string
(have similiar functionality as -tr_str_str option)
- accepts three string values
- first is string inside which we want to replace string
- second is the from string
- third is to string
- returns string value
spn - calculate initial lenght of string which contains only
specified set of characters. (have same functionality as
- 24 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
strspn() libc function)
- accepts two string values
- first is input string
- second is set of acceptable characters
- returns numeric value
cspn - calculate initial lenght of string which doesn't contain
specified set of characters. (have same functionality as
strcspn() libc function)
- accepts two string values
- first is input string
- second is set of unacceptable characters
- returns numeric value
sl - calculate lenght of string
- accepts one string value
- returns numeric value
ns - convert number to string by format
- accepts two parameters
- first parameter is format string same as for printf()
function
- second is number which we want to convert
- returns string value
lc - return position of last occurence of specified chracter
inside string
- accepts two string parameters
- first string which we are searching in
- second string contains character for which we are looking
for
- returns numeric value
+ - add two numeric values
- accepts two numeric values
- returns numeric value
- - subtract two numeric values
- accepts two numeric values
- returns numeric value
% - modulo addition
- accepts two numeric values
- returns numeric value
* - multiple two numeric values
- accepts two numeric values
- returns numeric value
/ - divide two numeric values
- accepts two numeric values
- returns numeric value
For example, if you are mirroring very huge number of internet
sites into same local directory, too much enties in one
directory, should cause performance problems. You may use for
example hsh or md5 functions to generate one additional level of
hash directies based on hostname whit one of following options :
- 25 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
-fnrules F '*' '(sc (nc "%02d/" (hsh %h 100)) %o)'
-fnrules F '*' '(sc (ss (md5 %h) 0 2) %o)'
-base_level $nr
Number of directory levels to omit in local tree.
For example when downloading URL
ftp://ftp.idata.sk/pub/unix/www/pavuk-0.7pl1.tgz you enter at
command line -base_level 4 in local tree will be created
www/pavuk-0.7pl1.tgz not ftp/ftp.idata.sk_21/pub/unix/www/pavuk-
0.7pl1.tgz as normaly.
-remove_adv/-noremove_adv
This option is used for turn on/off of removing HTML tags which
contains advertisement banners. The banners are not removed from
HTML file, but are commented out. Such URLs will also not be
downloaded. This option have efect only when used used option -
adv_re. This option is available only when your system have
support for POSIX or Bell V8 regualar expressions.
-adv_re $RE
This option is used to specify regular expressions for matching
URLs of advertisement banners. For example : -adv_re
http://ad.doubleclick.net/.* is used to match all files from
server ad.doubleclick.net. This option is available only when
your system have any supported regualar expressions
implementation.
Other Option
-sleep $nr
this option allows you to specify number of seconds during that
the program will be suspended between two transfers.
-ddays $nr
if document has modification time later as $nr days , then in
sync mode pavuk attempts to retrieve newer copy of document from
remote server.
-remove_old/-noremove_old
remove improper documents (that , which doesn't exist on remote
site). This option have effect only when used in sync mode.
-browser $str
is used to set your browser command (in URL tree you can use
right click to raise menu, from which you can start browser on
actualy selected URL). This option is available only when
compiled with GTK GUI and with support for URL tree preview.
- 26 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
-debug/-nodebug
turns on displaying of debug messages. This option is available
only when compiled with -DDEBUG. If -debug option is used pavuk
will output verbose information about documents, whole protocol
level information, locking informations and more (depends on
-debug_level setup).
-debug_level $level
Set level of required debug informations. $level can be numeric
value which represent binary mask for requested debug levels, or
comma separated list of supported debug levels. Currently pavuk
supports following debug levels :
html - for HTML parser debuging
protos - to see server side protocol messages
protoc - to see client side protocol messages
procs - to see some special procedure calls
locks - for debuging of documents locking
net - for debuging some low level netwok stuff
misc - for miscelanous unsorted debug messages
user - for verbose user level messages
all - request all currently supported debug levels
-remind_cmd $str
this option have effect only when running pavuk in reminder mode.
To command speciefied with this option pavuk sends result of
running reminder mode. There are listed URLs which are changed
and URLs which have any errors.
-nscache_dir $dir
Path to Netscape browser cache directory. If you specify this
path, pavuk tryies to find out if you have URL in this cache. If
URL is there it will be fetched else pavuk will download it from
network.
-post_cmd $str
Post-processing command, which will be executed after sucessfull
download of document. This command may somehow handle with
document. During time of running this command, pavuk leaves
actual document locked, so there isn't chance that some other
pavuk process will modify document. This postprocessing command
will get three additional parameters from pavuk.
- local name of document
- 1/0 1 if document is HTML document, 0 if not
- original URL of this document
ENVIRONMENTAL VARIABLES
USER variable is used to construct email address from user and
hostname
- 27 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
LC_* or LANG
used to set internationalized environment
PAVUKRC_FILE
with this variable you can specify alternative location for your
pavukrc configuration file.
REQUIRED EXTERNAL PROGRAMS
at is used for scheduling.
gunzip
is used to decode gzip or compress encoded documents.
Bugs
If you find any, please let me know.
FILES
/opt/pavuk/etc/pavukrc
~/.pavukrc
~/.pavuk_prefs
These files are used as default configuration files. You may
specify there some constant values like your proxy server or your
prefered WWW browser. Configuration options reflect command line
options. Not all parameters are suitable for use in default
configuration file. You should select only some of them, which
you really need.
File ~/.pavuk_prefs is special file which contains automaticaly
stored configuration. This file is used only when runing GUI
interface of pavuk and option -prefs is active.
First (if present) parsed file is /opt/pavuk/etc/pavukrc then
~/.pavukrc (if present), then ~/.pavuk_prefs (if present). Last
the command line is parsed. The precedence is as follows :
- highest -
Entered in user interface
Entered in command line
~/.pavuk_prefs
~/.pavukrc
/opt/pavuk/etc/pavukrc
- lowest -
Here is table of config file - command line options pairs.
- 28 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
MaxLevel: ---> -lmax
MaxDocs: ---> -dmax
MaxSize: ---> -maxsize
MinSize: ---> -minsize
SleepBetween: ---> -sleep
MaxRetry: ---> -retry
MaxRegets: ---> -nregets
MaxRedirections: ---> -nredirs
CommTimeout: ---> -timeout
RegetRollbackAmount: ---> -rollback
DocExpiration: ---> -ddays
UseCache: ---> -nocache
UseRobots: ---> -noRobots
AllowFTP: ---> -noFTP
AllowHTTP: ---> -noHTTP
AllowSSL: ---> -noSSL
AllowGopher: ---> -noGopher
AllowCGI: ---> -noCGI
AllowGZEncoding: ---> -noEnc
AllowFTPRecursion: ---> -FTPdir
ForceReget: ---> -force_reget
Debug: ---> -debug
AllowedSites: ---> -asite
DisallowedSites: ---> -dsite
AllowedDomains: ---> -adomain
DisallowedDomains: ---> -ddomain
AllowedPrefixes: ---> -aprefix
DisallowedPrefixes: ---> -dprefix
AllowedSufixes: ---> -asfx
DisallowedSufixes: ---> -dsfx
AllowedMIMETypes: ---> -amimet
DisallowedMIMETypes: ---> -dmimet
PreferredLanguages: ---> -alang
PreferredCharset: ---> -acharset
WorkingDir: ---> -cdir
WorkingSubDir: ---> -subdir
HTTPAuthorizationScheme: ---> -auth_scheme
HTTPAuthorizationName: ---> -auth_name
HTTPAuthorizationPassword: ---> -auth_passwd
AuthReuseDigestNonce: ---> -auth_reuse_nonce
SSLCertPassword: ---> -ssl_cert_passwd
SSLCertFile: ---> -ssl_cert_file
SSLKeyFile: ---> -ssl_key_file
EmailAddress: ---> -from
MatchPattern: ---> -pattern
REMatchPattern: ---> -rpattern
SkipMatchPattern: ---> -skip_pattern
SkipREMatchPattern: ---> -skip_rpattern
URLMatchPattern: ---> -url_pattern
- 29 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
URLREMatchPattern: ---> -url_rpattern
SkipURLMatchPattern: ---> -skip_url_pattern
SkipURLREMatchPattern: ---> -skip_url_rpattern
DefaultMode: ---> -mode
FTPProxy: ---> -ftp_proxy
HTTPProxy: ---> -http_proxy
SSLProxy: ---> -ssl_proxy
GopherProxy: ---> -gopher_proxy
FTPViaHTTPProxy: ---> -ftp_httpgw
GopherViaHTTPProxy: ---> -gopher_httpgw
HTTPProxyUser: ---> -http_proxy_user
HTTPProxyPass: ---> -http_proxy_pass
HTTPProxyAuth: ---> -http_proxy_auth
AuthReuseProxyDigestNonce: ---> -auth_reuse_proxy_nonce
Browser: ---> -browser
ScenarioDir: ---> -scndir
ShowProgress: ---> -progress
XMaxLogSize: ---> -xmaxlog
LogFile: ---> -logfile
RemoveOldDocuments: ---> -remove_old
AuthFile: ---> -auth_file
BaseLevel: ---> -base_level
FTPDirtyProxy: ---> -ftp_dirtyproxy
ActiveFTPData: ---> -ftp_active/-ftp_passive
ShowDownloadTime: ---> -stime
NLSMessageCatalogDir: ---> -msgcat
Quiet: ---> -quiet/-verbose
NewerThan: ---> -newer_than
OlderThan: ---> -older_than
Reschedule: ---> -reschedule
DontLeaveSite: ---> -dont_leave_site/-leave_site
DontLeaveDir: ---> -dont_leave_dir/-leave_dir
PreserveTime: ---> -preserve_time/-nopreserve_time
LeaveLevel: ---> -leave_level
GUIFont: ---> -gui_font
UserCondition: ---> -user_condition
CookieFile: ---> -cookie_file
CookieSend: ---> -cookie_send/-nocookie_send
CookieRecv: ---> -cookie_recv/-nocookie_recv
CookieUpdate: ---> -cookie_update/-nocookie_update
CookiesMax: ---> -cookies_max
CookieCheckDomain: ---> -cookie_check/-nocookie_check
DisabledCookieDomains: ---> -disabled_cookie_domains
DisableHTMLTag: ---> -disable_html_tag
EnableHTMLTag: ---> -enable_html_tag
TrDeleteChar: ---> -tr_del_chr
TrStrToStr: ---> -tr_str_str
TrChrToChr: ---> -tr_chr_chr
IndexName: ---> -index_name
- 30 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
StoreName: ---> -store_name
PreservePermisions: ---> -preserve_perm/-nopreserve_perm
PreserveAbsoluteSymlinks: ---> -preserve_slinks/-nopreserve_slinks
FTPListCMD: ---> -FTPlist/-noFTPlist
MaxRate: ---> -maxrate
MinRate: ---> -minrate
ReadBufferSize: ---> -bufsize
BgMode: ---> -bg/-nobg
CheckSize: ---> -check_size/-nocheck_size
SLogFile: ---> -slogfile
Identity: ---> -identity
SendFromHeader: ---> -send_from/-nosend_from
RunX: ---> -runX
FnameRules: ---> -fnrules
StoreDocInfoFiles: ---> -store_info/-nostore_info
AllLinksToLocal: ---> -all_to_local/-noall_to_local
AllLinksToRemote: ---> -all_to_remote/-noall_to_remote
SelectedLinksToLocal: ---> -sel_to_local/-nosel_to_local
ReminderCMD: ---> -remind_cmd
AutoReferer: ---> -auto_referer/-noauto_referer
URLsFile: ---> -urls_file
UsePreferences: ---> -prefs/-noprefs
FTPhtml: ---> -FTPhtml/-noFTPhtml
StoreDirIndexFile: ---> -store_index/-nostore_index
Language: ---> -language
FileSizeQuota: ---> -file_quota
TransferQuota: ---> -trans_quota
FSQuota: ---> -fs_quota
EnableJS: ---> -enable_js/-disable_js
UrlSchedulingStrategy: ---> -url_strategy
NetscapeCacheDir: ---> -nscache_dir
RemoveAdvertisement: ---> -remove_adv/-noremove_adv
AdvBannerRE: ---> -adv_re
CheckIfRunnigAtBackground: ---> -check_bg/-nocheck_bg
SendIfRange: ---> -send_if_range/-nosend_if_range
SchedulingCommand: ---> -sched_cmd
UniqueLogName: ---> -unique_log/-nounique_log
PostCommand: ---> -post_cmd
URL: ---> one URL (more lines with URL:
... means more URL's)
line which begins with '#' means comment.
TrStrToStr: and TrChrToChr: must contain two quoted strings. All
parameter names are case insensitive. If here is missing any option,
try to look inside config.c source file.
See pavukrc.sample file for example
.pavuk_authinfo
- 31 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
File should contain as many authentification records as you need.
Records are separated by any number of empty lines. Parameter
name is case insensitive.
Structure of record:
Proto: <proto ID> ---> identification of protocol
(ftp/http/https/..)
- required field
Host: <host:[port]> ---> host name
- required field
User: <user> ---> name of user
- optional
Pass: <password> ---> password for user
- optional
Base: <path> ---> base prefix of document path
- optional
Realm: <name> ---> realm for HTTP authentification
- optional
Type: <type> ---> HTTP authentification scheme
- 1 - user auth scheme
- 2 - Base auth scheme (default)
- 3 - Digest auth scheme
- optional
see pavuk_authinfo.sample file for example
~/.pavuk_keys
this is file where are stored information about configurable menu
option shortcuts. This is available only when compiled with
Gtk+1.2 and higher.
~/.pavuk_remind_db
this file contains informations about URLs for running in
reminder mode. Structure of this file is very easy. Each line
contains information abou one URL. first entry in line is last
known modification time of URL (stored in time_t format - number
of secons from 1.1.1970 GMT). And second entry is URL.
SEE ALSO
look into ChangeLog file for more informations about new features in
particular versions of pavuk.
AUTHOR
Ondrejicka Stefan, <ondrej@idata.sk>
Grammatic corrections in this man page by Kai Duebbert <kad@gmx.de>
Resorted by Sergey Taranenko <star@itk.dp.ua>
- 32 - Formatted: November 5, 2025
pavuk(0.9pl25c) 28 Jan 2000 pavuk(0.9pl25c)
Internet utils Internet utils
1
Many corrections by Colin Marquardt.
AVAILABILITY
pavuk is available via anonymous FTP from
ftp://ftp.idata.sk/pub/unix/www/ or from
ftp://sunsite.unc.edu/pub/Linux/apps/www/mirroring/. Or via Pavuk
HOMEPAGE at http://www.idata.sk/~ondrej/pavuk/
- 33 - Formatted: November 5, 2025