uri(3) uri(3)
local
NAME
uri - a set of functions to manipulate URIs
DESCRIPTION
The header file for the library is #include <uri.h> and the library
may be linked using -luri.
uri is a library that analyses URIs and transform them. It is designed
to be fast and occupy as few memory as possible. The basic usage of
this library is to transform an URI into a structure with one field
for each component of the URI and vice versa.
LIBRARY MODE
The library behaviour is controled by the flags described bellow. The
default set of flag is URI_MODE_CANNONICAL|URI_MODE_ERROR_STDERR.
URI_MODE_CANNONICAL
All objects store URI in cannonical form.
URI_MODE_LOWER_SCHEME
The scheme of the URI is always converted to lower case.
URI_MODE_ERROR_STDERR
If an error occurs, the error string is printed on the STDERR
chanel.
URI_MODE_FIELD_MALLOC
Each field may have its own malloc'd space. When the caller set a
field it can assume the content of the field is saved in the
object. Otherwise when the caller sets a field it must make sure
that the memory containing the value of the field will not be
freed before the object is deallocated.
URI_MODE_FURI_MD5
Use MD5 key calculated from the URL as a path name instead of the
readable path name described in FURI chapter below. For example
http://www.foo.com/ is transformed into the MD5 key
33024cec6160eafbd2717e394b5bc201 and the corresponding FURI is
33/02/4c/ec6160eafbd2717e394b5bc201.
URI_MODE_URI_STRICT
Behave in strict mode (see STRICTNESS below).
- 1 - Formatted: June 9, 2026
uri(3) uri(3)
local
URI_MODE_URI_STRICT_SCHEME
Behave in strict mode (see STRICTNESS below).
URI_MODE_FLAG_DEFAULT
The default mode of the library.
STRUCTURE AND ALLOCATION
The uri_t type is a structure describing the URI. Access functions are
provided and should be used to get the values of the fields and set
new values. All the fields are character strings whose size is
exactly the size of the string they contain. One can safely override
the values contained in the fields, as long as the replacement string
has a size lower or equal to the original size. If the replacement
string is larger, the caller must use a buffer of its own.
If the flag URI_MODE_FIELD_MALLOC is not set, which is the default,
the allocation policy for an uri_t object is minimal. When an object
is allocated using uri_alloc, memory is allocated by the library to
store the object. This memory will be released when the object is
freed using uri_free. When a field is set, the pointer is stored in
the object and no copy of the string is kept. It is the responsibility
of the caller to make sure that the string will live as long as the
object lives. This policy is designed to prevent allocation as much as
possible. Let's say you have a program that will operate on 50 000
URLs, only one malloc and a few realloc will be necessary instead of
50 000 malloc/free multiplied by the number of fields of the
structure. The loop will look like this:
/*
* Alloc an empty object.
*/
uri_t* uri = uri_alloc_1();
for(i = 0; i < 50000; i++) {
/*
* Reuse the object for another url, object grow
* only if needed because the url is larger than
* any previously seen url.
*/
uri_realloc(uri, url[i], strlen(url[i]));
... do something on uri ...
/*
* Print the url on stdout
*/
printf("%s\n", uri_uri(uri));
}
If the flag URI_MODE_FIELD_MALLOC is set, each field will have a
separatly allocated space, if necessary. The caller may assume that
the object is always self contained and does not depend on externally
- 2 - Formatted: June 9, 2026
uri(3) uri(3)
local
allocated string. Each set function (uri_scheme_set, uri_host_set
etc.) allocated the necessary space and duplicate the string given in
argument. The info field contains flags that record which fields
contain a malloc'd space and which does not (URI_INFO_M_* flags). This
information is only valid between two calls of the library functions.
For instance uri_cannonicalize will reorganize allocated space. This
policy is used for integration of the library into scripting langages
such as Perl.
info A bit field carrying information about the URI. Each bit has a
corresponding define that have the following meaning.
URI_INFO_CANNONICAL Set if the URI is in cannonical form.
URI_INFO_RELATIVE Set if the URI is a relative URI (does not start
with {http,..}://).
URI_INFO_RELATIVE_PATH Set if the URI is a relative URI and the path
does not start with a /.
URI_INFO_PARSED Set if the URI was successfully parsed. If this flag
is not set the content of the object is undefined.
URI_INFO_ROBOTS Set if the URI is an http robots.txt file.
URI_INFO_M_* There is such a flag for each field of the uri_t
structure. If the flag is set, the memory pointed by this field has
been allocated by malloc.
scheme
The scheme of the URI (http, ftp, file or news).
host The host name part of the URI.
port The port number associated to host, if any.
path The path name of the URI.
- 3 - Formatted: June 9, 2026
uri(3) uri(3)
local
params
The parameters of the URI (i.e. what is found after the ; in the
path).
query
The query part of a cgi-bin call (i.e. what is found after the ?
in the path).
frag The fragement of the document (i.e. what is found after the # in
the path).
user If authentication information is set, the user name.
passwd
If authentication information is set, the password.
FUNCTIONS
uri_t* uri_alloc_1()
Allocate an empty object that must be filled with the uri_realloc
function.
uri_t* uri_alloc(char* uri, int uri_length)
The uri is splitted into fields and the corresponding uri_t
structure is returned. The structure is allocated using malloc.
The URI is put in cannonical form. If it cannot be put in
cannonical form an error message is printed on stderr and a null
pointer is returned.
uri_t* uri_object(char* uri, int uri_length)
The uri is splitted into fields and the corresponding uri_t
structure is returned. The returned structure is statically
allocated and must not be freed. The URI is put in cannonical
form. If it cannot be put in cannonical form an error message is
printed on stderr and a null pointer is returned.
int uri_realloc(uri_t* object, char* uri, int
The uri is splitted into fields in the previously allocated
object structure. The URI is put in cannonical form and
URI_CANNONICAL is returned. If it cannot be put in cannonical
form, nothing is done and URI_NOT_CANNONICAL is returned.
- 4 - Formatted: June 9, 2026
uri(3) uri(3)
local
void uri_free(uri_t* object)
The object previously allocated by uri_alloc is deallocated.
uri_t* uri_abs(uri_t* base, char* relative_string, int
Transform the relative URI relative_string into an absolute URI
using base as the base URI. The returned uri_t object is
allocated statically and must not be freed.
uri_abs_1(uri_t* base, uri_t* relative)
Transform the relative URI relative into an absolute URI using
base as the base URI. The returned uri_t object is allocated
statically and must not be freed.
int uri_info(uri_t* object)
returns the content of the info field.
char* uri_scheme(uri_t* object)
returns the content of the scheme field.
char* uri_host(uri_t* object)
returns the content of the host field.
char* uri_port(uri_t* object)
returns the value of the port field of the object. If the port
field is empty, returns the default port for the corresponding
scheme. For instance, if the scheme is http the 80 string is
returned. The returned string is statically allocated and must
not be freed.
char* uri_path(uri_t* object)
returns the content of the path field.
char* uri_params(uri_t* object)
returns the content of the params field.
char* uri_query(uri_t* object)
returns the content of the path field.
char* uri_frag(uri_t* object)
returns the content of the frag field.
- 5 - Formatted: June 9, 2026
uri(3) uri(3)
local
char* uri_user(uri_t* object)
returns the content of the user field.
char* uri_passwd(uri_t* object)
returns the content of the passwd field.
char* uri_netloc(uri_t* object)
returns a concatenation of the host and port field, separated by
a :. If the host field is not set, the null pointer is returned
and a message is printed on stderr. The returned string is
statically allocated and must not be freed.
char* uri_auth_netloc(uri_t* object)
returns a concatenation of the host and port field, separated by
a :. If the user field is set, the user and passwd fields are
prepended to the netloc, separated by a @. If the host field is
not set, the null pointer is returned and error condition is set.
The returned string is statically allocated and must not be
freed.
char* uri_auth(uri_t* object)
returns a concatenation of the user and passwd field, separated
by a : or an empty string if any of them is not set. The
returned string is statically allocated and must not be freed.
char* uri_all_path(uri_t* object)
returns a concatenation of the path, params and query fields in
the form /path;params?query. Note that a leading slash to the
returned value if the object is not a relative URI. The returned
string is statically allocated and must not be freed.
void uri_info_set(uri_t* object, int value)
set the info field to value.
void uri_scheme_set(uri_t* object, char* value)
set the scheme field to value. The URI_INFO_RELATIVE is updated
according to the new value.
void uri_host_set(uri_t* object, char* value)
set the host field to value. The URI_INFO_RELATIVE is updated
according to the new value.
- 6 - Formatted: June 9, 2026
uri(3) uri(3)
local
void uri_params_set(uri_t* object, char* value)
set the params field to value.
void uri_query_set(uri_t* object, char* value)
set the query field to value.
void uri_user_set(uri_t* object, char* value)
set the user field to value.
void uri_passwd_set(uri_t* object, char* value)
set the passwd field to value.
void uri_copy(uri_t* to, uri_t* from)
copy the content of object from into object to.
uri_t* uri_clone(uri_t* from)
creates a new object containing the same data as from. The
returned object must be freed using uri_free.
void uri_clear(uri_t* object)
clear all information contained in object.
void uri_set_root(const char* root)
Set the path that uri_furi will prepend to the FURI. By default
it is the empty string.
const char* uri_get_root()
Get the path set by uri_set_root or empty string.
char* uri_furi(uri_t* object)
returns a string containing the FURI (File equivalent of an URI)
built from object. The returned string is statically allocated
and must not be freed.
char* uri_uri(uri_t* object)
returns a string containing the URI built from object. The
returned string is statically allocated and must not be freed.
void uri_string(uri_t* object, char** stringp, int*
Build a string representation of object in stringp according to
- 7 - Formatted: June 9, 2026
uri(3) uri(3)
local
flags. Possible values of flags is described in the
uri_cannonicalize_string function. Upon return the stringp
pointer points to a static array of stringp_size bytes allocated
with malloc. If stringp is not null it must point to a buffer
allocated with malloc and is reallocated to fit the needs of the
string conversion. This function is the backend of all object to
string translation functions.
char* uri_escape(char* string, char* range)
return a statically allocated copy of string with all characters
found in the the range string transformed in escaped form (%xx).
A few examples of range argument are defined:
URI_ESCAPE_RESERVED, URI_ESCAPE_PATH, URI_ESCAPE_QUERY, and
uri_escape_unsafe.
char* uri_unescape(char* string)
return a statically allocated copy of string with all escape
sequences (%xx) transformed to characters.
char* uri_cannonicalize_string(char* uri, int uri_length, int
returns the cannonical form of the uri given in argument. The
cannonical form is formatted according to the value of flag.
Values of flag are bits that can be ored together.
URI_STRING_FURI_STYLE return a FURI, URI_STRING_URI_STYLE return
an URI, URI_STRING_ROBOTS_STYLE return the corresponding
robots.txt URI, URI_STRING_URI_NOHASH_STYLE do not include the
frag in the returned string.
Returns 0 if uri is malformed.
uri_t* uri_cannonical(uri_t* object)
returns an object containing the cannonical form of object. If
the URI_MODE_CANNONICAL flag is set, the object itself is
returned.
int uri_consistent(uri_t* object)
Returns 0 if object contains unparsable URL, returns != 0 if
object contains a well formed URL. Must be called after a set of
field changes to reset flags and ensure that modified URL is well
formed.
HTTP FUNCTIONS
char* uri_robots(uri_t* object)
returns a string containing the URI of the robots.txt file
- 8 - Formatted: June 9, 2026
uri(3) uri(3)
local
corresponding to the URI contained in object. For instance, if
the URI contained in object is
http://www.foo.com/dir/dir/file.html the returned string will be
http://www.foo.com/robots.txt. The returned string is statically
allocated and must not be freed.
CANNONICAL FORM
The cannonical form of an URI is an arbitrary choice to code all the
possible variations of the same URI in one string. For instance
http://www.foo.com/abc"def.html will be transformed to
http://www.foo.com/abc%22def.html. Most of the transformations follow
the instructions found in draft-fielding-uri-syntax-04 but some of
them don't.
Additionally, when the path of the URI contains dots and double dots,
it is reduced. For instance http://www.foo.com/dir/.././file.html will
be transformed to http://www.foo.com/file.html.
If the URI_MODE_CANNONICAL flag is set, the uri_t object always
contains the cannonical form of the URL. The original form is lost.
If the URI_MODE_CANNONICAL flag is not set, the cannonical form of the
URI is stored in a separate object. The uri_t object contains the
original form of the URI. It takes more memory to store but may be
usefull in some situations.
ERROR HANDLING
When an error occurs (URI cannot be cannonicalized or parsed, for
instance), the global variable uri_errstr contains the full text of
the error message. This variable is never reset by the library
functions if no error occurs.
Additionally, the error string may be printed on the error chanel
(STDERR) if the URI_MODE_ERROR_STDERR flag is set. This is the
default.
STRICTNESS
The draft describing URI syntax (draft-fielding-uri-syntax-04)
specifies that an URI of the type http:g may be interpreted in two
different ways. If the URI_MODE_URI_STRICT flag is set, the library
interprets it as an absolute URI, otherwise it is a relative URI.
If the URI_MODE_URI_STRICT is not set, the URI_MODE_URI_STRICT_SCHEME
may be set so that a relative URI containing a scheme is interpreted
as an absolute URI only if the scheme is different from the scheme of
the base URI.
- 9 - Formatted: June 9, 2026
uri(3) uri(3)
local
FURI
It is sometimes convinient to convert an URI into a path name. Some
functions of the uri library provide such a conversion (uri_furi for
instance). These path names are called FURI (File equivalent of an
URI) for short. Here is a description of the transformation.
http://www.ina.fr:700/imagina/index.html#queau
| \____________/ \________________/\____/
| | | lost
| | |
| | |
/ | |
| | |
| | |
| | |
/ | |
| /^^^^^^^^^^^^^\/^^^^^^^^^^^^^^^^\
http/www.ina.fr:700/imagina/index.html
EXAMPLES
Show cannonical form of URI
char* uri = "http://www.foo.com/";
uri = uri_cannonicalize_string(uri, strlen(uri), URI_STRING_URI_STYLE);
if(uri) printf("uri = %s\n", uri);
Show the host and port of
char* uri = "http://www.foo.com:7000/";
uri_t* uri_object = uri_object(uri, strlen(uri));
if(uri_object) printf("netloc = %s\n", uri_netloc(uri_object));
Change the query part of URI
char* uri = "http://www.foo.com/cgi-bin/bar?param=1";
uri_t* uri_object = uri_object(uri, strlen(uri));
if(uri_object) {
uri_query_set(uri_object, "param=2");
printf("uri = %s\n", uri_uri(uri_object));
}
ADDING NEW SCHEMES
Add the name of the scheme in the SCHEMES file. If nothing else this
will bind the scheme to a generic parser following the URI parsing
rules. If you want to define specific behaviour for this scheme,
mimic the uri_scheme_http.c file and recompile. If gperf(1) complains
because it has conflicts you'll have to play with the -k option in
order to find a working range that does not conflict and takes a few
space as possible.
AUTHOR
Loic Dachary loic@senga.org
- 10 - Formatted: June 9, 2026
uri(3) uri(3)
local
SEE ALSO
draft-fielding-uri-syntax-04
- 11 - Formatted: June 9, 2026