packages icon



 uri(3)                                                               uri(3)
                                    local



 NAME
      uri - a set of functions to manipulate URIs


 DESCRIPTION
      The header file for the library is #include <uri.h> and the library
      may be linked using -luri.

      uri is a library that analyses URIs and transform them. It is designed
      to be fast and occupy as few memory as possible. The basic usage of
      this library is to transform an URI into a structure with one field
      for each component of the URI and vice versa.


 LIBRARY MODE
      The library behaviour is controled by the flags described bellow. The
      default set of flag is URI_MODE_CANNONICAL|URI_MODE_ERROR_STDERR.


      URI_MODE_CANNONICAL
           All objects store URI in cannonical form.


      URI_MODE_LOWER_SCHEME
           The scheme of the URI is always converted to lower case.


      URI_MODE_ERROR_STDERR
           If an error occurs, the error string is printed on the STDERR
           chanel.


      URI_MODE_FIELD_MALLOC
           Each field may have its own malloc'd space. When the caller set a
           field it can assume the content of the field is saved in the
           object. Otherwise when the caller sets a field it must make sure
           that the memory containing the value of the field will not be
           freed before the object is deallocated.


      URI_MODE_FURI_MD5
           Use MD5 key calculated from the URL as a path name instead of the
           readable path name described in FURI chapter below.  For example
           http://www.foo.com/ is transformed into the MD5 key
           33024cec6160eafbd2717e394b5bc201 and the corresponding FURI is
           33/02/4c/ec6160eafbd2717e394b5bc201.


      URI_MODE_URI_STRICT
           Behave in strict mode (see STRICTNESS below).




                                    - 1 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



      URI_MODE_URI_STRICT_SCHEME
           Behave in strict mode (see STRICTNESS below).


      URI_MODE_FLAG_DEFAULT
           The default mode of the library.


 STRUCTURE AND ALLOCATION
      The uri_t type is a structure describing the URI. Access functions are
      provided and should be used to get the values of the fields and set
      new values.  All the fields are character strings whose size is
      exactly the size of the string they contain. One can safely override
      the values contained in the fields, as long as the replacement string
      has a size lower or equal to the original size. If the replacement
      string is larger, the caller must use a buffer of its own.

      If the flag URI_MODE_FIELD_MALLOC is not set, which is the default,
      the allocation policy for an uri_t object is minimal. When an object
      is allocated using uri_alloc, memory is allocated by the library to
      store the object. This memory will be released when the object is
      freed using uri_free. When a field is set, the pointer is stored in
      the object and no copy of the string is kept. It is the responsibility
      of the caller to make sure that the string will live as long as the
      object lives. This policy is designed to prevent allocation as much as
      possible. Let's say you have a program that will operate on 50 000
      URLs, only one malloc and a few realloc will be necessary instead of
      50 000 malloc/free multiplied by the number of fields of the
      structure.  The loop will look like this:
           /*
               * Alloc an empty object.
            */
           uri_t* uri = uri_alloc_1();

           for(i = 0; i < 50000; i++) {
              /*
               * Reuse the object for another url, object grow
               * only if needed because the url is larger than
               * any previously seen url.
               */
              uri_realloc(uri, url[i], strlen(url[i]));
              ... do something on uri ...
              /*
               * Print the url on stdout
               */
              printf("%s\n", uri_uri(uri));
           }

      If the flag URI_MODE_FIELD_MALLOC is set, each field will have a
      separatly allocated space, if necessary. The caller may assume that
      the object is always self contained and does not depend on externally



                                    - 2 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



      allocated string. Each set function (uri_scheme_set, uri_host_set
      etc.) allocated the necessary space and duplicate the string given in
      argument. The info field contains flags that record which fields
      contain a malloc'd space and which does not (URI_INFO_M_* flags). This
      information is only valid between two calls of the library functions.
      For instance uri_cannonicalize will reorganize allocated space. This
      policy is used for integration of the library into scripting langages
      such as Perl.


      info A bit field carrying information about the URI. Each bit has a
           corresponding define that have the following meaning.


      URI_INFO_CANNONICAL Set if the URI is in cannonical form.


      URI_INFO_RELATIVE Set if the URI is a relative URI (does not start
      with {http,..}://).


      URI_INFO_RELATIVE_PATH Set if the URI is a relative URI and the path
      does not start with a /.


      URI_INFO_PARSED Set if the URI was successfully parsed. If this flag
      is not set the content of the object is undefined.


      URI_INFO_ROBOTS Set if the URI is an http robots.txt file.


      URI_INFO_M_* There is such a flag for each field of the uri_t
      structure. If the flag is set, the memory pointed by this field has
      been allocated by malloc.


      scheme
           The scheme of the URI (http, ftp, file or news).


      host The host name part of the URI.


      port The port number associated to host, if any.


      path The path name of the URI.






                                    - 3 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



      params
           The parameters of the URI (i.e. what is found after the ; in the
           path).


      query
           The query part of a cgi-bin call (i.e. what is found after the ?
           in the path).


      frag The fragement of the document (i.e. what is found after the # in
           the path).


      user If authentication information is set, the user name.


      passwd
           If authentication information is set, the password.


 FUNCTIONS
      uri_t* uri_alloc_1()
           Allocate an empty object that must be filled with the uri_realloc
           function.


      uri_t* uri_alloc(char* uri, int uri_length)
           The uri is splitted into fields and the corresponding uri_t
           structure is returned. The structure is allocated using malloc.
           The URI is put in cannonical form. If it cannot be put in
           cannonical form an error message is printed on stderr and a null
           pointer is returned.


      uri_t* uri_object(char* uri, int uri_length)
           The uri is splitted into fields and the corresponding uri_t
           structure is returned.  The returned structure is statically
           allocated and must not be freed.  The URI is put in cannonical
           form. If it cannot be put in cannonical form an error message is
           printed on stderr and a null pointer is returned.


      int uri_realloc(uri_t* object, char* uri, int
           The uri is splitted into fields in the previously allocated
           object structure. The URI is put in cannonical form and
           URI_CANNONICAL is returned. If it cannot be put in cannonical
           form, nothing is done and URI_NOT_CANNONICAL is returned.






                                    - 4 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



      void uri_free(uri_t* object)
           The object previously allocated by uri_alloc is deallocated.


      uri_t* uri_abs(uri_t* base, char* relative_string, int
           Transform the relative URI relative_string into an absolute URI
           using base as the base URI. The returned uri_t object is
           allocated statically and must not be freed.


      uri_abs_1(uri_t* base, uri_t* relative)
           Transform the relative URI relative into an absolute URI using
           base as the base URI. The returned uri_t object is allocated
           statically and must not be freed.


      int uri_info(uri_t* object)
           returns the content of the info field.


      char* uri_scheme(uri_t* object)
           returns the content of the scheme field.


      char* uri_host(uri_t* object)
           returns the content of the host field.


      char* uri_port(uri_t* object)
           returns the value of the port field of the object. If the port
           field is empty, returns the default port for the corresponding
           scheme.  For instance, if the scheme is http the 80 string is
           returned.  The returned string is statically allocated and must
           not be freed.


      char* uri_path(uri_t* object)
           returns the content of the path field.


      char* uri_params(uri_t* object)
           returns the content of the params field.


      char* uri_query(uri_t* object)
           returns the content of the path field.


      char* uri_frag(uri_t* object)
           returns the content of the frag field.




                                    - 5 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



      char* uri_user(uri_t* object)
           returns the content of the user field.


      char* uri_passwd(uri_t* object)
           returns the content of the passwd field.


      char* uri_netloc(uri_t* object)
           returns a concatenation of the host and port field, separated by
           a :. If the host field is not set, the null pointer is returned
           and a message is printed on stderr.  The returned string is
           statically allocated and must not be freed.


      char* uri_auth_netloc(uri_t* object)
           returns a concatenation of the host and port field, separated by
           a :. If the user field is set, the user and passwd fields are
           prepended to the netloc, separated by a @.  If the host field is
           not set, the null pointer is returned and error condition is set.
           The returned string is statically allocated and must not be
           freed.


      char* uri_auth(uri_t* object)
           returns a concatenation of the user and passwd field, separated
           by a : or an empty string if any of them is not set.  The
           returned string is statically allocated and must not be freed.


      char* uri_all_path(uri_t* object)
           returns a concatenation of the path, params and query fields in
           the form /path;params?query. Note that a leading slash to the
           returned value if the object is not a relative URI.  The returned
           string is statically allocated and must not be freed.


      void uri_info_set(uri_t* object, int value)
           set the info field to value.


      void uri_scheme_set(uri_t* object, char* value)
           set the scheme field to value. The URI_INFO_RELATIVE is updated
           according to the new value.


      void uri_host_set(uri_t* object, char* value)
           set the host field to value. The URI_INFO_RELATIVE is updated
           according to the new value.





                                    - 6 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



      void uri_params_set(uri_t* object, char* value)
           set the params field to value.


      void uri_query_set(uri_t* object, char* value)
           set the query field to value.


      void uri_user_set(uri_t* object, char* value)
           set the user field to value.


      void uri_passwd_set(uri_t* object, char* value)
           set the passwd field to value.


      void uri_copy(uri_t* to, uri_t* from)
           copy the content of object from into object to.


      uri_t* uri_clone(uri_t* from)
           creates a new object containing the same data as from. The
           returned object must be freed using uri_free.


      void uri_clear(uri_t* object)
           clear all information contained in object.


      void uri_set_root(const char* root)
           Set the path that uri_furi will prepend to the FURI. By default
           it is the empty string.


      const char* uri_get_root()
           Get the path set by uri_set_root or empty string.


      char* uri_furi(uri_t* object)
           returns a string containing the FURI (File equivalent of an URI)
           built from object. The returned string is statically allocated
           and must not be freed.


      char* uri_uri(uri_t* object)
           returns a string containing the URI built from object. The
           returned string is statically allocated and must not be freed.


      void uri_string(uri_t* object, char** stringp, int*
           Build a string representation of object in stringp according to



                                    - 7 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



           flags. Possible values of flags is described in the
           uri_cannonicalize_string function.  Upon return the stringp
           pointer points to a static array of stringp_size bytes allocated
           with malloc. If stringp is not null it must point to a buffer
           allocated with malloc and is reallocated to fit the needs of the
           string conversion. This function is the backend of all object to
           string translation functions.


      char* uri_escape(char* string, char* range)
           return a statically allocated copy of string with all characters
           found in the the range string transformed in escaped form (%xx).
           A few examples of range argument are defined:
           URI_ESCAPE_RESERVED, URI_ESCAPE_PATH, URI_ESCAPE_QUERY, and
           uri_escape_unsafe.


      char* uri_unescape(char* string)
           return a statically allocated copy of string with all escape
           sequences (%xx) transformed to characters.


      char* uri_cannonicalize_string(char* uri, int uri_length, int
           returns the cannonical form of the uri given in argument. The
           cannonical form is formatted according to the value of flag.
           Values of flag are bits that can be ored together.

           URI_STRING_FURI_STYLE return a FURI, URI_STRING_URI_STYLE return
           an URI, URI_STRING_ROBOTS_STYLE return the corresponding
           robots.txt URI, URI_STRING_URI_NOHASH_STYLE do not include the
           frag in the returned string.

           Returns 0 if uri is malformed.


      uri_t* uri_cannonical(uri_t* object)
           returns an object containing the cannonical form of object. If
           the URI_MODE_CANNONICAL flag is set, the object itself is
           returned.


      int uri_consistent(uri_t* object)
           Returns 0 if object contains unparsable URL, returns != 0 if
           object contains a well formed URL. Must be called after a set of
           field changes to reset flags and ensure that modified URL is well
           formed.


 HTTP FUNCTIONS
      char* uri_robots(uri_t* object)
           returns a string containing the URI of the robots.txt file



                                    - 8 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



           corresponding to the URI contained in object. For instance, if
           the URI contained in object is
           http://www.foo.com/dir/dir/file.html the returned string will be
           http://www.foo.com/robots.txt. The returned string is statically
           allocated and must not be freed.


 CANNONICAL FORM
      The cannonical form of an URI is an arbitrary choice to code all the
      possible variations of the same URI in one string. For instance
      http://www.foo.com/abc"def.html will be transformed to
      http://www.foo.com/abc%22def.html. Most of the transformations follow
      the instructions found in draft-fielding-uri-syntax-04 but some of
      them don't.

      Additionally, when the path of the URI contains dots and double dots,
      it is reduced. For instance http://www.foo.com/dir/.././file.html will
      be transformed to http://www.foo.com/file.html.

      If the URI_MODE_CANNONICAL flag is set, the uri_t object always
      contains the cannonical form of the URL. The original form is lost.

      If the URI_MODE_CANNONICAL flag is not set, the cannonical form of the
      URI is stored in a separate object. The uri_t object contains the
      original form of the URI. It takes more memory to store but may be
      usefull in some situations.


 ERROR HANDLING
      When an error occurs (URI cannot be cannonicalized or parsed, for
      instance), the global variable uri_errstr contains the full text of
      the error message. This variable is never reset by the library
      functions if no error occurs.

      Additionally, the error string may be printed on the error chanel
      (STDERR) if the URI_MODE_ERROR_STDERR flag is set. This is the
      default.


 STRICTNESS
      The draft describing URI syntax (draft-fielding-uri-syntax-04)
      specifies that an URI of the type http:g may be interpreted in two
      different ways. If the URI_MODE_URI_STRICT flag is set, the library
      interprets it as an absolute URI, otherwise it is a relative URI.

      If the URI_MODE_URI_STRICT is not set, the URI_MODE_URI_STRICT_SCHEME
      may be set so that a relative URI containing a scheme is interpreted
      as an absolute URI only if the scheme is different from the scheme of
      the base URI.





                                    - 9 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



 FURI
      It is sometimes convinient to convert an URI into a path name. Some
      functions of the uri library provide such a conversion (uri_furi for
      instance). These path names are called FURI (File equivalent of an
      URI) for short. Here is a description of the transformation.
       http://www.ina.fr:700/imagina/index.html#queau
         |    \____________/ \________________/\____/
         |          |              |               lost
         |          |              |
         |          |              |
        /           |              |
        |           |              |
        |           |              |
        |           |              |
       /            |              |
       |   /^^^^^^^^^^^^^\/^^^^^^^^^^^^^^^^\
      http/www.ina.fr:700/imagina/index.html


 EXAMPLES
      Show cannonical form of URI
      char* uri = "http://www.foo.com/";
      uri = uri_cannonicalize_string(uri, strlen(uri), URI_STRING_URI_STYLE);
      if(uri) printf("uri = %s\n", uri);

      Show the host and port of
      char* uri = "http://www.foo.com:7000/";
      uri_t* uri_object = uri_object(uri, strlen(uri));
      if(uri_object) printf("netloc = %s\n", uri_netloc(uri_object));

      Change the query part of URI
      char* uri = "http://www.foo.com/cgi-bin/bar?param=1";
      uri_t* uri_object = uri_object(uri, strlen(uri));
      if(uri_object) {
           uri_query_set(uri_object, "param=2");
           printf("uri = %s\n", uri_uri(uri_object));
      }


 ADDING NEW SCHEMES
      Add the name of the scheme in the SCHEMES file. If nothing else this
      will bind the scheme to a generic parser following the URI parsing
      rules.  If you want to define specific behaviour for this scheme,
      mimic the uri_scheme_http.c file and recompile. If gperf(1) complains
      because it has conflicts you'll have to play with the -k option in
      order to find a working range that does not conflict and takes a few
      space as possible.


 AUTHOR
      Loic Dachary loic@senga.org



                                   - 10 -           Formatted:  June 9, 2026






 uri(3)                                                               uri(3)
                                    local



 SEE ALSO
      draft-fielding-uri-syntax-04




















































                                   - 11 -           Formatted:  June 9, 2026