HP UX Archive Centre

Contents
=========

  1. mozg Introduction.

  2. "Soft" rules.

  3. mozg Programmer's guide.

    3.1 MLPerceptron class (high level).

      3.1.1 High level functions.

      3.1.2 Code example of a network learning.

      3.1.3 Code example of a saved net using.

    3.2 MLPerceptron class (middle & low level).

      3.2.1 Middle level functions.

      3.2.2 Low level functions(layer interactions functions).

      3.2.3 How to insert other output function.

        3.2.3.1 For net = sum(w_{ij}*y_j).

        3.2.3.2 For RBF_net = -.5*sum(m_{ij}-y_j)^2/sigma_i^2.

      3.2.4 How to change basic types.

    3.3 Kernel library (mozgVector,mozgMatrix classes templates).

      3.3.1 Description.

      3.3.2 Optimisation technique.

  4. Bibliography.

*****************************************************************************

                        1. mozg Introduction
                       ======================

    mozg is a flexible, fast neurolibrary. Up to the moment it allows one
    to create, learn and use multi-layer perceptron (MLP) [1,9], which is the
    most popular artificial neural network (ANN) for the large number of
    problems solved by a ANN.

      Structural parts of the mozg are:

        1. Kernel library is the library of fast vector and matrix operations. 
    It consists of mozgVector and mozgMatrix class temlates [2] and functions 
    templates using them (see mozgVectorMatrix.cc, mozgVectorMatrix.hh).
    Explanations of its appearance see in 3.3.1 (Description).
    
        2. MLPerceptron class (using the Kernel library) simulates the MLP and
    its behaviour. One can by the constructor create an ANN with desiring the 
    number of layers, the number of neurons (called units) in them, output
    function in each layer (except input layer storing network's inputs), 
    with/without bias term, learn, use, save and load a network
    (see MLPerceptron.cc, MLPerceptron.hh).

      Dynamic memory allocations have been placed in the constructor of MLP
    (and in putLParams (b)) and don't influence on the MLP learning and using
    CPU times. Constructors of vectors and matrices is called only in MLP con-
    structor and in putLParams (b), so there's no unnecessary vector & matrix
    constructor calls.

      Output of i-th neuron of a layer is computed as [1,3]:

        output_i = f(net_i) = f(sum_j(w_{ij}*input_j+theta_i)),

    where f(net_i) -- neuron's output function,
          net_i -- weighted inputs sum of i-th neuron
          w_{ij} -- weight for j-th input of i-th neuron,
          input_j -- j-th input of its,
          theta_i -- bias term or simply bias of its, for binary neuron it's 
    called threshhold.

      Sometimes they use neuron's activation function:

        a_i(t) = g(a_i(t-1),net_i),

        output_i(t) = f(a_i(t)),

    to use dependensy of neuron's output from time. It hasn't been implemented
    in the simulator.

      The temperature (T) in the output function allows to change slope para-
    meter of the output function [3]:

        output_i = f(net_i/T).

    Its decreasing allows to sharpen error surface (called energy landscape)
    and conversly its increasing allows to soften the surface, so one can
    manage learning by it.

      For any layer (except input one storing network's inputs) neurons' output
    function may be choosed from (standard set):

        1. 1/(1+exp[-net/T]),
        2. th(net/T),
        3. net/T,
        4. Radial Basis Function --
           RBF = exp(-.5*sum(m_{ij}-y_j)^2/(sigma_i^2*T)),
        5. 2/Pi arctg(net/T),

    or desiring one may be inserted (see mozg programmer's guide). RBF is a
    "local" function and differs from the others by the "net", which isn't
    weighted input sum as for those. Due to it the learning rule changes
    considerably for the RBF layer and previous one. I've deduced and imple-
    mented in the simulator error backpropagation rule for three cases:

      a) RBF for hidden and output layer,

      b) RBF for hidden, sigmoid for output layer,

      c) sigmoid for hidden, RBF for output layer,

    that allow to use RBF in any combinations with sigmoid on different layers.
    Example of deducing error backpropagation rule for RBF for output and
    sigmoid for hidden layer see in [5]. Local RBF nature may lead to a
    "sharp sweep". It is the bad phenomenon caused by lack of learning examples
    or them bad "scattering" in the input space and clear shows itself on the
    function approximation problem. This trouble one can overcome if one put
    the RBF_sigma_min > 0 (large value results in the network learningless)
    or good scattering of learning examples.

      Implemented learning rule is the "vanilla" backpropagation extended by
    the momentum term [1,2,7], Langevin noise [4], weight decay [4,7]:

        w_{ij}(t+1) = w_{ij}(t) + delta w_{ij}(t) =
                    = w_{ij}(t) -
                      - eta * d E / d w_{ij} +
                      + alpha * delta w_{ij}(t-1) +
                      + Langevin_noise_term -
                      - decay_coef * w_{ij}(t).

      Any from three energy functions may be used in learning of a network:

        1) summed squared error E = sum_p sum_i (t_i^p-o_i)^2,

        2) Cross-Entropy E = sum_p sum_i ((1-t_i^p)*ln(1-o_i)-t_i^p*ln(o_i)),

        3) log-squared error E = - sum_p sum_i ln(1-(o_i-t_i^p)^2),

    Surely, they may be used in turn for learning of one network.

      Possibility to learn network with random order of learning examples
    using has been implemented in simulator, it improves network generaliza-
    tion (property of network to recognize examples not used in learning
    correctly) [4].

      Sometimes I had trouble with learning 1,2,...-hidden layers network
    with linear output function for all layers(it is meaningless, but for
    the sake interest). Learning lost stability and simulator crashed.
    I don't know why it occurs.

      Wasilx Urazmetow rendered noticeble assistance and can be named second
    mozg author.

      Please, send me lost bugs and suggestions related to the mozg to the
    e-mail filin@desert.ihep.su. Good luck!
                                                Alexey P. Filin 19/09/99

*******************************************************************************

                        2. "Soft" rules
                       =================

        1. Chosen network inputs and outputs influence on learning results
        considerably, for a bad case network can't be learned. There isn't 
        guaranty of good learning result for the task at all. Your expe-
        rience is your advicer.

        2. Normalize inputs and outputs [6], i.e. make transformation

            new_input = ( old_input [- old_input_min] )
                          / | old_input_max - old_input_min |

        [- old_input_min] -- isn't certainly

        3. Divide your set of examples in two subset:
             - learning set (is used for learning),
             - test set (isn't used for learning)

        4. Try to use "vanilla" backpropagation rule [4] (without extentions)
        i.e. use function 2.a) (putLParams) for learning parameters settings 
        then optimised weights adjustment is used.
           Momentum term and Langevin noise are used for "flat spot" elimi-
        nation in errors-weights space [7]. When error derivative is too small
        for learning by "vanilla" backpropagation, backpropagation rule exten-
        tions have to be used. Langevin_sigma != 0 leads to N-FOLD CPU time in-
        creasing, only use Langevin noise for your need.
           Weight decay [4,7] is used for network architecture optimisation, it
        decreases absolute weights' values. It may lead to bad learning result.
           Bias term may be very usefull sometimes.

        5. The number of learning examples is recomended be >> the number of
        network weights or ANN will not generalize [4], it will remember of
        learning examples with their noise.

        6. Usually three successive regions of test errors (errors for
        examples not being used in learning) exist in the learning phase [6]:
             - test error decreasing,
             - test error optimal value,
             - test error increasing (overfitting).

  NOTE: error for learning examples usually decreases all learning time.
        Because of it test error checking is very usefull for learning result
        observation (see code example).
        
        7. Try to use different numbers of layers, numbers of hidden units,
        output functions. Learning results may be very different (That's
        life :).


*******************************************************************************

                   3. mozg Programmer's guide
                  ============================

    Three levels of called functions may be extracted:

        * level of interface functions providing network interaction with
    outer objects and network management. They have access "public" in the
    MLPerceptron (except operator-functions ">>" and "<<").

        * level of member-functions of the MLPerceptron with access
    "protected" (except propLayer()). These functions provide interaction of
    neighbouring network layers,

        * low level(kernel library). Functions for vector and matrix opera-
    tions. Those functions use classes templates mozgVector and mozgMatrix.

      Let's go into details of those levels.

3.1 MLPerceptron class (high level):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3.1.1 High level functions for MLPerceptron using are (see MLPerceptron.hh and
MLPerceptron.cc):

member-functions:

        1. MLPerceptron (int  number_of_layers,
                         int* num_outs_layers,
                         int* func_number,
                         float weight_dispersion,
                         bool bias_term_flag);
                         
              int number_of_layers -- number of layers (including input layer),
                it's usually enough three layers,sometimes four,

              int* num_outs_layers -- pointer to the massive wich contains
                numbers of units at layers; if the bias term is used, numbers
                of units in a layer MUST INCLUDE one extra unit -- bias unit,
                which models bias input for the next layer, that is

                       units_number = real_units_number + 1,

                EXCEPT output layer, which doesn't contain bias unit (because
                the layer is last);

          NOTE: at the first layer the number of units also must include bias
                unit, but network input vector dimension equals to the number
                of layer units WITHOUT bias unit,

              int* func_number -- pointer to the massive wich contains output
                 function numbers at layers. One can put it equal to:
                                  
                    1 -- 1/(1+exp[-net/T]),
                    2 -- th(net/T),
                    3 -- net/T,
                    4 -- Radial Basis Function,
                    5 -- 2/Pi arctg(net/T),
                    0 -- user defined function(see 3.2.3 of the text),
                    6 -- user defined function(the same).

                  For example, { 0, 4, 2}:
                    0 on the input layer (it may be any, because there's no
                      output function at the layer),
                    4 at the hidden layer,
                    2 at the output layer,

              float weight_dispersion -- dispersion of begin weight distribu-
                tion, bouth very small value and very large one leads to 
                network failed to be learned; try to use about 0.4;

              bool bias_term_flag -- flag of bias term, true (1) for its switch
                on and false (0) for its switch off; it is modeled by neuron's
                input, which allways equals to one -- bias input; sometimes it
                can be usefull, for example, for function approximation,
                without the term network's outputs is zero for zero network's
                inputs and odd neuron's output function (atg, th) so the net-
                work can't approximate function not equaling to zero for all
                zero inputs.

        create network, if RBF isn't used, then RBF_sigma_min doesn't
        influence on computations,


        2.a) putLParams (float learning_rate,
                         float temperature,
                         float RBF_sigma_minimum
		         int energy_function);

                float learning_rate -- common learning rate, usually value
                    about 0.3 is a good one. Learning rate for each layer is
                    computed as (rule of fan-in):

                       layer's_eta = learning_rate / layer's_inputs_number

                float temperature -- temperature in the output function, very
                    small value makes very sharp energy landscape, so learning
                    can be lost in local minima, very large one makes that
                    too sloping and network is learned too long.
                    at first try to use 1,

                float RBF_sigma_minimum -- allowed RBF sigma minimum in the 
                    learning phase, if RBF isn't used, it doesn't influence 
                    on computations (at first try to use zero);

                int energy_function -- you can use following energy function:

                      1 -- summed squared error:
                         E = sum_p sum_i (t_i^p-o_i)^2,

                      2 -- Cross-Entropy(don't use it for odd sigmoids):
                         E = sum_p sum_i (t_i^p*ln(o_i) +
                                         + (1-t_i^p)*ln(1-o_i)),

                      3 -- log-squared error:
                         E = - sum_p sum_i ln(1-(o_i-t_i^p)^2);

          put learning parameters for "vanilla" backpropagation rule,

          b) putLParams (float learning_rate,
                         float temperature,
                         float RBF_sigma_minimum,
		         int energy_function,
		         float mom_term_coef,
		         float lang_noise_begin_dispersion,
		         float lang_noise_disp_decr_coef,
		         float weight_decay_coef);

                float mom_term_coef -- momentum term coefficient(alpha) in
                    weight changing rule:

                    w_{ij}(t+1) = w_{ij}(t) + delta w_{ij}(t) =
                                = w_{ij}(t) + ... + alpha * delta w_{ij}(t-1)

                    is recomended to be in interval 0...1.

	        float lang_noise_begin_dispersion begin value of the disper-
                    sion of the Langevin noise(Gauss noise) term in a weight
                    changing rule:

                    w_{ij}(t+1) = w_{ij}(t) + ... + noise_term

                    It must be sufficiently small for successful learning.

	        float lang_noise_disp_decr_coef -- decreasing coefficient of
                    the Langevin dispersion in learning phase:

                    Langevin_sigma(t+1) =
                          = Langevin_sigma(t) * lang_noise_disp_decr_coef

                    where t+1 -- next call of the AdjustWeights (). It must be 
                    putted approximatly 0.99.

	        float weight_decay_coef -- weight decay coefficient for decay
                    term in a weight changing rule:

                    w_{ij}(t+1) = w_{ij}(t) + ... -
                                - weight_decay * w_{ij}(t).

                    It must be << 1, then a network can be learned(for a
                    normalized inputs);

        put learning parameters for backpropagation with extensions rule,
 
    ALL RECOMENDED VALUES INTEND FOR NORMALIZED INPUTS AND OUTPUTS (see "Soft"
    rules).

  NOTE: you can don't call the function then default values is used for lear-
        ning:

           layer[i].eta = 0.3 / num_outputs_layers[i];
           layer[i].temperature = 1.;
           RBF_sigma_min             = 0.;
           switch_of_energy_function = 1;
           alpha                     = 0.;
           sigma_Langevin            = 0.;
           sigma_Langevin_decrease   = 0.;
           decay                     = 0.;
           flag_of_optimisation      = true;
           error_threshold           = 0.;
           rate_coeff = 1.;
           temperature_coeff = 1.;


        3.a) learnNet (float* linputs,   // learning input vector
                       float* loutputs)  // learning output vector

          learn net for the learning example (one error backpropagation), by
          the function you can self choose order of learning examples in epochs
          what usefull if you want to use them in random order (it assists
          improve a network's generalization)

          b) learnNet (float** linputs,  // array of learning input vectors
                       float** loutputs, // array of learning output vectors
                       int lssize,       // learning set size
		       int vec_order,    // order of vectors used in learning
                       int epnum,        // number of epochs network must be
                                         // learned
                       int pepnum)       // answerMessage() (9.) is called for
                                         // each pepnum-th epoch

          learn network:

	    1) vec_order = 0 -- in random order of learning examples (regular
               distribution is used), that helps to improve network genera-
               lisation; in the case some examples will be used larger and some
               less number of times, but common number of error backpropagation
               is equal to one in a);

            2) vec_order = 1 -- in successive order of learning examples given
               by the array, one error backpropagation for each learning 
               example in epoch is used,

          each pepnum-th epoch learning error is printed.

    NOTE: learning error isn't the mean squared error on learning set,
          it is the mean of output network's error squares for learning set:

               learning_error = sum_i(target_i-realouput_i)^2/lssize

          where target_i -- i-th target output vector,
                realoutput -- i-th real output vector
          and each sum's term is calculated after a weight change,

          c) learnNet (float** linputs,  // array of learning input vectors
                       float** loutputs, // array of learning output vectors
                       int lssize,       // learning set size
		       int vec_order,    // order of vectors used in learning
                       int epnum,        // number of epochs network must be
                                         // learned
                       int pepnum,       // answerMessage() (9.) is called for
                                         // each pepnum-th epoch
                       float** tinputs,  // array of test input vectors
                       float** toutputs, // array of test output vectors
                       int tssize);      // test set size

          learn network (one error backpropagation for each learning example),
          each pepnum-th epoch the learning error and the test error is prin-
          ted.

    NOTE: test error is real mean squared error on test set:

                 test_error = (sum_i(target_i-realoutput_i)^2)/tssize ,

          where i (= 1...tssize) -- the number of test examples,
                target_i -- i-th target output vector,
                realoutput -- i-th real output vector,


        4. testNet (float** tinp, // array of test input vectors
                    float** tout, // array of test output vectors
                    int tssize);  // test set size

          test network and return test error.

    NOTE: test error is real mean squared error on test set,


        5.a) propInputs (float* invec,   // input vector
                         float* outvec); // output vector(allocated by
                                         // programmer)
          propagate input vector invec through network and place output vector
          in outvec

          b) propInputs (float** invecarr,  // input vector massive
                         float** outvecarr, // output vector massive (allocated
                                            // by programmer)
                         int ssize);        // vectors number
          propagate input vector massive through network and place output
          vector massive in outvecarr,


not member-functions:

        6. output << net
           ofstream& output -- output file stream,
           MLPerceptron& net -- network to be saved,
        save the network in the file,


        7. input >> p_p_net
            ifstream& input -- input file stream,
            MLPerceptron** p_p_net -- address of the pointer to the network,

        load the network from the file,

  NOTE: network is loaded to the dynamic memory wich is allocated in the 
        operator-function, so you must self destroy the network if it isn't
        need


        8. output << net
            ostream& output -- output stream (cout),
            MLPerceptron& net -- network to be writed in the output,

        show the network.


PLEASE DEFINE IT:

        9. bool answerMessage(float learn_error, // learning error
			      float test_error); // test error

     The function is intended for error checking and management of learning
   "on fly". It is called by learnNet b),c) each pepnum-th epoch and is got
   learning error and test error if it is calculated. If test error isn't cal-
   culated (for learnNet b)) zero is put in second signature parameter of the
   function. If you like graphics use a graphic library then you can admire
   learning in real time.
     The function MUST RETURN:
       - true if network learning must be ended (error is small enough), then
   return from learnNet() takes place,
       - and false in the other case (continue learning).
     A simple example:

        bool MLPerceptron::answerMessage(float learn_error,
        				 float test_error)
        {
          cout << "Overall number of epoch since learning begin: "
        // overall_epnum -- member-datum of MLPerceptron class (float)
               << overall_epnum << endl
               << "  learning error: " << learn_error << endl
               << "  test error    : " << test_error << endl;

        // is_call_aM -- member-datum of MLPerceptron class (bool);
        // to enter test error once
          if (!is_call_aM) {
            is_call_aM = true;
            cout << "Enter test error which can be allowed: ";

        // error_threshold -- member-datum of MLPerceptron class (float)
            cin >> error_threshold;
          }
          return (test_error < error_threshold);
        }

    I don't use static-values in answerMessage() instead of member-data
    overall_epnum, is_call_aM and error_threshold, because I had troubles
    with them. When I had made several networks, static-values was common.
    Please, don't forget it, if you want to use static-value.
      In learning process you can change some learning parameters, learning
    rate or temperature for each layer except input one, for it you can write
    in answerMessage():

          if (!is_call_aM) {

            is_call_aM = true;
            cout << "Enter rate_coeff and temperature_coeff:";
            cin >> rate_coeff >> temperature_coeff;
          }

          for (int i=1; i<num_layers; i++) {

            layers[i].rate *= rate_coeff;
            layers[i].temperature *= temperature_coeff;
          }

  NOTE: you can continue learning of the network by the other call of learnNet
        (any from a),b),c)) then overall_epnum DOESN'T be zeroed and equals to
        overall number of epochs for several call of learnNet().


3.1.2 Code example of a network learning:

        ...

        float** learning_set_inputs,
             ** learning_set_outs,
             ** test_set_inputs,
             ** test_set_outs,
                eta,
                begin_wts_sigma,
                learning_rate,
                temperature,
                RBF_sigma_minimum;

        int   number_layers,
            * num_outputs_at_layers,
            * num_output_f,
              learning_set_size,
              vec_order,
              energy_function;

        bool flag_of_bias_term;

        // Initialisations of inputs & outputs vector arrays
        ...

        // Enter network and its learning parameters
        params_input (...);

        // Network is created:
        MLPerceptron net (number_layers,
                          num_outputs_at_layers,
                          num_output_f,
                          begin_wts_sigma,
                          flag_of_bias_term);

        // Learning parameters initialisation:
        net.putLParams (learning_rate,
                        temperature,
                        RBF_sigma_minimum,
		        energy_function);

        // Network is learned:
        net.learnNet (learning_set_inputs,
                      learning_set_outputs,
                      learning_set_size,
                      vec_order,
                      epoch_num,
                      print_epoch_num,
                      test_set_inputs,
                      test_set_outs,
                      test_set_size);


        // and is saved:
        bool flag_of_saving;
        char net_file_name[100];

        cout << "Save network? (yes-1, no-0)";
        cin >> flag_of_saving;

        if (flag_of_saving) {

          cout << "Enter file name network must be saved in: ";
          cin >> net_file_name;

          ofstream out_file (net_file_name);
          out_file << net;
        }
        ...

3.1.3 Code example of a saved net using:

        float* set_inputs,
             * input_vec,
             * output_vec;
        ...

        MLPerceptron* p_net;

        ifstream from_file (net_file_name);

        from_file >> &p_net;

        // I want to see loaded net...
        cout << *p_net;

        output_vec = new float[num_outs];

        p_net->propInputs (input_vec,
                           output_vec);
        ...

3.2 MLPerceptron class (middle & low level).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3.2.1 Middle level functions:

        10. putInputs (inputs)
            float* inputs -- pointer to begin of the massive containing network
                inputs,

        put the inputs in the network input layer,

        11. compOutputErrors (target)
           float* target -- pointer to begin of the desiring outputs massive,

        compute network out errors and place square of a error vector module
        in (t_i-y_i)^2 the error_over_outs,

        12. backpropErrors ()

        propagate out errors back through the network,

        13. adjustWeights ()

        adjust network weights,

        14. getOutputs(net_outputs)
           float* net_outputs -- pointer to massive output vector had been
                                placed in,

        put output layer outputs to the net_outputs (which allocated by
        programmer),


3.2.2 Low level(layer interactions functions):

        1. propLayer (...), DON'T FORGET BIAS TERM if you use
        the function (for example, see definition of learnNet).

        propagate signal through a layer,

following functions have access "protected":

        2. compute_lay_outs (...)

        compute outs of a hidden layer,

        3. compute_output_errors (...)

        compute errors of the out layer,

        4. compute_lay_errors (...)

        compute errors of a hidden layer,

        5. adjust_lay_weights (...)

        update weights of neurons of a layer.


3.2.3 How to insert other output function.

    All you need to insert other output function is to write in MLPOutFunc.cc
    desiring function, its derivative  and recompile mozg. In the following I
    consider two case, the case of weighted inputs net and the case of RBF net.

        1. For weighted input sum, net = sum(w_{ij}*y_j) you must write
	desiring output function in mozgweinet(net,sigm) and its derivative in
    	mozgDweinet(out,net,sigm) (see MLPOutFunc.cc).

      	For example, you want to insert y = sin(net), then you must write in
    	mozgweinet (net,sigm):

	  mozgflt mozgweinet (mozgflt net,mozgflt sigm)
	  {
  	    return sin(net);
	  }

    	Then, because of y' = cos(x) = sqrt(1-y^2), you must write in the
    	mozgDweinet (out,net,sigm):

	  mozgflt mozgDweinet (mozgflt out,mozgflt net,mozgflt sigm)
	  {
  	    return cos(net);
	  }
    	or (for some functions using of "out" instead of "net" is more
        convenient)

	  mozgflt mozgDweinet (mozgflt out,mozgflt net,mozgflt sigm)
	  {
  	    return sqrt(1 - out * out);
	  }

  NOTE: the net is neuron's net devided by temperature yet.

        2. If you want to use RBFnet = -.5*sum(m_{ij}-y_j)^2/sigma_i^2, for
     	example y = sin(RBFnet), you must write in mozgrbfnet (RBFnet,sigm):

	  mozgflt mozgrbfnet (mozgflt RBFnet,mozgflt sigm)
	  {
 	    return sin(-.5 * RBFnet / (sigm * sigm));
	  }

    	and, because of y' = cos(RBFnet) = sqrt(1-y^2), you must write in the
    	mozgDrbfnet (out,RBFnet,sigm):

	  mozgflt mozgDrbfnet (mozgflt out,mozgflt RBFnet,mozgflt sigm)
	  {
  	    return cos(-.5 * RBFnet / (sigm * sigm)) / (sigm * sigm);
	  }

    	or(for some functions using of out instead of RBFnet is more
	convenient)

	  mozgflt mozgDrbfnet (mozgflt out,mozgflt RBFnet,mozgflt sigm)
	  {
  	    return sqrt(1 - out * out) / (sigm * sigm);
	  }

  NOTE: the RBFnet is neuron's RBFnet multiplied by gain (slope parameter) yet.

    For frequent change of output function it is better to extract
    MLPOutFunc.cc from the library only to recompile it after each change of
    output function.


3.2.4 How to change basic types.

    If you want to change basic value type "float" by "double" (for example for
    precision increasing) and value type "int" by "long", you need to replace
    "float" by "double" and "int" by long in "types.hh" and recompile mozg.
    Then ALL mozg's "float" values are replaced by "double" values and ALL
    "int" values are replaced by "long" values.


3.3 Kernel library(mozgVector,mozgMatrix classes templates).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3.3.1 Description.

    Extracting of vector and matrix operations from neurosimulator is conse-
    quence of ANN model, that has been described in terms of vector and matrix
    operations. Calculations with vectors and matrices nearly occupy all time
    of a neurosimulator's run only leave some percents or even some shares of
    percent for rest, so to extract them in separate library is very conve-
    nient for further acceleration of the neurosimulator, because you only
    need to change some library's functions by faster ones (for example,
    by written in assembler or even hard implemented ones) to get consi-
    derable acceleration. If you want to do it I'm ready to answer your ques-
    tions about the library.
      Kernel library consists of the mozgVector and mozgMatrix class templates
    (with 7 + 7 member-functions), 32 optimised kernel functions templates
    and 5 aid functions. Those functions may be used independent from
    MLPerceptron as a library for vector and matrix operations. But it isn't
    universal vector and matrix library, because it has been created for
    requirements of the neurosimulator (Some of those will be used in
    following mozg versions). Explicit instantiation of those templates
    functions for float, int value types and int size type has been made in
    VectorMatrix.cc.
      If you want to use kernel library (for example, for your neurosimulator
    writing) then vectors and matrices dimensions coincidences checkings in
    some functions (see VectorMatrix.cc) is very usefull. To use them type

      ./configure --with-debug

    in time of the installation process.


3.3.2 Optimisation technique.

      1. I actively used address arithmetic, for example, in the function of
    out vector product computation instead of:

           matr_t** matr_prod = matrix->matr ();
           vec_t* temp1 = v1.vec (),
                * temp2;

           vec_size_t size1 = v1.size (),
                      size2 = v2.size (),
                      i;

           for (i=0; i<size1; i++)
             {
               temp2 = v2.vec ();

               for (j=0; j<size2; j++)
                 matr_prod[i][j] = temp1[i] * temp2[j];
             }

    I've written:

           matr_t** matr_prod = matrix->matr () [0];
           vec_t* temp1 = v1.vec (),
                * temp2;

           vec_size_t size1 = v1.size (),
                      size2 = v2.size (),
                      i;

           while (size1--)
             {
               temp2 = v2.vec ();
               i = size2;

               while (i--)
       	         *temp3++ = *temp1 * *temp2++;

               temp1++;
             }

    because I didn't want rely upon cleverness of compilator designers :).

      2. The method was used for vector-matrix product:

            - matrix is allocated in one string,

            - instead of:

                for(..i..) { 
                  for(..j..) {
                    result_vector[i] += vector[j] * matrix[j][i];
                  }
                }

              has been written:

                
                for(..i..) {
                  mattrix = matrix[0][i];
                  for(..j..) {
                    *result_vector += *vector++ * *mattrix;
                    mattrix += matrix_columns_num;
                  }
                }.

    matrix[j][i] is compiled to *(*(matrix+j)+i), that requires more CPU time 
    than matrix += matrix_columns_num, so we get considerable gain for inten-
    sive vector-matrix operations (in a neurosimulator, for example). If we
    should use transponed matrix it led to the extra memory consumption
    undesirable for big matrices.

*******************************************************************************

                    4. Bibliography
                   =================

    [1]	J.Freeman, D.Scapura; {\it Neural Network: algorithms, applications, 
programming techniques.} Addison-Wesley Publishing Company, 1991, --422 p.

    [2] B. Stroustrup {\it The C++ programmig language,} third edition,
Addison-Wesley Publishing Company, 1997, --9XX p.

    [3]	C.Peterson, Th.R\"{o}gnvaldsson; {\it An Introduction to Artificial 
Neural networks,} Department of Theoretical Phisics, University of Lund Sweden.

    [4]	C.Peterson, Th.R\"{o}gnvaldsson; {\it JETNET 3.0 --- A Versatile
Neural Network Package,} Department of Theoretical Phisics, University of Lund
Sweden.

    [5]	J.Proriol; {\it Multi-modular neural network for the classification of 
$e^+ e^-$ hadronic events;} NIMA 337 (1994).

    [6]	C.Svarer; {\it Neural Network for Signal Processing,} Ph.D. Thesis,
ph.d. nr. 91-0112-134, CONNECT, Electronic Institute, Technical University of
Denmark, 1995.

    [7]	A.Zell et. al.; {\it Stuttgart Neural Network Simulator: User manuale, 
Version 4.2;} University of Stutgart, Institute for parallel and distributed
high performance systems (IPVR); University of T\"{u}bingen, 
Wilhelm-Schickard-Institute for Computer Science.

    [8] http://world.std.com/~nr

    [9] ftp://ftp.sas.com/pub/neural/FAQ.html