HP UX Archive Centre

 Metakit 2.3.4 - last release candidate March, 2001

 Welcome to the latest release of the  Metakit  embedded  database  library.
 This is a near-final release, it is functionally 100% done, but a few minor
 problems/bugs still remain.  Release  MK  2.3.4  is  essentialy  ready  for
 production use, with one extremely important disclaimer:

    >>> Do *not* rely on new commit-aside/-extend models, these
    >>> new features are not good enough for general use yet.
    >>> Worse still, they will remain experimental in 2.3 final!

 This document describes the main changes since 2.01 in some detail:

  - a new version numbering scheme (again)
  - some essential file format changes
  - new commit modes: commit-aside and commit-extend
  - the memo datatype has been removed from the public API
  - made a start with making custom viewers modifiable
  - new "mapped views": hashed, blocked, and ordered
  - recursive view definitions
  - the Tcl language binding now includes a command-object API
  - better error checking to avoid disk full errors from being missed
  - documentation and current status

 Most of these changes are intended to make Metakit  more  scalable  and  to
 improve  performance  in  various  ways.   This process is not complete, as
 several new bottlenecks have been intoduced in the new code.  The main goal
 of  the  2.3  release  is  to  make  sure  the  file  format is correct and
 functioning properly.  The next minor releases will  then  address  various
 bottlenecks to take maximum advantage of the new file format.

 See the last section for notes about the status and stability of MK.


 NEW VERSION NUMBERING SCHEME

 I am adopting a more rigid 3-level numbering scheme, as follows:

  - stable release are X.0, X.2, X.4, i.e. even-numbered
  - development releases are X.1, X.3, X.5, i.e. odd-numbered
  - alpha releases are all called X.Y.1, they are distinguished only by
    the date embedded in the snapshot tarball, e.g "mk2-20000629.tar.gz"
  - beta releases are all called X.Y.2, again only dates in snapshots
  - the "final release candidate" will be called X.Y.3, with one extra
    chance to fix any remaining mistakes in a release called X.Y.4
  - the first final release will be called X.Y.5, with every new update
    and fix incrementing that third digit

 To avoid confusion between 2.01 and a  version  number  of  2.1,  this  new
 release has been dubbed 2.3 - since it is still development version.


 NEW FILE FORMAT

 This release has major changes in the way data is stored on file, or to  be
 more  accurate:  the data itself is hardly different, but administration of
 the data is now radically different (it used to be a tree-walk on each open
 and commit, the new format reads/commits far more lazily).

 The most immediate effect is that file opens are now O(1),  and  no  longer
 take  time  proprotional  to the number of subviews and columns.  There has
 been some "cutting corners" in this release, but the  file  format  is  now
 able  to  support  several  degrees  of  laziness,  which will gradually be
 implemented to make opens instant, and frequent commits cheap.

 The biggest drawback of this change, is that there are  now  two  different
 file  formats.   The  new  2.3  release will convert old formats on-the-fly
 during open, and will save the new  format  on  commit.   This  means  that
 moving  to  the  new  release  is relatively painless, but once changes are
 committed with 2.3, you can not go back to 2.01 or earlier.

 The "random" versus "serial" file format differences of pre-2.3 Metakit  is
 now  gone.  When serializing data out with "SaveTo", MK will now figure out
 how to save the file in an optimally compact form, suitable for loading  on
 demand.   That  means  that the best way to reorganize a MK datafile to get
 rid of internal free space, is now to serialize it out to file.   As  extra
 benefit, a serialized file has an accurate file size in the header, so that
 the exact size is known up front when serializing back  in.   This  can  be
 useful  when  data  is  streamed across a communications channel.  Finally,
 storage objects are now derived from views, and  both  can  be  initialized
 from an input file/stream.

 One special file format change affects the top level: MK datafiles are  now
 conformant  with *IFF file formats (i.e. 8-byte header with type and size).
 Serialized files, and files using the ordinary ("full") commits consist  of
 exactly  one section.  However, files using commit-extend (described below)
 will contain multiple sections.

 The new file format now uses a tail marker.  Because of  that,  a  datafile
 can  be  appended to another file.  MK will always start by looking for the
 tail marker to determine where the  file  starts  (resorting  to  old-style
 search-from-start to deal with pre-2.3 files).  Another effect, is that any
 existing file can now be opened by MK.  If there is no valid MK  data,  new
 contents  will  be  appended.  There is a new c4_Strategy::EndOfData member
 which returns -1 if the file contained to valid MK data.  One goal  of  all
 this logic, was to allow adding MK data at the end of executables.

 Several features have been prepared for in the new file  format,  including
 heterogenous  views,  clustering,  compression, encryption - as well as the
 ability to store revisions.  None of this is anywhere near  implementation,
 so  please  don't expect much in these areas for quite some time.  The main
 reason to mention this at all, is to indicate that the new file format  has
 plenty of room for future expansion.


 NEW COMMIT MODES

 There are two new commit modes: "commit-aside"  and  "commit-extend",  with
 very  different  characteristics.  They cannot yet be combined, but this is
 one of the goals, since that will support some interesting scenario's.

 Commit-aside stores all changes made to one datafile in another  one.   The
 primary  datafile  is not altered, it may be opened read-only.  The changes
 are saved in a special format and managed completely by  MK.   Opening  the
 original  file  later will of course lead to the original pre-commit state.
 By attaching that  other  file  (using  c4_Storage::SetAside),  the  latest
 changes  become  visible  again.  This logic can be stacked, though the API
 for all this is not very intuitive.  Commit-aside applies to  every  change
 made  to the file, including structure changes.  The second data file is in
 effect a "difference file", it has no use without the original file.

 In commit-aside mode, the c4_View::Commit call  takes  an  extra  argument,
 specifying  whether  to  do  a "fast" or a "full" commit (fast is default).
 Fast means: save the changes in the  aside  file,  full  means:  apply  all
 changes  to  the  primary  datafile and clear the aside file contents.  The
 rollback also comes in two versions: fast rollback rolls back to the  state
 before  the  last  commit,  while  full rollback forgets about commit aside
 altogether and reverts to the original datafile (without deleting the aside
 file  contents  - it can be re-attached later).  Note that any changes made
 to the primary datafile will invalidate the aside file.

 The purpose of commit-aside is two-fold: to speed up commits, and as a  way
 to  save  changes  when  a  key database needs to remain unchanged (this is
 useful when distributing over CD-ROM, or to manage original  releases,  for
 example).   Commit  aside  is  not  yet  optimally efficient, but it'll get
 better - the key issue is that the amount of data saved is proportional  to
 the amount of change, and *not* to the size/number of views & columns,

 Commit-extend is somewhat different: it is a modified version of  a  normal
 commit,  which  saves all changes at the end of the datafile, in such a way
 that current readers are not affected.  Readers will not  see  any  changes
 until  they  do  a rollback (which is a misnomer in this context).  At that
 point, the reader will resynchronize to the latest *committed* changes.

 Commit-extend supports multiple readers and a single writer, and due to the
 way  it  is  implemented it does so without any locking or contention.  The
 trade-off is that datafiles will grow  on  each  commit,  and  need  to  be
 cleaned  up  periodically  to  avoid  filling  the disk.  This can often be
 scheduled to some "idle" period, at which point a normal  (exclusive)  open
 and commit can be performed, possible followed by serialization to generate
 an optimally-packed datafile again.  Note that commit-extend  is  not  much
 more  efficient  than  a  normal commit, it still writes out entire changed
 columns (there is a speedup because free-space is not reclaimed).

 In this release, commit-aside and commit-extend cannot  be  combined:  this
 would greatly reduce the amount of data saved on commit, while supporting a
 form of high-performance concurrency.  This "hybrid commit" is planned  for
 the  final 2.3 release, as is a more sophisticated degree of delta- storage
 in commit-aside.


 NO MORE MEMOS TO WORRY ABOUT

 The memo ("M") datatype has always been a confusing addition to  MK,  since
 it complements the bytes ("B") datatype yet adds very little value.  It has
 always been difficult to decide between B and M in the  design  phase,  and
 since there were no conversion tools - that choice has to be made up front.
 This release does away with memos.

 At least, that's how it looks on the outside.  On the inside,  MK  has  now
 been  extended  to dynamically switch between column-wise (B) and item-wise
 (M) storage, depending on the amount of data and the size of the view.  The
 result  is  likely  to  be  better than either approach before, because the
 choise is based on actual dynamics, and  on  an  item-by-item  basis.   The
 heuristics  which  determine  how data is stored internally are still being
 tweaked.  But the good news is that  binary  data  storage  is  now  always
 specified  in  a uniform manner, i.e. through c4_BytesProp.  In a way, this
 is analogous to how MK has always offered  "adaptive  integers",  i.e.  the
 fully transparent switch between 1/2/4/8/16/32-bit integers.

 Another change is that strings are now internally based on  the  same  more
 adaptive  style,  they now scale much better than before and also no longer
 suffer from a startup delay (caused by a pre-2.3 null-byte scan on open).


 MODIFIABLE CUSTOM VIEWERS

 Custom viewers were introduced in version 1.8 and have  turned  out  to  be
 extremely  effective  and flexible.  Almost all the relational operators of
 MK are implemented as classes derived from  c4_CustomViewer.   As  of  this
 release, custom viewers now support the ability to handle modifications.

 Many of the existing viewers, such as  join  and  intersection,  are  still
 read-only  (and  some  changes  could never be handled properly), but a few
 others were realtively easy to extend.   View  operators  such  as  Concat,
 RemapWith,  Pair,  now  support modifications on the resulting view.  These
 modifications are passed back to the underlying views.

 There are several types of modifications: setting individual properties  in
 individual   rows,   inserting/removing   rows,   and  changing  structure.
 Propagating individual property changes is usually  easiest  to  implement.
 Inserting/deleting  rows  is  far  more  complex, and not even semantically
 obvious in many cases (where do you insert row N, if the view is the result
 of  concatenating  an  N-row view A and an M-row view B?).  For the moment,
 row insertions/deletions are only implemented in mapped views (see below).

 The last category of changes is the most complex of all, and has  not  been
 addressed  in this release.  The best way to stay out of trouble is to only
 restructure views on open (i.e. at app initialization time),  and  to  then
 only   deal   with  inserting/modifying/deleting  rows.   Subviews  are  no
 different, and can be modified in the same way as the main views.


 MAPPED VIEWS

 Modifiable custom viewers which support row  insertion  and  deletion  have
 been  dubbed  "mapped  views",  because they usually represent a mapping of
 some kind over one or more underlying views.  Changes are propagated to the
 underlying views in the way that is deemed most intuitive.

 The 2.3 release contains  three  new  mapped  view  implementations:  hash,
 blocked, and ordered.

 The hash view takes a main "data" view as input, plus a  "map"  view.   The
 map  view  must  be  defined  with  a fixed structure ("_H:I,_R:I"), and is
 managed fully by the hash view.  The data view can be any structure, it  is
 the  collection of data on which hashing is to be applied.  When hash views
 are defined, they must be told how many of the first properties form a  key
 (note  that  the key fields must always be first).  What the hash view does
 is create a new layer on top of the original data  view,  which  intercepts
 all modification and find requests.  Modifications are applied to the data,
 but also cause the map view to be adjusted as needed.  Find requests  which
 happen to specify a value for the key are singled out and cause a fast O(1)
 hash lookup to take place.

 There are several consequences of this approach: first of all, it is up  to
 the application to set up a hash view after open (for now at least).  Also,
 once you want hashing, you should *never* change the underlying data or map
 views  - the only way to make sure they contain the proper info for hashing
 to work is to always change *through* the hash view.  Note  that  the  data
 view  is  the  same as without hashing, all details needed for instant hash
 lookup are maintained in the separate map view.  Unlike  most  conventional
 hash  tables,  insertion  by position is still allowed, though insertion at
 the end is fastest (by far).  Also unlike most hash implementations, it  is
 possible  to  change the key of a row, because MK will rehash whenever that
 happens.  Caveat: the details of changing key fields are not yet correct in
 this release.

 Find requests automatically detect hash  views  and  optimize  accordingly.
 The "python/find.py" script demonstrates find access optimizations.

 The blocked view is a view which stores its  rows  in  "blocks",  i.e.  one
 extra level of subviews.  To make a view of the form "a:I,b:S,c:D" blocked,
 you have to define it as "_B[a:I,b:S,c:D]" instead, then call  the  Blocked
 member  to produce the view you can work with.  The blocked view then takes
 over all insertions and deletions and rebalances both root and leaf  blocks
 in  such  a way that none of the subviews ever contains either very few nor
 very many rows ("few" and "many"  depend  on  the  total  number  of  rows,
 everything  is  adaptive).   Due to the extra level of indirection, blocked
 views are slightly slower, but the gain  is  that  they  can  scale  to  an
 unlimited  number  of rows and still offer good performance (probably O(log
 N) instead of the usual  O(N)).   These  blocked  views  will  behave  like
 ordinary  views  in every respect, i.e.  you can iterate and access rows by
 position, regardless of the blocking and occasional rebalancing that  takes
 place underneath.

 The ordered view is a view which assumes the underlying view is  sorted  on
 its first N properties (N defaults to 1), and which maintains that order by
 ignoring isnert positions and supplying its  own  instead.   As  with  hash
 views,  keys are unique.  Inserting a row which has the same key as another
 one will cause the other row to be replaced instead.

 Ordered views intercept  changes,  but  also  find  requests.   If  a  find
 specifies  a value for the key field(s), then binary search is used to find
 the rows far more quickly.

 All of the above views can be combined, e.g. by layering a hash view on top
 of a blocked data view, you get a persistent hash structure which scales up
 indefinitely, yet offers O(1) hash access performance.

 An ordered view on top a blocked view creates  a  blocked  structure  which
 keeps  its  rows sorted.  The current release will take advantage of binary
 search if possible - but  real  btree-like  searches  will  be  implemented
 before 2.3 final (this is far more efficient than "just" binary search, due
 to its much higher locality of reference).

 And finally, you can layer an ordered view on a hash view  on  one  or  two
 blocked  views  (for  data  and/or  map).   This  creates a structure which
 maintains sort order (and can be traversed in that order), yet with instant
 hash  access  on  single  key values.  The "tcl/mapped.tcl" script shows an
 example of various ways in which this can be done.


 RECURSIVE VIEW DEFINIONS

 You can now define views with recursive subviews.  The way this is done  is
 to   specify   a   subview   of   the   form   "subview[^]".   An  example:
         roots[name:S,children[^]] This define a "roots" view  containing  0
 or  more  rows.  Each has the form given above, which is also identical to:
         name:S,children[name:S,children[^]] In other words, the  "children"
 subview  has the same format as its parent, and this continues to arbitrary
 depths.  The above example is in fact nothing but a definition  of  a  tree
 containing   named   nodes.    Restructuring  works,  it  will  recursively
 restructure the entire tree.


 COMMAND OBJECTS IN MK4TCL

 The Mk4tcl extension has undergone a massive face-lift.  The  original  API
 is still there, but there is now also an OO-style command interface (in the
 same way as Tk works).  This is based on work by Matt Newman.   Because  of
 this,  a far more powerful binding to the MK C++ core is now available from
 Tcl (as it has been in Python  for  a  while).   These  new  calls  include
 relational  operators,  such  as join, select, sort, etc.  For now, this is
 "thinly" documented in the "tcl/mk4tcl.h" header.  Another useful source to
 check, is the "tcl/test/mk5object.tcl" script.

 The Tequila server and the I/O handler package (iohan) have undergone  some
 changes (extensions, really).


 BETTER ERROR CHECKING

 The write and commit logic have been  improved  to  better  detect  whether
 anything went wrong during the commit, to prevent the final end marker from
 being updated.  This has the effect of  ignoring  all  changes.   The  most
 important  change  was  to check and remember errors in fflush, and to make
 the c4_Stream::Write member return a success flag i.s.o. void.


 DOCUMENTATION

 Documentation.  Hm, what documentation?  I have converted the C++  docs  to
 work  with  Doxygen,  which extracts everything from comments in the source
 code.  The C++ documentation is available as a scripted document.   To  get
 it,        you        need        to        download        the       file:
         http://www.equi4.com/previews/mk4api.bin and one of  the  following
 (sorry,       only       Linux       and       Windows       for      now):
         http://www.equi4.com/previews/tclkit-linux.bin
         http://www.equi4.com/previews/tclkit.exe  Uncompress  and  make all
 files executable, then drop mk4api on tclkit, or  whatever  your  explorer,
 navigator,  finder,  or  shell wants you to do.  If this is not workable, I
 will create a tarball of the html/gif files.

 For Python and Tcl, check the pages doc/python.html and doc/tcl.html in the
 source distribution (they are also part of the above "mk4api").  There is a
 new examples/ directory with several Python and Tcl examples.

 The documentation is far from complete ("non-existent" or "wrong" are  also
 applicable  terms  in  some  cases,  I'm  afraid).   I have started work on
 improving this.   Comments  on  what  needs  to  be  done  first  are  most
 gratefully accepted - there is a huge amount of work left to do.


 CURRENT STATUS

 This 2.3.4 build passes all but one of the original regression  tests  (the
 one which does not pass is related to serialized data loading).

 There are several performance bottlenecks, despite the fact  that  the  new
 format  really  should  work better than the pre-2.3 file format.  For now,
 the first goal is to make sure that the file  format  is  100%  frozen  and
 useable as is - to avoid even more complex future conversion nightmares.  I
 will start aiming for top speed *after* everything works properly.

 The commit logic can be slow for complex datasets, I am looking into it.

 There are some problems when changing keys in hash  &  ordered  views,  the
 best  workaround is to not change key properties - delete and re-insert the
 row instead if you need this capability.

 This MK 2.3.4 final release candidate is very solid.   Recent  versions  of
 the  beta  preceding it have been in use in a wide range of active projects
 without substantial problems.  A byte-order  conversion  bug  has  recently
 been found and fixed, as well as double alignment on Solaris.

 At this point, I recommend using 2.3.4 as replacement for 2.01, except when
 tweaks  for  top  performance  have  been done in 2.01 (2.3.4 is not yet as
 highly optimized, and has a slightly different performance behavior).


 -- Jean-Claude Wippler <jcw@equi4.com>