Metakit 2.3.4 - last release candidate March, 2001 Welcome to the latest release of the Metakit embedded database library. This is a near-final release, it is functionally 100% done, but a few minor problems/bugs still remain. Release MK 2.3.4 is essentialy ready for production use, with one extremely important disclaimer: >>> Do *not* rely on new commit-aside/-extend models, these >>> new features are not good enough for general use yet. >>> Worse still, they will remain experimental in 2.3 final! This document describes the main changes since 2.01 in some detail: - a new version numbering scheme (again) - some essential file format changes - new commit modes: commit-aside and commit-extend - the memo datatype has been removed from the public API - made a start with making custom viewers modifiable - new "mapped views": hashed, blocked, and ordered - recursive view definitions - the Tcl language binding now includes a command-object API - better error checking to avoid disk full errors from being missed - documentation and current status Most of these changes are intended to make Metakit more scalable and to improve performance in various ways. This process is not complete, as several new bottlenecks have been intoduced in the new code. The main goal of the 2.3 release is to make sure the file format is correct and functioning properly. The next minor releases will then address various bottlenecks to take maximum advantage of the new file format. See the last section for notes about the status and stability of MK. NEW VERSION NUMBERING SCHEME I am adopting a more rigid 3-level numbering scheme, as follows: - stable release are X.0, X.2, X.4, i.e. even-numbered - development releases are X.1, X.3, X.5, i.e. odd-numbered - alpha releases are all called X.Y.1, they are distinguished only by the date embedded in the snapshot tarball, e.g "mk2-20000629.tar.gz" - beta releases are all called X.Y.2, again only dates in snapshots - the "final release candidate" will be called X.Y.3, with one extra chance to fix any remaining mistakes in a release called X.Y.4 - the first final release will be called X.Y.5, with every new update and fix incrementing that third digit To avoid confusion between 2.01 and a version number of 2.1, this new release has been dubbed 2.3 - since it is still development version. NEW FILE FORMAT This release has major changes in the way data is stored on file, or to be more accurate: the data itself is hardly different, but administration of the data is now radically different (it used to be a tree-walk on each open and commit, the new format reads/commits far more lazily). The most immediate effect is that file opens are now O(1), and no longer take time proprotional to the number of subviews and columns. There has been some "cutting corners" in this release, but the file format is now able to support several degrees of laziness, which will gradually be implemented to make opens instant, and frequent commits cheap. The biggest drawback of this change, is that there are now two different file formats. The new 2.3 release will convert old formats on-the-fly during open, and will save the new format on commit. This means that moving to the new release is relatively painless, but once changes are committed with 2.3, you can not go back to 2.01 or earlier. The "random" versus "serial" file format differences of pre-2.3 Metakit is now gone. When serializing data out with "SaveTo", MK will now figure out how to save the file in an optimally compact form, suitable for loading on demand. That means that the best way to reorganize a MK datafile to get rid of internal free space, is now to serialize it out to file. As extra benefit, a serialized file has an accurate file size in the header, so that the exact size is known up front when serializing back in. This can be useful when data is streamed across a communications channel. Finally, storage objects are now derived from views, and both can be initialized from an input file/stream. One special file format change affects the top level: MK datafiles are now conformant with *IFF file formats (i.e. 8-byte header with type and size). Serialized files, and files using the ordinary ("full") commits consist of exactly one section. However, files using commit-extend (described below) will contain multiple sections. The new file format now uses a tail marker. Because of that, a datafile can be appended to another file. MK will always start by looking for the tail marker to determine where the file starts (resorting to old-style search-from-start to deal with pre-2.3 files). Another effect, is that any existing file can now be opened by MK. If there is no valid MK data, new contents will be appended. There is a new c4_Strategy::EndOfData member which returns -1 if the file contained to valid MK data. One goal of all this logic, was to allow adding MK data at the end of executables. Several features have been prepared for in the new file format, including heterogenous views, clustering, compression, encryption - as well as the ability to store revisions. None of this is anywhere near implementation, so please don't expect much in these areas for quite some time. The main reason to mention this at all, is to indicate that the new file format has plenty of room for future expansion. NEW COMMIT MODES There are two new commit modes: "commit-aside" and "commit-extend", with very different characteristics. They cannot yet be combined, but this is one of the goals, since that will support some interesting scenario's. Commit-aside stores all changes made to one datafile in another one. The primary datafile is not altered, it may be opened read-only. The changes are saved in a special format and managed completely by MK. Opening the original file later will of course lead to the original pre-commit state. By attaching that other file (using c4_Storage::SetAside), the latest changes become visible again. This logic can be stacked, though the API for all this is not very intuitive. Commit-aside applies to every change made to the file, including structure changes. The second data file is in effect a "difference file", it has no use without the original file. In commit-aside mode, the c4_View::Commit call takes an extra argument, specifying whether to do a "fast" or a "full" commit (fast is default). Fast means: save the changes in the aside file, full means: apply all changes to the primary datafile and clear the aside file contents. The rollback also comes in two versions: fast rollback rolls back to the state before the last commit, while full rollback forgets about commit aside altogether and reverts to the original datafile (without deleting the aside file contents - it can be re-attached later). Note that any changes made to the primary datafile will invalidate the aside file. The purpose of commit-aside is two-fold: to speed up commits, and as a way to save changes when a key database needs to remain unchanged (this is useful when distributing over CD-ROM, or to manage original releases, for example). Commit aside is not yet optimally efficient, but it'll get better - the key issue is that the amount of data saved is proportional to the amount of change, and *not* to the size/number of views & columns, Commit-extend is somewhat different: it is a modified version of a normal commit, which saves all changes at the end of the datafile, in such a way that current readers are not affected. Readers will not see any changes until they do a rollback (which is a misnomer in this context). At that point, the reader will resynchronize to the latest *committed* changes. Commit-extend supports multiple readers and a single writer, and due to the way it is implemented it does so without any locking or contention. The trade-off is that datafiles will grow on each commit, and need to be cleaned up periodically to avoid filling the disk. This can often be scheduled to some "idle" period, at which point a normal (exclusive) open and commit can be performed, possible followed by serialization to generate an optimally-packed datafile again. Note that commit-extend is not much more efficient than a normal commit, it still writes out entire changed columns (there is a speedup because free-space is not reclaimed). In this release, commit-aside and commit-extend cannot be combined: this would greatly reduce the amount of data saved on commit, while supporting a form of high-performance concurrency. This "hybrid commit" is planned for the final 2.3 release, as is a more sophisticated degree of delta- storage in commit-aside. NO MORE MEMOS TO WORRY ABOUT The memo ("M") datatype has always been a confusing addition to MK, since it complements the bytes ("B") datatype yet adds very little value. It has always been difficult to decide between B and M in the design phase, and since there were no conversion tools - that choice has to be made up front. This release does away with memos. At least, that's how it looks on the outside. On the inside, MK has now been extended to dynamically switch between column-wise (B) and item-wise (M) storage, depending on the amount of data and the size of the view. The result is likely to be better than either approach before, because the choise is based on actual dynamics, and on an item-by-item basis. The heuristics which determine how data is stored internally are still being tweaked. But the good news is that binary data storage is now always specified in a uniform manner, i.e. through c4_BytesProp. In a way, this is analogous to how MK has always offered "adaptive integers", i.e. the fully transparent switch between 1/2/4/8/16/32-bit integers. Another change is that strings are now internally based on the same more adaptive style, they now scale much better than before and also no longer suffer from a startup delay (caused by a pre-2.3 null-byte scan on open). MODIFIABLE CUSTOM VIEWERS Custom viewers were introduced in version 1.8 and have turned out to be extremely effective and flexible. Almost all the relational operators of MK are implemented as classes derived from c4_CustomViewer. As of this release, custom viewers now support the ability to handle modifications. Many of the existing viewers, such as join and intersection, are still read-only (and some changes could never be handled properly), but a few others were realtively easy to extend. View operators such as Concat, RemapWith, Pair, now support modifications on the resulting view. These modifications are passed back to the underlying views. There are several types of modifications: setting individual properties in individual rows, inserting/removing rows, and changing structure. Propagating individual property changes is usually easiest to implement. Inserting/deleting rows is far more complex, and not even semantically obvious in many cases (where do you insert row N, if the view is the result of concatenating an N-row view A and an M-row view B?). For the moment, row insertions/deletions are only implemented in mapped views (see below). The last category of changes is the most complex of all, and has not been addressed in this release. The best way to stay out of trouble is to only restructure views on open (i.e. at app initialization time), and to then only deal with inserting/modifying/deleting rows. Subviews are no different, and can be modified in the same way as the main views. MAPPED VIEWS Modifiable custom viewers which support row insertion and deletion have been dubbed "mapped views", because they usually represent a mapping of some kind over one or more underlying views. Changes are propagated to the underlying views in the way that is deemed most intuitive. The 2.3 release contains three new mapped view implementations: hash, blocked, and ordered. The hash view takes a main "data" view as input, plus a "map" view. The map view must be defined with a fixed structure ("_H:I,_R:I"), and is managed fully by the hash view. The data view can be any structure, it is the collection of data on which hashing is to be applied. When hash views are defined, they must be told how many of the first properties form a key (note that the key fields must always be first). What the hash view does is create a new layer on top of the original data view, which intercepts all modification and find requests. Modifications are applied to the data, but also cause the map view to be adjusted as needed. Find requests which happen to specify a value for the key are singled out and cause a fast O(1) hash lookup to take place. There are several consequences of this approach: first of all, it is up to the application to set up a hash view after open (for now at least). Also, once you want hashing, you should *never* change the underlying data or map views - the only way to make sure they contain the proper info for hashing to work is to always change *through* the hash view. Note that the data view is the same as without hashing, all details needed for instant hash lookup are maintained in the separate map view. Unlike most conventional hash tables, insertion by position is still allowed, though insertion at the end is fastest (by far). Also unlike most hash implementations, it is possible to change the key of a row, because MK will rehash whenever that happens. Caveat: the details of changing key fields are not yet correct in this release. Find requests automatically detect hash views and optimize accordingly. The "python/find.py" script demonstrates find access optimizations. The blocked view is a view which stores its rows in "blocks", i.e. one extra level of subviews. To make a view of the form "a:I,b:S,c:D" blocked, you have to define it as "_B[a:I,b:S,c:D]" instead, then call the Blocked member to produce the view you can work with. The blocked view then takes over all insertions and deletions and rebalances both root and leaf blocks in such a way that none of the subviews ever contains either very few nor very many rows ("few" and "many" depend on the total number of rows, everything is adaptive). Due to the extra level of indirection, blocked views are slightly slower, but the gain is that they can scale to an unlimited number of rows and still offer good performance (probably O(log N) instead of the usual O(N)). These blocked views will behave like ordinary views in every respect, i.e. you can iterate and access rows by position, regardless of the blocking and occasional rebalancing that takes place underneath. The ordered view is a view which assumes the underlying view is sorted on its first N properties (N defaults to 1), and which maintains that order by ignoring isnert positions and supplying its own instead. As with hash views, keys are unique. Inserting a row which has the same key as another one will cause the other row to be replaced instead. Ordered views intercept changes, but also find requests. If a find specifies a value for the key field(s), then binary search is used to find the rows far more quickly. All of the above views can be combined, e.g. by layering a hash view on top of a blocked data view, you get a persistent hash structure which scales up indefinitely, yet offers O(1) hash access performance. An ordered view on top a blocked view creates a blocked structure which keeps its rows sorted. The current release will take advantage of binary search if possible - but real btree-like searches will be implemented before 2.3 final (this is far more efficient than "just" binary search, due to its much higher locality of reference). And finally, you can layer an ordered view on a hash view on one or two blocked views (for data and/or map). This creates a structure which maintains sort order (and can be traversed in that order), yet with instant hash access on single key values. The "tcl/mapped.tcl" script shows an example of various ways in which this can be done. RECURSIVE VIEW DEFINIONS You can now define views with recursive subviews. The way this is done is to specify a subview of the form "subview[^]". An example: roots[name:S,children[^]] This define a "roots" view containing 0 or more rows. Each has the form given above, which is also identical to: name:S,children[name:S,children[^]] In other words, the "children" subview has the same format as its parent, and this continues to arbitrary depths. The above example is in fact nothing but a definition of a tree containing named nodes. Restructuring works, it will recursively restructure the entire tree. COMMAND OBJECTS IN MK4TCL The Mk4tcl extension has undergone a massive face-lift. The original API is still there, but there is now also an OO-style command interface (in the same way as Tk works). This is based on work by Matt Newman. Because of this, a far more powerful binding to the MK C++ core is now available from Tcl (as it has been in Python for a while). These new calls include relational operators, such as join, select, sort, etc. For now, this is "thinly" documented in the "tcl/mk4tcl.h" header. Another useful source to check, is the "tcl/test/mk5object.tcl" script. The Tequila server and the I/O handler package (iohan) have undergone some changes (extensions, really). BETTER ERROR CHECKING The write and commit logic have been improved to better detect whether anything went wrong during the commit, to prevent the final end marker from being updated. This has the effect of ignoring all changes. The most important change was to check and remember errors in fflush, and to make the c4_Stream::Write member return a success flag i.s.o. void. DOCUMENTATION Documentation. Hm, what documentation? I have converted the C++ docs to work with Doxygen, which extracts everything from comments in the source code. The C++ documentation is available as a scripted document. To get it, you need to download the file: http://www.equi4.com/previews/mk4api.bin and one of the following (sorry, only Linux and Windows for now): http://www.equi4.com/previews/tclkit-linux.bin http://www.equi4.com/previews/tclkit.exe Uncompress and make all files executable, then drop mk4api on tclkit, or whatever your explorer, navigator, finder, or shell wants you to do. If this is not workable, I will create a tarball of the html/gif files. For Python and Tcl, check the pages doc/python.html and doc/tcl.html in the source distribution (they are also part of the above "mk4api"). There is a new examples/ directory with several Python and Tcl examples. The documentation is far from complete ("non-existent" or "wrong" are also applicable terms in some cases, I'm afraid). I have started work on improving this. Comments on what needs to be done first are most gratefully accepted - there is a huge amount of work left to do. CURRENT STATUS This 2.3.4 build passes all but one of the original regression tests (the one which does not pass is related to serialized data loading). There are several performance bottlenecks, despite the fact that the new format really should work better than the pre-2.3 file format. For now, the first goal is to make sure that the file format is 100% frozen and useable as is - to avoid even more complex future conversion nightmares. I will start aiming for top speed *after* everything works properly. The commit logic can be slow for complex datasets, I am looking into it. There are some problems when changing keys in hash & ordered views, the best workaround is to not change key properties - delete and re-insert the row instead if you need this capability. This MK 2.3.4 final release candidate is very solid. Recent versions of the beta preceding it have been in use in a wide range of active projects without substantial problems. A byte-order conversion bug has recently been found and fixed, as well as double alignment on Solaris. At this point, I recommend using 2.3.4 as replacement for 2.01, except when tweaks for top performance have been done in 2.01 (2.3.4 is not yet as highly optimized, and has a slightly different performance behavior). -- Jean-Claude Wippler <jcw@equi4.com>