This is the first beta release of an AWK-to-C Translator I'm currently in the process of developing. It is based on GAWK 2.15.6 and supports various Unix systems, and Windows NT/95 and DOS16/DOS32 on PC systems. The development is being done on a Linux system. Features of AWK/GAWK currently not supported yet (I'll probably forget some feature(s)... do let me know if you discover a "missing feature"): 1) All the builtin functions (i.e: printf, gsub, split, etc). Note however that the 'print' statement is supported. 2) String concatenations 3) The 'for XX in YY' statement. Associative arrays are otherwise supported fully. 4) I/O redirection within AWK program (i.e: print "fjklfj" > out.dat) 5) Variable initializations via command line. (i.e: awk -v foo=2.3 -f test.awk test.in) 6) User functions (they're listed in the order of priority for support in the next release) 1. Instructions for building the AWK-to-C Translator ==================================================== 1) Change to the src/ subdirectory 2) Unix systems: Run 'configure' , specifying option for your system (run it by itself to get a list) PC systems: Change to the src/pc/ subdirectory and run config.bat with the option for your OS/Compiler combination and change to the main src/ subdir again. OS choices: Windows NT, Win95, DOS, 32-bit DOS Compiler choices: MSC or Visual C++, Watcom C/C++, GCC (run it by itself to get a list) The configuration process will produce a Makefile and config.h tailored to your system and copy any special source files to the src/ directory. 3) Compiling the translator and static library: Run 'make all' (PC compilers like MSC use 'nmake', Watcom C/C++ uses 'wmake, etc). For unix systems, the make will produce the 'awk2c' translator executable and 'libawk2c.a' static library. For PC systems, the files will be 'awk2c.exe' and 'awk2c.lib'. You can ignore the warnings given by the PC compilers during the build. NOTE: You might want to add any compiler flags you like in the CFLAGS variable in the Makefile. 4) Copy 'awk2c', 'a2c', and 'a2cb' to some directory in your path (i.e. /usr/local/bin). (awk2c.exe, a2c.bat, and a2cb.bat for PC) 5) Copy 'libawk2c.a' or 'awk2c.lib' to your favourite library directory where the linker can find it (i.e. /usr/local/lib, etc) 6) Copy 'awk.h' and 'config.h' to your favourite include directory where the compiler can find it (i.e. /usr/local/include, etc) NOTE: If you want, you can keep everything in the src directory and can compile the converted C program there too (see section 3) 2. Using the 'awk2c' translator =============================== The translator behaves just like the standard gawk program except that it doesn't expect any input files or stdin. So you can translate an AWK program in 2 ways: 1) awk2c '<YOUR AWK PROGRAM GOES HERE>' (awk2c "..." for PC) 2) awk2c -f <AWK PROGRAM FILE> (run 'awk2c' by itself or 'awk2c --help' to get a synopsis) The converted C program is dumped to standard out (The raw output from awk2c is not indented or pretty formatted). The translator also accepts and understands any of gawk options which aren't specific to the runtime and execution of the AWK program. (Note that most of the options are severely under-tested) You can also use the 'a2c' unix script (or 'a2c.bat' for PC) which takes two arguments for the awk program filename and destination C filename. The a2c script passes awk2c's output to 'indent' to pretty format the code. For example, to translate ~/awk/foo.awk to test.c do: 'a2c ~/awk/foo.awk test.c' Run 'a2c' with no arguments to get a synopsis. 3. Compiling the converted C program ==================================== The scheme that is currently used (at a very high level) is: The translator produces a C file with three functions (each to run the BEGIN block, pattern/actions blocks, and the END block) and AWK user variable declarations. This file is compiled and linked with other objects to produce the final executable. One of these "other" objects is 'driver.o' which contains the main() function for driving the compiled AWK program. There are two methods for compiling the converted C program: 1) Name your converted C program to 'awk2c_cprogram.c' ('a2c_cprg.c' on PC systems) and compile it in the main src/ subdirectory by running 'make driver'. This uses the Makefile and the make will produce the compiled program in a file called 'driver'. (Obviously this method requires you to keep the various object files, header files, and Makefile lying around) 2) Use the 'a2cb' script (or a2cb.bat on PC systems). This script takes two arguments for the source C filename and destination executable name. It also expects to find 'libawk2c.a' during link time, and 'awk.h' & 'config.h' during compile-time. (run a2cb by itself to get a synopsis) This method is cleaner than method 1) in that it only requires you to keep 'awk2c'/'a2c'/'a2cb' (obvious), libawk2c.a (or awk2c.lib) in your favourite library directory, and awk.h & config.h in an include directory your choice. Everything else can be zapped after the translator is built. You might want to modify the 'CC' and 'CFLAGS' environment variables set in the 'a2cb' script to your compiler and options (default is CC=gcc, CFLAGS=-O2). A typical unix setup: 'awk2c', 'a2c' and 'a2cb' go in /usr/local/bin 'libawk2c' goes in /usr/local/lib 'awk.h' and 'config.h' go in /usr/local/include *NOTE: For some weird reason, Method 1) seems to produce slightly faster executables on my Linux 1.2.13 system running gcc 2.6.3 (the order that the objects are linked seems to effect the runtime performance) 4. Using the compiled AWK program ================================= You use the compiled AWK program just as you would use gawk but now the AWK program related options don't come into play. You can use the '--help' option to get a synopsis. The program will run on data input from standard input or pipes. To run it on input files just pass the filename(s) as arguments as you would with gawk. i.e: 'program < data.in', 'program data1.in data2.in', etc (where 'program' is filename of compiled executable) 5. Runtime Performance ====================== My tests show that performance improvements for gawk versus translator-produced executables range from minimal to more than 2000% (20+ times as fast). To be fair, I compiled both GAWK and the converted C programs with the same optimization options. Generally, you can win big if your AWK program uses loops, moderate to heavy expression processing or calculations, etc. As the complexity of the AWK program increases, the speed improvement factor increases. One other observation is that as the number of input records increase, the speed improvement for the same AWK program also increases. Generally, if your AWK program uses variables which are singular in type throughout the program or undergo only a few number<->string conversions, then the optimizer can detect this and emit more efficient C code. You can look at src/test/report/perf.rep to look at the performance numbers. (Some testcases are shown in "a" and "b" series. The former is the testcase being run on a small input set, while the latter is the same testcase being run on a larger input set). 6. Testcases ============ I've developed a suite of verificational and performance testcases in the src/test/fvt/ and src/test/performance/ subdirectories. You can look over these AWK programs (*test.*.awk) and their C counterparts (*test.*.c) to get an idea as to how the translator works. To make my life easier, I developed a testing tool (the 'tst' script in the main src/ subdirectory) which can run the verificational and performance tests on a given set of AWK programs and inputs. The performance tests are only meaningful if you're on a Unix system where activity is low and you have most/all of the processor cycles). The verification tests are simply a check that both gawk and the compiled AWK program produce the same output (=> meaningless if testcase has no output) To run the entire FVT suite, run 'tst v'; for performance, run 'tst p'. (Run 'tst' w/o arguments to get a synopsis). You can also run a single or a select group of testcases as long as each of the testcase filenames have a corresponding input file. i.e: If you're in the src/ directory, you can do 'tst v test/fvt/test.[1-4].awk' 'tst p test/performance/test.1*.awk' If you have a testcase in ~/test/foo.awk and an input file ~/test/foo.in, you can do 'tst v ~/test/foo.awk'. Misc ==== I hope you find use for this translator and I'll be interested in any comments or problem reports you have. If you have an idea for making something better, , or want to help in porting it to an unsupported system, also let me know or post to the comp.lang.awk usenet newsgroup. Leonard Theivendra IBM Toronto Software lab ------------------------------------------------------------------------------ E-Mail: firstname.lastname@example.org (<= 08/31/96) email@example.com (>= 07/01/96) Standard Disclaimer: Any opinions expressed are solely my own and not that of IBM Corp.