[tex-live] TL2004: Technical problems and testing

Olaf Weber olaf at infovore.xs4all.nl
Fri May 7 18:50:00 CEST 2004

Karl Berry writes:

>>> - 8-bit-troubles (tcx): These have caused severe problems in the
>>>   past, but are supposed to be solved.

>> We should help Olaf to find the right solution. The last statement from
>> him to me was that the automatic loading of cp8bit.tcx does not work
>> any more (due to encTeX) and that he rather desperately need feedback
>> on what to do instead.

> Olaf, where does the code stand at this point?

Basically, the automatic loading of cp8bit.tcx (a horrible kludge at
best) interferes with loading the xord/xchr/xprn arrays for encTeX.
The way it works, when loading a tcx you always lose whatever
xord/xchr/xprn is in the format file.  So if cp8bit.tcx is loaded by
default, then encTeX breaks.

Given the amount of feedback I've had, I've mainly gone on my sense of
what would be useful, usable, and (I think) comprehensible.

- Currently, the code distinguishes between writing to the terminal
  and log on the one hand, and other writes (\write) on the other.

- For writes to the terminal and log, isprint(3) function is used, and
  this is locale-dependent.

- For \write, the setup is as follows:

  - By default, printable means printable ASCII, codes [32..127].
    This is Knuth's original definition.  

  - Giving the -8bit option changes the default to make all characters

  - In a TCX file you can specify a third value, to explicitly set
    a character as printable or non-printable.  This is the way to
    make characters non-printable if -8bit is used as well.

  - The xord/xchr/xprn arrays are always saved in the format, for all
    engines except those in the Omega family.  Thus, a TCX or -8bit
    given in INI mode sets the defaults for that format.

  - If a TCX is specified, it overrules whatever is in the format.

  - If -8bit is specified, it overrules whatever is in the format.

  - The \xchr, \xord, \xprn primitives are only available in encTeX.

Another issue that has come up is the extension of format files.  At
present a dump begins with a magic number ("W2TX", "W2MF", or "W2MP")
followed by the name of the engine as a <size, name> pair, with the
name terminated by one or more NUL chars to get a size rounded to a
multiple of four (as we turn out to write dumps in multiples of four

With this, I can produce a comprehensible error message if the wrong
engine tries to load a dump, e.g:

	infovore:/home/olaf/web2c/src/texk/texk/web2c$ ./pdfetex \&latex
	This is pdfeTeXk, Version 3.141592-1.20a-rc1-2.1 (Web2C 7.5.3)
	 %&-line parsing enabled.
	---! latex.efmt was written by etex
	(Fatal format file error; I'm stymied)

What I would like to do, is to use only three extensions: .fmt for
formats, .base for bases, and .mem for mems.  If we expect that
multiple engines will want to use the same format name, we can use a
construct like this
	WEB2CDIR = .;{$TEXMF}/{$engine,}//
to get things sorted.  I've come to the conclusion that this kind of
occurrence ought to be rare, especially on systems using fmtutil, so
even that may be overkill.

Having a single extension for formats would clean up some things, but
I do not feel strongly about it.  What I do feel strongly about is
that we should either using a single extension, or have each engine
use its own extension; at present we're doing neither as pdftex uses
.fmt and pdfetex uses .efmt.

Olaf Weber

               (This space left blank for technical reasons.)

More information about the tex-live mailing list