[tex-live] Locale in texlive

Zdenek Wagner zdenek.wagner at gmail.com
Mon Jan 30 15:14:27 CET 2012

2012/1/30 Norbert Preining <preining at logic.at>:
> On Mo, 30 Jan 2012, Bernhard Kleine wrote:
>> I am quite convinced that you are not correct: with Kile (latest
>> version), all input files utf8 coded plus \usepackage[utf8]{inputenc} I
>> get still iso8859-15 coded log files. This is apparently a question of
>
> Yes, that is true. (pdf)tex does not produce utf8 encoded log files
> by default.
> tex itself uses anyway the ^^ output method consistently in the log
> file, so no encoding problem (all is ascii)

That's because these characters are not defined in the xprn table

> FOr pdftex (and thus also latex etc) I guess you need to use
> either -translate-file or something else, but I am not sure about it.
>
These commands switch the TCX tables. The advantage is that you do not
need inputenc, the \catcode of a letter is 11 (this may be important
in some macros) and the input character is converted directly to a
character in the font encoding, not to LICR via expansion of an active
character. In addition xprn table is defined so that you get correct
output in the log, but TCX tables can only use 8bit encodings, not
UTF8. Petr Olsak wrote the encTeX extension where TCX tables may be
replaced with a code that can handle UTF8 but TCX tables are still
supported as well as inputenc. Use of inputenc with UTF8 has two
problems:

1. From the user's point of view the input consists of characters
characters, then some letters will have \catcode=11, some 13. Thus you
cannot use \if\catcode#1=11 to test whether you see a letter.
Moreover, \futurelet\character\dosomething may see a part of a
character.

2. Paragraph breaking algorithm works internally in the font encoding.
If an under-/overful box is being reported, TeX does not use any hook
that could be intercepted by a macro. Only xprn table is used for
byte->byte conversion. If a byte is not found in the xprn table ^^
convention is used.

As I wrote, TCX tables are byte->byte, they cannot handle UTF8 in
principle. encTeX can do that because the tables are extended for
multibyte sequences and the lookup algorithms are modified
accordingly. Thus you have 3 possibilities:

1. encTeX disabled (default in TL) - you cannot use it, you still have
TCX tables and inputenc

2. encTeX enabled in the format but switched off in the document - you
have a few more primitives but the extended lookup is not performed,
you still have only TCX and inputenc

3. encTeX enabled and switched on in the document - extended features
are available, both TCX tables and multibyte lookup tables can be
modified by macros, inputenc should not be used (it is easy to mess
everything)

Although encTeX offers nice features, it may sometimes be tricky and
some problems may arise. I think that now lualatex or xelatex are
better (although I still use encTeX every day and can explain you how
to write fmtutil-local.cnf).

> Other than that, as Markus wrote, these are all questions related to
> TeX in general and will apply to every distribution of TeX and related
> programs.
>
> Best wishes
>
> Norbert
> ------------------------------------------------------------------------
> Norbert Preining            preining@{jaist.ac.jp, logic.at, debian.org}
> JAIST, Japan                                 TeX Live & Debian Developer
> DSA: 0x09C5B094   fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
> ------------------------------------------------------------------------
> TORONTO (n.) Generic term for anything which comes out of a gush
> despite all your careful efforts to let it out gently, e.g. flour into
> a white sauce, tomato ketchup on to fried fish, sperm into a human
> being, etc.
>                        --- Douglas Adams, The Meaning of Liff

--
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz