2004-01-30 Artem B. Bityuckiy <abitytsky@softminecorp.com>

Jeff Johnston  <jjohnstn@redhat.com>

        * libc/iconv/iconv.tex: Updated with more information.
This commit is contained in:
Jeff Johnston 2004-01-30 20:44:05 +00:00
parent ff41498a19
commit 2892ec6800
2 changed files with 426 additions and 3 deletions

View File

@ -1,3 +1,8 @@
2004-01-30 Artem B. Bityuckiy <abitytsky@softminecorp.com>
Jeff Johnston <jjohnstn@redhat.com>
* libc/iconv/iconv.tex: Updated with more information.
2004-01-30 Thomas Pfaff <tpfaff@gmx.net>
Jeff Johnston <jjohnstn@redhat.com>

View File

@ -1,14 +1,432 @@
@node Iconv
@chapter Character-set conversions (@file{iconv.h})
This chapter describes the iconv character-set conversion functions.
The corresponding declarations are in
This chapter describes the Newlib iconv library.
The iconv functions declarations are in
@file{iconv.h}.
@menu
* iconv:: Character set conversion routines
* iconv:: Character set conversion routines
* iconv architecture:: Architecture of Newlib iconv library
* iconv configuration:: Newlib iconv-specific configure options
* Generating CCS tables:: How to generate CCS tables
* Adding new converter:: Steps on adding a new converter
@end menu
@page
@include iconv/iconv.def
@page
@node iconv architecture
@section iconv architecture
@findex iconv architecture
@findex encoding
@findex CCS
@findex CES
@findex iconv converter
@*
@itemize @bullet
@item
Encoding - a rule to represent computer text by means of bits and bytes.
@item
CCS (Coded Character Set) - a mapping from an abstract character set
to a set of non-negative integers (character codes).
@item
CES (Character Encoding Scheme) - a mapping from a set of character codes
units to a sequence of bytes.
@end itemize
@*
Examples of CCS: ASCII, ISO-8859-x, KOI8-R, KSX-1001, GB-2312.@*
Examples of CES: UTF-8, UTF-16, EUC-JP, ISO-2022-JP.
@*
The iconv library is used to convert an array of characters in one encoding
to array in another encoding.
@*
From a user's point of view, the iconv library is a set of converters. Each converter
corresponds to one encoding (e.g., KOI8-R converter, UTF-8 converter).
Internally the meaning of converter is different.
@*
The iconv library always performs conversions through UCS-32: i.e., to convert
from A to B, iconv library first converts A to UCS-32, and then USC-32 to B.
@*
Each encoding consists of CES and CCS. CCS may be represented as data tables
but CES always implies some code (algorithm). Iconv uses CCS tables
to map from some encoding to UCS-32. CCS tables are placed into
the iconv/ccs subdirectory of newlib. The iconv code also uses CES
modules which can convert some CCS to and from UCS-32. CES modules are placed
in the iconv/ces subdirectory.
@*
Some encodings have CES = CCS (e.g., KOI8-R). For such encodings iconv uses
special subroutines which perform simple table conversions (ccs_table.c).
@*
Among specialized CES modules, the iconv library has
generic support for EUC and ISO-2022-family encodings (ces_euc.c and
ces_iso2022.c).
@*
To enable iconv to work with CCS or CES-based encodings, the correspondent
CES table or CCS module should be linked with Newlib. The iconv support
can also load CCS tables dynamically from external files (.cct files from
iconv/ccs/binary subdirectory). CES modules, on the other-hand, can't
be dynamically loaded.
@*
Each iconv converter has one name and a set of aliases. The list of
aliases for each converter's name is in the iconv/charset.aliases file.
Note: iconv always normalizes converter names and aliases before using.
@page
@node iconv configuration
@section iconv configuration
@findex iconv configuration
@findex iconv converter
@*
To enable iconv, the --enable-newlib-iconv configuration option should be
used when configuring newlib.
@*
To link a specific converter (CCS table or CES module) into Newlib, the
---enable-newlib-builtin-converters option should be used. A
comma-separated list of converters can be passed with this option
(e.g., ---enable-newlib-builtin-converters=koi8-r,euc-jp to link KOI8-R
and EUC-JP converters). Either converter names or aliases may be used.
@*
If the target system has a file system accessible by Newlib, table-based
converters may be loaded dynamically from external files. The iconv
code tries to load files from the iconv_data subdirectory of the directory
specified by the NLSPATH environment variable.
@*
Since Newlib has no generic dynamic module load support, CES-based converters
can't be dynamically loaded and should be linked-in.
@page
@node Generating CCS tables
@section Generating CCS tables
@*
CCS tables are placed in the ccs subdirectory of the iconv directory.
This subdirectory contains .cct and .c files. The .cct files are for
dynamic loading whereas the .c files are for static linking with Newlib.
Both .c and .cct files are generated by the 'iconv_mktbl' perl script
from special source files (call them
.txt files). The 'iconv_mktbl' script can be found in the iconv/ccs
subdirectory. Input .txt files can be found at the Unicode.org site or
other locations found on the web.
@*
The .c files are linked with Newlib if the correspondent 'configure' script
option was given. This is needed to use iconv on targets without file system
support. If a CCS table isn't configured to be linked, the iconv library
tries to load it dynamically from a corresponding .cct file.
@*
The following are commands to build .c and .cct CCS table files from .txt
files for several supported encodings.
@*
@itemize
@item
cp775:@*
iconv_mktbl -Co cp775.c cp775.txt@*
iconv_mktbl -o cp775.cct cp775.txt
@end itemize
@itemize
@item
cp850:@*
iconv_mktbl -Co cp850.c cp850.txt@*
iconv_mktbl -o cp850.cct cp850.txt
@end itemize
@itemize
@item
cp852:@*
iconv_mktbl -Co cp852.c cp852.txt@*
iconv_mktbl -o cp852.cct cp852.txt
@end itemize
@itemize
@item
cp855:@*
iconv_mktbl -Co cp855.c cp855.txt@*
iconv_mktbl -o cp855.cct cp855.txt
@end itemize
@itemize
@item
cp866@*
iconv_mktbl -Co cp866.c cp866.txt@*
iconv_mktbl -o cp866.cct cp866.txt
@end itemize
@itemize
@item
iso-8859-1@*
iconv_mktbl -Co iso-8859-1.c iso-8859-1.txt@*
iconv_mktbl -o iso-8859-1.cct iso-8859-1.txt
@end itemize
@itemize
@item
iso-8859-4@*
iconv_mktbl -Co iso-8859-4.c iso-8859-4.txt@*
iconv_mktbl -o iso-8859-4.cct iso-8859-4.txt
@end itemize
@itemize
@item
iso-8859-5@*
iconv_mktbl -Co iso-8859-5.c iso-8859-5.txt@*
iconv_mktbl -o iso-8859-5.cct iso-8859-5.txt
@end itemize
@itemize
@item
iso-8859-2@*
iconv_mktbl -Co iso-8859-2.c iso-8859-2.txt@*
iconv_mktbl -o iso-8859-2.cct iso-8859-2.txt
@end itemize
@itemize
@item
iso-8859-15@*
iconv_mktbl -Co iso-8859-15.c iso-8859-15.txt@*
iconv_mktbl -o iso-8859-15.cct iso-8859-15.txt
@end itemize
@itemize
@item
big5@*
iconv_mktbl -Co big5.c big5.txt@*
iconv_mktbl -o big5.cct big5.txt
@end itemize
@itemize
@item
ksx1001@*
iconv_mktbl -Co ksx1001.c ksx1001.txt@*
iconv_mktbl -o ksx1001.cct ksx1001.txt
@end itemize
@itemize
@item
gb_2312@*
iconv_mktbl -Co gb_2312-80.c gb_2312-80.txt@*
iconv_mktbl -o gb_2312-80.cct gb_2312-80.txt
@end itemize
@itemize
@item
jis_x0201@*
iconv_mktbl -Co jis_x0201.c jis_x0201.txt@*
iconv_mktbl -o jis_x0201.cct jis_x0201.txt
@end itemize
@itemize
@item
iconv_mktbl -Co shift_jis.c shift_jis.txt@*
iconv_mktbl -o shift_jis.cct shift_jis.txt
@end itemize
@itemize
@item
jis_x0208@*
iconv_mktbl -C -c 1 -u 2 -o jis_x0208-1983.c jis_x0208-1983.txt@*
iconv_mktbl -c 1 -u 2 -o jis_x0208-1983.cct jis_x0208-1983.txt
@end itemize
@itemize
@item
jis_x0212@*
iconv_mktbl -Co jis_x0212-1990.c jis_x0212-1990.txt@*
iconv_mktbl -o jis_x0212-1990.cct jis_x0212-1990.txt
@end itemize
@itemize
@item
cns11643-plane1@*
iconv_mktbl -C -p 0x1 -o cns11643-plane1.c cns11643.txt@*
iconv_mktbl -p 0x1 -o cns11643-plane1.cct cns11643.txt
@end itemize
@itemize
@item
cns11643-plane2@*
iconv_mktbl -C -p 0x2 -o cns11643-plane2.c cns11643.txt@*
iconv_mktbl -p 0x2 -o cns11643-plane2.cct cns11643.txt
@end itemize
@itemize
@item
cns11643-plane14@*
iconv_mktbl -C -p 0xE -o cns11643-plane14.c cns11643.txt@*
iconv_mktbl -p 0xE -o cns11643-plane14.cct cns11643.txt
@end itemize
@itemize
@item
koi8-r@*
iconv_mktbl -Co koi8-r.c koi8-r.txt@*
iconv_mktbl -o koi8-r.cct koi8-r.txt
@end itemize
@itemize
@item
koi8-u@*
iconv_mktbl -Co koi8-u.c koi8-u.txt@*
iconv_mktbl -o koi8-u.cct koi8-u.txt
@end itemize
@itemize
@item
us-ascii@*
iconv_mktbl -Cao us-ascii.c iso-8859-1.txt@*
iconv_mktbl -ao us-ascii.cct iso-8859-1.txt
@end itemize
@*
Source files for CCS tables can be taken from at least two places:
@*
@enumerate
@item
http://www.unicode.org/Public/MAPPINGS/ contains a lot of encoding
map files.
@item
http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains original
iconv sources and encoding map files.
@end enumerate
@*
The following are URLs where source files for some of the CCS tables
are found:
@itemize
@item
big5:@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
@end itemize
@itemize
@item
cns11643_plane14, cns11643_plane1 and cns11643_plane2:@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT
@end itemize
@itemize
@item
cp775, cp850, cp852, cp855, cp866:@*
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
@end itemize
@itemize
@item
gb_2312_80:@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT
@end itemize
@itemize
@item
iso_8859_15, iso_8859_1, iso_8859_2, iso_8859_4, iso_8859_5:@*
http://www.unicode.org/Public/MAPPINGS/ISO8859/
@end itemize
@itemize
@item
jis_x0201, jis_x0208_1983, jis_x0212_1990, shift_jis@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT
@end itemize
@itemize
@item
koi8_r@*
http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
@end itemize
@itemize
@item
ksx1001@*
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT
@end itemize
@itemize
@item
koi8-u can be given from original FreeBSD iconv library distribution
http://www.dante.net/staff/konstantin/FreeBSD/iconv/
@end itemize
@*
Moreover, http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains a
lot of additional CCS tables that you can use with Newlib (iso-2022 and
RFC1345 encodings).
@page
@node Adding new converter
@section Adding a new iconv converter
@*
The following steps should be taken to add a new iconv converter:
@*
@enumerate
@item
Converter's name and aliases list should be added to
the iconv/charset.aliases file
@item
All iconv converters are protected by a _ICONV_CONVERTER_XXX
macro, where XXX is converter name. This protection macro should be added to
newlib/newlib.hin file.
@item
Converter's name and aliases should be also registered in _iconv_builtin_aliases
table in iconv/lib/bialiasesi.c. The list should be protected by
the corresponding macro mentioned above.
@item
If a new converter is just a CCS table, the corresponding .cct and .c files
should be added to the iconv/ccs/ subdirectory. The name of the files
should be equivalent to the normalized encoding name. The 'iconv_mktbl'
Perl script (found in iconv/ccs) may
be used to generate such files. The file's name should be added to
iconv/ccs/Makefile.am and iconv/ccs/binary/Makefile.am files and then
automake should be used to regenerate the Makefile.in files.
@item
If a new converter has a CES algorithm, the appropriate file should be
added to the
iconv/ces/ subdirectory. The name of the file again should be equivalent
to the normalized
encoding name.
@item
If a converter is EUC or ISO-2022-family CES, then the converter
is just an array with a list of used CCS (See ccs/euc-jp.c for example). This
is because iconv already has EUC and ISO-2022 support. Used CCS tables should
be provided in iconv/ccs/.
@item
If a converter isn't EUC or ISO-2022-based CCS, the following two functions
should be provided (see utf-8.c for example):
@enumerate @minus
@item A function to convert from new CES to UCS-32;
@item A function to convert from UCS-32 to new CES;
@item An 'init' function;
@item A 'close' function;
@item A 'reset' function to reset shift state for stateful CES.
@end enumerate
@*
All these functions are registered into a 'struct iconv_ces_desc' object.
The name of the object should be _iconv_ces_module_XXX, where XXX is the
name of the converter.
@item
For CES converters the correspondent 'struct iconv_ces_desc' reference should
be added into iconv/lib/bices.c file.
@*
For CCS converters, the corresponding table reference should be added into
the iconv/lib/biccs.c file.
@end enumerate