2004-01-30 Artem B. Bityuckiy <abitytsky@softminecorp.com>
Jeff Johnston <jjohnstn@redhat.com> * libc/iconv/iconv.tex: Updated with more information.
This commit is contained in:
parent
ff41498a19
commit
2892ec6800
|
@ -1,3 +1,8 @@
|
|||
2004-01-30 Artem B. Bityuckiy <abitytsky@softminecorp.com>
|
||||
Jeff Johnston <jjohnstn@redhat.com>
|
||||
|
||||
* libc/iconv/iconv.tex: Updated with more information.
|
||||
|
||||
2004-01-30 Thomas Pfaff <tpfaff@gmx.net>
|
||||
Jeff Johnston <jjohnstn@redhat.com>
|
||||
|
||||
|
|
|
@ -1,14 +1,432 @@
|
|||
@node Iconv
|
||||
@chapter Character-set conversions (@file{iconv.h})
|
||||
|
||||
This chapter describes the iconv character-set conversion functions.
|
||||
The corresponding declarations are in
|
||||
This chapter describes the Newlib iconv library.
|
||||
The iconv functions declarations are in
|
||||
@file{iconv.h}.
|
||||
|
||||
@menu
|
||||
* iconv:: Character set conversion routines
|
||||
* iconv:: Character set conversion routines
|
||||
* iconv architecture:: Architecture of Newlib iconv library
|
||||
* iconv configuration:: Newlib iconv-specific configure options
|
||||
* Generating CCS tables:: How to generate CCS tables
|
||||
* Adding new converter:: Steps on adding a new converter
|
||||
@end menu
|
||||
|
||||
@page
|
||||
@include iconv/iconv.def
|
||||
|
||||
@page
|
||||
@node iconv architecture
|
||||
@section iconv architecture
|
||||
@findex iconv architecture
|
||||
@findex encoding
|
||||
@findex CCS
|
||||
@findex CES
|
||||
@findex iconv converter
|
||||
@*
|
||||
@itemize @bullet
|
||||
@item
|
||||
Encoding - a rule to represent computer text by means of bits and bytes.
|
||||
@item
|
||||
CCS (Coded Character Set) - a mapping from an abstract character set
|
||||
to a set of non-negative integers (character codes).
|
||||
@item
|
||||
CES (Character Encoding Scheme) - a mapping from a set of character codes
|
||||
units to a sequence of bytes.
|
||||
@end itemize
|
||||
|
||||
@*
|
||||
Examples of CCS: ASCII, ISO-8859-x, KOI8-R, KSX-1001, GB-2312.@*
|
||||
Examples of CES: UTF-8, UTF-16, EUC-JP, ISO-2022-JP.
|
||||
|
||||
@*
|
||||
The iconv library is used to convert an array of characters in one encoding
|
||||
to array in another encoding.
|
||||
|
||||
@*
|
||||
From a user's point of view, the iconv library is a set of converters. Each converter
|
||||
corresponds to one encoding (e.g., KOI8-R converter, UTF-8 converter).
|
||||
Internally the meaning of converter is different.
|
||||
|
||||
@*
|
||||
The iconv library always performs conversions through UCS-32: i.e., to convert
|
||||
from A to B, iconv library first converts A to UCS-32, and then USC-32 to B.
|
||||
|
||||
@*
|
||||
Each encoding consists of CES and CCS. CCS may be represented as data tables
|
||||
but CES always implies some code (algorithm). Iconv uses CCS tables
|
||||
to map from some encoding to UCS-32. CCS tables are placed into
|
||||
the iconv/ccs subdirectory of newlib. The iconv code also uses CES
|
||||
modules which can convert some CCS to and from UCS-32. CES modules are placed
|
||||
in the iconv/ces subdirectory.
|
||||
|
||||
@*
|
||||
Some encodings have CES = CCS (e.g., KOI8-R). For such encodings iconv uses
|
||||
special subroutines which perform simple table conversions (ccs_table.c).
|
||||
|
||||
@*
|
||||
Among specialized CES modules, the iconv library has
|
||||
generic support for EUC and ISO-2022-family encodings (ces_euc.c and
|
||||
ces_iso2022.c).
|
||||
|
||||
@*
|
||||
To enable iconv to work with CCS or CES-based encodings, the correspondent
|
||||
CES table or CCS module should be linked with Newlib. The iconv support
|
||||
can also load CCS tables dynamically from external files (.cct files from
|
||||
iconv/ccs/binary subdirectory). CES modules, on the other-hand, can't
|
||||
be dynamically loaded.
|
||||
|
||||
@*
|
||||
Each iconv converter has one name and a set of aliases. The list of
|
||||
aliases for each converter's name is in the iconv/charset.aliases file.
|
||||
Note: iconv always normalizes converter names and aliases before using.
|
||||
|
||||
@page
|
||||
@node iconv configuration
|
||||
@section iconv configuration
|
||||
@findex iconv configuration
|
||||
@findex iconv converter
|
||||
@*
|
||||
To enable iconv, the --enable-newlib-iconv configuration option should be
|
||||
used when configuring newlib.
|
||||
|
||||
@*
|
||||
To link a specific converter (CCS table or CES module) into Newlib, the
|
||||
---enable-newlib-builtin-converters option should be used. A
|
||||
comma-separated list of converters can be passed with this option
|
||||
(e.g., ---enable-newlib-builtin-converters=koi8-r,euc-jp to link KOI8-R
|
||||
and EUC-JP converters). Either converter names or aliases may be used.
|
||||
|
||||
@*
|
||||
If the target system has a file system accessible by Newlib, table-based
|
||||
converters may be loaded dynamically from external files. The iconv
|
||||
code tries to load files from the iconv_data subdirectory of the directory
|
||||
specified by the NLSPATH environment variable.
|
||||
|
||||
@*
|
||||
Since Newlib has no generic dynamic module load support, CES-based converters
|
||||
can't be dynamically loaded and should be linked-in.
|
||||
|
||||
@page
|
||||
@node Generating CCS tables
|
||||
@section Generating CCS tables
|
||||
@*
|
||||
CCS tables are placed in the ccs subdirectory of the iconv directory.
|
||||
This subdirectory contains .cct and .c files. The .cct files are for
|
||||
dynamic loading whereas the .c files are for static linking with Newlib.
|
||||
Both .c and .cct files are generated by the 'iconv_mktbl' perl script
|
||||
from special source files (call them
|
||||
.txt files). The 'iconv_mktbl' script can be found in the iconv/ccs
|
||||
subdirectory. Input .txt files can be found at the Unicode.org site or
|
||||
other locations found on the web.
|
||||
|
||||
@*
|
||||
The .c files are linked with Newlib if the correspondent 'configure' script
|
||||
option was given. This is needed to use iconv on targets without file system
|
||||
support. If a CCS table isn't configured to be linked, the iconv library
|
||||
tries to load it dynamically from a corresponding .cct file.
|
||||
|
||||
@*
|
||||
The following are commands to build .c and .cct CCS table files from .txt
|
||||
files for several supported encodings.
|
||||
|
||||
@*
|
||||
@itemize
|
||||
@item
|
||||
cp775:@*
|
||||
iconv_mktbl -Co cp775.c cp775.txt@*
|
||||
iconv_mktbl -o cp775.cct cp775.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cp850:@*
|
||||
iconv_mktbl -Co cp850.c cp850.txt@*
|
||||
iconv_mktbl -o cp850.cct cp850.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cp852:@*
|
||||
iconv_mktbl -Co cp852.c cp852.txt@*
|
||||
iconv_mktbl -o cp852.cct cp852.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cp855:@*
|
||||
iconv_mktbl -Co cp855.c cp855.txt@*
|
||||
iconv_mktbl -o cp855.cct cp855.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cp866@*
|
||||
iconv_mktbl -Co cp866.c cp866.txt@*
|
||||
iconv_mktbl -o cp866.cct cp866.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
iso-8859-1@*
|
||||
iconv_mktbl -Co iso-8859-1.c iso-8859-1.txt@*
|
||||
iconv_mktbl -o iso-8859-1.cct iso-8859-1.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
iso-8859-4@*
|
||||
iconv_mktbl -Co iso-8859-4.c iso-8859-4.txt@*
|
||||
iconv_mktbl -o iso-8859-4.cct iso-8859-4.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
iso-8859-5@*
|
||||
iconv_mktbl -Co iso-8859-5.c iso-8859-5.txt@*
|
||||
iconv_mktbl -o iso-8859-5.cct iso-8859-5.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
iso-8859-2@*
|
||||
iconv_mktbl -Co iso-8859-2.c iso-8859-2.txt@*
|
||||
iconv_mktbl -o iso-8859-2.cct iso-8859-2.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
iso-8859-15@*
|
||||
iconv_mktbl -Co iso-8859-15.c iso-8859-15.txt@*
|
||||
iconv_mktbl -o iso-8859-15.cct iso-8859-15.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
big5@*
|
||||
iconv_mktbl -Co big5.c big5.txt@*
|
||||
iconv_mktbl -o big5.cct big5.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
ksx1001@*
|
||||
iconv_mktbl -Co ksx1001.c ksx1001.txt@*
|
||||
iconv_mktbl -o ksx1001.cct ksx1001.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
gb_2312@*
|
||||
iconv_mktbl -Co gb_2312-80.c gb_2312-80.txt@*
|
||||
iconv_mktbl -o gb_2312-80.cct gb_2312-80.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
jis_x0201@*
|
||||
iconv_mktbl -Co jis_x0201.c jis_x0201.txt@*
|
||||
iconv_mktbl -o jis_x0201.cct jis_x0201.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
iconv_mktbl -Co shift_jis.c shift_jis.txt@*
|
||||
iconv_mktbl -o shift_jis.cct shift_jis.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
jis_x0208@*
|
||||
iconv_mktbl -C -c 1 -u 2 -o jis_x0208-1983.c jis_x0208-1983.txt@*
|
||||
iconv_mktbl -c 1 -u 2 -o jis_x0208-1983.cct jis_x0208-1983.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
jis_x0212@*
|
||||
iconv_mktbl -Co jis_x0212-1990.c jis_x0212-1990.txt@*
|
||||
iconv_mktbl -o jis_x0212-1990.cct jis_x0212-1990.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cns11643-plane1@*
|
||||
iconv_mktbl -C -p 0x1 -o cns11643-plane1.c cns11643.txt@*
|
||||
iconv_mktbl -p 0x1 -o cns11643-plane1.cct cns11643.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cns11643-plane2@*
|
||||
iconv_mktbl -C -p 0x2 -o cns11643-plane2.c cns11643.txt@*
|
||||
iconv_mktbl -p 0x2 -o cns11643-plane2.cct cns11643.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cns11643-plane14@*
|
||||
iconv_mktbl -C -p 0xE -o cns11643-plane14.c cns11643.txt@*
|
||||
iconv_mktbl -p 0xE -o cns11643-plane14.cct cns11643.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
koi8-r@*
|
||||
iconv_mktbl -Co koi8-r.c koi8-r.txt@*
|
||||
iconv_mktbl -o koi8-r.cct koi8-r.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
koi8-u@*
|
||||
iconv_mktbl -Co koi8-u.c koi8-u.txt@*
|
||||
iconv_mktbl -o koi8-u.cct koi8-u.txt
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
us-ascii@*
|
||||
iconv_mktbl -Cao us-ascii.c iso-8859-1.txt@*
|
||||
iconv_mktbl -ao us-ascii.cct iso-8859-1.txt
|
||||
@end itemize
|
||||
|
||||
@*
|
||||
Source files for CCS tables can be taken from at least two places:
|
||||
|
||||
@*
|
||||
@enumerate
|
||||
@item
|
||||
http://www.unicode.org/Public/MAPPINGS/ contains a lot of encoding
|
||||
map files.
|
||||
@item
|
||||
http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains original
|
||||
iconv sources and encoding map files.
|
||||
@end enumerate
|
||||
|
||||
@*
|
||||
The following are URLs where source files for some of the CCS tables
|
||||
are found:
|
||||
|
||||
@itemize
|
||||
@item
|
||||
big5:@*
|
||||
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cns11643_plane14, cns11643_plane1 and cns11643_plane2:@*
|
||||
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
cp775, cp850, cp852, cp855, cp866:@*
|
||||
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
gb_2312_80:@*
|
||||
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
iso_8859_15, iso_8859_1, iso_8859_2, iso_8859_4, iso_8859_5:@*
|
||||
http://www.unicode.org/Public/MAPPINGS/ISO8859/
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
jis_x0201, jis_x0208_1983, jis_x0212_1990, shift_jis@*
|
||||
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
koi8_r@*
|
||||
http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
ksx1001@*
|
||||
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT
|
||||
@end itemize
|
||||
|
||||
@itemize
|
||||
@item
|
||||
koi8-u can be given from original FreeBSD iconv library distribution
|
||||
http://www.dante.net/staff/konstantin/FreeBSD/iconv/
|
||||
@end itemize
|
||||
|
||||
@*
|
||||
Moreover, http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains a
|
||||
lot of additional CCS tables that you can use with Newlib (iso-2022 and
|
||||
RFC1345 encodings).
|
||||
|
||||
@page
|
||||
@node Adding new converter
|
||||
@section Adding a new iconv converter
|
||||
@*
|
||||
The following steps should be taken to add a new iconv converter:
|
||||
|
||||
@*
|
||||
@enumerate
|
||||
@item
|
||||
Converter's name and aliases list should be added to
|
||||
the iconv/charset.aliases file
|
||||
@item
|
||||
All iconv converters are protected by a _ICONV_CONVERTER_XXX
|
||||
macro, where XXX is converter name. This protection macro should be added to
|
||||
newlib/newlib.hin file.
|
||||
@item
|
||||
Converter's name and aliases should be also registered in _iconv_builtin_aliases
|
||||
table in iconv/lib/bialiasesi.c. The list should be protected by
|
||||
the corresponding macro mentioned above.
|
||||
@item
|
||||
If a new converter is just a CCS table, the corresponding .cct and .c files
|
||||
should be added to the iconv/ccs/ subdirectory. The name of the files
|
||||
should be equivalent to the normalized encoding name. The 'iconv_mktbl'
|
||||
Perl script (found in iconv/ccs) may
|
||||
be used to generate such files. The file's name should be added to
|
||||
iconv/ccs/Makefile.am and iconv/ccs/binary/Makefile.am files and then
|
||||
automake should be used to regenerate the Makefile.in files.
|
||||
@item
|
||||
If a new converter has a CES algorithm, the appropriate file should be
|
||||
added to the
|
||||
iconv/ces/ subdirectory. The name of the file again should be equivalent
|
||||
to the normalized
|
||||
encoding name.
|
||||
@item
|
||||
If a converter is EUC or ISO-2022-family CES, then the converter
|
||||
is just an array with a list of used CCS (See ccs/euc-jp.c for example). This
|
||||
is because iconv already has EUC and ISO-2022 support. Used CCS tables should
|
||||
be provided in iconv/ccs/.
|
||||
@item
|
||||
If a converter isn't EUC or ISO-2022-based CCS, the following two functions
|
||||
should be provided (see utf-8.c for example):
|
||||
@enumerate @minus
|
||||
@item A function to convert from new CES to UCS-32;
|
||||
@item A function to convert from UCS-32 to new CES;
|
||||
@item An 'init' function;
|
||||
@item A 'close' function;
|
||||
@item A 'reset' function to reset shift state for stateful CES.
|
||||
@end enumerate
|
||||
|
||||
@*
|
||||
All these functions are registered into a 'struct iconv_ces_desc' object.
|
||||
The name of the object should be _iconv_ces_module_XXX, where XXX is the
|
||||
name of the converter.
|
||||
@item
|
||||
For CES converters the correspondent 'struct iconv_ces_desc' reference should
|
||||
be added into iconv/lib/bices.c file.
|
||||
|
||||
@*
|
||||
For CCS converters, the corresponding table reference should be added into
|
||||
the iconv/lib/biccs.c file.
|
||||
@end enumerate
|
||||
|
||||
|
|
Loading…
Reference in New Issue