69 lines
3.3 KiB
Plaintext
69 lines
3.3 KiB
Plaintext
|
ICONV - Charset Conversion Library. Version 2.0
|
||
|
-----------------------------------------------
|
||
|
|
||
|
This distribution provides:
|
||
|
* the library (libiconv.a and .so) for conversion between
|
||
|
various charsets (character encoding schemes);
|
||
|
* and the command line utility (iconv), providing
|
||
|
conversion of a file, standard input or its argument
|
||
|
line from one charset to another;
|
||
|
* a set of coded character set tables (binary files) and
|
||
|
character encoding schemes (dynamically loaded modules)
|
||
|
for use by the library;
|
||
|
* a utility for creating character set tables from Unicode
|
||
|
conversion tables and RFC1345-style charset descriptions.
|
||
|
|
||
|
Syntax of the library functions (iconv_open, iconv, iconv_close)
|
||
|
and the utility is described in the man pages.
|
||
|
|
||
|
Features of the library:
|
||
|
- Coded character set (CCS) tables are binary files containing
|
||
|
pairs of tables for converting characters from some charset to
|
||
|
Unicode (UCS-2 in host byte order) and vice versa. There are 4
|
||
|
types of tables supported in iconv-2.0: for 7-bit, 8-bit, 14-bit
|
||
|
and 16-bit charsets. The library uses memory mapping (in
|
||
|
read-only mode) to access the table data.
|
||
|
- Character encoding schemes (CES) are small sets of C structures
|
||
|
and functions. The functions implement virtual methods for
|
||
|
converting a sequence of characters in some charset to a Unicode
|
||
|
character (UCS-4 in host byte order). Each encoding scheme is
|
||
|
located in a separate C file and can be compiled to a dynamically
|
||
|
loaded shared module.
|
||
|
- A universal CES for all table driven charsets is compiled into
|
||
|
the library and used for all CCS tables.
|
||
|
- Both CCS tables and CES C code can be built into the library by
|
||
|
specifying the corresponding charset name in the
|
||
|
ICONV_BUILTIN_CHARSETS make variable. By default us-ascii, utf-8
|
||
|
and ucs-4-internal are built in (plus the CES for all CCS
|
||
|
tables). All the CES modules are included to a static version of
|
||
|
the library (libiconv.a).
|
||
|
- Multiple aliases for every charset are supported. All aliases are
|
||
|
listed in the charset.aliases file(s). The library uses memory
|
||
|
mapping to parse alias information and find a canonical name
|
||
|
of a charset before looking it up in the internal list or
|
||
|
external table or shared module. Alias information can also be
|
||
|
compiled into the library (which is useful for compiled-in
|
||
|
charsets ;-)
|
||
|
- ISO/IEC 10646 conformance of the internal representation of
|
||
|
characters; conversion is done in two steps:
|
||
|
(1) a sequence of zero or more bytes from input buffer coded in
|
||
|
the source charset is converted to exactly one valid UCS-4
|
||
|
character and
|
||
|
(2) the UCS-4 character is converted to a sequence of zero or
|
||
|
more bytes in the target charset to the output buffer.
|
||
|
In the case when two charset names are found to be aliases
|
||
|
of the same charset, conversion is done via a simplified
|
||
|
converter by copying the data from the input buffer to the
|
||
|
output one.
|
||
|
- Open module API: adding new modules is easy. API has only been
|
||
|
documented via iconv.h file comments so far. A perl utility is
|
||
|
provided for conversion of Unicode charset tables
|
||
|
(http://www.unicode.org/Public/MAPPINGS/) and RFC1345-style
|
||
|
charset tables into the CCS format recognized by the library.
|
||
|
- API conformance to Unix98 specification.
|
||
|
- BSD-style copyright.
|
||
|
|
||
|
Konstantin Chuguev
|
||
|
<Konstantin.Chuguev@dante.org.uk>
|
||
|
November 2000.
|