HKIUG CJK/Unicode Resources
A. About the HKIUG Unicode Task Force
The HKIUG Unicode Task Force was officially established by the
HKIUG Standing Committee
in February 2005 to maintain the CJK/Unicode resources produced by the
HKIUG Unicode Project in 2003. It is also responsible for developing
new resources such as the TSVCC tables; facilitating the searching,
display and retrieval of CJK records in library catalogs; and assisting
member libraries in migrating from EACC-based character encoding to
Unicode.
Members of the Task Force include:
Past Task Force members:
In 2003, HKIUG member
libraries were in the process of rolling out Innovative's Unicode-based
Millennium modules as well as testing the CJK and Unicode support in
the web-based Online Public Access Catalog. It was found that
collaborative effort among INNOPAC/Millennium users was needed in order
to fix the retrieval and storage problems caused by the incorrect
mappings among Unicode (UTF-8), EACC and BIG5 encodings.
In July 2003, a working group
of catalogers and systems librarians from HKIUG member libraries was
established to study the issues and develop a joint proposal for the
vendor. After two months of effort, the group completed its study,
produced a EACC/Unicode Mapping Table and submitted the proposal to the
vendor for implementation.
The EACC/Unicode Mapping
Table is an useful CJK/Unicode resource. It supplements Library of
Congress's East
Asian Code Tables in two main aspects:
- Identified multi-mapping cases and marked HKIUG's
preferred mapping
System implementers can make use of this multi-mapping information to
develop logic for handling the MANY-TO-ONE mapping problem from EACC to
Unicode.
- Included Pure CCCII characters
In addition to the EACC characters as found in LC's code tables, 7,044
Pure CCCII characters are also included in the HKIUG Code Table to
reduce the occurrence of missing characters. These are CCCII (and
non-EACC) characters that have been in use in HKIUG member libraries.
They are called Pure CCCII because their inclusion to the table would
not introduce more multi-mapping cases.
You can download the HKIUG EACC/Unicode Mapping Table in HTML
and XML formats from
the following links:
Attempts to create TSVCC
(Traditional, Simplified, Variant Chinese Characters) links for Chinese
characters began in 2004. TSVCC linking allows retrieval systems to
implement search logic so that searching one form of a character will
also retrieve all the other forms. This linking information is
particularly essential for native Unicode database system. Unlike
EACC-based system that can make use of EACC's internal structure for
linking, Unicode-based system has to rely on external resource in order
to implement such linking logic.
There are two versions of
the HKIUG TSVCC Table, one for EACC-based systems and the other for
Unicode-based systems. You can download them from the following links:
This table
is useful for people who are interested in knowing the Pinyin and
Wade-Giles romanziation as well as the radicals and stroke counts of
Chinese characters as found in Unicode's Unihan database.
E. Presentations
Please send
comments and enquires to the Chair of the HKIUG Unicode Task Force Last
revised on 14 November 2013