Chinese in Mac OS X 10.5 Leopard

What's New

  • The two primary Traditional Chinese input methods, Pinyin and Zhuyin, are revised. They now support direct, tones-optional word/phrase input.
  • A new input-method framework allows for more consistency between the primary Chinese input methods, both in terms of input key sequences and the storage of user data.
  • Plug-in input methods are easier to install and more functional than ever before.
  • English-language Help is now built into the Chinese input methods.
  • Underlying support for Windows TrueType fonts and collections is significantly improved. Standard fonts that set off alarms in Panther and Tiger do not do so in Leopard.

Installation

Under the Language tab in System Preferences... International, you will find a list of languages supported by OS X 10.5:

Language tab

The language at the top of the list is used by the Finder. Adjustments to this list affect the default font behavior in applications that use Apple's built-in text engine, like Mail, Safari, and Pages. Unless you are running the system (i.e., the Finder) in an East Asian language, we recommend the following order: Simplified Chinese (简体中文), Traditional Chinese (繁體中文), Japanese (日本語), Korean Hangul (한글).

The "Order for sorted lists" pop-up menu has seven choices that affect Chinese:

  1. Standard ~ Unicode order. Sorts by Unicocde block, then radical-stroke.
  2. Standard (unihan) ~ Sorts all three Unicode CJK Unified Ideographs (CJKUI) blocks together by radical-stroke.
  3. Chinese ~ Sorts by pronunciation, using Hanyu Pinyin.
  4. Chinese (Pinyin Order) ~ Same as above.
  5. Chinese (Simplified Chinese (GB2312)) ~ Sorts by GB code.
  6. Chinese (Stroke Order) ~ Sorts GB 18030 by total number of strokes, then radical.
  7. Chinese (Traditional Chinese (Big5)) ~ Sorts by Big Five code.

Note:Unless otherwise noted, these only sort the original CJKUI block. Extensions A and then B follow, each sorted by radical-stroke.

To see if an application can be localized for Chinese (i.e., run with menus and dialogs in Chinese), select its icon in the Finder and choose Get Info from the File menu. See the Languages section of the window that opens. Apple uses "zh_CN" for Simplified Chinese and "zh_TW" for Traditional Chinese. To localize an application for Chinese, simply uncheck all languages listed above Chinese in the Language tab (see above).

Fonts

Chinese-capable fonts in Leopard:

  • Five standard GB 18030 fonts:
    • In the /System/Library/Fonts folder: 华文黑体 ST Hei Regular (STHeiti) and 华文细黑 ST Hei Light (STXihei).
      • STXihei includes 4,235 additional characters for Vietnamese Hán-Nôm.
    • In the /Library/Fonts folder: 华文楷体 ST Kai Regular (STKaiti), 华文宋体 ST Song Regular (STSong), and 华文仿宋 ST Fangsong Regular (STFangsong).
  • Two GB 2312 fonts:
    • In the /Library/Fonts folder: Hei and Kai.
  • Two Big-5E plus HKSCS 2001 fonts:
    • In the /System/Library/Fonts folder: LiHei 儷黑 Pro.
    • In the /Library/Fonts folder: LiSong 儷宋 Pro.
    • These contain a selection of 17,607 characters from the Unicode CJK Unified Ideographs block, 512 from Extension A, and 1,640 from Extension B.
  • Three standard Big Five fonts:
    • In the /Library/Fonts folder: Apple LiGothic, Apple LiSung, and BiauKai.
  • Arial Unicode MS is installed in the /Library/Fonts folder.
  • A GB 18030 bitmap font is installed in the /Library/Fonts folder.

Getting Started

Input Menu

Under the Input Menu tab in System Preferences... International, you will find check boxes that activate the components of the Chinese input methods and cause them to appear in the Input menu:

Input Menu tab

Make sure that the "Show input menu in menu bar" box is also checked. You can also check the Character Palette box to make it appear, and so on.

"Keyboard Shortcuts..." leads to the Keyboard Shortcuts tab in System Preferences... Keyboard & Mouse, you will find two keyboard shortcuts listed under the "Input Menu" heading. To enable them, you'll also need to disable those under the Spotlight heading:

  • Command-space [⌘Space] ~ Selects the previous input source. Toggles back and forth between the last two input sources selected in the Input menu.
  • Option-command-space [⌥⌘Space] ~ Selects the next input source. Cycles through the keyboards and input methods in the Input menu.

The Chinese input methods and plug-ins you choose will appear right away in the Input menu itself, which appears on the right side of the Menu bar:

Input menu

To activate a keyboard or input method, choose it from the menu. Its icon will appear in the Menu bar and it will have a check mark beside it in the menu. In the above example, the U.S. keyboard is followed by two Japanese input modes and then ITABC, the built-in Simplified Chinese Pinyin input method. Two Traditional Chinese input methods, Zhuyin and Pinyin, are next, and then QIM (the most advanced Pinyin input method available for Mac OS X). The last item in the first section is Biaoyin, a CIN plug-in input method for typing Chinese romanizations.

Input Methods

Help

In OS X 10.5, the built-in Chinese input methods include a full set of English-language help instructions. To access this Help, select an input mode in the Input menu. Its extended menu will appear, with Help at the bottom.

In ITABC, for example, it will look like this:

SCIM menu

Note: Apple's Help for the Chinese input methods is very good. What follows below is a general description, not designed to be especially helpful.

Key Sequences

After you have typed an appropriate input string for the input mode you are using, you can:

  • Use space to invoke the standard input mode. [ITABC, Pinyin, Zhuyin]
  • Use shift-space to invoke Structural Pinyin. [ITABC]

The return key always inputs whatever is displayed inline: either the input string or the selection in the Candidate window. Use caps lock to use your keyboard normally (i.e., to type in English or whatever) from within the Chinese input methods.

Keyboard shortcuts are listed here.

Simplified Chinese

Apple's primary simplified-Chinese input mode, ITABC, uses Pinyin with "stroke shape" [笔形] numbers (optional) instead of tone numbers for direct word/phrase input of the GB 2312 character set.

Stroke-shape input uses the number keys (1-8) to indicate the shapes of the strokes that make up a character:

Shapes

The chart above is not comprehensive, but it should give you a good sense of how this works. Shape numbers 3, 4, 5, and 6 each cover a set of related forms. Shape numbers 7 and 8 are actually combinations of strokes, and they take precedence over the individual strokes (thus, 7, not 1-2, and 8, not 2-5-1). For example, 苹 is "ping72" — typing "ping7" yields four choices, "ping72" narrows it down to two.

Stroke-shape numbers are listed in the "Information window" that you can turn on in the ITABC Preferences (only the first two digits are used with Pinyin):

Information window

Traditional Chinese

Apple's traditional-Chinese input modes Pinyin and Zhuyin support direct, tones-optional word/phrase input of the Big-5 character set. These are very different from the former Apple input modes with the same names and icons.

Note: Hanin and its key sequences remain the same as in Tiger.

Structural Pinyin

In addition to its usual Pinyin input mode, ITABC also provides access to the Jiegou Pinyin [结构拼音, "Structural Pinyin"] input mode, which covers the entire GB 18030-2000 character set. Standard Pinyin readings are used for the graphic and/or phonetic components of the structure of the character, usually left-right, top-bottom, inner-outer. These are listed in the Chaibai [拆白, "Components"] category in the Simplified Chinese section of the Character Palette and the ITABC Information window (see above).

The purpose of this is to allow you to use Pinyin to input obscure characters that you don't otherwise know how to pronounce. For example, you probably don't know the pronunciation of 龘 (dá), but with a basic reading knowledge of Chinese you can see that it is composed of three dragons [龍 lóng], and thus you know its Structural Pinyin reading is long-long-long.

Framework

The SCIM and TCIM components are located in the /System/Library/Input Methods folder. These do not contain user data.

The user data for most of the Chinese input methods (i.e., everything except Hanin) is stored in the home (/Users/) ~/Library/Dictionaries folder. You can empty your phrase and/or frequency data by trashing the "...NewPhraseDictionary" and/or "...DynFreqDictionary" files for either input method. Just drag them to the Trash and log out/in.

Plug-ins

Plug-in input methods are easier to install and more functional than ever before. You simply create a plain-text source file, change the file extension to either ".inputplugin" (for the Apple format) or ".cin" (a common open-source format), and then place it in the /Library/Input Methods folder or your Home ~/Library/Input Methods folder.

Character Palette

In Cocoa applications, the Character Palette is always accessible via Edit > Special Characters... There are multiple ways to view Chinese characters. To input characters into text in an application, just double-click on the character you want, or use the "Insert" button:

  • Simplified Chinese displays the GB 18030 character set. Use the "by Radical" tab (shown below, includes both Simplified and Traditional characters) to look for characters. If you highlight a character and then pause the mouse over it, a panel will appear, giving the UTF-16, UTF-8, and GB code points:

Character Palette SC

  • Traditional Chinese displays the Big-5 character set. Use the "by Radical" tab (shown below) to look for characters. If you highlight an indivdual character and then pause the mouse over it, a panel will appear, giving the UTF-16, UTF-8, and Big-5E and/or HKSCS-2001 code points:

Character Palette TC

  • All Characters displays all of the characters defined in Unicode. Chinese characters are found in the "by Radical" tab.
  • Code Tables displays Chinese characters in both the "Unicode" tab and the "Other Encodings" tab. Other Encodings provides tables of four Chinese encodings: Big-5E, HKSCS-2001, GB2312, and GB18030.
  • Glyph displays the complete contents of the selected font.

In the Character Info section (shown above), you will find a list of characters related to the selected character, along with input key sequences for the Apple input methods. You can drag/copy any character from an application and drop/paste it into the Character Info section to get information about that character.

In the Font Variation section (shown above), you can see all available glyphs for the selected character in the different fonts on your system. In addition, you can choose between "glyph variants" for a single Unicode character. Currently, the only fonts that contain glyph variants are Japanese: the Hiragino fonts and Adobe's Kozuka Pro fonts. Try U+9957, for example. Not all applications support glyph variants.

The widget in the bottom left of the palette provides access to Font Book via "Manage Fonts..."

Widget

If you select a character in a Cocoa application like TextEdit or Pages and then choose "Show Character Selected in Application" in the Character Palette, it will jump to that character.

Last, but not least, there is the search box at the bottom right of the palette. Here you can search for Chinese characters using their Hanyu Pinyin readings, in three categories, Simplified Chinese (the GB 18030 character set) Pinyin + tone number, Traditional Chinese Pinyin + tone number, and Structural Pinyin "chaibai" readings. Double-click on a character in the list of search results to bring it up in the Character Palette.

Search

You can also search for Zhuyin readings, Japanese readings, Korean readings, Unicode character names, code points, and so on.

Applications

Mail 3

Automatically sets the encoding of outgoing messages based on content. If your system is set to run in English (in the Language tab of System Preferences... International), or anything other than Chinese or Japanese, the default encoding for outgoing Chinese messages is UTF-8. When the system language is set to Traditional Chinese, the default is Big Five. For Simplified Chinese it is GB 2312. For Japanese it is ISO-2022-JP.

You can manually set the encoding of an outgoing message (and subject) in Message > Text Encoding. For example, "Simplified Chinese (EUC)" sets the charset name to GB2312.

TextEdit 1.5

You can customize the pop-up menu for encodings in Preferences. At the bottom of the menu is a "Customize Encodings List..." command, which brings you to a checklist of all supported encodings.

Font Book 2.1

One thing that no font manager, including Font Book, provides is detailed information about Chinese character-set coverage in a given font. For example, they don't tell you what version of Hong Kong SCS is supported. Toward that end, we provide text files containing selected Chinese character sets (hanzi only):

  • Big-5E (1998), listed by Big-5 block: [Download]
  • Hong Kong SCS 1999, 2001, 2004, 2008: [Download]
  • Unicode CJK Unified Ideographs Extension A (1999), plus six Extension B (2001) characters required for GB 18030 compliance: [Download]
  • Unicode CJK Strokes 2005, 2008: [Download]
  • Unicode CJK Unified Ideographs Extensions C (2009) and D (2010): [Download]

Just copy the text and paste it into the Preview > Custom window in Font Book.

Office 2008

Microsoft Office 2008 is Unicode-savvy for the Basic Multilingual Plane (BMP) only. Two Chinese fonts are installed during the standard installation: SimSun.ttf (v2.92) and PMingLiU.ttf (v4.55). They may appear in the Font menu as 宋体 and 新細明體.

To activate the advanced East Asian features in Office 2008 applications, you must use the Microsoft Language Register (in the Additional Tools folder). Choose Japanese in the pop-up menu that appears. Features available in the Format menu in Word 2008 include phonetic guides (ruby/furigana text), combined characters, enclosed characters. Support for changes in text direction (i.e., vertical text) is available in both the Format menu and the Formatting Palette. Chinese can be used for numbered lists, page numbers, footnote/endnote numbers, and so on. These features are designed for Japanese, but they also work well for Chinese and Korean.

If you need to handle documents created by any version of Office for Windows on a regular basis in Mac OS X, Office 2008 is a good solution (its limitation to the BMP notwithstanding). It can read files created by any version of Word for Windows, including the localized Chinese versions of Windows 95 and above. It also includes a "Compatibility Report" feature designed to address the problem of moving documents to Windows.