Computerized Pinyin: Caron or Breve?

I will admit before today I had no idea what a caron or breve was. Doing some work on the next version of Hanzi Warrior I had the luxury of finding out. In doing an audit of the internal wordlists it appears that sometimes I found a caron and sometimes a breve for representing the third tone. In Unicode there is no u with both diaeresis and breve, therefore I converted all breves to carons with a little one-line sed script:

sed 's/ă/ǎ/g;s/ŏ/ǒ/g;s/ĕ/ě/g;s/ĭ/ǐ/g;s/ŭ/ǔ/g'

Here’s a handy little chart for future reference:

Caron Unicode Breve Unicode
ǎ 01ce ă 0103
ǒ 01d2 ŏ 014f
ě 011b ĕ 0115
ǐ 01d0 ĭ 012d
ǔ 01d4 ŭ 016d
ǚ 01da N/A N/A

Although I still do not really know what the difference is, it does seem that one is more pointy than the other. Until someone gets the Unicode consortium to add a diaeresis/breve combo I suppose carons will rule the day. Does anyone have any preferences? Do you secretly use breves for Pinyin?


4 thoughts on “Computerized Pinyin: Caron or Breve?

  1. jonathan

    you can do it with the “combining breve”, but it looks awful. Unicode has made a policy of not adding any new precomposed characters; you’re supposed to compose them yourself with the “combining” accents.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s