Even Better Java i18n Pluralisation using ICU4JI18N and Java
The previous post showed how Java provides functionality for internationalisation, but this post exposes the the limitations of pluralisation within Java's functionality and demonstrates how ICU4J solves this for all locales!
Last week I wrote about Java's built-in ChoiceFormat class and the support it provides for pluralisation. It is a very useful class, but as pointed out by two commenters (btw... thanks for the feedback!) it doesn't cater well for all languages - particularly those that have more complex rules. This led me to investigate further, as I was certain there would be something useful out there - after all, internationalisation is a very common requirement of a large number of applications. After a little digging, I found that the one library that stands out is ICU4J . So who uses it? Well... pretty much everyone !
So for those that have more complex internationalisation requirements, this is an excellent library to use! I generally find that the best way to find out how something works is to see an example, so I've used the pluralisation example provided in the comments of my previous post to demonstrate ICU4J. I chose this example for a few reasons: firstly, because someone took the time to ask a question and I want to answer it; secondly, because it is clearly not supported by the JDK ChoiceFormat class; and lastly, because I only know languages with simple pluralisation rules.
I wrote a very basic class that simply prints out a localised message looked up from a ResourceBundle - which is probably the most commonly used approach and therefore familiar to most readers.
The code above should be familiar to everyone, as it shouldn't be all that different from how you're already doing i18n. However, note that I've imported
instead of the usual
. The really interesting part comes in when we use ICU4J's
format type, which is shown in the following properties files:
I'm sure you'll immediately notice that I'm not specifying numbers in these patterns, as we did with ChoiceFormat. Instead, I'm simply referring to categories of numbers by predefined mnemonics. This really cool feature is available because a number of language pluralisation rules have already been defined by the Unicode CLDR (Common Locale Data Repository). In particular, we're using the Language Plural Rules , which are provided in the ICU4J package. To explain how this works, let's look at the English example and then work our way up to the Polish example.
English has two categories - singular/plural. These two categories are named as
- fairly straightforward. What this really means in terms of plural rule definition is:
Polish is more complex than this and requires a number of rules to be defined:
Clearly the definition of rules makes our lives a lot easier. All we need to know is which category of numbers we want to provide a pluralisation for, and define the message against that name using the format
Note: The CLDR points out that the names are just mnemonics and aren't intended to describe the exact contents of the category, so try not to focus too much on them. It's merely providing categorisation by a recognisable name.
The above example only uses the predefined number categories, but we could easily mix this with explicit values if needed. In this case, the explicit values would be checked first for an exact match, and if none was found then the categories would be searched, and failing that the
category would be used. Here's an example of how you can mix the two concepts together:
If we formatted this with the numbers 1 to 5 in a loop, this would be formatted as follows:
Of course, there may be circumstances where the predefined rules don't do what you want (although, we're probably talking about exceptional circumstances now). In this case, you can simply define your own set of rules. This can be done using the PluralRules class or by customising the locale data that's available to ICU4J.
I've only scratched the surface of what you can do with this library - and pluralisation is only one very small part of what it provides - but I hope this is useful and is able to help get you started using it.