Tuesday, December 24, 2013

Code Generation and Polymer i18n


So how do internationalization messages work in Dart? I was able to make good use of Intl.plural() in a silly Polymer example yesterday. I even used Intl.message(), but I am pretty sure it is just dead code at this point.

My Polymer displays a locale-dependent greeting, buttons and the number of balloons that a person has:



Thanks to Intl.plural(), my Polymer can display proper grammar versions of the number of balloons that I have be it 99 or 1:



Or even, sadly no red balloons:



I am doing this with locale-specific maps that are chosen at runtime:
  static Map get fr => {
    'hello': 'Bonjour',
    'done': 'Fin',
    'how_many': 'Combien?',
    'instructions': 'Présentez-vous une expérience personnalisée incroyable!',
    'count_message': (count) =>
      Intl.message(
        '''
        ${Intl.plural(
            count,
            zero: "Je n'ai pas de ballon rouge",
            one: "J'ai un ballon rouge",
            other: "J'ai $count ballons rouges"
          )}.''')
  };
I am using Intl.message() for the count_message propery-function. But as I mentioned, it serves no purpose. In fact it rather gets in the way. If I remove it, the code still works and is far more readable:
  static Map get fr => {
    // ...
    'count_message': (count) =>
      Intl.plural(
        count,
        zero: "Je n'ai pas de ballon rouge",
        one: "J'ai un ballon rouge",
        other: "J'ai $count ballons rouges"
      )
  };
But darn it, Intl.message() does something.

I extract the Intl.message() call out in to a function:
count_message(count) =>
  Intl.message(
    '''
    ${Intl.plural(
        count,
        zero: "Je n'ai pas de ballon rouge",
        one: "J'ai un ballon rouge",
        other: "J'ai $count ballons rouges"
    )}.''',
    name: 'count_message',
    desc: 'Reports the number of red balloons a person has.',
    args: [count],
    examples: {'count': 99}
  );
And I dig through the intl code to find that I should:
$ dart \
  --package-root=packages \
  /home/chris/.pub-cache/hosted/pub.dartlang.org/intl-0.9.1/test/message_extraction/extract_to_json.dart \
  lib/hello_you.dart
Which produces a intl_messages.json file with:
[{
  "name":"count_message",
  "desc":"Reports the number of red balloons a person has.",
  "examples":"{'count' : 99}",
  "args":[],
  "message":"\n    ${Intl.plural(count, zero: 'Je n\\'ai pas de ballon rouge', one: 'J\\'ai un ballon rouge', other: 'J\\'ai ${count} ballons rouges')}."
}]
(the actual file output is one a single line -- I split the lines here for readability)

Great. So what does this file actually do?

Well, based on the documentation, I think it mostly serves as a template for translation_* files. I think that I have likely made a mistake here starting with the French, so I am going to create a separate French translation as well as an English translation in translation_en.json:
{
  "locale": "en",
  "count_message": "${Intl.plural(count, zero: 'I have no red balloons', one: 'I have one red balloon', other: 'I have ${count} red balloons')}."
}
Then I run the opposite of the original extract_to_json.dart script. Now I need to generate_from_json.dart. This uses the same command line arguments as the previous script, but now includes the translation_* files as well:
$ dart \
  --package-root=packages \
  /home/chris/.pub-cache/hosted/pub.dartlang.org/intl-0.9.1/test/message_extraction/generate_from_json.dart \
  lib/hello_you.dart \
  translation_en.json translation_fr.json
That fails for me with the following error:
Unable to open file: /home/chris/repos/polymer-book/play/i18n/dart/packages/serialization/serialization.dart'file:///home/chris/.pub-cache/hosted/pub.dartlang.org/intl-0.9.1/test/message_extraction/generate_from_json.dart': error: line 25 pos 1: library handler failed
import 'package:serialization/serialization.dart';
^
Bleh. So I add serialization to my pubspec.yaml, pub get and try again. And still get an error:
Unhandled exception:
The null object does not have a getter 'string'.

NoSuchMethodError : method not found: 'string'
Receiver: null
Arguments: []
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:42)
#1      generateLocaleFile (file:///home/chris/.pub-cache/hosted/pub.dartlang.org/intl-0.9.1/test/message_extraction/generate_from_json.dart:95:32)
#2      main (file:///home/chris/.pub-cache/hosted/pub.dartlang.org/intl-0.9.1/test/message_extraction/generate_from_json.dart:63:23)
#3      _startIsolate.isolateStartHandler (dart:isolate-patch/isolate_patch.dart:188)
#4      _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:93)
Looking through the code in the stacktrace, it seems that my JSON translation files need a _locale attribute, not a locale attribute:
{
  "_locale": "fr",
  "count_message": "${Intl.plural(count, zero: 'Je n\\'ai pas de ballon rouge', one: 'J\\'ai un ballon rouge', other: 'J\\'ai ${count} ballons rouges')}."
}
With that now working, I have three new files:messages_all.dart, messages_en.dart, and messages_fr.dart.

So what do I do with these?

I start with a simple import:
import 'messages_all.dart';
That, combined with setting the Intl.defaultLocale has no effect in the code -- the default French message is always shown. I find that I need to call the initializeMessages() function from the generated messages_all.dart file:
@CustomTag('hello-you')
class HelloYou extends PolymerElement {
  @observable String balloon_message;
  // ...
  HelloYou.created(): super.created() {
    initializeMessages('en_US');
  }
  // ...
  updateCount() {
    // count_message is the i18n function:
    balloon_message = count_message(int.parse(count));
  }
}
With that, I am setting my locale such that the count_message() function will honor it. Almost:



It seems that my translation_en.json is not generating a proper messages_en.dart. At this point, I just want to get this working, so I remove backslashes from messages_en.dart:
class MessageLookup extends MessageLookupByLibrary {
  get localeName => 'en';
  static count_message(count) => "\${Intl.plural(count, zero: \'I have no red balloons\', one: \'I have one red balloon\', other: \'I have ${count} red balloons\')}.";
  // ...
}
After removing the backslashes, I am left with:
  static count_message(count) => "${Intl.plural(count, zero: 'I have no red balloons', one: 'I have one red balloon', other: 'I have ${count} red balloons')}.";
Then I have my Dart intl-based solution working.

Having gone through all that, I can safely say that this is a huge pain. Even if the code was not mis-generated in the last step, this feels awkward to use. Much of that is probably my own biases, so I am not dismissing this approach outright. I personally prefer metaprogramming to code generation. But I recognize that code generation may very well be the right approach in this case. I also recognize that this approach may scale a little better than my poor man's map-based approach.

But in the end, I do not understand what the intializeMessages() does—even after digging through it. That alone may be enough to push me away. But I will sleep on it first.


Day #975

7 comments:

  1. You are right. Internationalization in Dart is a mess.

    ReplyDelete
  2. There's a polymer internationalization example in the samples as polymer_intl.

    Intl.message provides an level of indirection so that we can replace the message with the translated version at runtime. So if you write
    hello(name) => Intl.message("Hello $name);
    then at runtime if your active locale is French, it will return "Bonjour Chris".

    Intl.plural and Intl.gender allow you to specify variants of the message. However, as you note, wrapping them inside a literal string inside an Intl.message call would make for very messy escaping. So if you call Intl.plural it also acts the same as Intl.message, and you can omit it.

    The general idea is that you would write your application in some language, extract the messages, have them translated, and then generate code based on the translations. The translation format in the repository is a hacked-up one that's really just for the tests. As you also noticed, the literal form for complex plurals/genders is very painful, so it just serializes the data structure. To use it in a real way you'd presumably want a real translation system and generate code based on that format.

    initializeMessages is a hook to allow for async loading of the data for a particular language. Since the deferred loading of code doesn't entirely work right now it's not very useful, but if it did work, that would potentially go and load the appropriate library from the server so that you don't need to have every language's data as part of your program. It would also be possible to load those as data files rather than as deferred-loading code. While I share you general prejudice in favour of meta-programming, there are significant advantages for generating the code in this case. We might also be able to provide something that ran more directly. I'd consider this stuff still unfinished in some respects. It doesn't help that the dartdoc viewer only wants to show docs for one library per package, so you can't easily see the docs on any of this, such as they are.

    ReplyDelete
    Replies
    1. Much thanks for the clarifications -- they definitely help. FWIW I did see the sample Polymer i18n, but found it a little hard to follow. The documentation in the code and tests made more sense to me, with the sample Polymer i18n helping to clarify along the way.

      I'm still not sold on code generation, but you've thought about this much more than I have so I'm happy to defer to your expertise. I'm eager to see all of this evolve - I'm learning a ton already :)

      Thanks again for the reply!

      Delete
  3. Thanks for the post, Chris! My experience is similar to yours and it's nice to see that I'm not alone. IMHO localization in Dart is currently a pain, with 99% of the pain being caused by the code generation. Thanks also for the descriptions of the steps you followed, it will be helpful in getting the intl package working for me (I'm not there yet).

    Cocoa gets this right, with localization done in strings files outside of code (http://goo.gl/3GKsj1). There's a code-to-strings-file utility, but its use is optional and there's no need for a strings-file-to-code step (and the accompanying importing of the generated code). I wish the Dart team would consider doing something similar. As an added bonus, this approach would allow unused locale data to be ignored without depending on Dart's incomplete deferred code loading.

    I can't think of any benefit the code generation brings, but I'll raise the question on misc@dartlang.org when I next get a chance. One thing code generation allows for is the ability to use the 'Hello $name' syntax of Dart string literals for substitutions in localized strings, but I think that's actually a disadvantage since using the same syntax is messy in the (admittedly rare) cases where substitution needs to be deferred.

    ReplyDelete
    Replies
    1. The reason for the code generation is for date, gender, and pluralization handling. If you want to internationalize “I ate $x pizzas” and have it work for the zero case, the one case and the 2+ cases, then code generation is a decent approach. Dart's Intl handles that along with gender and dates.

      That said, I agree with you that an external lookup file approach in addition to the current Intl would be nice.

      Delete
    2. Hmmm... I can't agree with you there. I think code generation is a bad fit for this problem, and Dart localization won't be simplified until that approach is ditched. Patching the current code generation system with an additional lookup file would only make things more complicated.

      If anything Dart's date, gender, and pluralization localization (which I am familiar with :-) just proves this point. I've written localization code for dates, genders, and plurals at different points and with different tools over the last 20 years ago (including with one localization framework I wrote) and I've never found it to be as complex as it is with Dart.

      Another data point is that OS X and iOS ship to hundreds of million of users in more than 18 languages even though the localization mechanism they depend on doesn't use code generation. That localization mechanism is based on 20 year old NeXTSTEP technology and it's still simpler both to understand and to use than Dart's!

      I'd love to be proven wrong and to see a localization package that yields simplicity through code generation. I doubt that will happen, but in the meantime I'd just like to have a reasonably simple Dart localization framework. The current Intl does not qualify, and it will not qualify even after it's more complete and better documented unless it is significantly changed.

      Delete
    3. Heh. I didn't mean to imply that I thought code generation was the right solution. I'm pretty skeptical myself, but trying to give them the benefit of the doubt to see how it shakes out.

      And really, I think we're also in agreement about patching the current system with a lookup file. I tend to think it ought to be completely separate as well, possibly with a migration path to Intl. Something without code generation.

      For what it's worth, I am not using Intl myself. I am glad to have gotten it to work (mostly), but I have opted for something along the lines of: http://japhr.blogspot.com/2013/12/a-possible-approach-to-polymer-i18n.html (which uses JSON conf files).

      Really, someone should write another i18n package for Dart. Maybe someone with experience with different i18n tools over the last 20 years? :D

      Delete