Thursday, November 7, 2013

Unicode Dart Code


It would seem that Dart primary encoding scheme is utf-16. I honestly never knew that.

I ran into an encoding problem while getting various Pub packages ready for the imminent 1.0 release of Dart (OK, immenent may be overstating it, but I'm too excited to not use that word to describe it!). At first, I thought the problem may have just been a continuous build glitch. But I saw the same behavior in the dev channel build and then again today in the latest release of the SDK.

The problem is simple enough. In ctrl-alt-foo keyboard shortcut package, I like to describe Apple shortcuts with the unicode for the Apple Command key. For example, my test for the Keys.shortcuts high-level interface looks like:
    test("can establish shortcut listerner with a simple map", (){
      Keys.shortcuts({
        'Esc':          (){ /* ... */ },
        'Ctrl+N':       (){ /* ... */ },
        'Ctrl+O, ⌘+O':  expectAsync0((){}),
        'Ctrl+Shift+H': (){ /* ... */ }
      });

      typeCommand('O');
    });
Until last night, that had always passed. I had never really given much thought about it, but always assumed that the test.dart file itself was utf-8 encoded and that Dart could handle it. And, since the test had always passed, that seemed a perfectly valid ass⌘umption.

Until last night when I started seeing “passing” tests that looked like:
PASS: ShortCut throws an error for bad attempts at multiples (Ctl+O ⌘+O)
Worse, the Keys.shortcuts test above is outright failing:
ERROR: Keys can establish shortcut listerner with a simple map
  Test failed: Caught Instance of 'InvalidShortCutString'
  package:ctrl_alt_foo/shortcut.dart 67:9                       ShortCut.ShortCut.fromString
  package:ctrl_alt_foo/keys.dart 13:27                          Keys.shortcuts.<fn>.<fn>
  dart:core-patch/growable_array.dart 240                       List.forEach
  package:ctrl_alt_foo/keys.dart 13:16                          Keys.shortcuts.<fn>
  dart:collection-patch/collection_patch.dart 965               _HashMap&_LinkedHashMapMixin.forEach
  package:ctrl_alt_foo/keys.dart 10:22                          Keys.shortcuts
  ../test.dart 166:21
If I inspect the value of the shortcut passed the test, I see the same ⌘+O string instead of the ⌘+O that I see when I look at the code file. It seems that I can no longer make assumptions about encodings.

A quick check in Emacs (as awesome as the DartEditor is, it is no Emacs) reveals that I have encoded my code, unicode and all, as utf-8. A quick M-x set-buffer-file-coding-system to utf-16 (in both my test and application code), and I have:
PASS: ShortCut throws an error for bad attempts at multiples (Ctl+O ⌘+O)
PASS: Keys can establish shortcut listerner with a simple map
So Dart code must be encoded with utf-16, right?

Well, I'm not so sure. To better understand, I break this down into a smaller test case. I write a file string_test.dart, encoded in UTF-8, as:
main() {
  var command = '⌘';
  print('command char: $command');
  alias(command);
}

alias(String orig) {
  var aka = orig.replaceAll("⌘", "Meta");
  print('another name: ${aka}');
}
This should expose the same problem that I saw in my shortcut library—only it does not. When I run this, I find:
➜  tmp  dart --version
Dart VM version: 0.8.10.6_r30036 (Thu Nov  7 01:23:45 2013) on "linux_x64"
➜  tmp  dart string_test.dart
command char: ⌘
another name: Meta
The replace was a longshot. The shortcut library itself does a replace, so I thought maybe that was the problem, but no. It seems to work just fine with UTF-8.

At least on the command line. I had been running my tests in Dartium. Maybe that's the problem? So I create a string_test.html that runs string_test.dart:
<head>
  <script src="packages/browser/dart.js"></script>
  <script src="string_test.dart" type="application/dart"></script>
</head>
When I load that page in Dartium and check the console, I find:
command char: ⌘
another name: Meta
So this is a bug rather than a new feature of character encoding. I would tend to think the behavior in the Dart VM is the desired behavior, but I filed a bug to see what others think. If nothing else, it seems like my shortcut library ought to support both UTF-8 and UTF-16 as I can never be sure what library users might be encoding the command key as. Or I could force everyone to type out “Command,” but where's the fun in that?


Day #928

2 comments:

  1. Replies
    1. Pffft. I should pay them for all the fun stuff they make for me to play with. Between SPDY, Dart, Angular, and Polymer, it's pretty much been Christmas every day for nearly three years straight :)

      Delete