Unicode conversion
Written by Brian Chandler (c) 2003 / copyleft: use, but acknowledge
Notes and references
- This page uses the fact that Unicode is Javascript's native character set to make the conversions. Therefore: (a) it requires Javascript to be enabled, and (b) it only knows what Javascript knows about Unicode. In particular, it is unlikely to cater for characters outside the BMP (the basic 16-bit Unicode set). This HTML document is (notionally) encoded in UTF-8, though in principle the encoding shouldn't matter.
- Just click the Convert button, and avoid the temptation to press the Enter key: the latter should work, but involves submitting the form to the server, which sends it back again, only to invoke the Javascript.
- The test page button creates a sample page including the converted strings - use it to check everything's as it should be.
- Excellent reference: UTF-8 and Unicode FAQ for Unix/Linux by Markus Kuhn
- Some ad hoc notes on FORM submission and i18n by A J Flavell. For historians and masochists, the bit about Netscape 4 (the "unfinished browser") is interesting: this form seems to "work" with NN4; that is, paste in Japanese, and Javascript produces the right escape sequences, but the form input display is garble.
