Name: Phil Perry Date: July 9, 2006 at 19:06:25 Pacific Subject: ABCs of transliterations OS: Ubuntu Linux CPU/Ram: P4/256 Model/Manufacturer: Dell
Comment:
Is there any place on the Web listing accepted, widely-used, and even official transliteration systems to write non-Latin alphabet languages using ASCII? The transliteration should be lossless (someone can go back to the original language unchanged). Multicharacter equivalents are OK (almost unavoidable), and the Latin-1 alphabet with standard (Western European) diacritics (accents) might be tolerable. I can go with conversion into either Unicode (base letter + diacritics in one 16-bit number) or Unicode base letter + separate codes for various accents, tonal marks, and whatnot.
I'm curious how it's done for something like LaTeX with Babel, or ozTeX, or the like. Do users switch between Latin-1 for the commands and their native keyboard for the text itself? What do people do when all they have is a standard English keyboard and they want to write Greek, Russian and other Cyrillic languages, Hebrew, Arabic, various Indic languages, various Chinese languages, etc.? I'd prefer not to have to invent my own transliteration systems -- better to use those already accepted by lots of computer users.
I have some ideas for Web page software I'm thinking of commercializing, and it would be nice for people to be able to create Web pages in non-Latin alphabets. I can use TeX-style accents (e.g., \' for an acute accent on the following letter) for Latin alphabet-based languages. The source for a page would need to be a mixture of ASCII commands and ASCII or Latin-1 transliterated text. How do various systems handle this? Needless to say, there will probably be multiple transliteration systems in use for any language! My software would have to decode the transliteration and output the Unicode symbols in the HTML (or even output binary 8 or 16 bit characters).
"Is there any place on the Web listing accepted, widely-used, and even official transliteration systems to write non-Latin alphabet languages using ASCII?"
The two places I would look are with Apple and Microsoft. They will be on top of any working standards.
"Do users switch between Latin-1 for the commands and their native keyboard for the text itself?"
Depends on the language. On the Internet, it is conventional to use ASCII programming languages, API's, etc. The character data is treated like any byte stream in most cases.
"What do people do when all they have is a standard English keyboard and they want to write Greek, Russian and other Cyrillic languages, Hebrew, Arabic, various Indic languages, various Chinese languages, etc.?"
In all cases mentioned they have a modified keyboard layout - in a pinch many are trained to use a Latin QWERTY keyboard with software mappings. Some are more solidified than others - there are many many Chinese variants, for example.
"I'd prefer not to have to invent my own transliteration systems -- better to use those already accepted by lots of computer users."
Honestly, I never thought about it. I'd prefer in almost every case to use the language directly. And decoding transliteration sounds like a nightmare.
"anonproxy", thanks for the information. I know there are some transliteration schemes around (e.g., multiletter for Cyrillic). So, do all authors have ASCII available to enter commands, and some way to directly enter their own alphabet? That sounds like your preferred method. It would be easiest for me, too, so long as there is no overlap between byte(s) used for non-ASCII text and bytes used for ASCII command sequences. My ASCII commands will be buried within their non-ASCII language text. Without knowing much about their alphabet, I need to be able to pick out where ASCII commands start. This isn't quite like a regular programming language, where non-ASCII text can be set off cleanly in strings. I suppose I'll need to be told how many bytes per character, any escape codes, and byte ranges used, so I can figure out where their character ends and an ASCII command might begin? HTML, (La)TeX, and others must have solved this kind of problem already!
The information on Computing.Net is the opinions of its users. Such
opinions may not be accurate and they are to be used at your own risk.
Computing.Net cannot verify the validity of the statements made on this site. Computing.Net and Computing.Net, LLC hereby disclaim all responsibility and liability for the content of Computing.Net and its accuracy.
PLEASE READ THE FULL DISCLAIMER AND LEGAL TERMS BY CLICKING HERE