Computing.Net > Forums > Web Development > ABCs of transliterations

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

ABCs of transliterations

Reply to Message Icon

Name: Phil Perry
Date: July 9, 2006 at 19:06:25 Pacific
OS: Ubuntu Linux
CPU/Ram: P4/256
Product: Dell
Comment:

Is there any place on the Web listing accepted, widely-used, and even official transliteration systems to write non-Latin alphabet languages using ASCII? The transliteration should be lossless (someone can go back to the original language unchanged). Multicharacter equivalents are OK (almost unavoidable), and the Latin-1 alphabet with standard (Western European) diacritics (accents) might be tolerable. I can go with conversion into either Unicode (base letter + diacritics in one 16-bit number) or Unicode base letter + separate codes for various accents, tonal marks, and whatnot.

I'm curious how it's done for something like LaTeX with Babel, or ozTeX, or the like. Do users switch between Latin-1 for the commands and their native keyboard for the text itself? What do people do when all they have is a standard English keyboard and they want to write Greek, Russian and other Cyrillic languages, Hebrew, Arabic, various Indic languages, various Chinese languages, etc.? I'd prefer not to have to invent my own transliteration systems -- better to use those already accepted by lots of computer users.

I have some ideas for Web page software I'm thinking of commercializing, and it would be nice for people to be able to create Web pages in non-Latin alphabets. I can use TeX-style accents (e.g., \' for an acute accent on the following letter) for Latin alphabet-based languages. The source for a page would need to be a mixture of ASCII commands and ASCII or Latin-1 transliterated text. How do various systems handle this? Needless to say, there will probably be multiple transliteration systems in use for any language! My software would have to decode the transliteration and output the Unicode symbols in the HTML (or even output binary 8 or 16 bit characters).

Thanks, Phil



Sponsored Link
Ads by Google

Response Number 1
Name: anonproxy
Date: July 12, 2006 at 00:11:55 Pacific
Reply:

"Is there any place on the Web listing accepted, widely-used, and even official transliteration systems to write non-Latin alphabet languages using ASCII?"

The two places I would look are with Apple and Microsoft. They will be on top of any working standards.

"Do users switch between Latin-1 for the commands and their native keyboard for the text itself?"

Depends on the language. On the Internet, it is conventional to use ASCII programming languages, API's, etc. The character data is treated like any byte stream in most cases.

"What do people do when all they have is a standard English keyboard and they want to write Greek, Russian and other Cyrillic languages, Hebrew, Arabic, various Indic languages, various Chinese languages, etc.?"

In all cases mentioned they have a modified keyboard layout - in a pinch many are trained to use a Latin QWERTY keyboard with software mappings. Some are more solidified than others - there are many many Chinese variants, for example.

"I'd prefer not to have to invent my own transliteration systems -- better to use those already accepted by lots of computer users."

Honestly, I never thought about it. I'd prefer in almost every case to use the language directly. And decoding transliteration sounds like a nightmare.



0

Response Number 2
Name: Phil Perry
Date: July 17, 2006 at 09:22:12 Pacific
Reply:

"anonproxy", thanks for the information. I know there are some transliteration schemes around (e.g., multiletter for Cyrillic). So, do all authors have ASCII available to enter commands, and some way to directly enter their own alphabet? That sounds like your preferred method. It would be easiest for me, too, so long as there is no overlap between byte(s) used for non-ASCII text and bytes used for ASCII command sequences. My ASCII commands will be buried within their non-ASCII language text. Without knowing much about their alphabet, I need to be able to pick out where ASCII commands start. This isn't quite like a regular programming language, where non-ASCII text can be set off cleanly in strings. I suppose I'll need to be told how many bytes per character, any escape codes, and byte ranges used, so I can figure out where their character ends and an ASCII command might begin? HTML, (La)TeX, and others must have solved this kind of problem already!

Phil


0

Sponsored Link
Ads by Google
Reply to Message Icon

Related Posts

See More


Appearing on search engin... contact box´s????



Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Web Development Forum Home


Sponsored links

Ads by Google


Results for: ABCs of transliterations

Viewing index of web page www.computing.net/answers/webdevel/viewing-index-of-web-page/1749.html

Get rid of horizontal scroll bar IE www.computing.net/answers/webdevel/get-rid-of-horizontal-scroll-bar-ie/2635.html

db vs file storage of data www.computing.net/answers/webdevel/db-vs-file-storage-of-data/2551.html