unicode_iso10646_codes.txt =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= A detailed discussion of the Unicode and its relationship to the draft international standard ISO 10646 topic appears in the article "ASCII Goes Global" by Kenneth M. Sheldon, BYTE, July 1991, Volume 16, number 7, pages 108-116. (Reprinted in \The Best of BYTE/, Jay Ranade and Alan Nash, editors. New York, McGraw-Hill, 1994, ISBN 0-07-05134-9, page 267.) - Unicode establishes an unambiguous, fixed 16-bit codeset. The ISO committee originally proposed a 32-bit codeset, in part because of a desire by Japan and Korea not to share codes for ideographs with Chinese. The designers of Unicode chose to unify these "Han" codes, making 16-bit representation possible. The ISO committee was influenced by the acceptance of Unicode. - Design goals of Unicode * completeness: Unicode was intended to eventually cover the full range of characters used in text creation, including dead languages and future inventions. * simplicity and efficiency: Every Unicode character is the same length, 16 bits. and each represents a character (no embedded escape sequences). * unambiguity: Each code represents a character; an error in reading one character does not propagate to following characters * correctness: Every character represented is a real character that an expert in the appropriate language would recognize. * fidelity: No data in the text should be lost when it is being converted between Unicode and any pre-existing coding standard. - contributors to the protocol include Xerox Corporation Apple Computer, Inc. Metaphor Computer Systems, Inc. IBM Sun Microsystems, Inc. Next, Inc. Research Libraries Group, Inc. Go Corporation (acquired by Eo Corporation?) Novell, Inc. Lotus Development Corporation (now owned by IBM) Aldus Corporation (now owned by Adobe Corporation) Pacific Rim Connections The Unicode Consortium P.O. Box 700519 San Jose, CA 95170-0519 USA Voice: +1 408/777-5870 Fax: +1 408/777-5082 E-mail: unicode-inc@UNICODE.ORG WWW: http://www.stonehand.com/unicode.html FTP: ftp://unicode.org/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= The magazine article "Unicode Breaks the ASCII Barrier" by John R. Vacca (\Datamation/, 1 August 1991, volume 37, number 15, pages 55-56) explained the general concepts behind Unicode at the time of its initial launch. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= A "Frequently Asked Questions" list for International Character Sets is available on the World Wide Web at this URL: ftp://mirrors.aol.com/pub/rtfm/usenet/comp.lang.c/Programming_for_Internationalization_FAQ The original is at MIT but AOL's site is easier to get into. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Article 1462 of comp.terminals: Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!avdms8.msfc.nasa.gov!europa.eng.gtefsd.com!howland.reston.ans.net!spool.mu.edu!olivea!sgigate.sgi.com!odin!shams.mti.sgi.com!anoosh From: anoosh@shams.mti.sgi.com (Anoosh Hosseini) Subject: Re: Requirements for an ISO 10646 terminal emulator Message-ID: Sender: news@odin.corp.sgi.com (Net News) Nntp-Posting-Host: shams.mti.sgi.com Organization: Silicon Graphics, Inc., Mountain View, CA, USA References: <2a3ffiE97q@uni-erlangen.de> <2a6iv1Egd8@uni-erlangen.de> <2a8p8kINNqeb@life.ai.mit.edu> Date: Fri, 22 Oct 1993 18:18:10 GMT Lines: 34 Xref: cs.utk.edu comp.std.internat:1162 comp.terminals:1462 In article <2a8p8kINNqeb@life.ai.mit.edu>, glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) writes: |> |> Also, on the issue of having *two* spaces, a LTR space and a RTL space, |> 10646 & Unicode do not have two separate space characters. What they |> do have is LTR MARK and RTL MARK. These may be used to alter the directional |> semantics of characters having weak or neutral directionality, such as |> SPACE. A keyboard or input method, on the other hand, might support the |> notion of two spaces, and use a combination of SPACE with the appropriate |> directional mark characters to encode the result of such input. Note |> that a user interface is not bound to present to the user textual elements |> which map one to one to character codes; a user interface should, in general, |> be prepared to offer an abstraction of what "units of text" are that is |> independent of how those units of text are encoded (e.g., using one or more |> coded characters). my deepest apologies, Glenn is quite correct here. Unfortunately while commenting on bi-directional issues, I replied in the context an internal representation based on the updated Iranian standard, which employes a Persian SPACE, pseudo space PSP and pseudo connection PCN. (and file I/O for the subset of unicode supported ) PSP PCN provide the same functionality as joiner/non-joiners in Unicode. Though Unicode has done much respected work in the area of bidirectional text, it is optimized for multi-lingual solutions. Most people who request M.Eastern software are at best bilingual, and thus more optimal encodings are appropriate. |> |> -- Glenn Adams |> -anooshiravan =BB=F6=F8=DC=FC=D4=F8=BB=F6=A0=CC=DA=FC=F6=FC SGI/MIPS Technology Inc. anoosh@mti.sgi.com Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!emory!darwin.sura.net!howland.reston.ans.net!xlink.net!fauern !rrze.uni-erlangen.de!not-for-mail Message-ID: <2abe6bE5p@uni-erlangen.de> References: <2a3ffiE97q@uni-erlangen.de> <2a6iv1Egd8@uni-erlangen.de> <2a8p8kINNqeb@life.ai.mit.edu> Date: Sat, 23 Oct 1993 15:13:31 +0100 Organization: Regionales Rechenzentrum Erlangen, Germany From: unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn) Subject: Re: Requirements for an ISO 10646 terminal emulator Is there any standard that defines what ISO 10646/Unicode 1.1 terminal emulators should do? The information in ISO 10646 surely isn't sufficient to guarantee that e.g. two independently implemented Unicode extensions to xterm will be compatible. Especially with regard to bi-directional text and switching between UTF-2 and other encodings. Markus -- Markus Kuhn, Computer Science student «°o°» University of Erlangen, Germany Internet: mskuhn@cip.informatik.uni-erlangen.de | X.500 entry available =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!emory!sol.ctr.columbia.edu!hamblin.math.byu.edu!news.byu.edu !cwis.isu.edu!u.cc.utah.edu!cs.weber.edu!terry Organization: Weber State University, Ogden, UT Message-ID: <2ack7p$786@u.cc.utah.edu> References: <2a0t6fEi3k@uni-erlangen.de> Date: 24 Oct 1993 01:02:49 GMT From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: Requirements for an ISO 10646 terminal emulator In article <2a0t6fEi3k@uni-erlangen.de> mskuhn@cip.informatik.uni-erlangen.de writes: >Suppose, I'd like to write a free terminal emulator (like kermit, xterm, ...) >tomorrow. I've just read ISO 10646, but I still have several questions >that separate me from a clear vision of how a ISO 10646 capable >terminal emulator migth look like. [ ... ] >b) What special functions are required for non-latin scripts. Ok, > arabic is written from right to left. Do I have to send a special > direction switch character for the direction, or should the emulator > "know" that all arabic letters are written from right to left > and switch automatically. Left to right ligaturing is probably the least of your problems. When I attacked this problem for a console driver an X terminal for BSD, I noted the following: Unicode, or Unicode in the guise of ISO10646, is useful for internationalization as a means of localization; it is much less useful for internationalization as a means of creating a multilingual system. For a multilingual system, Unicode *requires* the use of compounding, as described in volume 1 of the Unicode standard. The problem is the selection of fonts for languages using the same glyphs rendered as unique symbols; the biggest issue is multilingual use of Japanese and Chinese in the same document, where the difference has been likened to Pica Elite vs. Old English -- the glyph rendering style is bound to the language being rendered. I have been involved in learning spoken Japanese on my own for nearly a year now, and have recently gotten into Kanji (from Kana), and even I can see the differences. From a straw poll, the major problem that Japanese users (programmers, really, since users don't know what storage or representational methods are used, nor do they care, as long as it works) have with Unicode is that compounding is necessary to distinguish between Japanese and Chinese text stored as Unicode encoded data. A secondary complaint is that Chinese dictionary order is used, and uniquely Japanese characters are forced into this. The reason this is not a real serious issue is that Unicode does not attempt to impose a sort order; even were it to do so, there are multiple orders available to Japanese (just as German supports "Phone Book" vs. "Dictionary" order). Since my interest was in ease of localization, these were non-issues for me. I did not intend to display multinational data except as a round trip from some set that supports it (ie: JIS208/JIS212). Another reason to ignore (or discount) multinational display is the differences in rendering order (as per your Arabic + English example) which are not resolvable with lexical based translation via round trip (ie: characters drawn in different orders are lexically distinguished by a simple artimatic comparison). The second problem in using Unicode directly is that it requires a rendering engine; cannonically, since there are no rendering hints in the lexical ordering of characters (and no "holes" in such sets as would require it -- ie: ligatured sets -- to allow the rendering hints be externally imposed outside the standard), rendering of such Indic scripts as Tamil, Devangari, etc. is not possible using direct X technology. An example of what is necessary was recently posted to comp.sources.x -- it's called "xtamil". Suffice it to say, there is an assumption that you aren't using X. >c) I assume that the font mechanisms have to be quite flexible, because > very few single authors can provide a full ~40000 characters font > set. So some kind of font config file is necessary that lists > font files which contain subsets of all Unicode characters. > A typical implementation might perhaps only provide a font > which covers the ISO 10646 characters that are in the superset > of ISO 8859, IBM CP 437, Macinthosh and e.g. the several hundred > mathematical symbols in UCS. Other people might than provide > other ISO 10646 subsets and can integrate them in the font config file. > Better ideas? Is any free full UCS font (e.g. the one that was used > for printing ISO 10646) available? If you discount the complexly ligatured character sets (character sets that can't be ligatured simply by picking a common joint location and butting the characters up next to one another), you can fit a 14x14 "Unicode font" (this term will get me flamed, I'm afraid) in right around 1M of memory. The problem with this is that you need a horizontal pixel resoloution of at least 1120 pixels to get an 80 column screen (and 375 vertical pixels assuming only 1 pixel of vertical spacing between characters, which is too small). >d) Hardware like the IBM VGA in text mode can only show 256 or with > tricks 512 characters simultaneously. So we can't load the full > UCS font or even a reasonable subset completely in the character > bitmap table. But in most applications, less than 512 characters > will be needed simultaneously and a kind of font cache that keeps > only the needed characters in the hardware font buffer would > be all right. Has anyone experience with similar implementations? Direct rendering is required; if you intend to support ligatured sets at all (Arabic and Hebrew can be supported by "butting" characters up next to each other, even if it does look ugly in some combinations), then you will need to do pixel draws to do joining. That means you can't use the standard character index technology employed by VGA text modes. Using a VGA graphic mode for the rendering, you throw out the 256/512 character limits (in point of fact, there is at least one card that will support 8 simultaneous sets for a total of 2048 characters in VGA "text mode", but direct graphic rendering buys you unlimited sizing on the number of distinct characters displayable. One way to "cheat" to get near-VGS text speeds on the rendering is to assume a large amount of memory on the card -- generally, 2M. You can load the glpyhs into card memory that is not used for display, and use blit operations to do the draws (copying from one location in card memory to another). >e) Some characters (e.g. CJK ideogramms) seem to be difficult to > display on VGA 16x(8+1) cells. Would the place needed by two > VGA characters be sufficient? How are e.g. VGA text modes be used > with non-latin scripts today? You probably need documentation on the NEC PC98; you may wish to contact some Japanese FTP sites, or the Tokyo computer club (which has done a NEC PC98 port of the 386BSD OS). Generally, I believe you will be restricted to using graphic modes for most of your rendering. >Other ideas? What would your dream ISO 10646 terminal (emulator) look >like (if it uses VGA text mode or a graphics mode)? Generally, I would say don't use UTF; I made my tty driver for my console 16 bit aware. With a slight mod to the cannonical processing of I/O, there is very little else necessary to support most textual rendering. Other issues include a Unicode name space in directory entries for the file system (files themselves require no modification) and modifications to the way "well known files" are accessed to allow them to be renamed but still directly accessed from precompiled programs. You may want to look at rewriting shells and standard utilities to use a common locale based error directory, etc. Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!emory!darwin.sura.net!spool.mu.edu!bloom-beacon.mit.edu !ai-lab!wheat-chex!glenn Organization: MIT Artificial Intelligence Lab Message-ID: <2ae4dlINNjkk@life.ai.mit.edu> References: <2a6iv1Egd8@uni-erlangen.de> <2a8p8kINNqeb@life.ai.mit.edu> From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) Subject: Re: Requirements for an ISO 10646 terminal emulator Date: 24 Oct 1993 14:45:09 GMT Summary: Two Spaces Joe Becker, Xerox PARC, has asked me to post the following. Personal responses should go to . -- Glenn Re: the issue of having *two* spaces, a LTR space and a RTL space [Archiver's Note: I assume this is "left-to-right" and "right-to-left". ...RSS] Without trying to revisit all the complexities of bidirectionality, I would like to point out that RTL variants of ASCII characters, especially space, were excluded from Unicode very intentionally because they may cause severe problems while solving nothing. The problems include, in any system where the user can cut and paste text, the danger that the user may unwittingly intermingle identical-appearing directional variants, resulting in text whose directional layout will be incomprehensible. This might happen most easily with space, which might for example be copied along with a word from one directional context to another. There are no compensating benefits to defining directional variants of ASCII characters, any desired behavior can be attained via real ASCII characters, using the various control mechanisms in extreme cases. In most normal text no control mechanisms at all are needed, for example there is no reason to imagine that a RTL space must be used in normal Arabic-script or Hebrew-script text, it is entirely sufficient just to define the normal ASCII space as directionally neutral. I realize that various systems do implement RTL variants of ASCII characters and differ from the Unicode model in other ways. I believe, for example, that Apple's Macintosh Arabic script system uses RTL spaces. Suffice it to say that the designer of that system was a major author of the Unicode bidi design, I believe his implementation experience has changed his opinion on this topic. We (Xerox) are in just the same position of accepting the Unicode design as superior to our own past implementations. All of us have learned from our past implementations and have tried to ensure that the Unicode bidi model, if not perfect, at least encompasses the virtues of past designs while avoiding their flaws. We can only hope that others too will eventually transition to the Unicode model, because there does need to be a single world standard for bidirectional text representation. ---------------------------------------------------------------- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!emory!europa.eng.gtefsd.com!uunet!pmcgw!personal-media.co.jp Message-ID: Date: 25 Oct 93 10:46:50 GMT References: <2a0t6fEi3k@uni-erlangen.de> <2ack7p$786@u.cc.utah.edu> Sender: news@pmcgw.personal-media.co.jp Reply-To: ishikawa@personal-media.co.jp Followup-To: comp.std.internat Organization: Personal Media Corp., Tokyo Japan From: ishikawa@personal-media.co.jp (Chiaki Ishikawa) Subject: Re: Requirements for an ISO 10646 terminal emulator In article <2ack7p$786@u.cc.utah.edu>, terry@cs.weber.edu (A Wizard of Earth C) writes: > > For a multilingual system, Unicode *requires* the use of compounding, as > described in volume 1 of the Unicode standard. The problem is the > selection of fonts for languages using the same glyphs rendered as > unique symbols; the biggest issue is multilingual use of Japanese and > Chinese in the same document, where the difference has been likened > to Pica Elite vs. Old English -- the glyph rendering style is bound to > the language being rendered. Yes, we need compound-text structure. Mixing of Japanese and Chinese is one such instance where compound-text structure is necessary and Japanese programmers who have been used to just bi-lingual (i.e., Japanese, ASCII character sets handling) are complaining of added complexity of the compound-text structure. I have a nagging doubt here. Maybe console driver needs to be very small and can't handle such real multi-lingual display. But, I may be wrong. > I have been involved in learning spoken Japanese on my own for nearly a > year now, and have recently gotten into Kanji (from Kana), and even I > can see the differences. Good luck to you in learning Japanese! > From a straw poll, the major problem that Japanese users (programmers, > really, since users don't know what storage or representational methods > are used, nor do they care, as long as it works) have with Unicode is > that compounding is necessary to distinguish between Japanese and Chinese > text stored as Unicode encoded data. Yes you are right. Users don't care as long as it `works' right. >e) Some characters (e.g. CJK ideogramms) seem to be difficult to > display on VGA 16x(8+1) cells. Would the place needed by two > VGA characters be sufficient? How are e.g. VGA text modes be used > with non-latin scripts today? I think two VGA characters area is sufficent. Regarding non-latin scripts, as far as Japan is concerned, Japan IBM has released DOS/V with support for drivers to display/input Japanese characters, and the added character I/O BIOS calls, I think is extended somehow to encompass Japanese character display. You probably need documentation on the NEC PC98; you may wish to contact some Japanese FTP sites, or the Tokyo computer club (which has done a NEC PC98 port of the 386BSD OS). I think Terry meant Kyoto (Not Tokyo) computer club here. They have ported 386BSD to NEC-PC. There is also an implementation of KON driver: a console driver that supports Kanji character display on Japanese PC done by someone here. I am looking at the possibility of using Linux on my home computer: at work, I use Sun, X11R5, Nemacs (A hacked version of Emacs 18.5x to support Japanese input/output), Wnn (Kanji input server that helps me in inputting Kanji/Kana characters using roman letter input), kterm (Kanji terminal emulator based on xterm from MIT X11 distribution.). I am hoping that I can use the same set of tools on my PC. So I will look forward to seeing your [Markus Kuhn] result to be used widely. I have a short summary of Japanization of Linux posted to insoft-l mailing list. If you want to take a look, just let me know. Chiaki Ishikawa, Personal Media Corp., MY Bldg, 1-7-7 Hiratsuka, Shinagawa, Tokyo 142, JAPAN. FAX:+81-3-5702-0359, Phone:+81-3-5702-0351 UUNET: ishikawa@personal-media.co.jp =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!emory!swrinde!cs.utexas.edu!utnut!torn!watserv2.uwaterloo.ca!undergrad.math.uwaterloo.ca!megauthi From: megauthi@watcgl.uwaterloo.ca (Marc E. Gauthier) Subject: Re: Requirements for an ISO 10646 terminal emulator Message-ID: Sender: news@undergrad.math.uwaterloo.ca Organization: University of Waterloo References: <2a6iv1Egd8@uni-erlangen.de> <2a8p8kINNqeb@life.ai.mit.edu> <2ae4dlINNjkk@life.ai.mit.edu> Date: Thu, 28 Oct 1993 03:52:29 GMT In article <2ae4dlINNjkk@life.ai.mit.edu>, Glenn A. Adams wrote: >Joe Becker, Xerox PARC, has asked me to post the following. Personal >responses should go to . > >-- Glenn > >[...description of experience on issue of bidirectionality deleted...] >of past designs while avoiding their flaws. We can only hope that others too >will eventually transition to the Unicode model, because there does need to be >a single world standard for bidirectional text representation. To tackle this properly, shouldn't scripting in arbitrary directions be considered? While handling bidirectionality is not trivial, handling a mix of left<->right and up<->down scripts would be even less so. Yet Japanese and Chinese (there must be others?) are, I understand, traditionally written up->down. [I've heard of Japanese films whose credits, written up->down, included some English also written up->down (so you'd tilt your head to read it :-).] Has any work been done to address this extra dimension to the problem? Now just to make things interesting, imagine some script comes along that writes along concentric circles, spirals, or some other pattern... :-) The ISO 10646-1.2 draft mentions explicit left-to-right and right-to-left marks and other characters for bidirectionality, but doesn't give any other characters for multiple directionality (eg. up<->down vs left<->right). Section "22 Order of characters" suggests the possibility of mixing vertical and horizontal scripts, but does not address how that might be represented or interpreted. For me this is a rather academic question, since I have no immediate application for it. Just wondering... -Marc -- Marc E. Gauthier +1 819 777 5841 (res,Hull) +1 613 991 6975 (work,Ottawa) mgauthier@iit.nrc.ca Software Eng Lab, IIT, National Research Council Canada aj313@freenet.carleton.ca Disclaimer: I speak for myself, not my employer =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!darwin.sura.net!spool.mu.edu!bloom-beacon.mit.edu!ai-lab!wheat-chex!glenn Organization: MIT Artificial Intelligence Lab Message-ID: <2aokqlINNrql@life.ai.mit.edu> References: <2ae4dlINNjkk@life.ai.mit.edu> Date: 28 Oct 1993 14:26:29 GMT From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) Subject: Re: Requirements for an ISO 10646 terminal emulator In article megauthi@watcgl.uwaterloo.ca (Marc E. Gauthier) writes: >To tackle this properly, shouldn't scripting in arbitrary directions >be considered? While handling bidirectionality is not trivial, handling >a mix of left<->right and up<->down scripts would be even less so. It is important to understand that the intention of the Unicode BIDI controls and algorithm is not to support the specification of the appearance of a text (i.e., formatting, style, etc.) in all its richness; rather, it is to specify the minimum amount of information that prevents ambiguity in the semantics of character data. In the case of Arabic and Hebrew script-based texts, and also the admixture of these texts and other normally left-to-right scripts, there are cases where the semantics of the text's content depends on resolving three kinds of ambiguities: (1) the directionality of neutrals (spaces, punctuation, some symbols, some numbers, etc.); (2) the nesting relationship of embedded directional texts; and, (3) the directionality of certain substrings, such as formulas, part numbers, etc. These three problems are solved with the use of directional marks (LRM, RLM), directional embedding controls (LRE, RLE, PDF), and directional override controls (LRO, RLO, PDF), respectively. As for vertical formatting, this, in general, is viewed as a formatting or appearance related function unrelated to the semantics or content. The kinds of ambiguities that arise with BIDI don't appear here. At least they don't appear in CJK and Tibetan texts. In CJK texts, and a little more so with Tibetan, there *are* certain differences in how one sets a text in the context of a vertical mode; however, this is really a matter of style. One can probably argue that with Tibetan one does need a vertical mode control because of the special use of vertical columns embedded in a normally horizontal text which are used for certain mantras (this is different than the normal Tibetan stacking behaviour). On the other hand, one could treat Japanese ranmoji and warichu (ways of embedding horiz. text in vert. text or vice-versa) in a similar fashion. In both of these cases, it is a matter more for a rich text format which specifies a much richer set of formatting parameters than one would find in an unadorned plain text. Keep in mind that Unicode's charter (as it were) is to encode only "plain" text, i.e., content alone. The encoding of style and appearance is considered to be another dimension that is layered on top of plain text or specified in an out-of-band channel, i.e., by a higher level protocol; and, consequently, to not be defined by Unicode. Regards, Glenn Adams Technical Director, Unicode Consortium =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.std.internat,comp.terminals Path: cs.utk.edu!avdms8.msfc.nasa.gov!europa.eng.gtefsd.com!howland.reston.ans.net!spool.mu.edu!olivea!sgigate.sgi.com!odin!shams.mti.sgi.com!anoosh Message-ID: Sender: news@odin.corp.sgi.com (Net News) Nntp-Posting-Host: shams.mti.sgi.com Organization: Silicon Graphics, Inc., Mountain View, CA, USA References: <2a3ffiE97q@uni-erlangen.de> <2a6iv1Egd8@uni-erlangen.de> <2a8p8kINNqeb@life.ai.mit.edu> Date: Fri, 22 Oct 1993 18:18:10 GMT From: anoosh@shams.mti.sgi.com (Anoosh Hosseini) Subject: Re: Requirements for an ISO 10646 terminal emulator In article <2a8p8kINNqeb@life.ai.mit.edu>, glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) writes: |> |> Also, on the issue of having *two* spaces, a LTR space and a RTL space, |> 10646 & Unicode do not have two separate space characters. What they |> do have is LTR MARK and RTL MARK. These may be used to alter the directional |> semantics of characters having weak or neutral directionality, such as |> SPACE. A keyboard or input method, on the other hand, might support the |> notion of two spaces, and use a combination of SPACE with the appropriate |> directional mark characters to encode the result of such input. Note |> that a user interface is not bound to present to the user textual elements |> which map one to one to character codes; a user interface should, in general, |> be prepared to offer an abstraction of what "units of text" are that is |> independent of how those units of text are encoded (e.g., using one or more |> coded characters). My deepest apologies, Glenn is quite correct here. Unfortunately while commenting on bi-directional issues, I replied in the context an internal representation based on the updated Iranian standard, which employes a Persian SPACE, pseudo space PSP and pseudo connection PCN. (and file I/O for the subset of unicode supported ) PSP PCN provide the same functionality as joiner/non-joiners in Unicode. Though Unicode has done much respected work in the area of bidirectional text, it is optimized for multi-lingual solutions. Most people who request M.Eastern software are at best bilingual, and thus more optimal encodings are appropriate. |> |> -- Glenn Adams |> -anooshiravan =BB=F6=F8=DC=FC=D4=F8=BB=F6=A0=CC=DA=FC=F6=FC SGI/MIPS Technology Inc. anoosh@mti.sgi.com =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.protocols.kermit.misc, comp.terminals, de.comp.os.sinix, comp.sys.dec, comp.os.aos, comp.sys.ibm.as400.misc, bit.listserv.ibm-main, comp.sys.hp.misc Followup-To: comp.protocols.kermit.misc Message-ID: <6vjfop$bmr$1@apakabar.cc.columbia.edu> Organization: Columbia University Date: Thursday, 8 Oct 1998 22:53:13 GMT From: Frank da Cruz Subject: Adapting Unicode to Terminal Emulation Anybody interested in full emulation of IBM, DEC, Data General, Hewlett Packard, Siemens-Nixdorf, or other terminals in Unicode-based terminal emulation software, probably knows that Unicode lacks some of the special characters used by these terminals. Work to add them, along with additional debugging capabilities (e.g. for graphical display of control characters, hex dumps, etc), is in progress. The current working paper can be found at: ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt Comments and suggestions welcome. Follow up to comp.protocols.kermit.misc, or by email to me. Thanks. Frank da Cruz The Kermit Project Columbia University =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=