Standardization of Regional Alphabet Code
  Adopting standards for Computer Market
 

Poland is a 38 million people market with growing demand for computer equipment. Under communist rule computer and comunication market were strongly limited. World leading producers of software and hardware assigned then low budget for their products adaptation to this market. On the other hand communist authorities were not interested in making efforts for approaching these producers with own standarization proposals for simplifying the market.

As a result, one emerging issue is now too many different hardware and software standards for polish alphabet letters. It calls for big problems in electronic data interchange of polish documents in business and industry. Polish alphabet, at the first look very similar to the latin one when compared with, for example greek or russian alphabets, contains in fact eighteen special letters with tails. Primary underestimation of this fact in early stages of software or hardware development projects for the polish market might have brougth up unexpected rise of related costs in the later stages, and because of that such projects could not be accomplished.

Originally the american alphabet code was established as a seven bits code under name ASCII (American Standard Code for Information Interchange). It is applied in the basic IBM code page 850.

In early eghties IBM, a leader in R&D of that time, has worked out code page 852, known as IBM Latin-2 page, with the corresponding keyboard for DOS. Anyway this standard was surely prepared by an individual hardly speaking polish language, because it differs substantially from valid polish typwriter keyboard. Additionally, polish letters which are similar to the letters already adopted in western languages were placed in very different places of the code table.

Polish computer industry has developed its own standard code called Mazovia, suiting much better polish typewriter keyboard. A text produced by this standard code can be also read with more popular code pages.

In early ninethies the Microsoft adopted, as a result of its multilanguage software strategy, its own 1250 code page, known also as Windows ANSI or Windows Latin-2. Unfortunately not all main barriers were overcome, neither this time. Four letters are placed at positions of normally very different application - After cuting of the eight bit they become control marks. The result is that the documments after passing through communication servers are severly altered. A simple adaptation of european and american programs for polish language is still not solved here. Some functions of basic level, as for example the clipboard, are still affected and do not work properly.

Actually many application programs are based on DOS, many others on Windows, and there is no compatibility between them regarding polish alphabet codes.

The first related international standard was ISO 8859-2:1987Information processing -- 8-bit single byte coded graphic character sets -- Part 2: Latin alphabet No. 2. All but six letters match here the Windows ANSI code. This standard is oficially adopted in polish standard and mostly used on Internet and in Unix environment.

8-bits code standards use the same 8-bits configuration for diffrent signs in diffrent languages depending on the number of regional standard. These standards are assigned to particular language regions and are represented by 10 code tables. Control marks are placed not allways identicaly and e-mail transfer through number of particulary coding e-mail systems may not be secure.

The Windows 95 multiprofile strategy speeds up communication also on the level of multilanguage applications. An example: when few years ago I tried to write documments in Polish on swedish computer it created big problem to the system oprerator and, on the other hand, my documents were not suficiently legible for Polish readers. So both parts were suspicous. Now, after I have got better access to the software I could install polish fonts on my swedish Windows 95 within few minutes.

Following table shows the differences between particular 8-bits codes of Polish letters applied on the market:

 
 
Internationally applied codes of Polish specific letters
A, C E, L N O S Z Z* a, c e, l n o s z z*
ISO 8859-2
161 198 202 163 209 211 166 172 175 177 230 234 179 241 243 182 188 191
ISO 10646
260 262 280 321 323 211 346 377 379 261 263 281 322 324 243 347 378 380
ISO 10646 HEX
104 106 118 141 143 0D3 15A 179 17B 105 107 119 142 144 0F3 15B 17A 17C
Windows-EE
165 198 202 163 209 211 140 143 175 185 230 234 179 241 243 156 159 191
IBM
(CP 852)
164 143 168 157 227 224 151 141 189 165 134 169 136 228 162 152 171 190
Mazowia
143 149 144 156 165 163 152 160 161 134 141 145 146 164 162 158 166 167
Mac
132 140 162 252 193 238 229 143 251 136 141 171 184 196 151 230 144 253
Amiga PL
194 202 203 206 207 211 212 218 219 226 234 235 238 239 243 244 250 251
TeXPL
129 130 134 138 139 211 145 153 155 161 162 166 170 171 243 177 185 187
lack of
65 67 79 76 78 79 83 90 90 97 99 101 108 110 111 115 122 122
Polish Standard
161 198 202 163 209 211 166 172 175 177 230 234 179 241 243 182 188 191
 

The latest international standard - ISO/IEC 10646-1:1993Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1:Architecture and Basic Multilingual Plane - uses 16-bits coding which theoretically allows 65536 signs. The idea is that each sign has an unique bit-representation worldwide; no conversion is needed. This coding system solution will ensure secure transport of a particular sign trough all communication links.

It would be more practical if communication would rely on 16-bits code according to the standard ISO/IEC 10646-1 and all particular local systems would convert owns codes to this standard for communication purposes.

    Conclusion is that two kind of development projects are higly applicable.

  1. A worldwide (perhaps in billion dollars class) project on implemetation of 16-bits communication which should give one main advantage: Decrease of the number of conversion operations between different alphabet codes during data transfer.

    The particular producers should adjust their programs in order to grant conversion of their code pages to 16 bits standard.

  2. A local (5-10 million ECU) project on worldwide encoding procedures of European languages characters set which`s situation is acute as far as the 16-bits communication is not implemented. For example in Poland, The Council for Coordination in Telecommunication is preparing criteria and recommendations for polish public administration sector how to apply the proper standards for EDI purposes.

For the United Nations Organisation the above communication development projects may propably save many million necessary otherwise for emergency help actions if the communication fails.

 

© Jacek Gancarczyk                              (last updated 96-11-5)