Standarisation of Regional Alfabet Code

 Standardization of Regional Alphabet Code
 Adopting standards for Computer Market
 

Poland is a 38 million people market with growing demand for computer equipment. Under communist rule computer and comunication market were strongly limited. World leading producers of software and hardware assigned then low budget for their products adaptation to this market. On the other hand communist authorities were not interested in making efforts for approaching these producers with own standarization proposals for simplifying the market.

As a result, one emerging issue is now too many different hardware and software standards for polish alphabet letters. It calls for big problems in electronic data interchange of polish documents in business and industry. Polish alphabet, at the first look very similar to the latin one when compared with, for example greek or russian alphabets, contains in fact eighteen special letters with tails. Primary underestimation of this fact in early stages of software or hardware development projects for the polish market might have brougth up unexpected rise of related costs in the later stages, and because of that such projects could not be accomplished.

Originally the american alphabet code was established as a seven bits code under name ASCII (American Standard Code for Information Interchange). It is applied in the basic IBM code page 850.

In early eghties IBM, a leader in R&D of that time, has worked out code page 852, known as IBM Latin-2 page, with the corresponding keyboard for DOS. Anyway this standard was surely prepared by an individual hardly speaking polish language, because it differs substantially from valid polish typwriter keyboard. Additionally, polish letters which are similar to the letters already adopted in western languages were placed in very different places of the code table.

Polish computer industry has developed its own standard code called Mazovia, suiting much better polish typewriter keyboard. A text produced by this standard code can be also read with more popular code pages.

In early ninethies the Microsoft adopted, as a result of its multilanguage software strategy, its own 1250 code page, known also as Windows ANSI or Windows Latin-2. Unfortunately not all main barriers were overcome, neither this time. Four letters are placed at positions of normally very different application – After cuting of the eight bit they become control marks. The result is that the documments after passing through communication servers are severly altered. A simple adaptation of european and american programs for polish language is still not solved here. Some functions of basic level, as for example the clipboard, are still affected and do not work properly.

Actually many application programs are based on DOS, many others on Windows, and there is no compatibility between them regarding polish alphabet codes.

The first related international standard was ISO 8859-2:1987Information processing — 8-bit single byte coded graphic character sets — Part 2: Latin alphabet No. 2. All but six letters match here the Windows ANSI code. This standard is oficially adopted in polish standard and mostly used on Internet and in Unix environment.

8-bits code standards use the same 8-bits configuration for diffrent signs in diffrent languages depending on the number of regional standard. These standards are assigned to particular language regions and are represented by 10 code tables. Control marks are placed not allways identicaly and e-mail transfer through number of particulary coding e-mail systems may not be secure.

The Windows 95 multiprofile strategy speeds up communication also on the level of multilanguage applications. An example: when few years ago I tried to write documments in Polish on swedish computer it created big problem to the system oprerator and, on the other hand, my documents were not suficiently legible for Polish readers. So both parts were suspicous. Now, after I have got better access to the software I could install polish fonts on my swedish Windows 95 within few minutes.

Following table shows the differences between particular 8-bits codes of Polish letters applied on the market:

  
 
Internationally applied codes of Polish specific letters
 A,E,LZ*a,e,lz*
ISO 8859-2161198202163209211166172175177230234179241243182188191
ISO 10646260262280321323211346377379261263281322324243347378380
ISO 10646 HEX1041061181411430D315A17917B1051071191421440F315B17A17C
Windows-EE165198202163209211140143175185230234179241243156159191
IBM
(CP 852)
164143168157227224151141189165134169136228162152171190
Mazowia143149144156165163152160161134141145146164162158166167
Mac132140162252193238229143251136141171184196151230144253
Amiga PL194202203206207211212218219226234235238239243244250251
TeXPL129130134138139211145153155161162166170171243177185187
lack of6567797678798390909799101108110111115122122
Polish Standard161198202163209211166172175177230234179241243182188191
 The latest international standard – ISO/IEC 10646-1:1993Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1:Architecture and Basic Multilingual Plane – uses 16-bits coding which theoretically allows 65536 signs. The idea is that each sign has an unique bit-representation worldwide; no conversion is needed. This coding system solution will ensure secure transport of a particular sign trough all communication links.

It would be more practical if communication would rely on 16-bits code according to the standard ISO/IEC 10646-1 and all particular local systems would convert owns codes to this standard for communication purposes.

Conclusion is that two kind of development projects are higly applicable.

  1. A worldwide (perhaps in billion dollars class) project on implemetation of 16-bits communication which should give one main advantage: Decrease of the number of conversion operations between different alphabet codes during data transfer.The particular producers should adjust their programs in order to grant conversion of their code pages to 16 bits standard.
  2. A local (5-10 million ECU) project on worldwide encoding procedures of European languages characters set which`s situation is acute as far as the 16-bits communication is not implemented. For example in Poland, The Council for Coordination in Telecommunication is preparing criteria and recommendations for polish public administration sector how to apply the proper standards for EDI purposes.

For the United Nations Organisation the above communication development projects may propably save many million necessary otherwise for emergency help actions if the communication fails.

 
© Jacek Gancarczyk                              (last updated 96-11-5)