Standarisation of Regional Alfabet Code

Standardization of Regional Alphabet Code

Adopting standards for Computer Market

Poland is a 38 million people market with growing demand for computer equipment. Under communist rule computer and comunication market were strongly limited. World leading producers of software and hardware assigned then low budget for their products adaptation to this market. On the other hand communist authorities were not interested in making efforts for approaching these producers with own standarization proposals for simplifying the market.

As a result, one emerging issue is now too many different hardware and software standards for polish alphabet letters. It calls for big problems in electronic data interchange of polish documents in business and industry. Polish alphabet, at the first look very similar to the latin one when compared with, for example greek or russian alphabets, contains in fact eighteen special letters with tails. Primary underestimation of this fact in early stages of software or hardware development projects for the polish market might have brougth up unexpected rise of related costs in the later stages, and because of that such projects could not be accomplished.

Originally the american alphabet code was established as a seven bits code under name ASCII (American Standard Code for Information Interchange). It is applied in the basic IBM code page 850.

In early eghties IBM, a leader in R&D of that time, has worked out code page 852, known as IBM Latin-2 page, with the corresponding keyboard for DOS. Anyway this standard was surely prepared by an individual hardly speaking polish language, because it differs substantially from valid polish typwriter keyboard. Additionally, polish letters which are similar to the letters already adopted in western languages were placed in very different places of the code table.

Polish computer industry has developed its own standard code called Mazovia, suiting much better polish typewriter keyboard. A text produced by this standard code can be also read with more popular code pages.

In early ninethies the Microsoft adopted, as a result of its multilanguage software strategy, its own 1250 code page, known also as Windows ANSI or Windows Latin-2. Unfortunately not all main barriers were overcome, neither this time. Four letters are placed at positions of normally very different application – After cuting of the eight bit they become control marks. The result is that the documments after passing through communication servers are severly altered. A simple adaptation of european and american programs for polish language is still not solved here. Some functions of basic level, as for example the clipboard, are still affected and do not work properly.

Actually many application programs are based on DOS, many others on Windows, and there is no compatibility between them regarding polish alphabet codes.

The first related international standard was ISO 8859-2:1987Information processing — 8-bit single byte coded graphic character sets — Part 2: Latin alphabet No. 2. All but six letters match here the Windows ANSI code. This standard is oficially adopted in polish standard and mostly used on Internet and in Unix environment.

8-bits code standards use the same 8-bits configuration for diffrent signs in diffrent languages depending on the number of regional standard. These standards are assigned to particular language regions and are represented by 10 code tables. Control marks are placed not allways identicaly and e-mail transfer through number of particulary coding e-mail systems may not be secure.

The Windows 95 multiprofile strategy speeds up communication also on the level of multilanguage applications. An example: when few years ago I tried to write documments in Polish on swedish computer it created big problem to the system oprerator and, on the other hand, my documents were not suficiently legible for Polish readers. So both parts were suspicous. Now, after I have got better access to the software I could install polish fonts on my swedish Windows 95 within few minutes.

Following table shows the differences between particular 8-bits codes of Polish letters applied on the market:

**Internationally applied codes of Polish specific letters**
	A,	C´	E,	L	N´	O´	S´	Z´	Z*	a,	c´	e,	l	n´	o´	s´	z´	z*
ISO 8859-2	161	198	202	163	209	211	166	172	175	177	230	234	179	241	243	182	188	191
ISO 10646	260	262	280	321	323	211	346	377	379	261	263	281	322	324	243	347	378	380
ISO 10646 HEX	104	106	118	141	143	0D3	15A	179	17B	105	107	119	142	144	0F3	15B	17A	17C
Windows-EE	165	198	202	163	209	211	140	143	175	185	230	234	179	241	243	156	159	191
IBM (CP 852)	164	143	168	157	227	224	151	141	189	165	134	169	136	228	162	152	171	190
Mazowia	143	149	144	156	165	163	152	160	161	134	141	145	146	164	162	158	166	167
Mac	132	140	162	252	193	238	229	143	251	136	141	171	184	196	151	230	144	253
Amiga PL	194	202	203	206	207	211	212	218	219	226	234	235	238	239	243	244	250	251
TeXPL	129	130	134	138	139	211	145	153	155	161	162	166	170	171	243	177	185	187
lack of	65	67	79	76	78	79	83	90	90	97	99	101	108	110	111	115	122	122
Polish Standard	161	198	202	163	209	211	166	172	175	177	230	234	179	241	243	182	188	191

The latest international standard – ISO/IEC 10646-1:1993Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1:Architecture and Basic Multilingual Plane – uses 16-bits coding which theoretically allows 65536 signs. The idea is that each sign has an unique bit-representation worldwide; no conversion is needed. This coding system solution will ensure secure transport of a particular sign trough all communication links.

It would be more practical if communication would rely on 16-bits code according to the standard ISO/IEC 10646-1 and all particular local systems would convert owns codes to this standard for communication purposes.

Conclusion is that two kind of development projects are higly applicable.

A worldwide (perhaps in billion dollars class) project on implemetation of 16-bits communication which should give one main advantage: Decrease of the number of conversion operations between different alphabet codes during data transfer.The particular producers should adjust their programs in order to grant conversion of their code pages to 16 bits standard.
A local (5-10 million ECU) project on worldwide encoding procedures of European languages characters set which`s situation is acute as far as the 16-bits communication is not implemented. For example in Poland, The Council for Coordination in Telecommunication is preparing criteria and recommendations for polish public administration sector how to apply the proper standards for EDI purposes.

For the United Nations Organisation the above communication development projects may propably save many million necessary otherwise for emergency help actions if the communication fails.

© Jacek Gancarczyk                              (last updated 96-11-5)

Standarisation of Regional Alfabet Code

Conclusion is that two kind of development projects are higly applicable.

Archives

Categories

Meta

Gallery

Get in Touch