ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as “Latin-2”. It is generally intended for Central[1] or “Eastern European” languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as “Latin-2” in Czech and Slovak regions.[2] Almost half the use of the encoding is for Polish, and it’s the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8 (on the web).
ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. Less than 0.04% of all web pages use ISO-8859-2 as of October 2022.[3][4] Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned code page 912 to ISO 8859-2,[5] until that code page was extended in 1999.[6] Code page 1111 is similar, but replaces byte B0 ° (degree sign) with U+02DA ˚ (ring above).
Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place).
Language coverage
These code values can be used for the following languages:
- ^ The missing letter Å is officially a part of the Finnish alphabet, however it has no native use and its usage is limited to foreign names only.
- ^ In 2017, the Council for German Orthography officially added a capital ẞ, but is not actually required as SS can be used instead.
- ^ This character set unifies Ș and Ț (S,T with commas below) with Ş and Ţ (S, T with cedillas), as did virtually all other character sets including Microsoft’s Windows-1250 and the first version of Unicode. However, Unicode subsequently disunified them, which complicates processing of Romanian data, as pre-existing data and input methods still contain the older cedilla codepoints.[citation needed]
Code page layout
Differences from ISO-8859-1 have the Unicode code point number underneath.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0x | ||||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | “ | # | $ | % | & | ‘ | ( | ) | * | + | , | – | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | ||||||||||||||||
| 9x | ||||||||||||||||
| Ax | NBSP | Ą 0104
|
˘ 02D8
|
Ł 0141
|
¤ | Ľ 013D
|
Ś 015A
|
§ | ¨ | Š 0160
|
Ş 015E
|
Ť 0164
|
Ź 0179
|
SHY | Ž 017D
|
Ż 017B
|
| Bx | ° | ą 0105
|
˛ 02DB
|
ł 0142
|
´ | ľ 013E
|
ś 015B
|
ˇ 02C7
|
¸ | š 0161
|
ş 015F
|
ť 0165
|
ź 017A
|
˝ 02DD
|
ž 017E
|
ż 017C
|
| Cx | Ŕ 0154
|
Á | Â | Ă 0102
|
Ä | Ĺ 0139
|
Ć 0106
|
Ç | Č 010C
|
É | Ę 0118
|
Ë | Ě 011A
|
Í | Î | Ď 010E
|
| Dx | Đ 0110
|
Ń 0143
|
Ň 0147
|
Ó | Ô | Ő 0150
|
Ö | × | Ř 0158
|
Ů 016E
|
Ú | Ű 0170
|
Ü | Ý | Ţ 0162
|
ß |
| Ex | ŕ 0155
|
á | â | ă 0103
|
ä | ĺ 013A
|
ć 0107
|
ç | č 010D
|
é | ę 0119
|
ë | ě 011B
|
í | î | ď 010F
|
| Fx | đ 0111
|
ń 0144
|
ň 0148
|
ó | ô | ő 0151
|
ö | ÷ | ř 0159
|
ů 016F
|
ú | ű 0171
|
ü | ý | ţ 0163
|
˙ 02D9
|
See also
References
- ^ “Microsoft Outlook Message Encodings”. 10 January 2017.
- ^ “The Czech and Slovak Character Encoding Mess Explained”. luki.sdf-eu.org. Retrieved 2022-02-27.
- ^ “Usage Statistics and Market Share of ISO-8859-2 for Websites, October 2022”. w3techs.com. Retrieved 2022-10-23.
- ^ “Historical trends in the usage statistics of character encodings for websites, February 2022”.
- ^ “Icu-data/Charset/Data/XML/Ibm-912_P100-1995.XML at main · unicode-org/Icu-data”. GitHub.
- ^ “Icu-data/Charset/Data/Ucm/Ibm-912_P100-1999.ucm at main · unicode-org/Icu-data”. GitHub.
External links
- ISO/IEC 8859-2:1999
- Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets – Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
- ISO-IR 101 Right-Hand Part of Latin Alphabet No.2 (February 1, 1986)
- ISO 8859-2 (Latin 2) Resources