KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers Russian and Bulgarian, but replaces ten box drawing characters with five Ukrainian and Belarusian letters Ґ, Є, І, Ї, and Ў in both upper case and lower case. It is even more closely related to KOI8-U, which does not include Ў but otherwise makes the same letter replacements. The additional letter allocations are matched by KOI8-E, except for Ґ which is added to KOI8-F.
In IBM, KOI8-RU is assigned code page/CCSID 1167.[1][2]
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on.[citation needed] Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.
KOI8 stands for Kod obmena informatsiey, 8 bit (Russian: Код обмена информацией, 8 бит) which means “Code for Information Exchange, 8 bit”.
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, “Код Обмена Информацией” in KOI8-RU becomes kOD oBMENA iNFORMACIEJ (the Russian meaning of the “KOI” acronym) if the 8th bit is stripped.
Character set
The following table shows the KOI8-RU encoding. Each character is shown with its equivalent Unicode code point.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0x | ||||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | “ | # | $ | % | & | ‘ | ( | ) | * | + | , | – | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | ─ 2500
|
│ 2502
|
┌ 250C
|
┐ 2510
|
└ 2514
|
┘ 2518
|
├ 251C
|
┤ 2524
|
┬ 252C
|
┴ 2534
|
┼ 253C
|
▀ 2580
|
▄ 2584
|
█ 2588
|
▌ 258C
|
▐ 2590
|
| 9x | ░ 2591
|
▒ 2592
|
▓ 2593
|
“[a] 201C
|
■ 25A0
|
∙ 2219
|
” 201D
|
—[a] 2014
|
№ 2116
|
™[a] 2122
|
NBSP | » 00BB
|
® 00AE
|
« 00AB
|
· 00B7
|
¤ 00A4
|
| Ax | ═ 2550
|
║ 2551
|
╒ 2552
|
ё 0451
|
є[b][c] 0454
|
╔ 2554
|
і[b][c] 0456
|
ї[b][c] 0457
|
╗ 2557
|
╘ 2558
|
╙ 2559
|
╚ 255A
|
╛ 255B
|
ґ[b] 0491
|
ў[c] 045E
|
╞ 255E
|
| Bx | ╟ 255F
|
╠ 2560
|
╡ 2561
|
Ё 0401
|
Є[b][c] 0404
|
╣ 2563
|
І[b][c] 0406
|
Ї[b][c] 0407
|
╦ 2566
|
╧ 2567
|
╨ 2568
|
╩ 2569
|
╪ 256A
|
Ґ[b] 0490
|
Ў[c] 040E
|
© 00A9
|
| Cx | ю 044E
|
а 0430
|
б 0431
|
ц 0446
|
д 0434
|
е 0435
|
ф 0444
|
г 0433
|
х 0445
|
и 0438
|
й 0439
|
к 043A
|
л 043B
|
м 043C
|
н 043D
|
о 043E
|
| Dx | п 043F
|
я 044F
|
р 0440
|
с 0441
|
т 0442
|
у 0443
|
ж 0436
|
в 0432
|
ь 044C
|
ы 044B
|
з 0437
|
ш 0448
|
э 044D
|
щ 0449
|
ч 0447
|
ъ 044A
|
| Ex | Ю 042E
|
А 0410
|
Б 0411
|
Ц 0426
|
Д 0414
|
Е 0415
|
Ф 0424
|
Г 0413
|
Х 0425
|
И 0418
|
Й 0419
|
К 041A
|
Л 041B
|
М 041C
|
Н 041D
|
О 041E
|
| Fx | П 041F
|
Я 042F
|
Р 0420
|
С 0421
|
Т 0422
|
У 0423
|
Ж 0416
|
В 0412
|
Ь 042C
|
Ы 042B
|
З 0417
|
Ш 0428
|
Э 042D
|
Щ 0429
|
Ч 0427
|
Ъ 042A
|
Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.
Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).
See also
References
- ^ “Code page 1167 information document”. Archived from the original on 2017-01-16.
- ^ “CCSID 1167 information document”. Archived from the original on 2016-03-27.
- ^ Leisher, Mark (1999-12-20), KOI8-RU Belarusian/Ukrainian Cyrillic to Unicode 2.1 mapping table, KOI8RU.TXT, archived from the original on 2020-07-28, retrieved 2020-04-29
- ^ Code Page CPGID 01167 (pdf) (PDF), IBM
- ^ Code Page CPGID 01167 (txt), IBM
External links
- Nechayev, Valentin (2013) [2001]. “Review of 8-bit Cyrillic encodings universe”. Archived from the original on 2016-12-05. Retrieved 2016-12-05.