Να κι ένα καλό νέο...
--- Begin Message ---
- Subject: UTF-8 manual pages
- From: Colin Watson <cjwatson [ at ] debian [ dot ] org>
- Date: Mon, 11 Feb 2008 12:02:37 +0000
Manual pages may now be installed in UTF-8 ========================================== Historically, translated manual pages have been installed using a variety of character encodings, usually legacy ones (ISO-8859-*, KOI8-R, EUC-*, and so on). While these encodings are still supported, I now recommend that Debian developers begin to install all manual pages in UTF-8. User locales are unaffected by this change. Provided that all the characters involved can be handled in the locale in question, manual pages installed in UTF-8 or indeed in other encodings will display just as well regardless of the locale. Pages should continue to be installed in /usr/share/man/LL where LL is the ISO-639-1 code for the language. Country codes should not be used unless they make a significant difference to the language (as with pt_BR, zh_CN, and zh_TW). There is no need to include an encoding in the directory name. Dependencies ------------ The necessary support in man-db is now in testing, and consensus on debian-policy was that no additional dependencies would in general be needed when converting pages to UTF-8, any more than a package delivering a file in a new version of HTML would need to conflict with browsers that do not implement it. As an exception, maintainers of packages consisting solely of translated manual pages may choose to conflict with man-db (<< 2.5.1-1). Migration arrangements ---------------------- For packages using debhelper and dh_installman, a simple rebuild with 6.0.5 or newer will install your manual pages in the UTF-8 character encoding automatically [1]. If you do not wish to use dh_installman, then the debhelper patch may give you an idea of how to do the same thing by hand. If dh_installman guesses the source encoding wrongly, see manconv(1) for an override mechanism. Manual pages that are maintained in the Debian diff or in Debian-native packages may have their source form migrated to UTF-8 at your convenience, perhaps in consultation with translators. Ordinary files may be converted using 'iconv -f <original encoding> -t UTF-8', although make sure to check the result to ensure that you have not produced double-encoded UTF-8 (i.e. garbage) by mistake. For manual pages produced using po4a, adding opt:"-L UTF-8" to the [type:man] section in po4a.cfg, converting any addenda to UTF-8 as above, and regenerating the output files should be sufficient. There should generally be no need to ask upstream maintainers to convert their manual pages to UTF-8. Support for legacy systems may often require the use of legacy encodings, and the measures above mean that we can move gradually towards a fully UTF-8 system without needing to disturb their existing arrangements. After migration --------------- If you convert your source to UTF-8, note that current groff limitations mean that you must ensure that all characters in the source should continue to be representable in the usual legacy encoding for that language (so, for example, a French manual page may not contain Russian characters since those are not available in ISO-8859-1); this occasionally causes problems, particularly when writing authors' names. groff_char(7) may help you if this is a problem for you. You should avoid using special Unicode punctuation characters such as hyphens, dashes, and so on in manual page source files. See groff_char(7) for safe equivalents. This work is intended for better representation of alphabetic characters, not so that we can use more Unicode gadgets. Other software -------------- Graphical manual page viewers may have problems with differing encodings, although usually not significantly worse than the problems they already had with encoding soup. I have sent patches for yelp [2] and konqueror [3]; at least xman and tkman could also do with work from developers who understand them. As a general rule, viewers that implement their own manual page rendering engines should read source files via 'man --recode UTF-8', instruct the rendering engine to expect UTF-8, and depend on man-db (>= 2.5.1-1). Viewers that use groff, troff, or nroff to format manual pages should instead use 'man -Tutf8' (which also removes the need to call tbl, eqn, et al explicitly), instruct the display code to expect UTF-8, and depend on man-db. Policy manual ------------- A policy amendment [4] is in progress to ratify these arrangements. Acknowledgements and references ------------------------------- Thanks to Adam Borowski, Jens Seidel, Russ Allbery, Brian M. Carlson, Joey Hess, and others for discussion and work leading up to this. I posted some blog entries [5] [6] while working on this, which may be interesting for historical context. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462937 "debhelper: recode manual pages to UTF-8" [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=465229 "yelp: Recode manual pages to UTF-8" [3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=449554 "konqueror: man pages viewed in konqueror are not in utf-8 (but in iso8859 for fr ...)" [4] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440420 "[AMENDMENT 11/02/2008] Manual page encoding" [5] http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2007-09-17-man-db-encodings.html "Encodings in man-db" [6] http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2008-01-29-utf-8-manual-pages.html "UTF-8 manual pages" Thanks, -- Colin Watson [cjwatson [ at ] debian [ dot ] org]Attachment: signature.asc
Description: Digital signature
--- End Message ---