Defines character classification, case conversion, and other character attributes.
The LC_CTYPE category of a locale definition source file defines character classification, case conversion, and other character attributes. This category begins with an LC_CTYPE category header and terminates with an END LC_CTYPE category trailer.
All operands for LC_CTYPE category statements are defined as lists of characters. Each list consists of one or more semicolon-separated characters or symbolic character names.
The following keywords are recognized in the LC_CTYPE category. In the descriptions, the term automatically included means that an error does not occur if the referenced characters are included or omitted. The characters will be provided if they are missing and will be accepted if they are present.
Item | Description |
---|---|
copy | Specifies the name of an existing locale to be used as the definition of this category. If a copy statement is included in the file, no other keyword can be specified. |
upper | Defines uppercase letter characters. No character defined by the cntrl, digit, punct, or space keyword can be specified. At a minimum, the uppercase letters A-Z must be defined. |
lower | Defines lowercase letter characters. No character defined by the cntrl, digit, punct, or space keyword can be specified. At a minimum, the lowercase letters a-z must be defined. |
alpha | Defines all letter characters. No character defined by the cntrl, digit, punct, or space keyword can be specified. Characters defined by the upper and lower keywords are automatically included in this character class. |
digit | Defines numeric digit characters. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. |
alnum | Defines alphanumeric characters. No character defined by the cntrl, punct, or space keyword can be specified. Characters defined by the alpha and digit keywords are automatically included in this character class. |
space | Defines whitespace characters. No character defined by the upper, lower, alpha, digit, graph, cntrl, or xdigit keyword can be specified. At a minimum, the <space>, <form-feed>, <newline>, <carriage return>, <tab>, and <vertical-tab> characters, and any characters defined by the blank keyword, must be specified. |
cntrl | Defines control characters. No character defined by the upper, lower, alpha, digit, punct, graph, print, xdigit, or space keyword can be specified. |
punct | Defines punctuation characters. A character defined as the <space> character and characters defined by the upper, lower, alpha, digit, cntrl, or xdigit keyword cannot be specified. |
graph | Defines printable characters, excluding the <space> character. If this keyword is not specified, characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class. No character defined by the cntrl keyword can be specified. |
Defines printable characters, including the <space> character. If this keyword is not specified, the <space> character and characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class. No character defined by the cntrl keyword can be specified. | |
xdigit | Defines hexadecimal digit characters. The digits 0-9 and the letters A-F and a-f can be specified. The xdigit keyword defaults to its normal class limits. |
blank | Defines blank characters. If this keyword is not specified, the <space> and <horizontal-tab> characters are included in this character class. Any characters defined by this statement are automatically included in the space keyword class. |
charclass | Defines one or more locale-specific character class names as strings separated by semicolons. Each named character class can then be defined subsequently in the LC_CTYPE definition. A character class name consists of at least one, and at most 32 bytes, of alphanumeric characters from the portable character set symbols. The first character of a character class name cannot be a digit. The name cannot match any of the LC_CTYPE keywords defined in this section. |
charclass-name | Defines characters to
be classified as belonging to the named
locale-specific character class. Locale-specific named character classes
need not exist in the POSIX locale. If a class name is defined by a charclass keyword, but no characters are subsequently assigned to it, it represents a class without any characters belonging to it. The charclass-name can be used as the Property parameter in the wctype subroutine, in regular expressions and shell pattern-matching expressions, and by the tr command. |
toupper | Defines the mapping of lowercase characters to uppercase characters. Operands for this keyword consist of semicolon-separated character pairs. Each character pair is enclosed in ( ) (parentheses) and separated from the next pair by a , (comma). The first character in each pair is considered lowercase; the second character is considered uppercase. Only characters defined by the lower and upper keywords can be specified. |
tolower | Defines the mapping of uppercase characters to lowercase characters. Operands for this keyword consist of semicolon-separated character pairs. Each character pair is enclosed in ( ) (parentheses) and separated from the next pair by a , (comma). The first character in each pair is considered uppercase; the second character is considered lowercase. Only characters defined by the lower and upper keywords can be specified. |
The tolower keyword is optional. If this keyword is not specified, the mapping defaults to the reverse mapping of the toupper keyword, if specified. If the toupper and tolower keywords are both unspecified, the mapping for each defaults to that of the C locale.
The LC_CTYPE category does not support multicharacter elements. For example, the German sharp-s character is traditionally classified as a lowercase letter. There is no corresponding uppercase letter; in proper capitalization of German text, the sharp-s character is replaced by the two characters ss. This kind of conversion is outside of the scope of the toupper and tolower keywords.
The following is an example of a possible LC_CTYPE category listed in a locale definition source file:
LC_CTYPE
#"alpha" is by default "upper" and "lower"
#"alnum" is by default "alpha" and "digit"
#"print" is by default "alnum", "punct" and the space character
#"graph" is by default "alnum" and "punct"
#"tolower" is by default the reverse mapping of "toupper"
#
upper <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\
<N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z>
#
lower <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\
<n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z>
#
digit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\
<seven>;<eight>;<nine>
#
space <tab>;<newline>;<vertical-tab>;<form-feed>;\
<carriage-return>;<space>
#
cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;/
<form-feed>;<carriage-return>;<NUL>;<SOH>;<STX>;/
<ETX>;<EOT>;<ENQ>;<ACK>;<SO>;<SI>;<DLE>;<DC1>;<DC2>;/
<DC3>;<DC4>;<NAK>;<SYN>;<ETB>;<CAN>;<EM>;<SUB>;/
<ESC>;<IS4>;<IS3>;<IS2>;<IS1>;<DEL>
#
punct <exclamation-mark>;<quotation-mark>;<number-sign>;\
<dollar-sign>;<percent-sign>;<ampersand>;<asterisk>;\
<apostrophe>;<left-parenthesis>;<right-parenthesis>;
<plus-sign>;<comma>;<hyphen>;<period>;<slash>;/
<colon>;<semicolon>;<less-than-sign>;<equals-sign>;\
<greater-than-sign>;<question-mark>;<commercial-at>;\
<left-square-bracket>;<backslash>;<circumflex>;\
<right-square-bracket>;<underline>;<grave-accent>;\
<left-curly-bracket>;<vertical-line>;<tilde>;\
<right-curly-bracket>
#
xdigit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\
<seven>;<eight>;<nine>;<A>;<B>;<C>;<D>;<E>;<F>;\
<a>;<b>;<c>;<d>;<e>;<f>
#
blank <space>;<tab>
#
toupper (<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\
(<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\
(<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\
(<p>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\
(<u>,<U>);(<v>,<V>);(<w>,<W>);(<x>,<X>);(<y>,<Y>);\
(<z>,<Z>)
#
END LC_CTYPE
Item | Description |
---|---|
/usr/lib/nls/loc/* | Specifies locale definition source files for supported locales. |
/usr/lib/nls/charmap/* | Specifies character set description (charmap) source files for supported locales. |