CTYPE(3) | Library Functions Manual | CTYPE(3) |
ctype
—
#include <ctype.h>
isalpha
(int
c);
isupper
(int
c);
islower
(int
c);
isdigit
(int
c);
isxdigit
(int
c);
isalnum
(int
c);
isspace
(int
c);
ispunct
(int
c);
isprint
(int
c);
isgraph
(int
c);
iscntrl
(int
c);
isblank
(int
c);
toupper
(int
c);
tolower
(int
c);
See the specific manual pages for information about the test or conversion performed by each function.
const char *s = "xyz"; while (*s != '\0') { putchar(toupper((unsigned char)*s)); s++; }
isblank
(),
conform to ANSI X3.159-1989
(“ANSI C89”). All described functions, including
isblank
(), also conform to IEEE Std
1003.1-2001 (“POSIX.1”).
EOF
(which has a negative
value), or must be a non-negative value within the range representable as
unsigned char. Passing invalid values leads to undefined
behavior.
Values of type int that were returned by
getc(3),
fgetc(3), and similar functions
or macros are already in the correct range, and may be safely passed to
these ctype
functions without any casts.
Values of type char or
signed char must first be cast to
unsigned char, to ensure that the values are within
the correct range. Casting a negative-valued char or
signed char directly to int will
produce a negative-valued int, which will be outside
the range of allowed values (unless it happens to be equal to
EOF
, but even that would not give the desired
result).
Because the bugs may manifest as silent misbehavior or as crashes
only when fed input outside the US-ASCII range, the
NetBSD implementation of the
ctype
functions is designed to elicit a compiler
warning for code that passes inputs of type char in
order to flag code that may pass negative values at runtime that would lead
to undefined behavior:
#include <ctype.h> #include <locale.h> #include <stdio.h> int main(int argc, char **argv) { if (argc < 2) return 1; setlocale(LC_ALL, ""); printf("%d %d\n", *argv[1], isprint(*argv[1])); printf("%d %d\n", (int)(unsigned char)*argv[1], isprint((unsigned char)*argv[1])); return 0; }
When compiling this program, GCC reports a warning for the line that passes char. At runtime, you may get nonsense answers for some inputs without the cast — if you're lucky and it doesn't crash:
% gcc -Wall -o test test.c test.c: In function 'main': test.c:12:2: warning: array subscript has type 'char' % LC_CTYPE=C ./test $(printf '\270') -72 5 184 0 % LC_CTYPE=C ./test $(printf '\377') -1 0 255 0 % LC_CTYPE=fr_FR.ISO8859-1 ./test $(printf '\377') -1 0 255 2
Some implementations of libc, such as glibc as of 2018, attempt to
avoid the worst of the undefined behavior by defining the functions to work
for all integer inputs representable by either unsigned
char or char, and suppress the warning. However,
this is not an excuse for avoiding conversion to unsigned
char: if EOF
coincides with any such value, as
it does when it is -1 on platforms with signed char,
programs that pass char will still necessarily confuse
the classification and mapping of EOF
with the
classification and mapping of some non-EOF inputs.
January 15, 2019 | NetBSD 9.0 |