Professional Documents
Culture Documents
https://en.wikipedia.org/wiki/C_string_handling#s...
C string handling
From Wikipedia, the free encyclopedia
Contents
1 Denitions
2 Character encodings
3 Overview of functions
3.1 Constants and types
3.2 Functions
3.2.1 Multibyte functions
3.3 Numeric conversions
4 Popular extensions
5 Replacements
6 See also
7 Notes
8 References
9 External links
Denitions
A string is a contiguous sequence of code units terminated by the rst zero code
(\0, corresponding to the null character). In C, there are two types of strings:
string, which is sometimes called byte string which uses the type chars as code
units (one char is at least 8 bits), and wide string[1] which uses the type wchar_t as
code units.
A common misconception is that all char arrays are strings, because string literals
are converted to arrays during the compilation (or translation) phase.[2] It is
important to remember that a string ends at the rst zero code unit. An array or
string literal that contains a zero before the last byte therefore contains a string,
1 of 12
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
Character encodings
Each string ends at the rst occurrence of the zero code unit of the appropriate
kind (char or wchar_t). Consequently, a byte string can contain non-NUL characters
in ASCII or any ASCII extension, but not characters in encodings such as UTF-16
(even though a 16-bit code unit might be nonzero, its high or low byte might be
zero). The encodings that can be stored in wide strings are dened by the width of
wchar_t. In most implementations, wchar_t is at least 16 bits, and so all 16-bit
encodings, such as UCS-2, can be stored. If wchar_t is 32-bits, then 32-bit
encodings, such as UTF-32, can be stored.
Variable-width encodings can be used in both byte strings and wide strings. String
length and osets are measured in bytes or wchar_t, not in "characters", which can
be confusing to beginning programmers. UTF-8 and Shift JIS are often used in C
byte strings, while UTF-16 is often used in C wide strings when wchar_t is 16 bits.
Truncating strings with variable length characters using functions like strncpy can
produce invalid sequences at the end of the string. This can be unsafe if the
truncated parts are interpreted by code that assumes the input is valid.
Support for Unicode literals such as char foo[512] = "";(UTF-8) or wchar_t
foo[512] = L""; (UTF-16 or UTF-32) is implementation dened, [4] and may
require that the source code be in the same encoding. Some compilers or editors
will require entering all non-ASCII characters as \xNN sequences for each byte of
UTF-8, and/or \uNNNN for each word of UTF-16.
Overview of functions
Most of the functions that operate on C strings are declared in the string.h header
(cstring in C++), while functions that operate on C wide strings are declared in the
2 of 12
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
Notes
NULL
wchar_t
Type used for a code unit in wide strings, usually either 16 or 32 bits.
No specic interpretation is specied for these code units; the C
standard requires of a wchar_t only that it be at least as large as a char,
not that it can actually store Unicode code points or UTF-16 code
units.[5]
wint_t
Integer type that can hold any value of a wchar_t as well as the value of
the macro WEOF. This type is unchanged by integral promotions.
Usually a 32 bit signed value.
mbstate_t
Contains all the information about the conversion state required from
one call to a function to the other.
Functions
3 of 12
09/07/2016 01:02 PM
Byte
string
String
manipulation
String
examination
Miscellaneous
Memory
manipulation
4 of 12
Wide
string
https://en.wikipedia.org/wiki/C_string_handling#s...
Description[note 1]
strcpy[6]
wcscpy[7]
strncpy[8]
wcsncpy[9]
strcat[10]
wcscat[11]
strncat[12]
wcsncat[13]
strxfrm[14]
wcsxfrm[15]
strlen[16]
wcslen[17]
strcmp[18]
wcscmp[19]
strncmp[20]
wcsncmp[21]
strcoll[22]
wcscoll[23]
strchr[24]
wcschr[25]
strrchr[26]
wcsrchr[27]
strspn[28]
wcsspn[29]
strcspn[30]
wcscspn[31]
strpbrk[32]
wcspbrk[33]
strstr[34]
wcsstr[35]
strtok[36]
wcstok[37]
strerror[38]
N/A
memset[39]
wmemset[40]
memcpy[41]
wmemcpy[42]
memmove[43]
wmemmove[44]
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
memcmp[45]
wmemcmp[46]
memchr[47]
wmemchr[48]
Multibyte functions
Name
Description
mblen[49]
mbtowc[50]
wctomb[51]
mbstowcs[52]
wcstombs[53]
btowc[54]
wctob[55]
mbsinit[56]
mbrlen[57]
mbrtowc[58]
wcrtomb[59]
mbsrtowcs[60]
wcsrtombs[61]
"state" is used by encodings that rely on history such as shift states. This is not
needed by UTF-8 or UTF-32. UTF-16 uses them to keep track of surrogate pairs
and to hide the fact that it actually is a multi-word encoding.
Numeric conversions
5 of 12
09/07/2016 01:02 PM
Byte
string
https://en.wikipedia.org/wiki/C_string_handling#s...
Wide
string
Description[note 1]
N/A
N/A
atof[62]
atoi
atol
atoll[63]
strtof
(C99)[64]
strtod[65]
strtold
(C99)[66]
wcstof
(C99)[67]
wcstod[68]
wcstold
(C99)[69]
strtol
wcstol
strtoll[70]
wcstoll[71]
strtoul
wcstoul
strtoull[72]
wcstoull[73]
The C standard library contains several functions for numeric conversions. The
functions that deal with byte strings are dened in the stdlib.h header (cstdlib
header in C++). The functions that deal with wide strings are dened in the
wchar.h header (cwchar header in C++).
The strtoxxx functions are not const-correct, since they accept a const string pointer
and return a non-const pointer within the string. Also, since the Normative
Amendment 1 (C95), atoxx functions are considered subsumed by strtoxxx
functions, for which reason neither C95 nor any later standard provides
wide-character versions of these functions.[74]
Popular extensions
6 of 12
09/07/2016 01:02 PM
Name
https://en.wikipedia.org/wiki/C_string_handling#s...
Platform
Description
memccpy[75]
SVID, POSIX
mempcpy[76]
GNU
strcasecmp[77]
POSIX, BSD
case-insensitive versions of
strcat_s[78]
C (2011) and
ISO/IEC WDTR
24731
strcpy_s[79]
C (2011) and
ISO/IEC WDTR
24731
strdup[80]
POSIX
strerror_r[81]
POSIX 1, GNU
stricmp[82]
Various
case-insensitive versions of
strlcpy[83]
BSD, Solaris
strlcat[83]
BSD, Solaris
strsignal[85]
POSIX:2008
strtok_r[86]
POSIX
a variant of
strtok
strcmp
strcmp
that is thread-safe
Replacements
Despite the well-established need to replace strcat[10] and strcpy[6] with functions
that do not allow buer overows, no accepted standard has arisen. This is partly
due to the mistaken belief by many C programmers that strncat and strncpy have
the desired behavior; however, neither function was designed for this (they were
intended to manipulate null-padded xed-size string buers, a data format less
commonly used in modern software), and the behavior and arguments are
non-intuitive and often written incorrectly even by expert programmers. [84]
The most popular[a] replacement are the strlcat and strlcpy functions, which
appeared in OpenBSD 2.4 in December, 1998.[84] These functions always write
one NUL to the destination buer, truncating the result if necessary, and return
the size of buer that would be needed, which allows detection of the truncation
7 of 12
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
and provides a size for creating a new buer that will not truncate. They have
been criticized on the basis of allegedly being ineicient, [87] encouraging the use
of C strings and creating more problems than initially trying to solve. [88][89]
Consequently, they have not been included in the GNU C library (used by software
on Linux), although they are implemented in OpenBSD, FreeBSD, NetBSD, Solaris,
OS X, and QNX. The lack of GNU C library support has not stopped various library
authors from using it and bundling a replacement, among other SDL, GLib,
mpeg, rsync, and even internally in the Linux kernel. Open source
implementations for these functions are available.[90][91]
As part of its 2004 Security Development Lifecycle, Microsoft introduced a family
of "secure" functions, such as strcpy_s and strcat_s (along with many others);[92]
these functions were later standardized with some minor changes, and are now
part of the optional C11 (Annex K) as proposed by ISO/IEC WDTR 24731. These
functions perform runtime integrity checks of their arguments; if the checks fail, a
user-specied "runtime-constraint handler" function is called.[93] If the user has
not specied such a function, the default behavior is implementation-dened. [94]
Microsoft's C runtime will abort the program when the constraints are
violated.[95] Some functions perform destructive operations before calling the
runtime-constraint handler; for example, strcat_s sets the destination to the empty
string,[96] which can make it diicult to recover from error conditions or debug
them. These functions attracted considerable criticism because initially they were
implemented only on Windows, and at the same time warning messages started to
be produced by Microsoft Visual C++, suggesting the programmers to use these
functions instead of standard ones. This has been speculated by some to be an
attempt by Microsoft to lock developers into its platform.[97] Although
open-source implementations of these functions are available,[98] these functions
are not present in common Unix C libraries. Experience with the safe functions
has shown signicant problems with their adoption and the removal of the
optional C11 (Annex K) is proposed for the next revision of the standard[99]
If the string length is known, then memcpy[41] or memmove[43] can be more eicient
than strcpy, so some programs use them to optimize C string manipulation. They
accept a buer length as a parameter, so they can be employed to prevent buer
overows in a manner similar to the aforementioned functions.
See also
C syntax Strings source code syntax, including backslash escape
sequences
String functions
Null-terminated string
8 of 12
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
Notes
a. On GitHub, there are 7,813,206 uses of
15,286,150 uses of strcpy).
strlcpy,
strcpy_s
(and
References
1. "The C99 standard draft + TC3" (PDF). 7.1.1p1. Retrieved 7 January 2011.
2. "The C99 standard draft + TC3" (PDF). 6.4.5p7. Retrieved 7 January 2011.
3. "The C99 standard draft + TC3" (PDF). Section 6.4.5 footnote 66. Retrieved 7 January
2011.
4. "The C99 standard draft + TC3" (PDF). 5.1.1.2 Translation phases, p1. Retrieved
23 December 2011.
5. Gillam, Richard (2003). Unicode Demystied: A Practical Programmer's Guide to the
Encoding Standard. Addison-Wesley Professional. p.714.
6. "strcpy - cppreference.com". En.cppreference.com. 2014-01-02. Retrieved
2014-03-06.
7. "wcscpy - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
8. "strncpy - cppreference.com". En.cppreference.com. 2013-10-04. Retrieved
2014-03-06.
9. "wcsncpy - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
10. "strcat - cppreference.com". En.cppreference.com. 2013-10-08. Retrieved
2014-03-06.
11. "wcscat - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
12. "strncat - cppreference.com". En.cppreference.com. 2013-07-01. Retrieved
2014-03-06.
13. "wcsncat - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
14. "strxfrm - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
15. "wcsxfrm - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
16. "strlen - cppreference.com". En.cppreference.com. 2013-12-27. Retrieved
2014-03-06.
17. "wcslen - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
18. "strcmp - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
19. "wcscmp - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
20. "strncmp - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
21. "wcsncmp - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
22. "strcoll - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
23. "wcscoll - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
24. "strchr - cppreference.com". En.cppreference.com. 2014-02-23. Retrieved
2014-03-06.
25. "wcschr - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
26. "strrchr - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
27. "wcsrchr - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
28. "strspn - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
29. "wcsspn - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
30. "strcspn - cppreference.com". En.cppreference.com. 2013-05-31. Retrieved
2014-03-06.
31. "wcscspn - cppreference.com". En.cppreference.com. Retrieved 2014-03-06.
9 of 12
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
10 of 12
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
External links
11 of 12
09/07/2016 01:02 PM
https://en.wikipedia.org/wiki/C_string_handling#s...
Fast memcpy in C
(http://www.danielvik.com/2010/02/fastmemcpy-in-c.html), multiple C coding
examples to target dierent types of CPU
instruction architectures
The Wikibook C
Programming has a
page on the topic of: C
Programming/Strings
12 of 12
09/07/2016 01:02 PM