您正在查看: c++ 分类下的文章

14.1 Different Character Encodings

14.1 Different Character Encodings

One area internationalization addresses is how to handle different character encodings. This issue arises mainly in Asia, where different encodings are used to represent the same character set. The issue normally comes in conjunction with character encodings that use more than 8 bits. To process such characters, it is necessary to use new concepts and functions for text processing.

14.1.1
Wide-Character and Multibyte Text

Two different approaches are common to address character sets that have more than 256 characters: multibyte representation and wide-character representation:

  • With multibyte representation, the number of bytes used for a character is variable. A 1 -byte character, such as an ISO Latin-1 character, can be followed by a 3-byte character, such as a Japanese ideogram.

  • With wide-character representation, the number of bytes used to represent a character is always the same, independent of the character being represented. Typical representations use 2 or 4 bytes. Conceptually, this does not differ from representations that use just I byte for locales, where ISO Latin-1 or even ASCII is sufficient.

        </li>
    
    
    <p>This multibyte representation is more compact than the wide-character representation. Thus, the multibyte representation is normally used to store data outside of programs. Conversely, it is much easier to process characters of fixed size, so the wide-character representation is usually used inside programs.</p>
    
    <p>Like ISO C, ISO C++ uses the type wchar_t to represent wide characters. However in C++, wchar_t is a keyword rather than a type definition. Thus, it is possible to overload all functions with this type.</P>
    
    <P>In a multibyte string, the same byte may represent a character or even just a part of the character. During iteration through a multibyte string, each byte is interpreted according to a current "shift state." Depending on the value of the byte and the current shift state, a byte may represent a certain character or a change of the current shift state. A multibyte string always starts in some defined initial shift state. For example, in the initial shift state the bytes may represent ISO Latin-1 characters until an escape character is encountered. The character following the escape character identifies the new shift state. For example, that character may switch to a shift state in which the bytes are interpreted as Arabic characters until the next escape character is encountered.</P>
    
    <P>The class template codecvt&lt;> (described in Section 14.4.4,) is used to convert between different character encodings. This class is used mainly by the class basic_filebuf &lt;> (see page 627) to convert between internal and external representations. The C++ standard actually makes no assumptions about multibyte character encodings, but it supports the notion of shift states. The members of the codecvt&lt;> class support an argument that may be used to store an arbitrary state of a string. They also support a function intended to determine the character sequence used to return to the initial shift state.</p>
    

    14.1.2
    Character Traits

    The different representations of character sets imply variations that are relevant for the processing of strings and I/O. For example, the value used to represent "end-of-file" or the details of comparing characters may differ for representations.

    <p>The string and stream classes are intended to be instantiated with built-in types, especially with char and wchar_t. The interface of built-in types cannot be changed. Thus, the details on how to deal with aspects that depend on the representation are factored into a separate class, a so-called "traits class." Both the string and stream classes take a traits class as a template argument. This argument defaults to the class char_traits, parameterized with the template argument that defines the character type of the string or stream:</P>
    
    <Pre>
        
    

    namespace std {
    template<class charT,
    class traits = char_traits<charT>,
    class Allocator = allocator<charT> >
    class basic_string;
    }
    namespace std {
    template <class charT,
    class traits = char_traits<charT> >
    class basic_istream;
    template <class charT,
    class traits = char_traits<charT> >
    class basic_ostream;
    ...
    }

    </prE>
    
    <P>The character traits have type char_traits&lt;>. This type is defined in &lt;string> and is parameterized for the specific character type:</p>
    
    <pre>
        
    

    namespace std {
    template <class charT>
    struct char_traits {
    ...
    };
    }

    </pre>
    
    <p>The traits classes define all fundamental properties of the character type and the corresponding operations necessary for the implementation of strings and streams as static components. Table 14.1 lists the members of char_traits.</P>
    
    <P>The functions that process strings or character sequences are present for optimization only. They could also be implemented by using the functions that process single characters. For example, copy() can be implemented using assign(). However, there might be more efficient implementations when dealing with strings.</p>
    
    <P>Note that counts used in the functions are exact counts, not maximum counts. That is, string termination characters within these sequences are ignored.</P>
    
    <P>The last group of functions cares about the special processing of the character that represents end-of-file (EOF). This character extends the character set by an artificial character to indicate special processing. For some representations, the character type may be insufficient to accommodate this special character because it has to have a value that differs from the values of all "normal" characters of the character set. C established the convention to return a character as int instead of as char from functions reading characters. This technique was extended in C++. The character traits define char_type as the type to represent all characters, and int_type as the type to represent all characters plus EOF. The functions to_char_type(), to_int_type(), not_eof(), and eq_int_type() define the corresponding conversions and comparisons. It is possible that char_type and int_type are identical for some character traits. This can be the case if not all values of char_type are necessary to represent characters so that there is a spare value that can be used for end-of-file.</P>
    
    <P>pos_type and off_type are used to define file positions and offsets (see page 634 for details).</p>
    
    <p><table width="100%">
    

    Table?4.1. Character Traits Members

    Expression Meaning char_type The character type (that is, the template argument for char_traits) int_type A type large enough to represent an additional, otherwise unused value for end-of-file pos_type A type used to represent positions in streams off_type A type used to represent offsets between positions in streams state_type A type used to represent the current state in multibyte streams assign (c1,c2) Assigns character c2 to c1 eq(c1,c2) Returns whether the characters c1 and c2 are equal It(c1,c2) Returns whether character c1 is less than character c2 length (s) Returns the length of the string s compare (s1 ,s2 ,n) Compares up to n characters of strings s1 and s2 copy (s1,s2, n) Copies n characters of string s2 to string s1 move(s1,s2,n) Copies n characters of string s2 to string s1, where s1 and s2 may overlap assign (s, n,c) Assigns the character c to n characters of string s find(s,n,c) Returns a pointer to the first character in string s that is equal to c, or returns zero, if there is no such character among the first n characters eof() Returns the value of end-of-file to_int_type(c) Converts the character c into the corresponding representation as int_type to_char_type(i) Converts the representation i as int_type to a character (the result of converting EOF is undefined) not_eof (i) Returns the value i unless i is the value for EOF; in this case an implementation-dependent value different from EOF is returned eq_int_type(i1 ,i2) Tests the equality of the two characters i1 and i2 represented as int_type (that is, the characters may be EOF)

    <p>The C++ standard library provides specializations of char_traits&lt;> for types char and wchar_t:</p>
    
    <PRE>
        
    

    namespace std {
    template<> struct char_traits<char>;
    template<> struct char_traits<wchar_t>;
    }

    </PRE>
    
    <p>The specialization for char is usually implemented by using the global string functions of C that are defined in &lt;cstring> or &lt;string.h>. An implementation might look as follows:</p>
    
    <PRE>
        
    

    template<> struct char_traits<char> {
    //type definitions:
    typedef charchar_type;
    typedef int int_type;
    typedef streampos pos_type;
    typedef streamoff off_type;
    typedef mbstate_t state_type;

    //functions:
    static void assign(char& c1, const char& c2) {
    c1 = c2;
    }
    static bool eq(const char& c1, const char& c2) {
    return c1 == c2;
    }
    static bool It(const char& c1, const char& c2) {
    return c1 < c2;
    }
    static size_t length(const char* s) {
    return strlen(s);
    }
    static int compare(const char* s1, const char* s2, size_t n) {
    return memcmp(s1,s2,n);
    }
    static char* copy(char* s1, const char* s2, size_t n) {
    return (char)memcpy(s1,s2,n);
    }
    static char
    move(char* s1, const char* s2, size_t n) {
    return (char)memmove(s1,s2,n);
    }
    static char
    assign(char* s, size_t n, char c) {
    return (char)memset(s,c,n);
    }
    static const char
    find(const char* s, size_t n,
    const char& c) {
    return (const char*)memchr(s,c,n);
    }
    static int eof() {
    return EOF;
    }
    static int to_int_type(const char& c) {
    return (int)(unsigned char)c;
    }
    static char to_char_type(const int& i) {
    return (char)i;
    }
    static int not_eof(const int& i) {
    return i!=EOF ? i : !EOF;
    }
    static bool eq_int_type(const int& i1, const int& i2) {
    return i1 == i2;
    }
    };

    </pre>
    
    <p>See Section 11.2.14, for the implementation of a user-defined traits class that lets strings behave in a case-insensitive manner.</p>
    

    14.1.3
    Intelnationalization of Special Characters

    One issue in conjunction with character encodings remains: How are special characters such as the newline or the string termination character internationalized? The class basic_ios has members widen() and narrow() that can be used for this purpose. Thus, the newline character in an encoding appropriate for the stream strm can be written as follows:

    <PRE>
        
    

    strm. widen ('\n') // internationalized newline character

    </prE>
    
    <P>The string termination character in the same encoding can be created like this:</P>
    
    <Pre>
        
    

    strm. widen ('\0')// internationalized string termination character

    </PRE>
    
    <P>See the implementation of the end1 manipulator on page 613 for an example use.</p>
    
    <p>The functions widen() and narrow() actually use a locale object, more precisely the ctype facet of this object. This facet can be used to convert all characters between char and some other character representations. It is described in Section 14.4.4,. For example, the following expression converts the character c of type char into an object of type char_type by using the locale object loc[2]
    

    :

    [2] Note that you have to put a space between the two ">" characters. ">>" would be parsed as shift operator, which would result in a syntax error.

    <prE>
        
    

    std::use_facet<std::ctype<char_type> >(loc).widen(c)

    </PRE>
    
    <P>The details of the use of locales and their facets are described in the following sections.</P>
    
    
        
    
  • 14.2 The Concept of Locales

    14.2 The Concept of Locales

    A common approach to internationalization is to use environments, called locales, to encapsulate national or cultural conventions. The C community uses this approach. Thus, in the context of internationalization, a locale is a collection of parameters and functions used to support national or cultural conventions. According to X/Open conventions,[3] the environment variable LANG is used to define the locale to be used. Depending on this locale, different formats for floating-point numbers, dates, monetary values, and so on are used.

    [3] POSIX and X/Open are standards for operating system interfaces.

    The format of the string defining a locale is normally this:

    language [_area [.code]]

    language represents the language, such as English or German, area is the area, country, or culture where this language is used. It is used, for example, to support different national conventions even if the same language is used in different nations. code defines the character encoding to be used. This is mainly important in Asia, where different character encodings are used to represent the same character set.

    Table 14.2 presents a selection of typical language strings. However, note that these strings are not yet standardized. For example, sometimes the first character of language is capitalized. Some implementations deviate from the format mentioned previously and, for example, use english to select an English locale. All in all, the locales that are supported by a system are implementation specific.

    For programs, it is normally no problem that these names are not standardized! This is because the locale information is provided by the user in some form. It is common that programs simply read environment variables or some similar database to determine which locales to use. Thus, the burden of finding the correct locale names is put on the users. Only if the program always uses a special locale does the name need to be hard coded in the program. Normally, for this case, the C locale is sufficient, and is guaranteed to be supported by all implementations and to have the name C.

    The next section presents the use of different locales in C++ programs. In particular, it introduces facets of locales that are used to deal with specific formatting details.

    C also provides an approach to handle the problem of character sets with more than 256 characters. This approach is to use the character type wchar_t, a type definition for one of the integral types with language support for wide-character constants and wide-character string literals. However, apart from this, only functions to convert between wide characters and narrow characters are supported. This approach was also incorporated into C++ with the character type wchar_t, which is, unlike the C approach, a distinct type in C++. However, C++ provides more library support than C, because basically everything available for char is also available for wchar_t, and any other type may be used as a character type.

    Table?4.2. Selection of Locale Names
    Locale Meaning
    c Default: ANSI-C conventions (English, 7 bit)
    de_DE German in Germany
    de_DE. 88591 German in Germany with ISO Latin-1 encoding
    de_AT German in Austria
    de_CH German in Switzerland
    en_US English in the United States
    en_GB English in Great Britain
    en_AU English in Australia
    en_CA English in Canada
    fr_FR French in France
    fr_CH French in Switzerland
    fr_CA French in Canada
    ja_JP. jis Japanese in Japan with Japanese Industrial Standard (JIT) encoding
    ja_JP. sjis Japanese in Japan with Shift JIS encoding
    ja_JP.ujis Japanese in Japan with UNIXized JIS encoding
    ja_JP.EUC Japanese in Japan with Extended UNIX Code encoding
    ko_KR Korean in Korea
    zh_CN Chinese in China
    zh_TW Chinese in Taiwan
    lt_LN.bit7 ISO Latin, 7 bit
    lt_LN.bit8 ISO Latin, 8 bit
    POSIX POSIX conventions (English, 7 bit)

    14.2.1
    Using Locales

    Using translations of textual messages is normally not sufficient for true internationalization. For example, different conventions for numeric, monetary, or date formatting also have to be used. In addition, functions manipulating letters should depend on character encoding to ensure the correct handling of all characters that are letters in a given language.

    <P>According to the POSIX and X/Open standards, it is already possible in C programs to set a locale. This is done using the function setlocale(). Changing the locale influences the results of character classification and manipulation functions, such as isupper() and toupper(), and the I/O functions, such as printf().</P>
    
    <P>However, the C approach has several limitations. Because the locale is a global property, using more than one locale at the same time (for example, when reading floating-point numbers in English and writing them in German) is either not possible or is possible only with a relatively large effort. Also, locales cannot be extended. They provide only the facilities the implementation chooses to provide. If something the C locales do not provide must also be adapted to national conventions, a different mechanism has to be used to do this. Finally, it is not possible to define new locales to support special cultural conventions.</P>
    
    <p>The C++ standard library addresses all of these problems with an object-oriented approach. First, the details of a locale are encapsulated in an object of type locale. Doing this immediately provides the possibility of using multiple locales at the same time. Operations that depend on locales are configured to use a corresponding locale object. For example, a locale object can be installed for each I/O stream, which is then used by the different member functions to adapt to the corresponding conventions. This is demonstrated by the following example:</p>
    
    <pre>
        
    

    // i18n/loc1.cpp

    #include <iostream>
    #include <locale>
    using namespace std;

    int main()
    {
    // use classic C locale to read data from standard input
    cin.imbue(locale::classic());

    // use a German locale to write data to standard ouput
    cout.imbue(locale("de_DE"));

    // read and output floating-point values in a loop

    double value;
    while (cin >> value) {
    cout << value << endl;
    }
    }

    </PRE>
    
    <P>The statement</P>
    
    <pRE>
        
    

    cin.imbue(locale::classic());

    </pRE>
    
    <p>assigns the "classic" C locale to the standard input channel. For the classic C locale, formatting of numbers and dates, character classification, and so on is handled as it is in original C without any locales. The expression</P>
    
    <PRE>
        
    

    std::locale::classic()

    </PRe>
    
    <p>obtains a corresponding object of class locale. Using the expression</p>
    
    <PRE>
        
    

    std::locale("C")

    </Pre>
    
    <p>instead would yield the same result. This last expression constructs a locale object from a given name. The name "C" is a special name, and actually is the only one a C++ implementation is required to support. There is no requirement to support any other locale, although it is assumed that C++ implementations also support other locales.</p>
    
    <p>Correspondingly, the statement</p>
    
    <pre>
        
    

    cout.imbue (locale("de_DE"));

    </prE>
    
    <P>assigns the locale de_DE to the standard output channel. This is, of course, successful only if the system supports this locale. If the name used to construct a locale object is unknown to the implementation, an exception of type runtime_error is thrown.</p>
    
    <p>If everything was successful, input is read according to the classic C conventions and output is written according to the German conventions. The loop thus reads floating-point values in the normal English format, for example</p>
    
    <PRE>
        
    

    47.11

    </Pre>
    
    <p>and prints them using the German format, for example</p>
    
    <PRE>
        
    

    47,11

    </Pre>
    
    <p>Yes, the Germans really use a comma as a "decimal point".</p>
    
    <P>Normally, a program does not predefine a specific locale except when writing and reading data in a fixed format. Instead, the locale is determined using the environment variable LANG. Another possibility is to read the name of the locale to be used. The following program demonstrates this:</p>
    
    <pre>
        
    

    // i18n/loc2.cpp

    #include <iostream>
    #include <locale>
    #include <string>
    #include <cstdlib>
    using namespace std;

    int main()
    {
    //create the default locale from the user's environment
    locale langLocale("');

    //and assign it to the standard ouput channel
    cout.imbue(langLocale);

    //process the name of the locale
    bool isGerman;
    if (langLocale.name() == "0de_DE" ||
    langLocale.name() == "de" ||
    langLocale.name() == "german") {
    isGerman = true;
    }
    else {
    isGerman = false;
    }

    //read locale for the input
    if (isGerman) {
    cout << "Sprachumgebung fuer Eingaben: ";
    }
    else {
    cout << "Locale for input: ";
    }
    string s;
    cin >> s;
    if (!cin) {
    if (isGerman) {
    cerr << "FEHLER beim Einlesen der Sprachumgebung"
    << endl;
    }
    else {
    cerr << "ERROR while reading the locale" << endl;
    }
    return EXIT.FAILURE;
    }
    locale cinLocale(s.c_str());

    //and assign it to the standard input channel
    cin.imbue(cinLocale);

    //read and output floating-point values in a loop
    double value;
    while (cin >> value) {
    cout << value << endl;
    }
    }

    </PRE>
    
    <p>In this example, the following statement creates an object of the class locale:</P>
    
    <PRe>
        
    

    locale langLocale("");

    </pre>
    
    <P>Passing an empty string as the name of the locale has a special meaning: The default locale from the user's environment is used (this is often determined by the environment variable LANG). This locale is assigned to the standard input stream with the statement</P>
    
    <Pre>
        
    

    cout.imbue(langLocale);

    </prE>
    
    <P>The expression</P>
    
    <Pre>
        
    

    langLocale.name()

    </pre>
    
    <p>is used to retrieve the name of the default locale, which is returned as an object of type string (see Chapter 11).</P>
    
    <P>The following statements construct a locale from a name read from standard input:</P>
    
    <PRE>
        
    

    string s;
    cin >> s;
    ...
    locale cinLocale(s.c_str());

    </pRE>
    
    <p>To do this, a word is read from the standard input and used as the constructor's argument. If the read fails, the ios_base:: failbit is set in the input stream, which is checked and handled in this program:</p>
    
    <pre>
        
    

    if (!cin) {
    if (isGerman) {
    cerr << "FEHLER beim Einlesen der Sprachumgebung"
    << endl;
    }
    else {
    cerr << "ERROR while reading the locale" << endl;
    }
    return EXIT_FAILURE;
    }

    </PRE>
    
    <P>Again, if the string is not a valid value for the construction of a locale, a runtime_error exception is thrown.</p>
    
    <p>If a program wants to honor local conventions, it should use corresponding locale objects. The static member function global() of the class locale can be used to install a global locale object. This object is used as the default value for functions that take an optional locale object as an argument. If the locale object set with the global() function has a name, it is also arranged that the C functions dealing with locales react correspondingly. If the locale set has no name, the consequences for the C functions depend on the implementation.</P>
    
    <P>Here is an example of how to set the global locale object depending on the environment in which the program is running:</P>
    
    <PRe>
        
    

    / * create a locale object depending on the program's environment and

    • set it as the global object
      */
      std::locale::global(std::locale(""));

      Among other things, this arranges for the corresponding registration for the C functions to be executed. That is, the C functions are influenced as if the following call was made:

    std::setlocale(LC_ALL,"")

    </prE>
    
    <P>However, setting the global locale does not replace locales already stored in objects. It only modifies the locale object copied when a locale is created with a default constructor. For example, the stream objects store locale objects that are not replaced by a call to locale::global(). If you want an existing stream to use a specific locale, you have to tell the stream to use this locale using the imbue() function.</p>
    
    <p>The global locale is used if a locale object is created with the default constructor. In this case, the new locale behaves as if it is a copy of the global locale at the time it was constructed. The following three lines install the default locale for the standard streams:</p>
    
    <pre>
        
    

    // register global locale object for streams
    std::cin.imbue(std::locale());
    std::cout.imbue(std::locale());
    std::cerr.imbue(std::locale());

    </pre>
    
    <p>When using locales in C++, it is important to remember that the C++ locale mechanism is only loosely coupled to the C locale mechanism. There is only one relation to the C locale mechanism: The global C locale is modified if a named C++ locale object is set as the global locale. In general, you should not assume that the C and the C++ functions operate on the same locales.</p>
    

    14.2.2
    Locale Facets

    The actual dependencies on national conventions are separated into several aspects that are handled by corresponding objects. An object dealing with a specific aspect of internationalization is called a facet. A locale object is used as a container of different facets. To access an aspect of a locale, the type of the corresponding facet is used as the index. The type of the facet is passed explicitly as a template argument to the template function use_facet(), accessing the desired facet. For example, the expression

    <PRE>
        
    

    std::use_facet<std::numpunct<char> >(loc)

    </PRE>
    
    <p>accesses the facet type numpunct for the character type char of the locale object loc. Each facet type is defined by a class that defines certain services. For example, the facet type numpunct provides services used in conjunction with the formatting of numeric and Boolean values. For example, the following expression returns the string used to represent true in the locale loc.</P>
    
    <PrE>
        
    

    std::use_facet<std::numpunct<char> >(loc).truename()

    </PRE>
    
    <P>Table 14.3 provides an overview over the facets predefined by the C++ standard library. Each facet is associated with a category. These categories are used by some of the constructors of locales to create new locales as the combination of other locales.</P>
    
    <P><table width="100%">
    

    Table?4.3. Facet Types Predefined by the C++ Standard Library

    Category Facet Type Used for numeric num_get<>() Numeric input 燑/foNT> num_put<>() Numeric output 燑/font> numpunct<>() Symbols used for numeric I/O time time_get<>() Time and date input 燑/font> time_put<>() Time and date output monetary money_get<>() Monetary input 燑/font> money_put<>() Monetary output 燑/fONt> moneypunct <>() Symbols used for monetary I/O ctype ctype<>() Character information(toupper() , isupper()) 燑/font> codecvt<>() Conversion between different character encodings collate collate<>() String collation messages messages<> Message string retrieval

    <p>It is possible to define your own versions of the facets to create specialized locales. The following examples demonstrates how this is done. It defines a facet using German representations of the Boolean values:</p>
    
    <pre>
        
    

    class germanBoolNames : public std::numpunct_byname<char> {
    public:
    germanBoolNames (const char *name)
    : std::numpunct_byname<char>(name) {
    }
    protected:
    virtual std::string do_truename() const {
    return "wahr";
    }
    virtual std::string do_falsename() const {
    return "falsch";
    }
    };

    </pre>
    
    <p>The class germanBoolNames derives from the class numpunct_byname, which is defined by the C++ standard library. This class defines punctuation properties depending on the locale used for numeric formatting. Deriving from numpunct_byname instead of from numpunct lets you customize the members not overridden explicitly. The values returned from these members still depend on the name used as the argument to the constructor. If the class numpunct had been used as the base class, the behavior of the other functions would be fixed. However, the class germanBoolNames overrides the two functions used to determine the textual representation of true and false.</P>
    
    <p>To use this facet in a locale, you need to create a new locale using a special constructor of the class locale. This constructor takes a locale object as its first argument and a pointer to a facet as its second argument. The created locale is identical to the first argument except for the facet that is passed as the second argument. This facet is installed in the newly create locale after the first argument is copied:</P>
    
    <PRe>
        
    

    std::locale loc (std::locale(""), new germanBoolNames(""));

    </pre>
    
    <P>The new expression creates a facet that is installed in the new locale. Thus, it is registered in loc to create a variation of locale(""). Since locales are immutable, you have to create a new locale object if you want to install a new facet to a locale. This locale object can be used like any other locale object. For example,</P>
    
    <pre>
        
    

    std::cout.imbue(loc);
    std::cout << std::boolalpha << true << std::endl;

    </pre>
    
    <p>would have the following output:</p>
    
    <pre>
        
    

    wahr

    </pre>
    
    <p>You also can create a completely new facet. In this case, the function has_facet() can be used to determine whether such a new facet is registered for a given locale object.</P>
    
    
        
    

    13.12 Input/Output Operators for User-Defined Types

    13.12 Input/Output Operators for User-Defined Types

    As mentioned earlier in this chapter, a major advantage of streams over the old I/O mechanism of C is the possibility that the stream mechanism can be extended to user-defined types. To do this, you must overload operators << and >>. This is demonstrated using a class for fractions in the following subsection.

    13.12.1
    Implementing Output Operators

    In an expression with the output operator, the left operand is a stream and the right operand is the object to be written:

    <pre>
        
    

    stream << object

    </pre>
    
    <p>According to language rules this can be interpreted in two ways:</p>
    
  • As stream. operator<<(object)

  • As operator<<(stream,object)

        </Li>
    
    
    <p>The first way is used for built-in types. For user-defined types you have to use the second way because the stream classes are closed for extensions. All you have to do is implement global operator &lt;&lt; for your user-defined type. This is rather easy, unless access to private members of the objects is necessary (which I cover later).</p>
    
    <p>For example, to print an object of class Fraction with the format numerator/denominator, you can write the following function:</P>
    
    <pRE>
        
    

    // io/frac1out.hpp

    #include <iostream>

    inline
    std::ostream& operator << (std::ostream& strm, const Fraction& f)
    {
    strm << f.numerator() << '/' << f.denominator();
    return strm;
    }

    </PrE>
    
    <P>The function writes the numerator and the denominator, separated by the character '/', to the stream that is passed as the argument. The stream can be a file stream, a string stream, or some other stream. To support the chaining of write operations or the access to the streams state in the same statement, the stream is returned by the function.</P>
    
    <P>This simple form has two drawbacks:</P>
    
  • Because ostream is used in the signature, the function applies only to streams with the character type char. If the function is intended only for use in Western Europe or in North America, this is no problem. On the other hand, a more general version requires only a little extra work, so it should at least be considered.

  • Another problem arises if a field width is set. In this case, the result is probably not what might be expected. The field width applies to the immediately following write; in this case, to the numerator. Thus, the statements

    Fraction vat(16,100); // I'm German and we have a uniform VAT of 16%...
    std::cout << "VAT: "" << std::left << std::setw(8)
    << vat << '"' << std::endl;

    result in this output:

    VAT: "16 /100"

        </Li>
    
    
    <p>The next version solves both of these problems:</P>
    
    <PRE>
        
    

    // io/frac2out.hpp

    #include <iostream>
    #include <sstream>

    template <class charT, class traits>
    inline
    std::basic_ostream<charT,traits>&
    operator << (std::basic_ostream<charT,traits>& strm,
    const Fraction& f)
    {
    /* string stream

      • with same format
      • without special field width
        */
        std::basic_ostringstream<charT,traits> s;
        s.copyfmt(strm);
        s.width(0);

    // fill string stream
    s << f.numerator() << '/' << f.denominator();

    // print string stream
    strm << s.str();

    return strm;
    }

    </PRe>
    
    <p>The operator has become a template function that is parameterized to suit all kinds of streams. The problem with the field width is addressed by writing the fraction first to a string stream without setting any specific width. The constructed string is then sent to the stream passed as the argument. This results in the characters representing the fraction being written with only one write operation, to which the field width is applied. Thus, the statements</p>
    
    <pre>
        
    

    Fraction vat (16,100);// I'm German...
    std::cout << "VAT: "" << std::left << std::setw(8)
    << vat << '"' << std::endl;

    </pre>
    
    <p>now produce the following output:</p>
    
    <pre>
        
    

    VAT: "15/100 "

    </pre>
    

    13.12.2
    Implementing Input Operators

    Input operators are implemented according to the same principle as output operators (described in the previous subsection). However, input incurs the likely problem of read failures. Input functions normally need special handling of cases in which reading might fail.

    <P>When implementing a read function you can choose between simple or flexible approaches. For example, the following function uses a simple approach. It reads a fraction without checking for error situations:</P>
    
    <PRe>
        
    

    // io/frac1in.hpp

    #include <iostream>

    inline
    std::istream& operator >> (std::istream& strm, Fraction& f)
    {
    int n, d;

    strm >> n;// read value of the numerator
    strm.ignore(); // skip '/'
    strm >> d;// read value of the denominator

    f = Fraction(n,d);// assign the whole fraction

    return strm;
    }

    </PRe>
    
    <p>This implementation has the problem that it can be used only for streams with the character type char. In addition, whether the character between the two numbers is indeed the character '/' is not checked.</P>
    
    <P>Another problem arises when undefined values are read. When reading a zero for the denominator, the value of the read fraction is not well-defined. This problem is detected in the constructor of the class Fraction that is invoked by the expression Fraction(n,d). However, handling inside class Fraction means that a format error automatically results in an error handling of the class Fraction. Because it is common practice to record format errors in the stream, it might be better to set ios_base::failbit in this case.</p>
    
    <P>Lastly, the fraction passed by reference might be modified even if the read operation is not successful. This can happen, for example, when the read of the numerator succeeds, but the read of the denominator fails. This behavior contradicts common conventions established by the predefined input operators, and thus is best avoided. A read operation should be successful or have no effect.</P>
    
    <P>The following implementation is improved to avoid these problems. It is also more flexible because it is parameterized to be applicable to all stream types:</P>
    
    <pre>
        
    

    // io/frac2in.hpp

    #include <iostream>

    template <class charT, class traits>
    inline
    std::basic_istream<charT,traits>&
    operator >> (std::basic_istream<charT,traits>& strm, Fraction& f)
    {
    int n, d;

    // read value of numerator
    strm >> n;

    /* if available

      • read '/' and value of demonimator
        */
        if (strm.peek() == '/' ) {
        strm.ignore();
        strm >> d;
        }
        else {
        d = 1;
        }

    /* if denominator is zero

      • set failbit as I/O format error
        */
        if (d == 0) {
        strm.setstate(std::ios::failbit);
        return strm;
        }

    /* if everything is fine so far

    • change the value of the fraction
      */
      if (strm) {
      f = Fraction(n,d);
      }

    return strm;
    }

    </pre>
    
    <p>Here the denominator is read only if the first number is followed by the character '/'; otherwise, a denominator of one is assumed and the integer read is interpreted as the whole fraction. Hence, the denominator is optional.</P>
    
    <P>This implementation also tests whether a denominator with value 0 was read. In this case, the ios_base::failbit is set, which might trigger a corresponding exception (see Section 13.4.4). Of course, the behavior can be implemented differently if the denominator is zero. For example, an exception could be thrown directly, or the check could be skipped so that the fraction is initialized with zero, which would throw the appropriate exception by class Fraction.</p>
    
    <P>Lastly, the state of the stream is checked and the new value is assigned to the fraction only if no input error occurred. This final check should always be done to make sure that the value of an object is changed only if the read was successful.</P>
    
    <P>Of course, it can be argued whether it is reasonable to read integers as fractions. In addition, there are other subtleties that may be improved. For example, the numerator must be followed by the character '/' without separating whitespaces. But the denominator may be preceded by arbitrary whitespaces because normally these are skipped. This hints at the complexity involved in reading nontrivial data structures.</p>
    

    13.12.3
    Input/Output Using Auxiliary Functions

    If the implementation of an I/O operator requires access to the private data of an object, the standard operators should delegate the actual work to auxiliary member functions. This technique also allows polymorphic read and write functions. This might look as follows:

    <pre>
        
    

    class Fraction {
    ...
    public:
    virtual void printOn (std::ostream& strm) const; // output
    virtual void scanFrom (std::istream& strm);// input
    ...
    };

    std::ostream& operator << (std::ostream& strm, const Fraction& f)
    {
    f.printOn (strm);
    return strm;
    }

    std::istream& operator >> (std::istream& strm, Fraction& f)
    {
    f.scanFrom (strm);
    return strm;
    }

    </pRE>
    
    <P>A typical example is the direct access to the numerator and denominator of a fraction during input:</P>
    
    <PRe>
        
    

    void Fraction::scanFrom (std::istream& strm)
    {
    ...
    // assign values directly to the components
    num = n;
    denom = d;
    }

    </pRE>
    
    <p>If a class is not intended to be used as a base class, the I/O operators can be made friends of the class. However, note that this approach reduces the possibilities significantly when inheritance is used. Friend functions cannot be virtual; so as a result, the wrong function might be called. For example, if a reference to a base class actually refers to an object of a derived class and is used as an argument for the input operator, the operator for the base class is called. To avoid this problem, derived classes should not implement their own I/O operators. Thus, the implementation sketched previously is more general than the use of friend functions. It should be used as a standard approach, although most examples use friend functions instead.</p>
    

    13.12.4
    User-Defined Operators Using Unformatted Functions

    The I/O operators implemented in the previous subsections delegate most of the work to some predefined operators for formatted I/O. That is, operators << and >> are implemented in terms of the corresponding operators for more basic types.

    <p>The I/O operators defined in the C++ standard library are defined differently. The common scheme used for these operators is as follows: First, with some preprocessing the stream is prepared for actual I/O. Then the actual I/O is done, followed by some postprocessing. This scheme should be used for your own I/O operators, too, to provide consistency for I/O operators.</p>
    
    <p>The classes basic_istream and basic_ostream each define an auxiliary class sentry. The constructor of these classes does the preprocessing, and the destructor does the corresponding postprocessing. These classes replace the member functions that were used in former implementations of the IOStream library (ipfx(), isfx(), opfx(), and osfx()). Using the new classes ensures that the postprocessing is invoked even if the I/O is aborted with an exception.</p>
    
    <p>If an I/O operator uses a function for unformatted I/O or operates directly on the stream buffer, the first thing to be done should be the construction of a corresponding sentry object. The remaining processing should then depend on the state of this object, which indicates whether the stream is OK. This state can be checked using the conversion of the sentry object to bool. Thus, I/O operators generally look like this:</P>
    
    <PRE>
        
    

    sentry se(strm); // indirect pre- and postprocessing
    if (se) {
    ... // the actual processing
    }

    </PrE>
    
    <P>The sentry object takes the stream strm, on which the preprocessing and postprocessing should be done, as the constructor argument.</p>
    
    <P>The additional processing is used to arrange general tasks of the I/O operators. These tasks include synchronizing several streams, checking whether the stream is OK, and skipping whitespaces, as well as possibly implementation-specific tasks. For example, in a multithreaded environment, the additional processing can be used for corresponding locking.</P>
    
    <P>For input streams, the sentry object can be constructed with an optional Boolean value that indicates whether skipping of whitespace should be avoided even though the flag skipws is set:</p>
    
    <pre>
        
    

    sentry se(strm,true); // don't skip whitespaces during the additional processing

    </pre>
    
    <p>The following examples demonstrate this for class Row, which is used to represent the lines in a text processor or editor:</p>
    
    <uL>
    
  • The output operator writes a line by using the stream buffer's member function sputn():

    std::ostream& operator<< (std::ostream& strm, const Row& row)
    {
    // ensure pre- and postprocessing
    std::ostream::sentry se(strm);
    if (se) {
    // perform the output
    strm.write(row.c_str(),row.len());
    }

    return strm;
    }

        </LI>
    
  • The input operator reads a line character-by-character in a loop. The argument true is passed to the constructor of the sentry object to avoid the skipping of whitespaces:

    std::istream& operator>> (std::istream& strm, Row& row)
    {
    /* ensure pre- and postprocessing

      • true: Yes, don't ignore leading whitespaces
        */
        std::istream::sentry se(strm,true);
        if (se) {
        // perform the input
        char c;
        row.clear();
        while (strm.get(c) && c != '\n') {
        row.append(c);
        }
        }

    return strm;
    }

        </li>
    

    Of course, it is also possible to use this framework even if functions do not use unformatted functions for their implementation but use I/O operators instead. However, using basic_istream or basic_ostream members for reading or writing characters within code guarded by sentry objects is unnecessarily expensive. Whenever possible, the corresponding basic_streambuf should be used instead.

    13.12.5
    User-Defined Format Flags

    When user-defined I/O operators are being written, it is often desirable to have formatting flags specific to these operators, probably set by using a corresponding manipulator. For example, it would be nice if the output operator for fractions, shown previously, could be configured to place spaces around the slash that separates numerator and denominator.

    <p>The stream objects support this by providing a mechanism to associate data with a stream. This mechanism can be used to associate corresponding data (for example, using a manipulator), and later retrieve the data. The class ios_base defines the two functions iword() and pword(), each taking an int argument as the index, to access a specific long&amp; or void*&amp; respectively. The idea is that iword() and pword() access long or void* objects in an array of arbitrary size stored with a stream object. Formatting flags to be stored for a stream are then placed at the same index for all streams. The static member function xalloc() of the class ios_base is used to obtain an index that is not yet used for this purpose.</P>
    
    <P>Initially, the objects accessed with iword() or pword() are set to 0. This value can be used to represent the default formatting or to indicate that the corresponding data was not yet accessed. Here is an example:</p>
    
    <pre>
        
    

    // get index for new ostream data
    static const int iword_index = std::ios_base::xalloc();

    // define manipulator that sets this data
    std::ostream& fraction_spaces (std::ostream& strm)
    {
    strm.iword(iword_index) = true;
    return strm;
    }
    std::ostream& operator<< (std::ostream& strm, const Fraction& f)
    {
    /* query the ostream data

      • if true, use spaces between numerator and denominator
      • if false, use no spaces between numerator and denominator
        */
        if (strm.iword(iword_index)) {
        strm << f.numerator() << " / " << f.denominator();
        }
        else {
        strm << f.numerator() << "/" << f.denominator();
        }
        return strm;
        }

      This example uses a simple approach to the implementation of the output operator because the main feature to be exposed is the use of the function iword(). The format flag is considered to be a Boolean value that defines whether spaces between numerator and denominator should be written.

      In the first line, the function ios_base::xalloc() is used to obtain an index that can be used to store the format flag. The result of this call is stored in a constant because it is never modified. The function fraction_spaces() is a manipulator that sets the int value that is stored at the index iword_index in the integer array associated with the stream strm to true. The output operator retrieves that value and writes the fraction according the value stored. If the value is false, the default formatting using no spaces is used. Otherwise, spaces are placed around the slash.

      When iword() and pword() are used, references to int or void* objects are returned. These references stay valid only until the next call of iword() or pword() for the corresponding stream object or until the stream object is destroyed. Normally, the results from iword() and pword() should not be saved. It is assumed that the access is fast, although it is not required that the data is really represented by using an array.

      The function copyfmt() copies all format information (see page 615). This includes the arrays accessed with iword() and pword(). This may pose a problem for the objects stored with a stream using pword(). For example, if a value is the address of an object, the address is copied instead of the object. If you copy only the address, it may happen that if the format of one stream is changed, the format of other streams would be affected. In addition, it may be desirable that an object associated with a stream using pword() is destroyed when the stream is destroyed. So, a deep copy rather than a shallow copy may be necessary for such an object.

      A callback mechanism is defined by ios_base to support behavior, such as making a deep copy if necessary or deleting an object when destroying a stream. The function register_callback() can be used to register a function that is called if certain operations are performed on the ios_base object. It is declared as follows:

    namespace std {
    class ios_base {
    public:
    // kinds of callback events
    enum event { erase_event, imbue_event, copyfmt_event };
    // type of callbacks
    typedef void (*event_callback) (event e, ios_base& strm,
    int arg);
    // function to register callbacks
    void register_callback (event_callback cb, int arg);
    ...
    };
    }

    </pRE>
    
    <P>register_callback() takes a function pointer as the first argument and an int argument as the second. The int argument is passed as the third argument when a registered function is called. It can, for example, be used to identify an index for pword() to signal which member of the array has to be processed. The argument strm that is passed to the callback function is the ios_base object that caused the call to the callback function. The argument e identifies the reason why the callback function was called. The reasons for calling the callback functions are listed in Table 13.40.</p>
    
    <p><table width="100%">
    

    Table?3.40. Reasons for Callback Events

    Event Reason ios_base::imbue_event A locale is set with imbue() ios_base::erase_event The stream is destroyed or copyfmt() is used ios_base::copy_event copyfmt() is used

    <p>If copyfmt() is used, the callbacks are called twice for the object on which copyfmt() is called. First, before anything is copied, the callbacks are invoked with the argument erase_event to do all the cleanup necessary (for example, deleting objects stored in the pword() array). The callbacks called are those registered for the object. After the format flags are copied, which includes the list of callbacks from the argument stream, the callbacks are called again, this time with the argument copy_event. This pass can, for example, be used to arrange for deep copying of objects stored in the pword() array. Note that the callbacks are also copied and the original list of callbacks is removed. Thus, the callbacks invoked for the second pass are the callbacks just copied.</p>
    
    <p>The callback mechanism is very primitive. It does not allow callback functions to be unregistered, except by using copyfmt() with an argument that has no callbacks registered. Also, registering a callback function twice, even with the same argument, results in calling the callback function twice. It is, however, guaranteed that the callbacks are called in the opposite order of registration. This has the effect that a callback function registered from within some other callback function is not called before the next time the callback functions are invoked.</P>
    

    13.12.6
    Conventions for User-Defined Input/Output Operators

    Several conventions that should be obeyed by the implementations of your own I/O operators have been presented. They correspond to the behavior that is typical for the predefined I/O operators. To summarize, these conventions are the following:

    <UL>
    
  • The output format should allow an input operator that can read the data without loss of information. Especially for strings, this is close to impossible because a problem with spaces arises. A space character in the string cannot be distinguished from a space character between two strings.

  • The current formatting specification of the stream should be taken into account when doing I/O. This applies especially to the width for writing.

  • If an error occurs, an appropriate state flag should be set.

  • The objects should not be modified in case of an error. If multiple data is read, the data should first be stored in auxiliary objects before the value of the object passed to the read operator is set.

  • Output should not be terminated with a newline, mainly because it is otherwise impossible to write other objects on the same line.

  • Even values that are too large should be read completely. After the read, a corresponding error flag should be set, and the value returned should be some meaningful value, such as the maximum value.

  • If a format error is detected, no character should be read, if possible.

        </li>
    
  • 13.13 The Stream Buffer Classes

    13.13 The Stream Buffer Classes

    As mentioned in Section 13.2.1, the actual reading and writing is not done by the streams directly, but is delegated to stream buffers. This section describes how these classes operate. The discussion not only gives a deeper understanding of what is going on when I/O streams are used, but also provides the basis to define new I/O channels. Before going into the details of stream buffer operation, the public interface is presented for those only interested in using stream buffers.

    13.13.1
    User's View of Stream Buffers

    To the user of a stream buffer the class basic_streambuf is not much more than something that characters can be sent to or extracted from. Table 13.41 lists the public function for writing characters.

    <p><table width="100%">
    

    Table?3.41. Public Members for Writing Characters

    Member Function Meaning sputc(c) Sends the character c to the stream buffer sputn(s, n) Sends n character from the sequence s to the stream buffer

    <P>The function sputc() returns traits_type::eof() in case of an error, where traits_type is a type definition in the class basic_streambuf. The function sputn() writes the number of characters specified by the second argument unless the stream buffer cannot consume them. It does not care about string termination characters. This function returns the number of characters written.</p>
    
    <p>The interface to reading characters from a stream buffer is a little bit more complex (Table 13.42). This is because for input it is necessary to have a look at a character without consuming it. Also, it is desirable that characters can be put back into the stream buffer when parsing. Thus, the stream buffer classes provide corresponding functions.</p>
    
    <p><table width="100%">
    

    Table?3.42. Public Members for Reading Characters

    Member Function Meaning in_avail() Returns a lower bound on the characters available sgetc() Returns the current character without consuming it sbumpc() Returns the current character and consumes it snextc() Consumes the current character and returns the next character sgetn(b, n) Reads n characters and stores them in the buffer b sputbackc(c) Returns the character c to the stream buffer sungetc() Steps one step back to the previous character

    <P>The function in_avail() can be used to determine how many characters are at least available. This can be used, for example, to make sure that reading does not block when reading from the keyboard. However, there can be more characters available.</p>
    
    <p>Until the stream buffer has reached the end of the stream, there is a current character. The function sgetc() is used to get the current character without moving on to the next character. The function sbumpc() reads the current character and moves on to next character, making this the new current character. The last function reading a single character, snextc() makes the next character the current one and then reads this character. All three functions return traits_type::eof() to indicate failure. The function sgetn() reads a sequence of characters into a buffer. The maximum number of characters to be read is passed as an argument. The function returns the number of characters read.</p>
    
    <p>The two functions sputbackc() and sungetc() are used to move one step back, making the previous character the current one. The function sputbackc() can be used to replace the previous character by some other character. These two functions should only be used with care. Often it is only possible to put back just one character.</p>
    
    <p>Finally, there are functions to access the imbued locale object, to change the position, and to influence buffering. Table 13.43 lists these functions.</P>
    
    <P><table width="100%">
    

    Table?3.43. Miscellaneous Public Stream Buffer Functions

    Member Function Meaning pubimbue(loc) Imbues the stream buffer with the locale loc getloc() Returns the current locale Pubseekpos(pos Repositions the current position to an absolute position pubseekpos(pos, which) Same with specifying the I/O direction pubseekoff(offset, rpos) Repositions the current position relative to another position pubseekoff(offset, rpos, which) Same with specifying the I/O direction pubsetbuf(b, n) Influences buffering

    <p>pubimbue() and getloc() are used for internationalization (see page 625). pubimbue() installs a new locale object in the stream buffer returning the previously installed locale object. getloc() returns the currently installed locale object.</p>
    
    <p>The function pubsetbuf() is intended to provide some control over the buffering strategy of stream buffers. However, whether it is honored depends on the concrete stream buffer class. For example, it makes no sense to use pubsetbuf() for string stream buffers. Even for file stream buffers the use of this function is only portable if it is called before the first I/O operation is performed and if it is called as pubsetbuf(0,0) (that is, no buffer is to be used). This function returns 0 on failure and the stream buffer otherwise.</p>
    
    <p>The functions pubseekoff() and pubseekpos() are used to manipulate the current position used for reading and/or writing. Which position is manipulated depends on the last argument, which is of type ios_base::openmode and which defaults to ios_base::in|ios_base::out if it is not specified. If ios_base::in is set, the read position is modified. Correspondingly, the write position is modified if ios_base::out is set. The function pubseekpos() moves the stream to an absolute position specified as the first argument whereas the function pubseekoff() moves the stream relative to some other position. The offset is specified as the first argument. The position used as starting point is specified as the second argument and can be either ios_base::cur, ios_base::beg, or ios_base::end (see page 635 for details). Both functions return the position to which the stream was positioned or an invalid stream position. The invalid stream position can be detected by comparing the result with the object pos_type(off_type(-1)) (pos_type and off_type are types for handling stream positions; see page 634). The current position of a stream can be obtained using pubseekoff():</P>
    
    <PRE>
        
    

    sbuf.pubseekoff(0, std::ios::cur)

    </pre>
    

    13.13.2
    Stream Buffer Iterators

    An alternative way to use a member function for unformatted I/O is to use the stream buffer iterator classes. These classes provide iterators that conform to input iterator or output iterator requirements and read or write individual characters from stream buffers. This fits character-level I/O into the algorithm library of the C++ standard library.

    <p>The template classes istreambuf_iterator and ostreambuf_iterator are used to read or to write individual characters from or to objects of type basic_streambuf. The classes are defined in the header &lt;iterator> like this:</p>
    
    <pRE>
        
    

    namespace std {
    template <class charT,
    class traits = char_traits<charT> >
    istreambuf_iterator;
    template <class charT,
    class traits = char_traits<charT> >
    ostreambuf_iterator;
    }

    </PRe>
    
    <p>These iterators are special forms of stream iterators, which are described in Section 7.4.3. The only difference is that their elements are characters.</p>
    
    Output Stream Buffer Iterators
        <p>Here is how a string can be written to a stream buffer using an ostreambuf_iterator:</p>
    
        <pre>
    

    // create iterator for buffer of output stream cout
    std::ostreambuf_iterator<char> bufWriter(std::cout);

    std::string hello("hello, world\n");
    std::copy(hello.begin(), hello.end(), // source: string
    bufWriter);// destination: output buffer of cout

        </PRE>
    
        <P>The first line of this example constructs an output iterator of type ostreambuf_iterator from the object cout. Instead of passing the output stream you could also pass a pointer to the stream buffer directly. The remainder constructs a string object and copies the characters in this object to the constructed output iterator.</P>
    
        <P>Table 13.44 lists all operations of output stream buffer iterators. The implementation is similar to ostream iterators (see page 278). In addition, you can initialize the iterator with a buffer and you can call failed() to query whether the iterator is able to write. If any prior writing of a character failed, failed() yields true. In this case, any writing with operator = has no effect.</P>
    
        <P><table width="100%">
    

    Table?3.44. Operations of Output Stream Buffer Iterators

    Expression Effect ostreambuf_iterator<char>(ostream) Creates an output stream buffer iterator for ostream ostreambuf_iterator<char>(buffer_ptr) Creates an output stream buffer iterator for the buffer to which buffer_ptr refers *iter No-op (returns iter) iter = c Writes character c to the buffer by calling sputc(c) for it ++iter No-op (returns iter) iter++ No-op (returns iter) failed() Returns whether the output stream iterator is not able to write anymore

    Input Stream Buffer Iterators
        <p>Table 13.45 lists all operations of input stream buffer iterators. The implementation is similar to istream iterators (see page 280). In addition, you can initialize the iterator with a buffer, and a member function, equal(), is provided, which returns whether two input stream buffer iterators are equal. Two input stream buffer iterators are equal when they are both end-of-stream iterators or when neither is an end-of-stream iterator.</p>
    
        <p>What is somewhat obscure is what it means for two objects of type istreambuf_iterator to be equivalent: Two istreambuf_iterator objects are equivalent if both iterators are end-of-stream iterators or if neither of them is an end-of-stream iterator (whether the output buffer is the same doesn't matter). One possibility to get an end-of-stream iterator is to construct an iterator with the default constructor. In addition, an istreambuf_iterator becomes an end-of-stream iterator when an attempt is made to advance the iterator past the end of the stream (in other words, if sbumpc() returns traits_type::eof(). This behavior has two major implications:</p>
    
        <p><table width="100%">
    

    Table?3.45. Operations of Input Stream Buffer Iterators

    <p><b>Expression</B></P>
    
    <P><B>Effect</B></p>
    
    <p>istreambuf _iterator&lt;char>()</P>
    
    <P>Creates an end-of-stream iterator</P>
    
    <p>istreambuf_iterator&lt;char>(istream)</P>
    
    <P>Creates an input stream buffer iterator for istream and might read the first character using sgetc()</p>
    
    <p>istreambuf_iterator&lt;char>(buffer_ptr)</P>
    
    <p>Creates an input stream buffer iterator for the buffer to which buffer_ptr refers and might read the first character using sgetc()</p>
    
    <P>*iter</P>
    
    <p>Returns the actual character, read with sgetc() before (reads the first character if not done by the constructor)</p>
    
    <P>++iter</P>
    
    <p>Reads the next character with sbumpc() and returns its position</p>
    
    <p>iter++</p>
    
    <p>Reads the next character with sbumpc() but returns an iterator for the previous character</p>
    
    <P>iter1.equal (iter2)</p>
    
    <p>Returns whether both iterators are equal</p>
    
    <p>iter1== iter2</p>
    
    <p>Tests iter1 and iter2 for equality</P>
    
    <P>iter1 ! = iter2</P>
    
    <P>Tests iter1 and iter2 for inequality</p>
    

  • A range from the current position in a stream to the end of the stream is defined by the two iterators istreambuf_iterator<charT,traits> (stream) (for the current position) and istreambuf_iterator<charT,traits>() (for the end of the stream), where stream is of type basic_istream<charT,traits> or basic_streambuf<charT,traits>.

  • It is not possible to create subranges using istreambuf_iterators.

  • Example Use of Stream Buffer Iterators
        <P>The following example is the classic filter framework that simply writes all read characters with stream buffer iterators. It is a modified version of the example on page 611:</P>
    
        <pRE>
    

    // io/charcat2.cpp

    #include <iostream>
    #include <iterator>
    using namespace std;

    int main()
    {
    // input stream buffer iterator for cin
    istreambuf_iterator<char> inpos(cin);

    // end-of-stream iterator
    istreambuf_iterator<char> endpos;

    // output stream buffer iterator for cout
    ostreambuf_iterator<char> outpos(cout);

    // while input iterator is valid
    while (inpos != endpos) {
    *outpos = *inpos; // assign its value to the output iterator
    ++inpos;
    ++outpos;
    }
    }

        </PRE>
    

    13.13.3
    User-Defined Stream Buffers

    Stream buffers are buffers for I/O. Their interface is defined by class basic_streambuf<>. For the character types char and wchar_t, the specializations streambuf and wstreambuf, respectively, are predefined. These classes are used as base classes when implementing the communication over special I/O channels. However, doing this requires an understanding of the stream buffer's operation.

    <P>The central interface to the buffers is formed by three pointers for each of the two buffers. The pointers returned from the functions eback(), gptr(), and egptr() form the interface to the read buffer. The pointers returned from the functions pbase(), pptr(), and epptr() form the interface to the write buffer. These pointers are manipulated by the read and write operations, which may result in corresponding reactions in the corresponding read or write channel. The exact operation is examined separately for reading and writing.</p>
    
    User-Defined Output Buffers
        <p>A buffer used to write characters is maintained with three pointers that can be accessed by the three functions pbase(), pptr(), and epptr() (Figure 13.4). Here is what these pointers represent:</p>
    
        
    
        
    
  • pbase()("put base") is the beginning of the output buffer.

  • pptr()("put pointer") is the current write position.

  • epptr()("end put pointer") is the end of the output buffer. This means that epptr() points to one past the last character that can be buffered.

  •     <P>The characters in the range from pbase() to pptr() (not including the character pointed to by pptr()) are already written but not yet transported (flushed) to the corresponding output channel.</P>
    
        <P>A character is written using the member function sputc(). This character is copied to the current write position if there is a spare write position. Then the pointer to the current write position is incremented. If the buffer is full (pptr() == epptr()), the contents of the output buffer are sent to the corresponding output channel. This is done by calling the virtual function overflow(). This function is effectively responsible for the actual sending of the characters to some "external representation" (which may actually be internal, as in the case of string streams). The implementation of overflow() in the base class basic_streambuf only returns end-of-file, which indicates that no more characters could be written.</P>
    
        <P>The member function sputn() can be used to write multiple characters at once. This function delegates the work to the virtual function xsputn(), which can be implemented for more efficient writing of multiple characters. The implementation of xsputn() in class basic_streambuf basically calls sputc() for each character. Thus, overriding xsputn() is not necessary. However, often, writing multiple characters can be implemented more efficiently than writing characters one at a time. Thus, this function can be used to optimize the processing of character sequences.</P>
    
        <p>Writing to a stream buffer does not necessarily involve using the buffer. Instead, the characters can be written as soon as they are received. In this case, the value 0 or NULL has to be assigned to the pointers that maintain the write buffer. The default constructor does this automatically.</p>
    
        <p>With this information, the following example of a simple stream buffer can be implemented. This stream buffer does not use a buffer. Thus, the function overflow() is called for each character. Implementing this function is all that is necessary:</P>
    
        <PRE>
    

    // io/outbuf1.hpp

    #include <streambuf>
    #include <locale>
    #include <cstdio>

    class outbuf : public std::streambuf
    {
    protected:
    /* central output function

      • print characters in uppercase mode
        */
        virtual int_type overflow (int_type c) {
        if (c != EOF) {
        // convert lowercase to uppercase
        c = std::toupper(c,getloc());

    // and write the character to the standard output
    if (putchar(c) == EOF) {
    return EOF;
    }
    }
    return c;
    }
    };

        </pre>
    
        <p>In this case, each character sent to the stream buffer is written using the C function putchar(). However, before the character is written it is turned into an uppercase character using toupper() (see page 718). The function getloc() is used to get the locale object that is associated with the stream buffer (see also page 626).</p>
    
        <P>In this example, the output buffer is implemented specifically for the character type char (streambuf is the specialization of basic_streambuf&lt;> for the character type char). If other character types are used, you have to implement this function using character traits, which are introduced in Section 14.1.2. In this case, the comparison of c with end-of-file looks different. traits::eof() has to be returned instead of EOF, and if the argument c is EOF, the value traits::not_eof (c) should be returned (where traits is the second template argument to basic_streambuf). This might look as follows:</p>
    
        <pre>
    

    // io/outbuf1x.hpp

    #include <streambuf >
    #include <locale>
    #include <cstdio>

    template <class charT, std::class traits = char_traits<charT> >
    class basic_outbuf : public std::basic_streambuf<charT,traits>
    {
    protected:
    /* central output function

      • print characters in uppercase mode
        */
        virtual int_type overflow (int_type c) {
        if (!traits::eq_int_type(c,traits::eof())) {
        // convert lowercase to uppercase
        c = std::toupper(c,getloc());

    // and write the character to the standard output
    if (putchar(c) == EOF) {
    return traits::eof();
    }
    }
    return traits::not_eof(c);
    }
    };

    typedef basic_outbuf<char> outbuf;
    typedef basic_outbuf<wchar_t> woutbuf;

        </pRE>
    
        <P>Using this stream buffer in the following program:</P>
    
        <PRe>
    

    // io/outbuf1.cpp

    #include <iostream>
    #include "outbuf1.hpp"

    int main()
    {
    outbuf ob; //create special output buffer
    std::ostream out (&ob) ; // initialize output stream with that output buffer

    out << "31 hexadecimal: " << std::hex << 31 << std::endl;
    }

        </PRE>
    
        <P>produces the following output:</P>
    
        <Pre>
    

    31 HEXADECIMAL: 1F

        </prE>
    
        <P>The same approach can be used to write to other arbitrary destinations. For example, the constructor of a stream buffer may take a file descriptor, the name of a socket connection, or two other stream buffers used for simultaneous writing to initialize the object. Writing to the corresponding destination requires only that overflow() be implemented. In addition, the function xsputn() should also be implemented to make writing to the stream buffer more efficient.</p>
    
        <p>For convenient construction of the stream buffer, it is also reasonable to implement a special stream class that mainly passes the constructor argument to the corresponding stream buffer. The next example demonstrates this. It defines a stream buffer class initialized with a file descriptor, to which characters are written with the function write() (a low-level I/O function used on UNIX- like operating systems). In addition, a class derived from ostream is defined that maintains such a stream buffer, to which the file descriptor is passed:</p>
    
        <PRE>
    

    // io/outbuf2.hpp

    #include <iostream>
    #include <streambuf>
    #include <cstdio>

    extern "C" {
    int write (int fd, const char* buf, int num);
    }

    class fdoutbuf : public std::streambuf {
    protected:
    int fd; // file descriptor
    public:
    // constructor
    fdoutbuf (int_fd) : fd(_fd) {
    }
    protected:
    // write one character
    virtual int_type overflow (int_type c) {
    if (c != EOF) {
    char z = c;
    if (write (fd, &z, 1) ! = 1) {
    return EOF;
    }
    }
    return c;
    }
    // write multiple characters
    virtual
    std::streamsize xsputn (const char* s,
    std::streamsize num) {
    return write(fd,s,num);
    }
    };
    class fdostream : public std::ostream {
    protected:
    fdoutbuf buf;
    public:
    fdostream (int fd) : buf(fd), std::ostream(&buf) {
    }
    };

        </PRE>
    
        <P>This stream buffer also implements the function xsputn() to avoid calling overflow() for each character if a character sequence is sent to this stream buffer. This function writes the whole character sequence with one call to the file identified by the file descriptor fd. The function xsputn() returns the number of characters written successfully. Here is a sample application:</p>
    
        <pre>
    

    // io/outbuf2.cpp

    #include <iostream>
    #include "outbuf2.hpp"

    int main()
    {
    fdostream out(1); // stream with buffer writing to file descriptor 1

    out << "31 hexadecimal: " << std::hex << 31 << std::endl;
    }

        </pre>
    
        <P>This program creates a output stream that is initialized with the file descriptor 1. This file descriptor, by convention, identifies the standard output channel. Thus, in this example the characters are simply printed. If some other file descriptor is available (for example, for a file or a socket), it can also be used as the constructor argument.</P>
    
        <P>To implement a stream buffer that really buffers, the write buffer has to be initialized using the function setp(). This is demonstrated by the next example:</p>
    
        <PRE>
    

    // io/outbuf3.hpp

    #include <cstdio>
    #include <streambuf>

    extern "C" {
    int write (int fd, const char* buf, int num);
    }
    class outbuf : public std::streambuf {
    protected:
    static const int bufferSize = 10; // size of data buffer
    char buffer [bufferSize] ;// data buffer

    public:
    /* constructor

      • initialize data buffer
      • one character less to let the bufferSizeth character
    • cause a call of overflow()
      /
      outbuf() {
      setp (buffer, buffer+(bufferSize-1));
      }
      /
      destructor
      • flush data buffer
        */
        virtual ~outbuf() {
        sync();
        }

    protected:
    // flush the characters in the buffer
    int flushBuffer() {
    int num = pptr()-pbase();
    if (write (1, buffer, num) != num) {
    return EOF;
    }
    pbump (-num); // reset put pointer accordingly
    return num;
    }

    /* buffer full

      • write c and all previous characters
        */
        virtual int_type overflow (int_type c) {
        if (c != EOF) {
        // insert character into the buffer
        *pptr() = c;
        pbump(1);
        }
        // flush the buffer
        if (flushBuffer() == EOF) {
        // ERROR
        return EOF;
        }
        return c;
        }

    /* synchronize data with file/destination

      • flush the data in the buffer
        */
        virtual int sync() {
        if (flushBuffer() == EOF) {
        // ERROR
        return -1;
        }
        return 0;
        }
        };

        The constructor initializes the write buffer with setp():

    setp (buffer, buffer+(size-1));

        </PRE>
    
        <P>The write buffer is set up such that overflow() is already called when there is still room for one character. If overflow() is not called with EOF as the argument, the corresponding character can be written to the write position because the pointer to the write position is not increased beyond the end pointer. After the argument to overflow() is placed in the write position, the whole buffer can be emptied.</p>
    
        <p>The member function flushBuffer() does exactly this. It writes the characters to the standard output channel (file descriptor 1) using the function write(). The stream buffer's member function pbump() is used to move the write position back to the beginning of the buffer.</p>
    
        <P>The function overflow() inserts the character that caused the call of overflow() into the buffer if it is not EOF. Then, pbump() is used to advance the write position to reflect the new end of the buffered characters. This moves the write position beyond the end position (epptr()) temporarily.</p>
    
        <p>This class also features the virtual function sync() that is used to synchronize the current state of the stream buffer with the corresponding storage medium. Normally, all that needs to be done is to flush the buffer. For the unbuffered versions of the stream buffer, overriding this function was not necessary because there was no buffer to be flushed.</p>
    
        <p>The virtual destructor ensures that data is written that is still buffered when the stream buffer is destroyed.</p>
    
        <p>These are the functions that are overridden for most stream buffers. If the external representation has some special structure, overriding additional functions may be useful. For example, the functions seekoff() and seekpos() may be overridden to allow manipulation of the write position.</P>
    
    
    User-Defined Input Buffers
        <p>The input mechanism works basically the same as the output mechanism. However, for input there is also the possibility of undoing the last read. The functions sungetc() (called by unget() of the input stream) or sputbackc() (called by putback() of the input stream) can be used to restore the stream buffer to its state before the last read. It is also possible to read the next character without moving the read position beyond this character. Thus, you must override more functions to implement reading from a stream buffer than is necessary to implement writing to a stream buffer.</p>
    
        <p>A stream buffer maintains a read buffer with three pointers that can be accessed through the member function eback(), gptr() and egptr() (Figure 13.5):</P>
    
        
    
        
    
  • eback() ("end back") is the beginning of the input buffer, or (this is where the name comes from) the end of the putback area. The character can only be put back up to this position without taking special action.

  • gptr() ("get pointer") is the current read position.

  • egptr() ("end get pointer") is the end of the input buffer.

  •     <p>The characters between the read position and the end position have been transported from the external representation to the program's memory, but they still await processing by the program.</p>
    
        <p>Single characters can be read using the function sgetc() or sbumpc(). These two functions differ in that the read pointer is incremented by sbumpc(), but not by sgetc(). If the buffer is read completely (gptr() == egptr()), there is no character available and the buffer has to be refilled. This is done by a call of the virtual function underflow(). This function is responsible for the reading of data. The function sbumpc() calls the virtual function uflow() instead, if no characters are available. The default implementation of uflow() is to call underflow() and then increment the read pointer. The default implementation of underflow() in the base class basic_streambuf is to return EOF. This means it is impossible to read characters with the default implementation.</P>
    
        <p>The function sgetn() is used for reading multiple characters at once. This function delegates the processing to the virtual function xsgetn(). The default implementation of xsgetn() simply extracts multiple characters by calling sbumpc() for each character. Like the function xsputn() for writing, xsgetn() can be implemented to optimize the reading of multiple characters.</P>
    
        <p>For input it is not sufficient just to override one function as it is the case of output. Either a buffer has to be set up, or at the very least underflow() and uflow() have to implemented. This is because underflow() does not move past the current character, but underflow() may be called from sgetc(). Moving on to the next character has to be done using buffer manipulation or using a call to uflow(). In any case, underflow() has to be implemented for any stream buffer capable of reading characters. If both underflow() and uflow() are implemented, there is no need to set up a buffer.</p>
    
        <p>A read buffer is set up with the member function setg(), which takes three arguments in this order:</P>
    
        
    
  • A pointer to the beginning of the buffer (eback())

  • A pointer to the current read position (gptr())

  • A pointer to the end of the buffer (egptr())

  •     <p>Unlike setp(), setg() takes three arguments. This is necessary to be able to define the room for storing characters that are put back into the stream. Thus, when the pointers to the read buffer are being set up, it is reasonable to have some characters (at least one) that are already read but still stored in the buffer.</P>
    
        <p>As mentioned, characters can be put back into the read buffer using the functions sputbackc() and sungetc(). sputbackc() gets the character to be put back as its argument and ensures that this character was indeed the character read. Both functions decrement the read pointer, if possible. Of course, this only works as long as the read pointer is not at the beginning of the read buffer. If you attempt to put a character back after the beginning of the buffer is reached, the virtual function pbackfail() is called. By overriding this function you can implement a mechanism to restore the old read position even in this case. In the base class basic_streambuf, no corresponding behavior is defined. Thus, in practice, it is not possible to go back an arbitrary number of characters. For streams that do not use a buffer, the function pbackfail() should be implemented because it is generally assumed that at least one character can be put back into the stream.</p>
    
        <p>If a new buffer was just read, another problem arises: Not even one character can be put back if the old data is not saved in the buffer. Thus, the implementation of underflow() often moves the last few characters (for example, four characters) of the current buffer to the beginning of the buffer and appends the newly read characters thereafter. This allows some characters to be moved back before pbackfail() is called.</P>
    
        <p>The following example demonstrates how such an implementation might look. In the class inbuf, an input buffer with ten characters is implemented. This buffer is split into a maximum of four characters for the putback area and six characters for the "normal" input buffer:</p>
    
        <pre>
    

    // io/inbuf1.hpp

    #include <cstdio>
    #include <cstring>
    #include <streambuf>

    extern "C" {
    int read (int fd, char* buf, int num);
    }

    class inbuf : public std::streambuf {
    protected:
    /* data buffer:

      • at most, four characters in putback area plus
      • at most, six characters in ordinary read buffer
        */
        static const int bufferSize = 10; // size of the data buffer
        char buffer[bufferSize] ;// data buffer

    public:
    /* constructor

      • initialize empty data buffer
      • no putback area
    • => force underflow()
      */
      inbuf() {
      setg (buffer+4, // beginning of putback area
      buffer+4, // read position
      buffer+4); // end position
      }

    protected:
    // insert new characters into the buffer
    virtual int_type underflow() {

    // is read position before end of buffer?
    if (gptr() < egptr()) {
    return gptr();
    }
    /
    process size of putback area

      • use number of characters read
      • but at most four
        */
        int numPutback;
        numPutback = gptr() - eback();
        if (numPutback > 4) {
        numPutback = 4;
        }

    /* copy up to four characters previously read into

    • the putback buffer (area of first four characters)
      */
      std::memcpy (buffer+(4-numPutback), gptr()-numPutback,
      numPutback);

    // read new characters
    int num;
    num = read (0, buffer+4, bufferSize-4);
    if (num <= 0) {
    // ERROR or EOF
    return EOF;
    }

    // reset buffer pointers
    setg (buffer+(4-numPutback),// beginning of putback area
    buffer+4, // read position
    buffer+4+num); // end of buffer

    // return next character
    return *gptr();
    }
    };

        </prE>
    
        <P>The constructor initializes all pointers so that the buffer is completely empty (Figure 13.6). If a character is read from this stream buffer, the function underflow() is called. This function is always used by this stream buffer to read the next characters. It starts by checking for read characters in the input buffer. If characters are present, they are moved to the putback area using the function memcpy(). These are, at most, the last four characters of the input buffer. Then POSIX's low-level I/O function read() is used to read the next character from the standard input channel. After the buffer is adjusted to the new situation, the first character read is returned.</p>
    
        
    
        <P>For example, if the characters 'H', 'a', 'l', 'l', 'o', and 'w' are read by the first call to read(), the state of the input buffer changes, as shown in Figure 13.7. The putback area is empty because the buffer was filled for the first time, and there are no characters yet that can be put back.</P>
    
        
    
        <p>After these characters are extracted, the last four characters are moved into the putback area and new characters are read. For example, if the characters 'e', 'e', 'n', and '\n' are read by the next call of read() the result is as shown in Figure 13.8.</p>
    
        
    
        <P>Here is an example of the use of this stream buffer:</p>
    
        <PRE>
    

    // io/inbuf1.cpp

    #include <iostream>
    #include "inbuf1.hpp"

    int main()
    {
    inbuf ib;// create special stream buffer
    std::istream in(&ib) ; // initialize input stream with that buffer

    char c;
    for (int i=1; i<=20; i++) {
    // read next character (out of the buffer)
    in.get(c);

    // print that character (and flush)
    std::cout << c << std::flush;

    // after eight characters, put two characters back into the stream
    if (i == 8) {
    in.unget();
    in.unget();
    }
    }
    std::cout << std::endl;
    }

        </pre>
    
        <P>The program reads characters in a loop and writes them out. After the eighth character is read, two characters are put back. Thus, the seventh and eighth characters are printed twice.</P>
    
    
    
        
    

    13.14 Performance Issues

    13.14 Performance Issues

    This section specifically addresses issues that focus on performance. In general the stream classes should be pretty efficient, but performance can be improved further in applications in which I/O is performance critical.

    One performance issue was mentioned in Section 13.2.3, already: You should only include those headers that are necessary to compile your code. In particular, you should avoid including <iostream> if the standard stream objects are not used.

    13.14.1
    Synchronization with C's Standard Streams

    By default, the eight C++ standard streams (the four narrow character streams cin, cout, cerr, and clog, and their wide-character counterpart) are synchronized with the corresponding files from the C standard library (stdin, stdout, and stderr). By default clog and wclog use the same stream buffer as cerr and wcerr respectively. Thus, they are also synchronized with stderr by default, although there is no direct counterpart in the C standard library.

    <p>Depending on the implementation, this synchronization might imply some often unnecessary overhead. For example, if the standard C++ streams are implemented using the standard C files, this basically inhibits buffering in the corresponding stream buffers. However, the buffer in the stream buffers is necessary for some optimizations especially during formatted reading (see Section 13.14.2). To allow switching to a better implementation, the static member function sync_with_stdio() is defined for the class ios_base (Table 13.46).</P>
    
    <P><table width="100%">
    

    Table?3.46. Synchronizing Standard C++ and Standard C Streams

    Static Function Meaning Sync_with_stdio() Returns whether the standard stream objects are synchronized with standard C streams Sync_with_stdio(false) Disables the synchronization of C++ and C streams provided it is called before any I/O

    <P>sync_with_stdio() takes an optional Boolean value as argument that determines whether the synchronization with the standard C streams should be turned on. Thus, to turn the synchronization off you have to pass false as the argument:</P>
    
    <PRE>
        
    

    std::ios::sync_with_stdio(false);// disable synchronization

    </prE>
    
    <P>Note that you have to disable the synchronization before any other I/O operation. Calling this function after any I/O has occurred results in implementation-defined behavior.</P>
    
    <P>The function returns the previous value with which the function was called. If not called before, it always returns true to reflect the default setup of the standard streams.</p>
    

    13.14.2
    Buffering in Stream Buffers

    Buffering I/O is important for efficiency. One reason for this is that system calls are, in general, relatively expensive and it pays to avoid them if possible. There is, however, another more subtle reason in C++ for doing buffering in stream buffers, at least for input: The functions for formatted I/O use stream buffer iterators to access the streams, and operating on stream buffer iterators is slower than operating on pointers. The difference is not that big, but it is sufficient to justify improved implementations for frequently used operations like formatted reading of numeric values. However, for such improvements it is essential that stream buffers are buffered.

    <p>Thus, all I/O is done using stream buffers, which implement a mechanism for buffering. However, it is not sufficient to rely solely on this buffering because there arc three aspects that conflict with effective buffering:</p>
    
  • It is often simpler to implement stream buffers without buffering. If the corresponding streams are not used frequently or are only used for output (for output the difference between stream buffer iterators and pointers is not as bad as for input; the main problem is comparing stream buffer iterators), buffering is probably not that important. However, for stream buffers that are used extensively, buffering should definitely be implemented.

  • The flag unitbuf causes output streams to flush the stream after each output operation. Correspondingly, the manipulators flush and endl also flush the stream. For the best performance all three should probably be avoided. However, when writing to the console, for example, it is probably still reasonable to flush the stream after writing complete lines. If you are stuck with a program that makes heavy use of unitbuf, flush, or endl, you might consider using a special stream buffer that does not use sync() to flush the stream buffer but uses some other function that is called when appropriate.

  • Tieing streams with the tie() function (see Section 13.10.1,) also results in additional flushing of streams. Thus, streams should only be tied if it is really necessary.

        </LI>
    
    
    <p>When implementing new stream buffers, it may be reasonable to implement them without buffering first. Then, if the stream buffer is identified as a bottleneck, it is still possible to implement buffering without affecting anything in the remainder of the application.</p>
    

    13.14.3
    Using Stream Buffers Directly

    All member functions of the class basic_istream and basic_ostream that read or write characters operate according to the same schema: First, a corresponding sentry object is constructed, then the actual operation is performed. The construction of the sentry object results in flushing of potentially tied objects, skipping of whitespace (for input only), and implementation-specific operations like locking in multithreaded environments (see Section 13.12.4).

    <P>For unformatted I/O, most of the operations are normally useless anyway. Only the locking operation might be useful if the streams are used in multithreaded environments (note that the C++ standard does not address multithreading). Thus, when doing unformatted I/O it is normally much better to use stream buffers directly.</P>
    
    <p>To support this behavior, you can use operators &lt;&lt; and >> with stream buffers as follows:</p>
    
    <UL>
    
  • By passing a pointer to a stream buffer to operator <<, you can output all input of its device. This is probably the fastest way to copy files by using C++ I/O streams. For example:

    // io/copy1.cpp

    #include <iostream>

    int main()
    {
    // copy all standard input to standard output
    std::cout << std::cin.rdbuf();
    }

    Here, rdbuf() yields the buffer of cin (see page 638). Thus, the program copies all standard input to standard output.

  • By passing a pointer to a stream buffer to operator >>, you can read directly into a stream buffer.

    For example, you could also copy all standard input to standard output in the following way:

    // io/copy2.cpp

    #include <iostream>

    int main()
    {
    // copy all standard input to standard output
    std::cin >> std::cout.rdbuf();
    }

    Note that you have to clear the flag skipws. Otherwise, leading whitespace of the input is skipped.

        </Li>
    

    Even for formatted I/O it may be reasonable to use stream buffers directly. For example, if lots of numeric values are read in a loop, it is sufficient to construct just one sentry object that exists for the whole time the loop is executed. Then, within the loop, whitespace is skipped manually (using the ws manipulator would also construct a sentry object) and then the facet num_get (see Section 14.4.1,) is used for reading the numeric values directly.

    <P>Note that a stream buffer has no error state of its own. It also has no knowledge of the input or ouput stream that might connect to it. So, inside of:</P>
    
    <pre>
        
    

    //copy contents of in to out
    out in.rdbuf();

    </Pre>
    
    <p>there is no way to change the error state of in due to a failure of end-of-file.</P>