您正在查看: c++ 分类下的文章

15.2 Using Allocators as a Library Programmer

15.2 Using Allocators as a Library Programmer

This section describes the use of allocators from the viewpoint of people who use allocators to implement containers and other components that are able to handle different allocators. This section is based, with permission, partly on Section 19.4 of Bjarne Stroustrup's The C++ Programming Language, 3rd edition.

Allocators provide an interface to allocate, create, destroy, and deallocate objects (Table 15.1). With allocators, containers and algorithms can be parameterized by the way the elements are stored. For example, you could implement allocators that use shared memory or that map the elements to a persistent database.

Table?5.1. Fundamental Allocator Operations
Expression Effect
a.allocate(num) Allocates memory for num elements
a.construct(p) Initializes the element to which p refers
a.destroy(p) Destroys the element to which p refers
a.deallocate(p,num) Deallocates memory for num elements to which p refers

As an example, let's look at a naive implementation of a vector. A vector gets its allocator as a template or a constructor argument and stores it somewhere internally:

namespace std {
template <class T,
class Allocator = allocator<T> >
class vector {
...
private:
Allocator alloc; //allocator
T* elems; //array of elements
size_type numElems; //number of elements
size_type sizeElems; //size of memory for the elements
...

public:
//constructors
explicit vector(const Allocator& = Allocator());
explicit vector(size_type num, const T& val = T(),
const Allocator& = Allocator());
template <class InputIterator>
vector(InputIterator beg, InputIterator end,
const Allocator& = Allocator());
vector(const vector<T,Allocator>& v);
...
};
}

The second constructor that initializes the vector by num elements of value val could be implemented as follows:

namespace std {
template <class T, class Allocator>
vector<T,Allocator>::vector(size_type num, const T& val,
const Allocator& a)
: alloc(a) //initialize allocator
{
//allocate memory
sizeElems = numElems = num;
elems = alloc.allocate(num);

//initialize elements
for (size_type i=0; i<num; ++i) {
//initialize ith element
alloc.construct(&elems[i],val);
}
}
}

Table?5.2. Convenience Functions for Uninitialized Memory
Expression Effect
uninitialized_fill(beg,end,val) Initializes [beg, end) with val
uninitialized_fill_n(beg,num,val) Initializes num elements starting from beg with val
uninitialized_copy(beg,end,mem) Initialize elements starting from mem with the elements of [beg,end)

However, for the initialization of uninitialized memory the C++ standard library provides some convenience functions (Table 15.2). Using these functions, the implementation of the constructor becomes even simpler:

namespace std {
template <class T, class Allocator>
vector<T,Allocator>::vector(size_type num, const T& val,
const Allocator& a)
: alloc(a) //initialize allocator
{
//allocate memory
sizeElems = numElems = num;
elems = alloc.allocate(num);

//initialize elements
uninitialized_fill_n(elems, num, val);
}
}

The member function reserve(), which reserves more memory without changing the number of elements (see page 149), could be implemented as follows:

namespace std {
template <class T, class Allocator>
void vector<T,Allocator>::reserve(size_type size)
{
//reserve() never shrinks the memory
if (size <= sizeElems) {
return;
}

//allocate new memory for size elements
T* newmem = alloc.allocate (size);

//copy old elements into new memory
uninitialized_copy(elems,elems+numElems,newmem);

//destroy old elements
for (size_type i=0; i<numElems; ++i) {
alloc.destroy(&elems [i]);
}

//deallocate old memory
alloc.deallocate(elems,sizeElems);

//so, now we have our elements in the new memory
sizeElems = size;
elems = newmem;
}
}

Raw Storage Iterators

In addition, class raw_storage_iterator is provided to iterate over uninitialized memory to initialize it. Therefore, you can use any algorithms with a raw_storage_iterator to initialize memory with the values that are the result of that algorithm.

<p>For example, the following statement initializes the storage to which elems refers by the values in range [x.begin(),x.end()):</P>

<PRe>
    

copy (x.begin(), x.end(),//source
raw_storage_iterator<T*,T>(elems));//destination

<p>The first template argument (T*, here) has to be an output iterator for the type of the elements. The second template argument (T, here) has to be the type of the elements.</p>

Temporary Buffers

In code you might also find the get_temporary_buffer() and return_temporary_buffer(). They are provided to handle uninitialized memory that is provided for short, temporary use inside a function. Note that get_temporary_buffer() might return less memory than expected. Therefore, get_temporary_buffer() returns a pair containing the address of the memory and the size of the memory (in element units). Here is an example of how to use it:

<PRE>
    

void f()
{
//allocate memory for num elements of type MyType
pair<MyType*,ptrdiff_t> p = get_temporary_buffer<MyType>(num);
if (p.second == 0) {
//could not allocate any memory for elements
...
}
else if (p.second < num) {
//could not allocate enough memory for num elements
//however, don't forget to deallocate it
...
}

//do your processing
...

//free temporarily allocated memory, if any
if (p.first != 0) {
return_temporary_buffer(p.first);
}
}

<p>However, it is rather complicated to write exception-safe code with get_temporary_buffer() and return_temporary_buffer(), so they are usually no longer used in library implementations.</P>


    

15.3 The Default Allocator

15.3 The Default Allocator

The default allocator is declared as follows:

namespace std {
template <class T>
class allocator {
public:
//type definitions
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
typedef Tvalue_type;

//rebind allocator to type U
template <class U>
struct rebind {
typedef allocator<U> other;
};

//return address of values
pointer address(reference value) const;
const_pointer address(const_reference value) const;

//constructors and destructor
allocator() throw();
allocator(const allocator&) throw();
template <class U>
allocator(const allocator<U>&) throw();
~allocator() throw();

//return maximum number of elements that can be allocated
size_type max_size() const throw();

// allocate but don't initialize num elements of type T
pointer allocate(size_type num,
allocator<void>::const_pointer hint = 0);

// initialize elements of allocated storage p with value value
void construct(pointer p, const T& value);

// delete elements of initialized storage p
void destroy(pointer p);

// deallocate storage p of deleted elements
void deallocate(pointer p, size_type num);
};
}

The default allocator uses the global operators new and delete to allocate and deallocate memory. Thus, allocate() may throw a bad_alloc exception. However, the default allocator may be optimized by reusing deallocated memory or by allocating more memory than needed to save time in additional allocations. So, the exact moments when operator new and operator delete are called are unspecified. See page 735 for a possible implementation of the default allocator.

There is a strange definition of a template structure inside the allocator, called rebind. This template structure provides the ability that any allocator may allocate storage of another type indirectly. For example, if Allocator is an allocator type, then

Allocator::rebind<T2>::other

is the type of the same allocator specialized for elements of type T2.

rebind<> is useful if you implement a container and you have to allocate memory for a type that differs from the element's type. For example, to implement a deque you typically need memory for arrays that manage blocks of elements (see the typical implementation of a deque on page 160). Thus, you need an allocator to allocate arrays of pointers to elements:

namespace std {
template <class T,
class Allocator = allocator<T> >
class deque {
...
private:
//rebind allocator for type T* typedef typename Allocator::rebind<T*>::other PtrAllocator;

Allocator alloc; //allocator for values of type T
PtrAllocator block_alloc; //allocator for values of type T* T** elems; //array of blocks of elements
...
};
}

To manage the elements of a deque you have to have one allocator to handle arrays/blocks of elements and another allocator to handle the array of element blocks. The latter has type PtrAllocator, which is the same allocator as for the elements. By using rebind<> the Allocator for the elements (Allocator) is bound to the type of an array of elements (T*).

The default allocator has the following specialization for type void:

namespace std {
template <>
class allocator<void> {
public:
typedef void* pointer;
typedef const void* const_pointer;
typedef void value_type;
template <class U>
struct rebind {
typedef allocator<U> other;
};
};
}

14.3 Locales in Detail

14.3 Locales in Detail

A C++ locale is an immutable container for facets. It is defined in the <locale> header file as follows:

namespace std {
class locale {
public:
// global locale objects
static const locale& classic();//classic C locale
static locale global(const locale&); //set global locale

// internal types and values
class facet;
class id;
typedef int category;
static const category none, numeric, time, monetary,
ctype, collate, messages, all;

// constructors
locale() throw();
explicit locale (const char* name);

// create locale based on other locales
locale (const locale& loc) throw();
locale (const locale& loc, const char* name, category);
template <class Facet>
locale (const locale& loc, Facet* fp);
locale (const locale& loc, const locale& loc2, category);

// assignment operator
const locale& operator= (const locale& loc) throw();
template <class Facet>
locale combine (const locale& loc);

// destructor
~locale() throw();

//name (if any)
basic_string<char> name() const;

// comparisons
bool operator== (const locale& loc) const;
bool operator!= (const locale& loc) const;

//sorting of strings
template <class charT, class Traits, class Allocator>
bool operator() (
const basic_string<charT,Traits,Allocator>& s1,
const basic_string<charT,Traits,Allocator>& s2) const;
};

//facet access
template <class Facet>
const Facet& use_facet (const locale&);
template <class Facet>
bool has_facet (const locale&) throw();
}

The strange thing about locales is how the objects stored in the container are accessed. A facet in a locale is accessed using the type of the facet as the index. Because each facet exposes a different interface and suits a different purpose, it is desirable to have the access function to locales return a type corresponding to the index. This is exactly what can be done with a type as the index. Using the facet's type as an index has the additional advantage of having a type-safe interface.

Locales are immutable. This means the facets stored in a locale cannot be changed (except when locales are being assigned). Variations of locales are created by combining existing locales and facets to create a new locale. Table 14.4 lists the constructors for locales.

Table?4.4. Constructing Locales
Expression Effect
locale() Creates a copy of the current global locale
locale (name) Creates a locale from the string name
locale (loc) Creates a copy of locale loc
locale (loc1,loc2, cat) Creates a copy of locale loc1, with all facets from category cat replaced with facets from locale loc2
locale (loc,name,cat) Equivalent to locale(loc, locale (name) ,cat)
locale (loc,fp) Creates a copy of locale loc and installs the facet to which fp refers
loc1 = loc2 Assigns locale loc2 to locale loc1
loc1. template combined<F > (loc2) Creates a copy of locale loc1 but with the facet of type F taken from loc2

Almost all constructors create a copy of some other locale. Merely copying a locale is considered to be a cheap operation. Basically, it consists of setting a pointer and increasing a reference count. Creating a modified locale is more expensive. In this case, a reference count for each facet stored in the locale has to be adjusted. Although the standard makes no guarantees about such efficient behavior, it is likely that all implementations will be rather efficient for copying locales.

Two of the constructors listed in Table 14.4 take names of locales. The names accepted are not standardized, with the exception of the name C. However, the standard requires that the documentation with the C++ standard library lists the accepted names. It is assumed that most implementations will accept names as outlined in Section 14.2.

The member function combine() needs some explanation because it uses a feature that was implemented in compilers only recently. It is a member function template with an explicitly specified template argument. This means the template argument is not deduced implicitly from an argument because there is no argument from which the type can be deduced. Instead, the template argument is specified explicitly (type F in (his case).

The two functions that access facets in a locale object use the same technique (Table 14.5). The major difference is that these two functions are global template functions, thereby making this ugly syntax involving the template keyword unnecessary.

The function use_facet() returns a reference to a facet. The type of this reference is the type passed explicitly as the template argument. If the locale passed as the argument does not contain a corresponding facet, the function throws a bad_cast exception. The function has_facet() can be used to test whether a particular facet is present in a given locale.

Table?4.5. Accessing Facets
Expression Effect
has_facet<F>(loc) Returns true if a facet of type F is stored in locale loc
use_facet<F> (loc) Returns a reference to the facet of type F stored in locale loc

The remaining operations of locales are listed in Table 14.6. The name of a locale is maintained if the locale was constructed from a name, or one or more named locales. However, again, the standard makes no guarantees about the construction of a name resulting from combining two locales. Two locales are considered to be identical if one is a copy of the other or if both locales have the same name. It is natural to consider two objects to be identical if one is a copy of the other. But what about this naming stuff? The idea behind this is basically that the name of the locale reflects the names used to construct the named facets. For example, the locale's name might be constructed by joining the names of the facets in a particular order, separating the individual names by separation characters. Using this scheme it would possible to identify two locale objects as identical if they are constructed by combining the same named facets into locale objects. In other words, the standard basically requires that two locales consisting of the same set of named facets be considered identical. Thus, the names will probably be constructed carefully to support this notion of equality.

Table?4.6. Operations of Locales
Expression Effect
loc.name() Returns the name of locale loc as string
loc1 == loc2 Returns true if loc1 and loc2 are identical locales
loc1 != loc2 Returns true if loc1 and loc2 are different locales
loc(str1 ,str2) Returns the Boolean result of comparing strings str1 and str2 for ordering (whether str1 is less than str2)
locale::classic() Returns locale("C")
locale::global (loc) Installs loc as the global locale and returns the previous global locale

The parentheses operator makes it possible to use a locale object as a comparator for strings. This operator uses the string comparison from the collate facet to compare the strings passed as the argument for ordering. Thus, it returns whether one string is less than the other string according to the locale object. This is the behavior of an STL function object (see Section 8.1,), so you can use a locale object as a sorting criterion for STL algorithms that operate on strings. For example, a vector of strings can be sorted according to the rules for string collation of the German locale as follows:

std::vector<std::string> v;
...
// sort strings according to the German locale
std::sort (v.begin(),v.end(), //range
locale("de_DE")); //sorting criterion

14.4 Facets in Detail

14.4 Facets in Detail

The important aspect of locales are the contained facets. All locales are guaranteed to contain certain standard facets. The description of the individual facets in the following subsections provides which instantiations of the corresponding facet are guaranteed. In addition to these facets, an implementation of the C++ standard library may provide additional facets in the locales. What is important is that the user can also install her own facets or replace standard ones.

Section 14.2.2, discussed how to install a facet in a locale. For example, the class germanBoolNames was derived from the class numpunct_byname<char>, one of the standard facets, and installed in a locale using the constructor, taking a locale and a facet as arguments. But what do you need to create your own facet? Every class F that conforms to the following two requirements can be used as a facet:

  • F derives publically from class locale::facet. This base class mainly defines some mechanism for reference counting that is used internally by the locale objects. It also declares the copy constructor and the assignment operator to be private, thereby making it infeasible to copy or to assign facets.

  • F has a publically accessible static member named id of type locale::id. This member is used to look up a facet in a locale using the facet's type. The whole issue of using a type as the index is to have a type-safe interface. Internally, a normal container with an integer as the index is used to maintain the facets.

    </li>
    

    The standard facets conform not only to these requirements but also to some special implementation guidelines. Although conforming to these guidelines is not required, doing so is useful. The guidelines are as follows:

  • All member functions are declared to be const. This is useful because use_facet() returns a reference to a const facet. Member functions that are not declared to be const can't be invoked.

  • All public functions are nonvirtual and delegate each request to a protected virtual function. The protected function is named like the public one, with the addition of a leading do_. For example, numpunct::truename() calls numpunct::do_truename(). This style is used to avoid hiding member functions when overriding only one of several virtual member functions that has the same name. For example, the class num_put has several functions named put(). In addition, it gives the programmer of the base class the possibility of adding some extra code in the nonvirtual functions, which is executed even if the virtual function is overridden.

    </LI>
    

    The following description of the standard facets concerns only the public functions. To modify the facet you have always to override the corresponding protected functions. If you define functions with the same interface as the public facet functions, they would only overload them because these functions are not virtual.

    For most standard facets, a "_byname" version is defined. This version derives from the standard facet and is used to create an instantiation for a corresponding locale name. For example, the class numpunct_byname is used to create the numpunct facet for a named locale. For example, a German numpunct facet can be created like this:

    std::numpunct_byname("de_DE")

    The _byname classes are used internally by the locale constructors that take a name as an argument. For each of the standard facets supporting a name, the corresponding _byname class is used to construct an instant of the facet.

    14.4.1
    Numeric Formatting

    Numeric formatting converts between the internal representation of numbers and the corresponding textual representations. The iostream operators delegate the actual conversion to the facets of the locale::numeric category. This category is formed by three facets:

  • numpunct, which handles punctuation symbols used for numeric formatting and parsing

  • num_put, which handles numeric formatting

  • num_get, which handles numeric parsing

        </li>
    
    
    <p>In short, the facet num_put does the numeric formatting described for iostreams in Section 13.7, and num_get parses the corresponding strings. Additional flexibility not directly accessible through the interface of the streams is provided by the numpunct facet.</p>
    
    Numeric Punctuation
        <P>The numpunct facet controls the symbol used as the decimal point, the insertion of optional thousands separators, and the strings used for the textual representation of Boolean values. Table 14.7 lists the members of numpunct.</p>
    
        <p><table width="100%">
    

    Table?4.7. Members of the numpunct Facet

    Expression Meaning np.decimal_point() Returns the character used as the decimal point np.thousands_sep() Returns the character used as the thousands separator np.grouping() Returns a string describing the positions of the thousands separators np.truename() Returns the textual representation of true np.falsename() Returns the textual representation of false

        <P>numpunct takes a character type charT as the template argument. The characters returned from decimal_point() and thousand.sep() are of this type, and the functions truename() and falsename() return a basic_string&lt;charT>. The two instantiations numpunct&lt;char> and numpunct&lt;wchar_t> are required.</p>
    
        <p>Because long numbers are hard to read without intervening characters, the standard facets for numeric formatting and numeric parsing support thousands separators. Often, the digits representing an integer are grouped into triples. For example, one million is written like this:</p>
    
        <PRE>
    

    1,000,000

        </Pre>
    
        <p>Unfortunately, it is not used everywhere exactly like that. For example, in German a period is used instead of a comma. Thus, a German would write one million like this:</p>
    
        <PRE>
    

    1.000.000

        </Pre>
    
        <p>This difference is covered by the thousands_sep() member. But this is not sufficient because in some countries digits are not put into triples. For example, in Nepal people would write</p>
    
        <pre>
    

    10.00.000

        </pre>
    
        <p>using even different numbers of digits in the groups. This is where the string returned from the function grouping() comes in. The number stored at index i gives the number of digits in the ith group, where counting starts with zero for the rightmost group. If there are fewer characters in the string than groups, the size of the last specified group is repeated. To create unlimited groups, you can use the value numeric_limits&lt;char>: :max() or, if there is no group at all, the empty string.Table 14.8 lists some examples of the formatting of one million.</p>
    
        <P><table width="100%">
    

    Table?4.8. Examples of Numeric Punctuation of One Million

    String Result { 0 } or "" (the default for grouping()) 1000000 { 3, 0 } or "\3" 1,000,000 { 3, 2, 3, 0 } or "\3\2\3" 10,00,000 { 2, CHAR_MAX, 0 } 10000,00

        <p>Note that normal digits are usually not very useful. For example, the string "2" specifies groups of 50 digits for ASCII encoding because the character '2' has the integer value 50 in the ASCII character set.</P>
    
    
    Numeric Formatting
        <P>The num_put facet is used for textual formatting of numbers. It is a template class that takes two template arguments: the type charT of the characters to be produced and the type OutIt of an output iterator to the location at which the produced characters are written. The output iterator defaults to ostreambuf _iterator&lt;charT>. The num_put facet provides a set of functions, all called put() and differing only in the last argument. You can use the facet as follows:</P>
    
        <PRE>
    

    std::localeloc;
    OutItto = ...;
    std: : ios_base& fmt = ...;
    charTfill = ...;
    T value = ...;

    //get numeric output facet of the loc locale
    const std::num_put<charT,OutIt>& np
    = std::use_facet<std::num_put<charT,OutIt>(loc);

    //write value with numeric output facet
    np.put(to, fmt, fill, value);

        </pRE>
    
        <P>These statements would produce a textual representation of the value value using characters of type charT written to the output iterator to. The exact format is determined from the formatting flags stored in fmt, where the character fill is used as a fill character. The put() function returns an iterator pointing immediately after the last character written.</p>
    
        <P>The facet num_put provides member functions that take objects of types bool, long, unsigned long, double, long double, and void* as the last argument. It does not provide member functions, for example, for short or int. This is no problem because corresponding values of built-in types are promoted to supported types if necessary.</p>
    
        <p>The standard requires that the two instantiations num_put&lt;char> and num_put&lt;wchar_t> are stored in each locale (both using the default for the second template argument). In addition, the C++ standard library supports all instantiations that take a character type as the first template argument and an output iterator type as the second. Of course, it is not required that all of these instantiations are stored in each locale because this would be an infinite amount of facets.</p>
    
    
    Numeric Parsing
        <p>The facet num_get is used to parse textual representations of numbers. Corresponding to the facet num_put, it is a template that takes two template arguments: the character type charT and an input iterator type InIt, which defaults to istreambuf _iterator&lt;charT>. It provides a set of get() functions that differ only in the last argument. You can use the facet as follows:</P>
    
        <PRe>
    

    std::localeloc;
    InIt from = ...;
    InIt end = ...;
    std::ios_base&fmt = ...;
    std::ios_base::ios_state err;
    T value;

    //get numeric input facet of the loc locale
    const std::num_get<charT,InIt>& ng
    = std::use_facet<std::num_get<charT,InIt>(loc);

    // read value with numeric input facet
    ng.get(from, end, fmt, err, value);

        </PRE>
    
        <P>These statements attempt to parse a numeric value corresponding to the type T from the sequence of characters between from and end. The format of the expected numeric value is defined by the argument fmt. If the parsing fails, err is modified to contain the value ios_base: :failbit. Otherwise, ios_base: :goodbit is stored in err and the parsed value in value. The value of value is modified only if the parsing is successful. get() returns the second parameter (end) if the sequence was used completely. Otherwise, it returns an iterator pointing to the first character that could not be parsed as part of the numeric value.</p>
    
        <p>The facet num_get supports functions to read objects of the types bool, long, unsigned short, unsigned int, unsigned long, float, double, long double, and void*. There are some types for which there is no corresponding function in the num_put facet; for example, unsigned short. This is because writing a value of type unsigned short produces the same result as writing a value of type unsigned short promoted to an unsigned long. However, reading a value as type unsigned long and then converting it to unsigned short may yield a different value than reading it as type unsigned short directly.</P>
    
        <P>The standard requires that the two instantiations num_get&lt;char> and num_get&lt;wchar_t> be stored in each locale (both using the default for the second template argument). In addition, the C++ standard library supports all instantiations that take a character type as the first template argument and an input iterator type as the second. As with num_put, not all supported instantiations are required to be present in all locale objects.</p>
    

    14.4.2
    Time and Date Formatting

    The two facets time_get and time_put in the category time provide services for parsing and formatting times and dates. This is done by the member functions that operate on objects of type tm. This type is defined in the header tile <ctime>. The objects are not passed directly; rather, a pointer to them is used as the argument.

    <P>Both facets in the time category depend heavily on the behavior of the function strftime() (also defined in the header file &lt;ctime>). This function uses a string with conversion specifiers to produce a string from a tm object. Table 14.9 provides a brief summary of the conversion specifiers. The same conversion specifiers are also used by the time_put facet.</p>
    
    <p>Of course, the exact string produced by strftime() depends on the C locale in effect. The examples in the table are given for the "C" locale.</P>
    
    Time and Date Parsing
        <P>The facet time_get is a template that takes a character type charT and an input iterator type InIt as template arguments. The input iterator type defaults to istreambuf _iterator&lt;charT>.
    

    Table 14.10 lists the members defined for the time_get facet. All of these members, except date_order(), parse the string and store the results in the tm object pointed to by the argument t. If the string could not be parsed correctly, either an error is reported (for example, by modifying the argument err) or an unspecified value is stored. This means that a time produced by a program can be parsed reliably but user input cannot. With the argument fmt, other facets used during parsing are determined. Whether other flags from fmt have any influence on the parsing is not specified.

        <p>All functions return an iterator that has the position immediately after the last character read. The parsing stops if parsing is complete or if an error occurs (for example, because a string could not be parsed as a date).</p>
    
        <P>A function reading the name of a weekday or a month reads both abbreviated names and full names. If the abbreviation is followed by a letter, which would be legal for a full name, the function attempts to read the full name. If this fails, the parsing fails, even though an abbreviated name was already parsed successfully.</P>
    
        <P><table width="100%">
    

    Table?4.9. Conversion Specifiers for strftime()

    Specifier Meaning Example %a Abbreviated weekday Mon %A Full weekday Monday %b Abbreviated month name Jul %B Full month name July %c Locale's preferred date and time representation Jul 12 21:53:22 1998 %d Day of the month 12 %H Hour of the day using a 24-hour clock 21 %I Hour of the day using a 12-hour clock 9 %j Day of the year 193 %m Month as decimal number 7 %M Minutes 53 %P Morning or evening (am or pm) pm %S Seconds 22 %U Week number starting with the first Sunday 28 %W Week number starting with the first Monday 28 %w Weekday as a number (Sunday == 0) 0 %x Locale's preferred date representation Jul 12 1998 %X Locale's preferred time representation 21:53:22 %y The year without the century 98 %Y The year with the century 1998 %Z The time zone MEST %% The literal % 7.

        <p>Whether a function that is parsing a year allows two-digit years is unspecified. The year that is assumed for a two-digit year, if it is allowed, is also unspecified.</p>
    
        <p>date_order() returns the order in which the day, month, and year appear in a date string. This is necessary for some dates because the order cannot be determined from the string representing a date. For example, the first day in February in the year 2003 may be printed either as 3/2/1 or as 1/2/3. Class time_base, which is the base class of the facet time_get, defines an enumeration called dateorder for possible dale order values. Table 14.11 lists these values.</P>
    
        <p>The standard requires that the two instantiations time_get&lt;char> and time_get&lt;wchar_t> are stored in each locale. In addition, the C++ standard library supports all instantiations that take char or wchar_t as the first template argument, and a corresponding input iterator as the second. All of these instantiations are not required to be stored in each locale object.</P>
    
        <P><table width="100%">
    

    Table?4.10. Members of the time_get Facet

    Expression Meaning tg.get_time (from , to , fmt , err , t ) Parses the string between from and to as the time produced by the X specifier for strftime() tg.get_date(from,to,fmt ,err,t) Parses the string between from and to as the date produced by the x specifier for strftime() tg.get_weekday (from, to , fmt , err , t ) Parses the string between from and to as the name of the weekday tg.get_monthname (from , to , fmt , err , t ) Parses the string between from and to as the name of the month tg.get_year (from, to , fmt , err , t ) Parses the string between from and to as the year tg.date_order( ) Returns the date order used by the facet

        <p><table width="100%">
    

    Table?4.11. Members of the Enumeration dateorder

    Value Meaning no_order No particular order (for example, a date may be in Julian format) dmy The order is day, month, year mdy The order is month, day, year ymd The order is year, month, day ydm The order is year, day, month

    Time and Date Formatting
        <p>The facet time_put is used for formatting times and dates. It is a template that takes as arguments a character type charT and an optional output iterator type Out It. The latter defaults to type ostreambuf_iterator (see page 665).</p>
    
        <p>The facet time_put defines two functions called put(), which are used to convert the date information stored in an object of type tm into a sequence of characters written to an output iterator. Table 14.12 lists the members of the facet time_put.</P>
    
        <p><table width="100%">
    

    Table?4.12. Members of the time_put Facet

    Expression Meaning tp.put (oit , fmt ,fill , t , cbeg , cend) Converts according to the string [cbeg,cend) tp.put (oit , fmt , fill , t , cvt ,mod) Converts using the conversion specifier cvt

        <p>Both functions write their results to the output iterator oit and return an iterator pointing immediately after the last character produced. The argument I is of type ios_base and is used to access other facets and potentially additional formatting information. The character fill is used when a space character is needed and for filling. The argument t points to an object of type tm that is storing the date to be formatted.</P>
    
        <P>The version of put() that takes two characters as the last two arguments formats the date found in the tm object to which t refers, interpreting the argument cvt like a conversion specifier to strftime(). This put() function does only one conversion; namely, the one specified by the cvt character. This function is called by the other put() function for each conversion specifier found. For example, using 'X' as the conversion specifier results in the time that is stored in *t being written to the output iterator. The meaning of the argument mod is not defined by the standard. It is intended to be used as a modifier to the conversion as found in several implementations of the strftime() function.</p>
    
        <P>The version of put() that takes a string defined by the range [cbeg,cend) to guide the conversion behaves very much like strftime(). It scans the string and writes every character that is not part of a conversion specification to the output iterator oit. If it encounters a conversion specification introduced by the character %, it extracts an optional modifier and a conversion specifier. The function continues by calling the other version of put(), using the conversion specifier and the modifier as the last two arguments. After processing a conversion specification, put() continues to scan the string.</P>
    
        <P>Note that this facet is somewhat unusual because it provides a nonvirtual member function; namely, the function put(), which uses a string as the conversion specification. This function cannot be overridden in classes derived from time_put. Only the other put() function can be overridden.</p>
    
        <p>The standard requires that the two instantiations time_put&lt;char> and time_put&lt;wchar_t> are stored in each locale. In addition, the C++ standard library supports all instantiations that take char or wchar_t as the first template argument and a corresponding output iterator as the second. There is no guaranteed support for instantiations using a type other than char or wchar_t as the first template argument. Also, it is not guaranteed that any instantiations other than time_put&lt;char> and time_put&lt;wchar_t> be stored in locale objects by default.</p>
    

    14.4.3
    Monetary Formatting

    The category monetary consists of the facets moneypunct, money_get, and money_put. The facet moneypunct defines the format of monetary values. The other two use this information to format or to parse a monetary value.

    Monetary Punctuation
        <p>Monetary values are printed differently depending on the context. The formats used in different cultural communities differ widely. Examples of the varying details are the placement of the currency symbol (if present at all), the notation for negative or positive values, the use of national or international currency symbols, and the use of thousands separators. To provide the necessary flexibility, the details of the format are factored into the facet moneypunct.</p>
    
        <p>The facet moneypunct is a template that takes as arguments a character type charT and a Boolean value that defaults to false. The Boolean value indicates whether local (false) or international (true) currency symbols are to be used. Table 14.13 lists the members of the facet moneypunct.</P>
    
        <P><table width="100%">
    

    Table?4.13. Members of the moneypunct Facet

    Expression Meaning mp.decimal_point() Returns a character to be used as the decimal point mp.thousands _ sep() Returns a character to be used as the thousands separator mp.grouping() Returns a string specifying the placement of the thousands separators mp.curr_symbol() Returns a string with the currency symbol mp.positive_sign() Returns a string with the positive sign mp.negative_sign() Returns a string with the negative sign mp.frac_digits() Returns the number of fractional digits mp.pos_format() Returns the format to be used for non-negative values mp.neg_format() Returns the format to be used for negative values

        <p>moneypunct derives from the class money_base. This base class defines an enumeration called part, which is used to form a pattern for monetary values. The class also defines a type called pattern (which is actually a type definition for char [4]). This type is used to store four values of type part that form a pattern describing the layout of a monetary value. Table 14.14 lists the five possible parts that can be placed in a pattern.</p>
    
        <p><table width="100%">
    

    Table?4.14. Parts of Monetary Layout Patterns

    Value Meaning none At this position, spaces may appear but are not required space At this position, at least one space is required sign At this position, a sign may appear symbol At this position, the currency symbol may appear value At this position, the value appears

        <P>moneypunct defines two functions that return patterns: the function neg_format() for negative values and the function pos_format() for non-negative values. In a pattern, each of the parts sign, symbol, and value is mandatory, and one of the parts none and space has to appear. This does not mean, however, that there is really a sign or a currency symbol printed. What is printed at the positions indicated by the parts depends on the values returned from other members of the facet and on the formatting flags passed to the functions for formatting.</P>
    
        <P>Only the value always appears. Of course, it is placed at the position where the part value appears in the pattern. The value has exactly frac_digits() fractional digits, with decimal_point() used as the decimal point (unless there are no fractional digits, in which case no decimal point is used).</p>
    
        <p>The value may be interspersed with thousands separators, unless the string that is returned from grouping() is empty. The character used for the thousands separator is the one returned from thousands_sep(). The rules for the placement of the thousands separators are identical to the rules for numeric formatting (see page 705). When monetary values are printed, thousands separators are always inserted according to the string returned from grouping(). When monetary values are read, thousands separators are optional unless the grouping string is empty. The correct placement of thousands separators is checked after all other parsing is successful.</P>
    
        <P>The parts space and none control the placement of spaces. space is used at a position where at least one space is required. During formatting, if ios_base: :internal is specified in the format flags, fill characters are inserted at the position of the space or the none part. Of course, filling is done only if the minimum width specified is not used with other characters. The character used as the space character is passed as the argument to the functions for the formatting of monetary values. If the formatted value does not contain a space, none can be placed at the last position. space and none may not appear as the first part in a pattern, and space may not be the last part in a pattern.</p>
    
        <p>Signs for monetary values may consist of more than one character. For example, in certain contexts parentheses around a value are used to indicate negative values. At the position where the sign part appears in the pattern, the first character of the sign appears. All other characters of the sign appear at the end after all other components. If the string for a sign is empty, no character indicating the sign appears. The character that is to be used as a sign is determined with the function positive_sign() for non-negative values and negative_sign() for negative values.</P>
    
        <P>At the position of the symbol part, the currency symbol appears. The symbol is present only if the formatting flags used during formatting or parsing have the ios_base::showbase flag set. The string returned from the function curr_symbol() is used as the currency symbol. The currency symbol is a local symbol to be used to indicate the currency if the second template argument is false (the default). Otherwise, an international currency symbol is used.</P>
    
        <P>Table 14.15 illustrates all of this, using the value $-1234.56 as an example. Of course, this means that frac_digits() returns 2. In addition, a width of 0 is always used.</p>
    
        <p>The standard requires that the instantiations moneypunct&lt;char>, moneypunct&lt;wchar_t>, moneypunct&lt;char, true>, and moneypunct&lt;wchar_t, true> are stored in each locale. The C++ standard library does not support any other instantiation.</P>
    
    
    Monetary Formatting
        <P>The facet money_put is used to format monetary values. It is a template that takes a character type charT as the first template argument and an output iterator OutIt as the second. The output iterator defaults to ostreambuf _iterator&lt;charT>. The two member functions put() produce a sequence of characters corresponding to the format specified by a moneypunct facet. The value to be formatted is either passed as type long double or as type basic_string&lt;charT>. You can use the facet as follows:</P>
    
        <p><table width="100%">
    

    Table?4.15. Examples of Using the Monetary Pattern

    Pattern Sign Result symbol none sign value 燑/font> $1234.56 symbol none sign value - $-1234.56 symbol space sign value - $ -1234.56 symbol space sign value ( ) $ (1234.56) sign symbol space value ( ) ($ 1234.56) sign value space symbol 0 (1234.56 $) symbol space value sign - $ 1234.56- sign value space symbol - -1234.56 $ sign value space symbol - -1234.56 $ sign value space symbol - -1234.56$

        <PRe>
    

    //get monetary output facet of the loc locale
    const std::money_put<charT,OutIt>& mp
    = std::use_facet<std::money_put<charT,OutIt> >(loc);

    // write value with monetary output facet
    mp.put(out, intl, frat, fill, value);

        </pre>
    
        <p>The argument out is an output iterator of type OutIt to which the formatted string is written. put() returns an object of this type pointing immediately after the last character produced. The argument intl indicates whether a local or an international currency symbol is to be used. fmt is used to determine formatting flags, such as the width to be used and the moneypunct facet defining the format of the value to be printed. Where a space character has to appear, the character fill is inserted.</p>
    
        <p>The argument value has type long double or type basic_string&lt;charT>. This is the value that is formatted. If the argument is a string, this string may consist only of decimal digits with an optional leading minus sign. If the first character of the string is a minus sign, the value is formatted as a negative value. After it is determined that the value is negative, the minus sign is discarded. The number of fractional digits in the string is determined from the member function frac_digits() of the moneypunct facet.</p>
    
        <p>The standard requires that the two instantiations money_put&lt;char> and money_put&lt;wchar_t> are stored in each locale. In addition, the C++ standard library supports all instantiations that take char or wchar_t as the first template argument and a corresponding output iterator as the second. All of these instantiations are not required to be stored in each locale object.</P>
    
    
    Monetary Parsing
        <P>The facet money_get is used for parsing of monetary values. It is a template class that takes a character type charT as the first template argument and an input iterator type InIt as the second. The second template argument defaults to istreambuf _iterator&lt;charT>. This class defines two member functions called get() that try to parse a character and, if the parse is successful, store the result in a value of type long double or of type basic_string&lt;charT>. You can use the facet as follows:</P>
    
        <PrE>
    

    //get monetary input facet of the loc locale
    const std::money_get<charT,InIt>& mg
    = std::use_facet<std::money_get<charT,InIt> >(loc);

    //read value with monetary input facet
    mg.get(ibeg, iend, intl, fmt, err, val);

        </PRE>
    
        <P>The character sequence to be parsed is defined by the sequence between ibeg and iend. The parsing stops as soon as either all elements of the used pattern are read or an error is encountered. If an error is encountered, the ios_base::failbit is set in err and nothing is stored in val. If parsing is successful, the result is stored in the value of types long double or basic_string that is passed by reference as argument val.</P>
    
        <p>The argument intl is a Boolean value that selects a local or an international currency string. The moneypunct facet defining the format of the value to be parsed is retrieved using the locale object imbued by the argument fmt. For parsing a monetary value, the pattern returned from the member neg_format() of the moneypunct facet is always used.</p>
    
        <P>At the position of none or space, the function that is parsing a monetary value consumes all available space, unless none is the last part in a pattern. Trailing spaces are not skipped. The get() functions return an iterator that points after the last character that was consumed.</p>
    
        <p>The standard requires that the two instantiations money_get&lt;char> and money_get&lt;wchar_t> be stored in each locale. In addition, the C++ standard library supports all instantiations that take char or wchar_t as the first template argument and a corresponding input iterator as the second. All of these instantiations are not required to be stored in each locale object.</p>
    

    14.4.4
    Character Classification and Conversion

    The C++ standard library defines two facets to deal with characters: ctype and codecvt. Both belong to the category locale:: ctype. The facet ctype is used mainly for character classification. such as testing whether a character is a letter. It also provides methods for conversion between lowercase and uppercase letters and for conversion between char and the character type for which the facet is instantiated. The facet codecvt is used to convert characters between different encodings and is used mainly by basic_filebuf to convert between external and internal representations.

    Character Classification
        <P>The facet ctype is a template class parameterized with a character type. Three kinds of functions are provided by the class ctype&lt;charT>:</p>
    
        
    
  • Functions to convert between char and charT

  • Functions for character classification

  • Functions for conversion between uppercase and lowercase letters

  •     <p>Table 14.16 lists the members defined for the facet ctype.</p>
    
        <p><table width="100%">
    

    Table?4.16. Services Defined by the ctype<charT> Facet

    Expression Effect ct.is(m,c) Tests whether the character c matches the mask m ct.is(beg ,end, vec) For each character in the range between beg and end, places a mask matched by the character in the corresponding location of vec ct.scan_is(m,beg,end) Returns a pointer to the first character in the range between beg and end that matches the mask m or end if there is no such character ct.scan_not (m , beg , end) Returns a pointer to the first character in the range between beg and end that does not match the mask m or end if all characters match the mask ct.toupper(c) Returns an uppercase letter corresponding to c if there is such a letter; otherwise c is returned ct.toupper(beg,end*) Converts each letter in the range between beg and end by replacing the letter with the result of toupper() ct.tolower(c) Returns a lowercase letter corresponding to c if there is such a letter; otherwise c is returned ct.tolower(beg,end*) Converts each letter in the range between beg and end by replacing the letter with the result of tolower() ct.widen(c) Returns the char converted to charT ct.widen(beg, end, dest) For each character in the range between beg and end, places the result of widen() at the corresponding location in dest ct.narrow (c , default) Returns the charT c converted to char, or the char default if there is no suitable character ct.narrow (beg, end, default, dest) For each character in the range between beg and end,places the result of narrow() at the corresponding location in dest

        <p>The function is(beg,end, vec) is used to store a set of masks in an array. For each of the characters in the range between beg and end, a mask with the attributes corresponding to the character is stored in the array pointed to by vec. This is useful to avoid virtual function calls for the classification of characters if there are lots of characters to be classified.</P>
    
        <P>The function widen() can be used to convert a character of type char from the native character set to the corresponding character in the character set used by a locale. Thus, it makes sense to widen a character even if the result is also of type char. For the opposite direction, the function narrow() can be used to convert a character from the character set used by the locale to a corresponding char in the native character set, provided there is such a char. For example, the following code converts the decimal digits from char to wchar_t:</P>
    
        <Pre>
    

    std::locale loc;
    char narrow[] = "0123456789";
    wchar_t wide [10];

    std::use_facet<std::ctype<wchar_t> >(loc).widen(narrow, narrow+10,
    wide);

        </pre>
    
        <p>Class ctype derives from the class ctype_base. This class is used only to define an enumeration called mask. This enumeration defines values that can be combined to form a bitmask used for testing character properties. The values defined in ctype_base are shown in Table 14.17. The functions for character classification all take a bitmask as an argument, which is formed by combinations of the values defined in ctype_base. To create bitmasks as needed, you can use the operators for bit manipulation (|, &amp;,^, and ~). A character matches this mask if it is any of the characters identified by the mask.</p>
    
        <p><table width="100%">
    

    Table?4.17. Character Mask Values Used by ctype

    Value Meaning ctype_base::alnum Tests for letters and digits (equivalent to alpha I digit) ctype_base:: alpha Tests for letters ctype_base::cntrl Tests for control characters ctype_base:: digit Tests for decimal digits ctype_base:: graph Tests for punctuation characters, letters, and digits (equivalent to alnum | punct) ctype_base :: lower Tests for lowercase letters ctype_base:: print Tests for printable characters ctype_base::punct Tests for punctuation characters ctype_base :: space Tests for space characters ctype_base:: upper Tests for uppercase letters ctype_base::xdigit Tests for hexadecimal digits

    Specialization of ctype&lt;> for Type char
        <P>For better performance of the character classification functions, the facet ctype is specialized for the character type char. This specialization does not delegate the functions dealing with character classification (is(), scan_is(), and scan_not()) to corresponding virtual functions. Instead, these functions are implemented inline using a table lookup. For this case additional members are provided (Table 14.18).</P>
    
        <P><table width="100%">
    

    Table?4.18. Additional Members of ctype<char>

    Expression Effect ctype<char>::table_size Returns the size of the table (>=256) ctype<char>:: classic_table() Returns the table for the "classic" C locale ctype<char> (table,del=false) Creates the facet with table table ct. table() Returns the actual table of facet ct

        <P>Manipulating the behavior of these functions for specific locales is done with a corresponding table of masks that is passed as a constructor argument:</P>
    
        <Pre>
    

    // create and initialize the table
    std::ctype_base::mask mytable[std::ctype<char>::table_size] = {
    ...
    };

    // use the table for the ctype<char>facet ct
    std::ctype<char> ct(mytable, false);

        </pre>
    
        <p>This code constructs a ctype&lt;char> facet that uses the table mytable to determine the character class of a character. More precisely, the character class of the character c is determined by</p>
    
        <PRe>
    

    mytable[static_cast<unsigned char>(c)]

        </PRE>
    
        <P>The static member table_size is a constant defined by the library implementation and gives the size of the lookup table. This size is at least 256 characters. The second optional argument to the constructor of ctype&lt;char> indicates whether the table should be deleted if the facet is destroyed. If it is true, the table passed to the constructor is released by using delete [] when the facet is no longer needed.</p>
    
        <p>The member function table() is a protected member function that returns the table that is passed as the first argument to the constructor. The static protected member function classic_table() returns the table that is used for character classification in the classic C locale.</P>
    
    
    Global Convenience Functions for Character Classification
        <p>Convenient use of the ctype facets is provided by predefined global functions. Table 14.19 lists all of the global functions.</p>
    
        <p><table width="100%">
    

    Table?4.19. Global Convenience Functions for Character Classification

    Function Effect isalnum(c, loc) Returns whether c is a letter or a digit (equivalent to isalpha()&&isdigit()) isalpha(c, loc) Returns whether c is a letter iscntrl(c, loc) Returns whether c is a control character isdigit(c, loc) Returns whether c is a digit isgraph(c, loc) Returns whether c is a printable, nonspace character (equivalent to isalnum()&&ispunct()) islower(c, loc) Returns whether c is a lowercase letter isprint (c, loc) Returns whether c is a printable character (including whitespaces) ispunct(c, loc) Returns whether c is a punctuation character (that is, it is printable, but it is not a space, digit, or letter) isspace(c, loc) Returns whether c is a space character isupper(c, loc) Returns whether c is an uppercase letter isxdigit(c, loc) Returns whether c is a hexadecimal digit tolower(c, loc) Converts c from an uppercase letter to a lowercase letter toupper(c, loc) Converts c from a lowercase letter to an uppercase letter

        <P>For example, the following expression determines whether the character c is a lowercase letter in the locale loc:</p>
    
        <pre>
    

    std::islower(c,loc)

        </pre>
    
        <p>It returns a corresponding value of type bool.</p>
    
        <p>The following expression returns the character c converted to an uppercase letter, if c is a lowercase letter in the locale loc:</P>
    
        <Pre>
    

    std::toupper(c,loc)

        </prE>
    
        <P>If c is not a lowercase letter, the first argument is returned unmodified.</P>
    
        <p>The expression</p>
    
        <pre>
    

    std::islower(c,loc)

        </pre>
    
        <p>is equivalent to the following expression:</p>
    
        <pre>
    

    std::use_facet<std::ctype<char> >(loc).is(std::ctype_base::lower,c)

        </pre>
    
        <p>This expression calls the member function is() of the facet ctype&lt;char>. is() determines whether the character c fulfills any of the character properties that are passed as the bitmask in the first argument. The values for the bitmask are defined in the class ctype_base. See page 502 and page 669 for examples of the use of these convenience functions.</P>
    
        <P>The global convenience functions for character classification correspond to C functions that have the same name but only the first argument. They are defined in &lt;cctype> and &lt;ctype.h>, and always use the current global C locale.[4]
    

    Their use is even more convenient:

    [4] This locale is only identical to the global C++ locale if the last call to locale:: global() was with a named locale and if there was no call to setlocale() since then. Otherwise, the locale used by the C functions is different from the global C++ locale.

        <pRE>
    

    if (std::isdigit(c))
    ...
    {

        </pRE>
    
        <P>However, by using them you can't use different locales in the same program. Also, you can't use a user-defined ctype facet using the C function. See page 497 for an example that demonstrates how to use these C functions to convert all characters of a string to uppercase letters.</P>
    
        <P>It is important to note that the C++ convenience functions should not be used in code sections where performance is crucial. It is much faster to obtain the corresponding facet from the locale and use the functions on this object directly. If a lot of characters are to be classified according to the same locale, this can be improved even more, at least for non-char characters. The function is(beg,end,vec) can be used to determine the masks for typical characters: This function determines for each character in the range [beg,end)amask that describes the properties of the character. The resulting mask is stored in vec at the position corresponding to the character's position. This vector can then be used for fast lookup of the characters.</p>
    
    
    Character Encoding Conversion
        <p>The facet codecvt is used to convert between internal and external character encoding. For example, it can be used to convert between Unicode and EUC (Extended UNIX Code), provided the implementation of the C++ standard library supports a corresponding facet.</P>
    
        <P>This facet is used by the class basic_filebuf to convert between the internal representation and the representation stored in a file. The class basic_filebuf &lt;charT,traits> (see page 627) uses the instantiation codecvt&lt;charT,char.typename traits::state_type> to do so. The facet used is taken from the locale stored with basic_filebuf. This is the major application of the codecvt facet. Only rarely is it necessary to use this facet directly.</p>
    
        <p>In Section 14.1, some basics of character encodings are introduced. To understanding codecvt, you need to know that there are two approaches for the encoding of characters: One is character encodings that use a fixed number of bytes for each character (wide-character representation), and the other is character encodings that use a varying number of bytes per character (multibyte representation).</P>
    
        <P>It is also necessary to know that multibyte representations use so-called shift states for space efficient representation of characters. The correct interpretation of a byte is possible only with the correct shift state at this position. This in turn can be determined only by walking through the whole sequence of multibyte characters (see Section 14.1, for more details).</P>
    
        <P>The codecvt&lt;> facet takes three template arguments:</p>
    
        
    
  • The character type internT used for an internal representation

  • The type externT used to represent an external representation

  • The type stateT used to represent an intermediate state during the conversion

  •     <P>The intermediate state may consist of incomplete wide characters or the current shift state. The C++ standard makes no restriction about what is stored in the objects representing the state.</P>
    
        <P>The internal representation always uses a representation with a fixed number of bytes per character. Mainly the two types char and wchar_t are intended to be used within a program. The external representation may be a representation that uses a fixed size or a multibyte representation. When a multibyte representation is used, the second template argument is the type used to represent the basic units of the multibyte encoding. Each multibyte character is stored in one or more objects of this type. Normally, the type char is used for this.</p>
    
        <p>The third argument is the type used to represent the current state of the conversion. It is necessary, for example, if one of the character encodings is a multibyte encoding. In this case, the processing of a multibyte character might be terminated because the source buffer is drained or the destination buffer is full while one character is being processed. If this happens, the current state of the conversion is stored in an object of this type.</p>
    
        <p>Similar to the other facets, the standard requires support for only a very few conversions. Only the following two instantiations are supported by the C++ standard library:</p>
    
        
    
  • codecvt<char,char,mbstate_t>, which converts the native character set to itself (this is actually a degenerated version of the codecvt facet)

  • codecvt<wchar_t,char,mbstate_t>, which converts between the native tiny character set(that is, char) and the native wide-character set (that is, wchar_t)

  •     <p>The C++ standard does not specify the exact semantics of the second conversion. The only natural thing to do, however, is to split each wchar_t into sizeof(wchar_t) objects of type char for the conversion from wchar_t to char, and to assemble a wchar_t from the same amount of chars when converting in the opposite direction. Note that this conversion is very different from the conversion between char and wchar_t done by the widen() and narrow() member functions of the ctype facet: While the codecvt functions use the bits of multiple chars to form one wchar_t (or vice versa), the ctype functions convert a character in one encoding to the corresponding character in another encoding (if there is such a character).</p>
    
        <p>Like the ctype facet, codecvt derives from a base class used to define an enumeration type. This class is named codecvt.base, and it defines an enumeration called result. The values of this enumeration are used to indicate the results of codecvt's members. The exact meanings of the values depend on the member function used. Table 14.20 lists the member functions of the codecvt facet.</p>
    
        <p>The function in() converts an external representation to an internal representation. The argument s is a reference to a stateT. At the beginning, this argument represents the shift state used when the conversion is started. At the end, the final shift state is stored there. The shift state passed in can differ from the initial state if the input buffer to be converted is not the first buffer being converted. The arguments fb (from begin) and fe (from end) are of type const internT*, and represent the beginning and the end of the input buffer. The arguments tb (to begin) and te (to end) are of type externT*, and represent the beginning and the end of the output buffer. The arguments</p>
    
        <P><table width="100%">
    

    Table?4.20. Members of the codecvt Facet

    Expression Meaning cvt.in(s,fb,fe,fn,tb,te,tn) Converts external representation to internal representation cvt. out (s , fb , fe , fn , tb , te , tn) Converts internal representation to external representation cvt.unshift(s,tb,te,tn) Writes escape sequence to switch to initial shift state cvt.encoding() Returns information about the external encoding cvt. always_noconv() Returns true if no conversion will ever be done cvt.length(s,fb,fe,max) Returns the number of externTs from the sequence between fb and fe to produce max internal characters cvt.max_length() Returns the maximum number of externTs necessary to produce one internT

        <p>fn (from next, of type const externT*&amp;) and tn (to next, of type internT*&amp;) are references used to return the end of the sequence converted in the input buffer and the output buffer respectively. Either buffer may reach the end before the other buffer reaches the end. The function returns a value of type codecvt_base:: result, as indicated in Table 14.21.</p>
    
        <p><table width="100%">
    

    Table?4.21. Return Values of the Conversion Functions

    Value Meaning ok All source characters were converted successfully partial Not all source characters were converted, or more characters are needed to produce a destination character error A source character was encountered that cannot be converted noconv No conversion was necessary

        <P>If ok is returned the function made some progress. If fn == fe holds, this means that the whole input buffer was processed and the sequence between tb and tn contains the result of the conversion. The characters in this sequence represent the characters from the input sequence, potentially with a finished character from a previous conversion. If the argument s passed to in() was not the initial state, a partial character from a previous conversion that was not completed could have been stored there.</P>
    
        <P>If partial is returned, either the output buffer was full before the input buffer could be drained or the input buffer was drained when a character was not yet complete (for example, because the last byte in the input sequence was part of an escape sequence switching between shift states). If fe == fn, the input buffer was drained. In this case, the sequence between tb and tn contains all characters that were converted completely but the input sequence terminated with a partially converted character. The necessary information to complete this character's conversion during a subsequent conversion is stored in the shift state s. If fe ! = fn, the input buffer was not completely drained. In this case, te == tn holds; thus, the output buffer is full. The next time the conversion is continued, it should start with fn.</P>
    
        <p>The return value noconv indicates a special situation. That is, no conversion was necessary to convert the external representation to the internal representation. In this case, fn is set to fb and tn is set to tb. Nothing is stored in the destination sequence because everything is already stored in the input sequence.</P>
    
        <P>If error is returned, that means a source character that could not be converted was encountered. There are several reasons why this can happen. For example, the destination character set has no representation for a corresponding character, or the input sequence ends up with an illegal shift state. The C++ standard does not define any method that can be used to determine the cause of the error more precisely.</p>
    
        <P>The function out() is equivalent to the function in(), except that it converts in the opposite direction. That is, it converts an internal representation to an external representation. The meanings of the arguments and the values returned are the same; only the types of the arguments are swapped. That is, tb and te now have the type const internT*, and fb and fe now have the type const externT*. The same applies to fn and tn.</P>
    
        <P>The function unshift() inserts characters necessary to complete a sequence when the current state of the conversion is passed as the argument s. This normally means that a shift state is switched to the initial switch state. Only the external representation is terminated. Thus, the arguments tb and tf are of type externT*, and tn is of type externT&amp;*. The sequence between tb and te defines the output buffer in which the characters are stored. The end of the result sequence is stored in tn. unshift() returns a value as shown in Table 14.22.</p>
    
        <p><table width="100%">
    

    Table?4.22. Return Values of the Function unshift()

    Value Meaning ok The sequence was completed successfully partial More characters need to be stored to complete the sequence error The state is invalid noconv No character was needed to complete the sequence

        <P>The function encoding() returns some information about the encoding of the external representation. If encoding() returns -1, the conversion is state dependent. If encoding() returns 0, the number of externTs needed to produce an internal character is not constant. Otherwise, the number of externTs need to produce an internT is returned. This information can be used to provide appropriate buffer sizes.</P>
    
        <P>The function always_noconv() returns true if the functions in() and out() never perform a conversion. For example, the standard implementation of codecvt&lt;char, char, mbstate_t> does no conversion, and thus, always_noconv() returns true for this facet. However, this only holds for the codecvt facet from the "C" locale. Other instances of this facet may actually do a conversion.</p>
    
        <p>The function length() returns the number of externTs from the sequence between fb and fe necessary to produce max characters of type internT. If there are fewer than max complete internT characters in the sequence between fb and fe, the number of externTs used to produce a maximum number of internTs from the sequence is returned.</P>
    

    14.4.5
    String Collation

    The facet collate handles differences between conventions for the sorting of strings. For example, in German the letter "? is treated as being equivalent to the letter "u" or to the letters "ue" for the purpose of sorting strings. For other languages, this letter is not even a letter, and it is treated as a special character, when it is treated at all. Other languages use slightly different sorting rules for certain character sequences. The collate facet can be used to provide a sorting of strings that is familiar to the user. Table 14.23 lists the member functions of this facet. In this table, col is an instantiation of collate, and the arguments passed to the functions are iterators that are used to define strings.

    <p><table width="100%">
    

    Table?4.23. Members of the collate<> Facet

    Expression Meaning col.compare (beg1 ,end1 ,beg2,end2) Returns 1 if the first string is greater than the second 0 if both strings are equal -1 if the first string is smaller than the second col.transform (beg ,end) Returns a string to be compared with other transformed strings col.hash (beg , end) Returns a hash value (of type long) for the string

    <p>The collate facet is a class template that takes a character type charT as its template argument. The strings passed to collate's members are specified using iterators of type const charT*. This is somewhat unfortunate because there is no guarantee that the iterators used by 
    

    Chapter 14. Internationalization

    Chapter 14. Internationalization

    As the global market has increased in importance, so has internationalization (or i18n for short)[1] become more important for software development. As a consequence, the C++ standard library provides concepts to write code for international programs. These concepts influence mainly the use of I/O and string processing. This chapter describes these concepts. Many thanks to Dietmar K黨l, who is an expert on I/O and internationalization in the C++ standard library and wrote major parts of this chapter.

    [1] i18n is a common abbreviation for internationalization. It stands for the letter i, followed by 18 characters, followed by the letter n.

    <p>The C++ standard library provides a general approach to support national conventions without being bound to specific conventions. This goes to the extent, for example, that strings are not bound to a specific character type to support 16-bit characters in Asia. For the internationalization of programs, two related aspects are important:</p>
    
  • Different character sets have different properties. Handling them requires flexible solutions for problems, such as what is considered to be a letter or, worse, what type to use to represent characters. For character sets with more than 256 characters, type char is not sufficient as a representation.

  • The user of a program expects to see national or cultural conventions obeyed (for example, the formatting of dates, monetary values, numbers, and Boolean values).

        </lI>
    
    
    <P>For both aspects, the C++ standard library provides related solutions.</p>
    
    <p>The major approach toward internationalization is to use locale objects to represent an extensible collection of aspects to be adapted to specific local conventions. Locales are already used in C to adapt to specific local conventions. In the C++ standard, this mechanism was generalized and made more flexible. Actually, the C++ locale mechanism can be used to address all kinds of customization, depending on the user's environment or preferences. For example, it can be extended to deal with measurement systems, time zones, or paper size.</P>
    
    <P>Most of the mechanisms of internationalization involve no or only minimal additional work for the programmer. For example, when doing I/O with the C++ stream mechanism, numeric values are formatted according to the rules of some locale. The only work for the programmer is to instruct the I/O stream classes to use the user's preferences.</P>
    
    <P>In addition to such automatic use, the programmer may use locale objects directly for formatting, collation, character classification, and so on. Some internationalized aspects supported by the C++ standard library are not used by the C++ standard library itself, and to use them the programmer has to call those functions manually. For example, there are no stream functions defined in the C++ standard library that do time, date, or monetary formatting. To use these services, it is necessary to call them directly (for example, in user-defined stream operators writing objects of a money class).</p>
    
    <p>Strings and streams use another concept for internationalization: character traits. They define fundamental properties and operations that differ for different character sets, such as the value of "end-of-file" as well as functions to compare, assign, and copy strings.</P>
    
    <P>The classes for internationalization were introduced to the standard relatively late. Although the general approach is extremely flexible, it still needs some work to make it really complete. For example, the functions for string collation (that is, comparing strings for sorting according to some locale conventions) use only iterators of type const charT*, where charT is some character type. Although it is very likely that basic_string&lt;charT> uses this type as an iterator type, it is not at all guaranteed. Thus, it is not guaranteed that string iterators can be used as arguments to the functions for string collation. However, it is possible to use the result of basic_string data() member functions with the string collation functions.</P>