//
//  Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
//
//  Distributed under the Boost Software License, Version 1.0. (See
//  accompanying file LICENSE_1_0.txt or copy at
//  http://www.boost.org/LICENSE_1_0.txt)
//


/*!

\mainpage Boost.Nowide

Table of Contents:

- \ref main 
- \ref main_rationale 
    - \ref main_the_problem 
    - \ref main_the_solution 
    - \ref main_wide 
    - \ref main_reading 
- \ref using
    - \ref using_standard
    - \ref using_custom
- \ref technical 
    - \ref technical_imple 
    - \ref technical_cio 
- \ref qna

\section main What is Boost.Nowide

Boost.Nowide is a library implemented by Artyom Beilis
that make cross platform Unicode aware programming
easier.

The library provides an implementation of standard C and C++ library
functions, such that their inputs are UTF-8 aware on Windows without
requiring to use Wide API.



\section main_rationale Rationale
\subsection main_the_problem The Problem

Consider a simple application that splits a big file into chunks, such that 
they can be sent by e-mail. It requires doing few very simple taks:

- Access command line arguments: <code>int main(int argc,char **argv)</code>
- Open a input file, open several output files: <code>std::fstream::open(char const *,std::ios::openmode m)</code>
- Remove the files in case of fault: <code>std::remove(char const *file)</code>
- Print a progress report into console: <code>std::cout << file_name </code>

Unfortunately it is impossible to implement this simple task in a plain C++
if the file names contain non-ASCII characters

The simple program that uses the API would work on the systems that use UTF-8
internally -- the vast majority of Unix-Line operating systems: Linux, Mac OS X, 
Solaris, BSD. But it would fail on files like <code>War and Peace - Война и мир - מלחמה ושלום.zip</code>
under Microsoft Windows because the native Windows Unicode aware API is Wide-API - UTF-16.

This, such a trivial task is very hard to implement in a cross platform manner.

\subsection main_the_solution The Solution

Boost.Nowide provides a set of standard library functions that are UTF-8 aware and 
makes Unicode aware programming easier.

The library provides:

- Easy to use functions for converting UTF-8 to/from UTF-16
- A class to fixing \c argc, \c argc and \c env \c main parameters to use UTF-8
- UTF-8 aware functions
    - \c stdio.h functions:
        - \c fopen
        - \c freopen
        - \c remove
        - \c rename
    - \c stdlib.h functions
        - \c system
        - \c getenv
        - \c setenv
        - \c unsetenv
        - \c putenv
    - \c fstream
        - \c filebuf
        - \c fstream/ofstream/ifstream
    - \c iostream
        - \c cout
        - \c cerr
        - \c clog
        - \c cin 


\subsection main_wide Why Not Narrow and Wide? 

Why not to provide both Wide and Narrow implementations so the
developer can choose to use Wide characters on Unix-Like platforms

Several reasons:

- \c wchar_t is not really portable, it can be 2 bytes, 4 bytes or even 1 byte making Unicode aware programming harder
- Standard C and C++ library uses narrow strings for OS interactions. This library follows this general rule. There is
  no such thing as <code>fopen(wchar_t const *,wchar_t const *)</code> in the standard library, so it is better
  to stick to the standards rather than re-implement Wide API in "Microsoft Windows Style"


\subsection main_reading Further Reading

- <a href="http://www.utf8everywhere.org/">www.utf8everywhere.org</a>
- <a href="http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/">Windows console i/o approaches</a>

\section using Using The Library
\subsection using_standard Standard Features

The library is mostly header only library, only console I/O requires separate compilation under Windows.

As a developer you are expected to to \c boost::nowide functions instead of the function avalible in the
\c std namespace.

For example, Unicode unaware implementation of line counter:
\code
#include <fstream>
#include <iostream>

int main(int argc,char **argv)
{
    if(argc!=2) {
        std::cerr << "Usage: file_name" << std::endl;
        return 1;
    }

    std::ifstream f(argv[1]);
    if(!f) {
        std::cerr << "Can't open a file " << argv[1] << std::endl;
        return 1;
    }
    int total_lines = 0;
    while(f) {
        if(f.get() == '\n')
            total_lines++;
    }
    f.close();
    std::cout << "File " << argv[1] << " has " << total_lines << " lines" << std::endl;
    return 0;
}
\endcode

To make this program handle Unicode properly we do the following changes:


\code
#include <boost/nowide/args.hpp>
#include <boost/nowide/fstream.hpp>
#include <boost/nowide/iostream.hpp>

int main(int argc,char **argv)
{
    boost::nowide::args a(argc,argv); // Fix arguments - make them UTF-8
    if(argc!=2) {
        boost::nowide::cerr << "Usage: file_name" << std::endl; // Unicode aware console
        return 1;
    }

    boost::nowide::ifstream f(argv[1]); // argv[1] - is UTF-8
    if(!f) {
        // the console can display UTF-8
        boost::nowide::cerr << "Can't open a file " << argv[1] << std::endl;
        return 1;
    }
    int total_lines = 0;
    while(f) {
        if(f.get() == '\n')
            total_lines++;
    }
    f.close();
    // the console can display UTF-8
    boost::nowide::cout << "File " << argv[1] << " has " << total_lines << " lines" << std::endl;
    return 0;
}
\endcode

This is very simple and straight forward approach helps writing Unicode aware programs.
    
\subsection using_custom Custom API

Of course this simple set of functions does not cover all needs. However if you need
to access Wide API from Windows application using UTF-8 encoding internally you can use
functions like \c boost::nowide::widen and \c boost::nowide::narrow.

For example
\code
CopyFileW(  boost::nowide::widen(existing_file).c_str(),
            boost::nowide::widen(new_file).c_str(),
            TRUE);
\endcode

So the conversion is done at the last stage and you continue using
UTF-8 strings anywhere and only at glue points you switch to 
Wide API.

\c boost::nowide::widen returns \c std::string. Sometimes
it is convenient to prevent allocation and use on stack buffers
if possible. Boot.Nowide provides \c boost::nowide::basic_stackstring class.

Such that the example above can be rewritten as:

\code
boost::nowide::basic_stackstring<wchar_t,char,64> wexisting_file,wnew_file;
if(!wexisting_file.convert(existing_file) || !wnew_file.convert(new_file)) {
    // invalid UTF-8
    return -1;
}

CopyFileW(wexisting_file.c_str(),wnew_file.c_str(),TRUE);
\endcode

\note There are convenience typedefs \c stackstring, \c wstackstring, \c short_stackstring and \c wshort_stackstring
that use buffers of size 256 or 16 characters, and if the string is longer, they fall-back to memory
allocation

\subsection using_windows_h windows.h header

The library does not include the \c windows.h in order to prevent namespace pollution with numerous
defines and types. The library rather defines the prototypes to the Win32 API functions.

However if you may request to use original \c windows.h header by setting \c BOOST_NOWIDE_USE_WINDOWS_H 
define before including any of the Boost.Nowide headers

\section technical Technical Details
\subsection technical_imple Windows vs POSIX

The library provide UTF-8 aware functions for Microsoft Windows in  \c boost::nowide namespace that usually lay in \c std:: namespace,
for example \c std::fopen goes to \c boost::nowide::fopen. 

Under POSIX platforms the boost::nowide::fopen and all other functions are aliases to standard library functions:

\code
namespace boost {
namespace nowide {
#ifdef BOOST_WINDOWS
inline FILE *fopen(char const *name,char const *mode)
{
    ...
}
#else
using std::fopen
#endif
} // nowide
} // boost
\endcode

\subsection technical_cio Console I/O

Console I/O implemented as wrapper over ReadConsoleW/WriteConsoleW unless
the stream is not "atty" like a pipe than ReadFile/WriteFile is used.

This approach eliminates a need of manual code page handling. If TrueType
fonts are used the Unicode aware input and output would work.

\section qna Q & A

<b>Q: Why the library does not convert the string from Locale's encoding not UTF-8 and wise versa on POSIX systems</b>

A: It is inherently incorrect
to convert strings to/from locale encodings on POSIX platforms.

You can create a file named "\xFF\xFF.txt" (invalid UTF-8), remove it, pass its name as a parameter to program 
and it would work whether the current locale is UTF-8 locale or not.
Also changing the locale from let's say \c en_US.UTF-8 to \c en_US.ISO-8859-1 would not magically change all
files in OS or the strings a user may pass to the program (which is different on Windows)

POSIX OSs treat strings as \c NUL terminated cookies.

So altering their content according to the locale would
actually lead to incorrect behavior.
 
For example, this is a naive implementation of a standard program "rm"

\code
#include <cstdio>

int main(int argc,char **argv)
{
   for(int i=1;i<argc;i++)
     std::remove(argv[i]);
   return 0;
}
\endcode

It would work with ANY locale and changing the strings would
lead to incorrect behavior.

The meaning of a locale under POSIX and Windows paltforms
is different and has very different effects.



*/

// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen

