/******************************************************************************
 *
 * 
 *
 * Copyright (C) 2009 
 *
 * Permission to use, copy, modify, and distribute this software and its
 * documentation under the terms of the GNU General Public License is hereby 
 * granted. No representations are made about the suitability of this software 
 * for any purpose. It is provided "as is" without express or implied warranty.
 * See the GNU General Public License for more details.
 *
 * Documents produced by Doxygen are derivative works derived from the
 * input used in their production; they are not affected by this license.
 *
 */ /*! \page decomp Describing decompositions


One of the biggest challenges to working with PIO is setting up 
the call to \ref PIO_initdecomp.  The user must properly describe
how the data within each MPI tasks memory should be placed or retrieved from 
disk. PIO provides two methods to rearrange data from compute tasks to IO tasks.  
The first method, called box rearrangement is the only one provided in PIO1.   
The second called subset rearrangement is introduced in PIO2.  

\section BOXREARR Box rearrangement

In this method data is rearranged from compute to IO tasks such that
the arrangement of data on the IO tasks optimizes the call from the IO
tasks to the underlying (NetCDF) IO library.  In this case each
compute task will transfer data to one or more IO tasks.





\section SUBSETREARR Subset rearrangement

In this method each IO task is associated with a unique subset of
compute tasks so that each compute task will transfer data to one and
only one IO task.  Since this technique does not guarantee that data
on the IO node represents a contiguous block of data on the file it
may require multiple calls to the underlying (NetCDF) IO library.

As an example suppose we have a global two dimensional grid of size 4x5 decomposed over 5 tasks. We represent the two dimensional grid in terms of offset from the initial element ie
<pre>
     0  1  2  3 
     4  5  6  7 
     8  9 10 11
    12 13 14 15
    16 17 18 19 
</pre>
Now suppose this data is distributed over the compute tasks as follows:
<pre>
0: {   0  4 8 12  } 
1: {  16 1 5 9  } 
2: {  13 17 2 6  } 
3: {  10 14 18 3  } 
4: {   7 11 15 19  } 
</pre>

If we have 2 io tasks the Box rearranger would give:
<pre>
0: { 0  1  2  3  4  5  6  7  8  9  }
1: { 10 11 12 13 14 15 16 17 18 19 }
</pre>
While the subset rearranger would give:
<pre>
0: { 0  1  4  5  8  9  12 16 }
1: { 2  3  6  7  10 11 13 14 17 18 19 }
</pre>

Note that while the box rearranger gives a data layout which is well
balanced and well suited for the underlying io library, it had to
communicate with every compute task to do so.  On the other hand the
subset rearranger communicated with only a portion of the compute tasks
but requires more work on the part of the underlying io library to complete
the operation.

Also note if every task is an IO task then the box rearranger will need
to do an alltoall communication, while the subset rearranger does none.
In fact using the subset rearranger with every compute task an IO task
provides a measure of what you might expect the performance of the underlying
IO library to be if it were used without PIO.


*/

