10. Data_Structures Module

Data structures are important. Programs with less than optimal data structures experience problems reaching their full potential, and programs with poor data structures can fail completely. Parallel data structures are even more important, and even more complex. Ideally, this complexity should be hidden from the casual user.

The Data_Structures Module implements the fundamental parallel data structures in the CÆSAR Code Package, and addresses the following issues:

First, all of the low-level communication calls in CÆSAR are wrapped by the Communication Class. These calls currently use the PGSLib Package, but they could be rewritten to use UPS or even bare calls from the MPI Package. The Communication Class also includes a serial version. Another data structure auxiliary type, which contains the information associated with a gather-scatter operation, is the Trace Class.

Every data structure in the Data Structures Module can be thought of as a multi-dimensional array, with one dimension spread across the processors. The dimension that is spread across the processors is referred to as the distributed (or parallel) dimension (or axis). ^10.1 The remaining serial axes are contained on a given processor, and are treated like standard Fortran 95 arrays. In the rest of this discussion the serial axes will be ignored and arrays will be described as vectors, with the understanding that a ``vector'' may actually have several serial axes in addition to its single parallel or distributed axis. Note also that the PEs are assumed to contain contiguous pieces of the vector, with the first section being on PE=1, the second section being on PE=2, etc.

Information about the basic distribution of an axis across the PEs is contained in the Base_Structure Class. This information includes items such as the total length of the vector, the length of the vector on this PE, the starting and ending indices for this PE, and the locus (or name) for the axis. For example, a user might specify a Base Structure for the nodes in a mesh (locus = `Nodes'), another one for the cells, and another one for the faces. Or equivalently, Base Structures could be defined for equations or variables. The actual locus is not specified by the Data_Structures Module, so the user may easily define new ones as needed.

The simplest CÆSAR data structure, the Assembled_Vector Class, is not parallel at all, but exists only on a single processor (see Figure 9.1). These vectors can be thought of as parallel data structures that have been ``assembled'' on a single PE (the IO processor). The primary use for an Assembled Vector is input and output, although there may certainly be times when a given parallel data structure is too large to be assembled on the IO PE. An Assembled Vector contains a pointer to a Base Structure which determines its structure.

**Figure 9.1:** A Schematic Diagram of the Assembled Vector data structure. The data has been assembled on the IO PE.
$\includegraphics[scale=.9,angle=-90]{../../images/Caesar/AssembledVector-nt.ps}$

The basic parallel data structure is defined by the Distributed_Vector Class. A Distributed Vector is a true parallel data structure, with data spread across the processors according to its Base Structure, a different amount on each PE (see Figure 9.2). A Distributed Vector contains a pointer to a Base Structure which determines its structure. The conversions between Assembled Vectors and Distributed Vectors (based on the same Base Structure) can be accomplished with an equals sign (via operator overloading). The operation changing a Distributed Vector to an Assembled Vector is known as assemble; the opposite operation is called distribute.

**Figure 9.2:** A Schematic Diagram of the Distributed Vector data structure. The data has been distributed to all of the PEs, a different amount on each one.
$\includegraphics[scale=.9,angle=-90]{../../images/Caesar/DistributedVector-nt.ps}$

Assembled and Distributed Vectors, and the Base Structures that they are based upon, satisfy the input/output and parallel distribution needs of CÆSAR. There is also a need for the association of two different distributions. This is used to define, for example, the nodes that correspond to each cell in a mesh. The association between two data distributions can also be thought of as indirection, which is very useful for unstructured mesh operations (or parallel distributed mesh operations).

Association between two Base Structures in CÆSAR is accomplished via the Data_Index Class. This association is considered to be a many-of-one relationship - ``many'' entries in the first Base Structure correspond to ``one'' entry in the second Base Structure. For example, many nodes in a node Base Structure correspond to each cell in a cell Base Structure. A Data Index therefore contains a Many (Base) Structure, a One (Base) Structure, and an index array to relate the two. It also stores some information about the communication pattern necessary to gather the ``many'' entries for each ``one'' entry on the one axis.

Note that ``many'' need not be more than unity - for instance there could be one boundary face for each regular face. In this case, the index array would be one-dimensional. If ``many'' is a constant number (for instance, there are always six faces on each cell), then the index array is a two-dimensional array (in this example, six by the number of cells on each PE). If ``many'' is a variable number (for instance, a polyhedral mesh where the number of faces per cell may vary), then a ragged-right index array is used. Ragged-right index arrays are not yet implemented, but one- and two-dimensional arrays are.

The Overlapped_Vector Class defines a data structure that represents an intermediate step in performing a gather and collect operation. The data in an Overlapped Vector has been gathered from a Distributed Vector with a Many Structure according to a specified One Structure, but has not been collected into place. An Overlapped Vector contains a Many (Base) Structure, a One (Base) Structure, a Many-of-One (Data) Index, and a Distributed Vector with a Many Structure (see Figure 9.3). It also contains the entries from the Distributed Vector that would be needed for a gather and collect operation that are not local, and so some data is represented multiple times - but no more than once on a given PE. This overlapped data can represent the data from the boundary of a given processor that does not reside locally in the Distributed Vector. For example, an Overlapped Vector can contain the coordinates for all the nodes that correspond to cells on a given processor, even though some of the nodes on the boundary actually reside on other processors. Since it does not store the nodes twice on a given processor (even if more than one cell on that processor contains a specific node), the Overlapped Vector data structure uses a smaller amount of memory (compared to a Collected Array, which is described in the next subsection). Since it does not require interprocessor communication to generate the collected form of the data, an Overlapped Vector can save on run time (compared to gathering and collecting a Distributed Vector).

**Figure 9.3:** A Schematic Diagram of the Overlapped Vector data structure. The green blocks represent the Distributed Vector that the Overlapped Vector includes. The blue blocks represent off-processor data necessary for a gather operation that has been stored locally - the overlapped part of the vector.
$\includegraphics[scale=.9,angle=-90]{../../images/Caesar/OverlappedVector-nt.ps}$

The fully-evaluated ``many-of-one'' relationship between two distributions is described by the Collected_Array Class. In contrast to the other CÆSAR data structures, the Collected Array is an array rather than a vector (see Figure 9.4). It has two axes to represent the single parallel axis that has been discussed so far (in addition to the many possible serial axes whose discussion is being suppressed here). The vertical axis in Figure 9.4 is distributed across the processors according to the One Structure. The horizontal axis of the Collected Array has been formed by collecting all of the ``many'' items that correspond to each ``one'' (note that a Collected Array for a two-dimensional Many-of-One Index is shown).

A Collected Array is the final result of a gather and collect operation on a Distributed Vector according to a Many-of-One Index, or the result of simply collecting an Overlapped Vector (with no interprocessor communication). The data in a Collected Array is in a very useable form, so that access requires little run-time (as compared to collecting an Overlapped Vector or gathering and collecting a Distributed Vector). The data in a Collected Array takes up much more memory than an Overlapped Vector, since ``many'' items that belong to more than a single ``one'' item (for example, nodes belonging to more than one cell) are stored multiply, even if they exist on the same processor. Sometimes this behavior is not only useful from a run-time perspective, but is required - for instance, a physical property evaluated on each face of a cell may require a different value for the same face in different cells due to different materials in the cells. The ``many'' axis in a Collected Array can be combined using a specified combination operator to form a Distributed Vector with a One Structure.

**Figure 9.4:** A Schematic Diagram of the Collected Array data structure. One way to form a Collected Array is to collect all of the ``many'' entries that correspond to each entry on the ``one'' axis. The green blocks represent data that was already on-processor when the data was distributed on the many axis in a Distributed Vector. The blue blocks represent off-processor data that was gathered and collected onto the one axis, which is the same as the blue overlapped blocks in an Overlapped Vector.
$\includegraphics[scale=.9,angle=-90]{../../images/Caesar/CollectedArray-nt.ps}$

In contrast to the communication, the calculations using data from the data structures are done using standard Fortran 95 arrays, pointered to allow dynamic memory allocation. This allows the compiler to make any optimizations possible with the Fortran 95 intrinsic array syntax. For the purpose of this discussion, to contrast these standard Fortran arrays with the other data structures described here, they are referred to as Bare Naked Vectors and Bare Naked Arrays, meaning that there is no encumbering derived type associated with them.

The CÆSAR data structures are very useful in coordinating the necessary communication for a parallel program. They provide a great deal of control over data layout, memory usage and cpu time, and enable CÆSAR to optimize in one direction or another depending on available resources (see Table 9.1). All of the complicated operations of converting from one data structure to another are hidden, and in fact can in most cases be accomplished with a simple equals sign, via operator overloading.

**Table 9.1:** Relative Data Structure Memory and CPU Requirements. Assuming that information is stored in a Distributed Vector and needs to be accessed in a gathered and collected Bare Naked Array form, this table shows the memory / cpu time trade-offs for various CÆSAR data structures.
Data Structure	Memory Usage	CPU Usage
Distributed Vector	Lowest	High, with communication
Overlapped Vector	Low (same as DV + off-PE entries)	Medium, no communication
Collected Array	High	Low, no communication

As an example, take the problem of reading in node coordinates, calculating cell center values, and writing them out. The steps to be taken are diagrammed in Figures 9.5 and 9.6. First, set-up information for the nodes is read in, in the form of the total number of nodes and the number on each PE, and a Base Structure for the nodes (referred to as the Node Structure) is initialized. A similar set up is done for the cells. Next, an index array that tells which nodes are associated with which cells is read, and a Data Index called the Nodes of Cells Index is initialized using the index array and the Node and Cell Structures.

Now the necessary data structures can be easily initialized from the Node Structure, the Cell Structure, and the Nodes of Cells Index. These are, using acronyms for the data structures (i.e. AV: Assembled Vector, DV: Distributed Vector, OV: Overlapped Vector, CA: Collected Array, BNV: Bare Naked Vector, BNA: Bare Naked Array):

Coordinates_Nodes_BNV (IO PE only),

Coordinates_Nodes_AV,

Coordinates_Nodes_DV,

Coordinates_Nodes_of_Cells_OV,

Coordinates_Nodes_of_Cells_CA,

Centers_Cells_DV,

Centers_Cells_AV, and

Centers_Cells_BNV (IO PE only).

The node coordinates are now read into a Bare Naked Vector called Coordinates_Nodes_BNV which is defined on the IO PE only. To store the coordinates in the Assembled Vector, a simple equals sign is used:

If the DEBUG_LEVEL is set high enough, copious error checking will be done to make sure that the left and right sides of the equals signs above are compatible.

Note that the solution given above is not unique. For example, to conserve memory, the Collected Array need not be formed, and a direct step around it may be taken:

**Figure 9.5:** A flow chart showing the hierarchical relationships between CÆSAR Data Structures. Operations in red require global communication.
$\includegraphics[scale=.95,angle=-90]{../../images/Caesar/DataStructures-simple-nt.ps}$

**Figure 9.6:** A flow chart showing operations that have been implemented for CÆSAR Data Structures. Operations in red require global communication.
$\includegraphics[scale=.6,angle=-90]{../../images/Caesar/DataStructures-implemented-nt.ps}$