More on SSA

Posted on by Heidy

Well, this is quite a delayed blog post. The first month of school was extremely hectic, more so than I thought it would be. Who knew that juggling five 4000-level CS courses, a TA position, and research would be this tough? Anyhow, for my Honors Thesis, I often tend to write summaries for the papers that I read. Here are two summaries with some commentary pertaining to my research project (which isn’t meant to be understood, as it was initially commentary to my professor).

Practical Improvements to the Construction and Destruction of Static Single Assignment Form

In Single Static Assignment, a merge point is necessary in order to correspond multiple SSA names to the original name. A new definition is given to the original name at a merge point, which is provided by a function whose right hand side is a pseudo-function called a {\phi}-function. “The arguments of the {\phi}-function are the names flowing into the convergence, and the {\phi}-function defines a single, new name. Subsequent uses of the original name will be replaced with the new name defined by the {\phi}-function”.

As of the time this paper was written (1998), there were two main forms of SSA, “minimal” and “pruned”. The “minimal” form of SSA inserts a {\phi}-function at every point where two SSA names are joined together in order to produce the proper definition for the original name. This means that a {\phi}-function is inserted even if the two values are never to be used again after the merge, “in data-flow analysis terminology, two values that are not live.”

Choi et al. proposed the “pruned” form of SSA, which “uses global data-flow analysis to decide where values are live”. Thus, {\phi}-functions are only inserted at merge points where the analysis indicates that the values are live. This reduces the number of {\phi}-functions inserted, but clearly increases the complexity due to the live analysis. However, there does exist various algorithms that efficiently compute live information, but the overhead is never eliminated.

Evidently, both “minimal” and “pruned” SSA have different time and space complexity. The minimal SSA form faults in its larger SSA form, yet avoids computing live information, while the pruned SSA form faults in computing live information, yet produces a more minute SSA form.

Briggs et. al propose a semi-pruned SSA form that is somewhat of a mix between “minimal” and “pruned” SSA. They believe that “different applications of SSA call for different flavors; at different times, our compiler builds both minimal and pruned SSA, along with a third flavor that we call semi-pruned. ” The semi-pruned SSA has fewer nodes than the minimal form “without the expense of solving data-flow equations to determine which values are live. The second speeds up the renaming process through an algorithmic improvement. The third and final improvement is an algorithmic fix to a problem that arises when translating SSA form back into executable code.”

I will not go into the detail of their implementation, because while all this is interesting and efficient for many languages, I find that minimal SSA will be much more efficient in our case. Due to the fact that our language is functional (although not purely functional), we do not have to reserve any state but that of the variables’, which are limited to a rather minimal scope. Not only that, we will be explicitly specifying the use of SSA in only some instances of our programming language, thus eliminating some of the hindrances that are associated with “minimal” SSA (the over insertion of {\phi}-functions) . Please refer to Page 8 for the algorithms construction of minimal SSA. I can only hope that this will be rather useful for our implementation.

Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs

Lee et. al propose a Concurrent Static Single Assignment (CSSA) form which converts explicitly parallel programs into what they call a Concurrent Control Flow Graph (CCFG). They claim that their parallel static single assignment form varies from the original proposition made by Srinivasan et. al, as the latter form is restricted to Parallel Fortran with copy-in/copy-out semantics. The CSSA is designed for languages with “interleaving semantics and restricted form of post/wait synchronization. ” The CCFG is the intermediate representation of an explicitly parallel program, similar to many other parallel graphs representations. The CCFG is defined on page 4 of this paper. Crucial to the CCFG is the representation of conflicting edges which is essential to their CSSA form (discussed later in this summary):

 Definition 1 Two memory references in different threads  conflict if they reference the same memory location and at least one is a write. Two statements conflict  if they contain mutually conflicting references.

Lee et. al extend the usage of the {\phi}-functions by inserting them at join points where the threads are merged (Coend node in CCFG).

Definition 5  A {\phi}-function has the form {\phi} ({V_1} , {V_2} , …,{V_n}), where n is the number of incoming control flow edges of the node where it is places. The value of {\phi} ({V_1} , {V_2} , …,{V_n}) is one of the {V_i's} and the selection depends on the control flow. At Coend node, the selection depends on the interleavings of statements in different threads in execution.

Lee et. al then introduce the {\pi}-function, which are placed after the {\phi}-functions are inserted

 Definition 6 A {\pi}-function has the form {\pi} ({V_1} , {V_2} , …,{V_n}), where n is the number of incoming control flow edges plus the number of incoming distinct conflict edges labeled DU. The value of {\pi} ({V_1} , {V_2} , …,{V_n}) is one of the {V_i's} and determined nondeterministically because of the concurrency between threads.

Thus, {\pi}-functions are used in order to resolve conflicting edges, which as I define above, occur when two threads share a memory reference with at least one write. The paper then goes on to explain that before the insertion of both the {\phi}-functions and {\pi}-functions, a partial order of the statements is necessary. I believe that this aspect does not necessarily apply to what we’re attempting to do with parallelization, so I am disregarding it as of now (feel free to correct me if it is in fact of any use to us).

{\phi}-functions are assigned through depth first traversal, since after all, CCFG is a graph representation. Perhaps this is a feasible way for us to insert {\phi}-functions, but I find that the minimal SSA form is still the best way to handle {\phi}-function injections. Please refer to Theorem 1 on page 9 if you are interested in this implementation. I find that the placement algorithm for  {\pi}-functions may give us much more insight on an explicitly parallel implementation. {\phi}-functions clearly cannot be placed properly when two or more threads access or manipulate a variable, thus a separate function, in this case {\pi}-function, must properly solve the dilemma of conflicting edges.

 Definition 7 A collection of simple paths {p_i, 2 \le i \le n}, in a given concurrent control flow graph is said to converge by interleaving at a nod y if they satisfy the following conditions

  1. They have an end node y in common.
  2. They do not have any node in common except y.
  3. One and only one of these paths consists only of control flow edges.
  4. Every path{p_i, i \ne m}, consists of only one conflict edge and the type of edge is DU.

Definition 8 Given a collection of paths {p_i}converging by interleaving at a node y, y is called the concurrent join node of the start nodes of {p_i}‘s for a variable V, if the start nodes of all {p_i}‘s contain an assignment to V, and no other node in {p_i}contains an assignment to V, and the collection is the largest one possible for the variable V.

As you can see, Lee et.  al heavily utilize CCFG to place both {\phi}-functions and {\pi}-functions, but a “minimal” SSA form might be much more efficient in our case.


One comment on “More on SSA

Leave a Reply

Your email address will not be published. Required fields are marked *