require all array declarations that work in the static model to work in the dynamic model #99

GoogleCodeExporter · 2015-07-02T03:42:32Z

This discussion is a continuation of 
http://code.google.com/p/upc-specification/issues/detail?id=30 that focuses on 
the positive solution to the discrepancy between the static and dynamic threads 
model w.r.t. array declarations.

Dan has provided numerous examples of what works only the static model, so I'll 
just name the following as my motivating example:

shared A[THREADS][THREADS];

This should work in both the static and dynamic models.  Paul Hargrove has 
indicated that BUPC can do this an no implementer has given any evidence that 
other UPC compilers can't do this as well.

Original issue reported on code.google.com by [email protected] on 17 Oct 2012 at 2:25

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-07-02T03:42:32Z

"no implementer has given any evidence that other UPC compilers can't do this 
as well"

Copying my response from issue 30:

I believe there are two basic issues with relaxing the restrictions on shared 
types under dynamic threads.

1. The first issue deals with implementing the allocation of static shared 
data. UPC was designed to allow an implementation where the compiler can, at 
compile time, determine and allocate the shared space required on each thread 
for all statically-allocated objects. The dynamic threads restriction 
(described formally in issue 94, comment 11) ensures this calculation is always 
possible for shared arrays, so that statically-allocated shared arrays can be 
placed in the .bss linker section, just like every other statically-allocated 
object in C99.

That being said, nothing REQUIRES this implementation approach, and as Paul and 
others pointed out several compilers allocate the "static" shared data 
dynamically at startup anyhow - so those compilers could easily relax the 
allocation-related part of the restriction. I don't know how many of the major 
compilers fall into this category and how many rely upon the guarantee 
mentioned above, but changing an existing implementation from one strategy to 
the other probably represents a significant undertaking. That alone probably 
delays this feature enhancement to 2.0 or later.

2. The second, stickier issue is type-checking. Allowing the THREADS expression 
to appear in arbitrary places in array declarations under dynamic threads would 
mean allowing all those expressions to have a value which is not a compile-time 
constant. Various parts of type-checking for arrays in *C89* relied upon 
compile-time constant array dimensions, and at the time UPC was first specified 
many C implementations still shared that restriction. C99 relaxes this 
restriction somewhat with the variable-length array feature, whose 
implementation has now become more widespread and might be used to alleviate 
some of this issue. However even in C99 it is not legal to declare a 
statically-allocated array with a variable length (because of point 1 above), 
so it would need to be extended somewhat to handle UPC shared arrays which are 
always declared with a static lifetime (as opposed to stack variables, which 
are the target of C99's VLA). One would also have to decide "how variable" we 
allow the dimension expressions to become - ie just otherwise-constant 
expressions that include THREADS? What about expressions like 
ceil(sqrt(THREADS))? What about arbitrary user-provided function calls?

In any case, the problem also affects blocksizes, eg consider this declaration:

shared [THREADS] long a[1024];
shared [16] long *p = &a;

Is the second line valid? The type compatibility depends on whether 
THREADS==16, which is not known at compile time for dynamic threads (and 
therefore cannot be typechecked statically, which is fundamental to the C 
philosophy). In the past we've occasionally tossed around the possibility of 
adding language support for variable blocksizes and it's never really taken 
off, however we would probably need a facility like that to support dynamic 
THREADS in a blocksize expression (without imposing a bunch of artificial 
limitations).

A minor (perhaps unimportant?) side effect of adding such a feature is it would 
make it easy to write programs that failed with memory exhaustion BEFORE 
REACHING main(). For example a declaration like this:

shared int x[THREADS*THREADS*THREADS*THREADS*THREADS];

would work fine when run with small thread counts, but at larger thread counts 
would quickly lead to a spawn-time error that cannot be diagnosed at compile or 
link time. This should not be surprising (when written this obviously), but it 
is novel - in the current UPC/C99 language the linker can reject erroneous 
attempts to create ludicrously-sized static data. Users are accustomed to the 
possibility that upc_alloc() or malloc() might fail due to memory exhaustion, 
but this failure would happen at startup before reaching any user code (which 
might make it more difficult to diagnose, depending on implementation support).

Original comment by danbonachea on 17 Oct 2012 at 2:26

GoogleCodeExporter · 2015-07-02T03:42:32Z

I oppose this relaxation.  The Cray compiler is an example of a UPC compiler 
that allocates static UPC data statically in the data segment.  Both this 
static allocation and the representation of array extents would be hindered by 
the need to support things like shared int x[THREADS*THREADS] or shared int 
x[THREADS][THREADS].  Detection of local references becomes more complicated 
when the compiler cannot rely on there being a single THREADS dimension.  
Finally, compilers that allow all of their usual C optimizations to apply to 
(loops making) UPC array references may need to modify the "non-UPC" parts of 
their compiler to understand this new kind of array extent.

The mention of C99 VLAs is an interesting comparison, but I think that we ought 
not to use it as a model for any UPC feature.  VLAs are very controversial and 
are one of the C features that never made it into C++.  In a compiler that 
supports C, UPC, and C++, the C99 VLAs are already a bit of a one-off 
oddity/nuisance.

Original comment by [email protected] on 17 Oct 2012 at 3:22

GoogleCodeExporter · 2015-07-02T03:42:32Z

Regarding comment 1: I think having the dimensions being a simple polynomial of 
THREADS is sufficient.  I don't know how to formalize this, but how about this?

dim = \sum_i=1:n c*THREADS^n for any finite n less than some implementation 
defined and specified value that is greater than or equal to 12 but hopefully 
much larger; c is an integer constant.

Regarding comment 2: So this sounds like the static model will provide better 
performance, which I think it should.  Clearly, more information enables 
greater optimization.  It sounds like the argument against this is because it 
requires work to implement.  I would imagine any new feature has this property. 
 Paul suggested this feature target UPC 2.0, which seems like it might have 
other additions that would require additional implementation effort.  Should 
the language not be allowed to progress because it requires work to implement?  
I imagine Cray's C compiler is going to need a lot of work to support C11 as 
well.

I understand this this feature may require more work for implementers.  
However, I think it has tremendous value for the user community outside of the 
power users who build their application once for a particular piece of hardware 
and rarely modify it again.  

I believe that this feature will make it significantly easier to attract new 
users, particularly students who read the UPC book or look at the tutorials.  
Every example of multidimensional arrays I've found in those resources does not 
work with the dynamic model.  This is incredibly discouraging to new users and 
makes UPC feel like a domain-specific language.

Furthermore, it is effectively impossible to write a general purpose science 
code that uses multidimensional arrays because, as noted by others, this 
requires all sorts of error-prone pointer arithmetic that e.g. quantum chemists 
are not highly skilled at.  At the very least, it has to wait until someone 
like me writes a full-service replacement to GA in UPC to do all of that for 
them, but as anyone can see from the general behavior of the HPC community, the 
existence of a good library for X does not reduce the probability that a domain 
scientist will try to re-implement X anyways.  Forcing users to adopt libraries 
because the language lacks features seems like an unfriendly model.

Original comment by [email protected] on 17 Oct 2012 at 3:56

GoogleCodeExporter · 2015-07-02T03:42:32Z

My argument is not "this feature would require work, therefore it is bad."  The 
argument is, as Dan has also pointed out, that static allocation of an array 
with more than one THREADS dimension cannot be implemented, no matter how much 
work is put into it.  Adding this feature effectively prohibits a UPC compiler 
from locating such an array in the static data segment.  Therefore, the 
compiler would need to use dynamic allocation (an under-the-covers 
upc_all_alloc), which returns a pointer, but ideally still optimize references 
through that pointer as well as if they were referencing through an array in 
the static data segment.  This is the part that _could_ be done, but I'm not 
convinced that we _should_.

There is another issue open for providing a library routine to help with UPC 
pointer arithmetic.  Would that be sufficient to at least lower the difficulty 
of using UPC multidimensional arrays to the level of using C multidimensional 
arrays?  I guess I'm having trouble seeing how we could add better support for 
multidimensional arrays to UPC when it doesn't already exist in C and that's 
the base language.  If we were successful, you'd have C programmers wanting to 
declare their multidimensional arrays as shared simply to take advantage of 
some nicer way of dealing with them.

Original comment by [email protected] on 17 Oct 2012 at 4:41

GoogleCodeExporter · 2015-07-02T03:42:32Z

"All array declarations that work in the static model" includes these:

int foo[THREADS+3]; // Not shared
struct {
  char names[THREADS][20]; // Structure field
  int flags;
} tdata[THREADS][THREADS]; // Also not shared

Original comment by brian.wibecan on 17 Oct 2012 at 7:10

GoogleCodeExporter · 2015-07-02T03:42:33Z

Thanks, Brian.  I was way too general in my issue proposal.  I will make a much 
more restrictive proposal shortly once I capture more of the semantics in my 
head.

Original comment by [email protected] on 17 Oct 2012 at 10:26

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated Consensus-Low labels Jul 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

require all array declarations that work in the static model to work in the dynamic model #99

require all array declarations that work in the static model to work in the dynamic model #99

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

require all array declarations that work in the static model to work in the dynamic model #99

require all array declarations that work in the static model to work in the dynamic model #99

Comments

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015

GoogleCodeExporter commented Jul 2, 2015