MESA++ logo

Interfacing C++ to Fortran

This page has information on how C++ has been interfaced to Fortran in MESA++. .


Why?
What's in a name?
Procedure argument lists

Why?

C++ and its parent language, C, are widely used by writers of operating systems, compilers, tools, graphical toolkits, and other applications. These languages are also used by a sizable community of scientific programmers. Their popularity can be attributed to the availability of free high-quality compilers and development tools, such as g++ and Anjuta, and to the ability of C++ to accomodate a wide range of programming styles, from programming that is very close to the hardware to very high-level polymorphic and generic programming.

However, Fortran remains a popular language for numerical computing, in part from the perception that it is a more efficient language, and in part from tradition. There is a massive body of legacy scientific code written in various dialects of Fortran that ought not to be lightly discarded.

Fortran was originally developed for number crunching, and it has always emphasized computational efficiency: A real man's programming language, from the days when such a blatantly sexist statement didn't raise many eyebrows. It has evolved and improved over the years, but the insistence on efficiency first has made the Fortran community very conservative about adopting some of the more powerful modern programming language concepts, such as object-oriented programming. This has had the effect of causing Fortran to be scorned by many computer scientists. This author can attest to once being asked in an interview for a professorship whether he was willing to teach Fortran to physics graduate students, because the computer science department at the university in question was uninterested in teaching a Fortran course.

Curiously, C was also originally developed for efficiency, though with different applications in mind. C was originally developed for the PDP-11 family of computers, and its operator set closely matched the beautifully orthogonal instruction set of the PDP-11 machines. C continued to be developed as a language very close to the hardware, and, for certain applications, it is a very efficient language. Its chief disadvantage for number crunching is its very low-level treatment of arrays and its pervasive aliasing.

C really only offers one-dimensional arrays, because only one-dimensional arrays  map directly to the linear address space of most computer memory. Oh, it's true that the following is legal C code:

double a[10][20];
for (int i=0; i<9; ++i)
{
    for (int j=0; j<19; ++j)
        {
            a[i][j] = i+3*j;
        }
}

and it does what you expect, but a is not actually a 10 by 20 array of doubles: It is an array of 10 arrays of 20 doubles. This is not the same thing even if it looks superficially similar.

Closely related is the fact that an array is more or less freely convertible to a pointer to the underlying type. In other words,

double a[20];
double *aptr = a;

is perfectly legal C and is in fact what essentially takes place very time you pass an array to a procedure: The declarations

int func(double *a);

and

int func(double a[]);

are semantically identical. It follows that you can apply the subscript operator [] to any pointer, and the dereference operator * to any array. The expressions *a and aptr[0] are equivalent.

Fortran has reluctantly added pointers in its last few dialects, but with heavy restrictions on their use that are aimed at avoiding aliasing. Aliasing is the state of affairs in which the compiler cannot tell if two pointers will point to the same location in memory at run time. In Fortran, by default, two arrays coming into the same function are assumed to be different arrays, and this gives the compiler some optimization possibilities it would not have if the compiler had to assume the two array arguments might refer to the same memory. Furthermore, if an array is not specifically designated as a target, the compiler can assume that no pointer points to it.

C offers no such guarantees about aliasing. On the contrary, the compiler almost always is required to assume that two pointers might point at the same memory. This includes two pointers coming into a procedure as array arguments. This lack of restrictions on aliasing reflects the fact that pointers are true first-class objects in C and are very powerful tools for many (non-numerical) computing tasks.

How detrimental is aliasing to optimization, really? My most recent experience with C versus Fortran on a heavy-duty number crunching algorithm had the Fortran coming in about 5% faster with both the gcc and ifort/icpc compiler suites. That's in the grass, in my opinion. Your mileage may vary, but the potential of aliasing to inhibit optimizations should not be overstated.

C++ is very nearly a proper superset of C. In other words, almost all C code is also legal C++ code, which almost always has the same semantic meaning. The "almost always" can be a gotcha, but rarely a serious one. This means that C++ code can be written that is as efficient as any C code, and probably not much worse than Fortran.

However, the power of C++ is in the very high-level programming constructs it adds to its C foundation. These include polymorphic classes, generic programming, exception handling, and an extensive library. These powerful facilities often come at a price in efficiency, but the programmer has a lot of control over when they are used. The proper use of these features can be a joy. Their misuse in bottleneck portions of a code can be an efficiency disaster.

If C++ can be cleanly interfaced to Fortran, then one can have the best of both worlds. There are some definite challenges to doing so, but these are addressed in this article.

What's in a name?

In C++, every procedure or variable must be declared before it is first used, as if IMPLICIT NONE was always in effect. These declarations are traditionally put into a header file that is introduced wherever needed using a preprocessor #include directive. For a C++ procedure to call a Fortran procedure or access a Fortran variable, it must have available to it a C++ translation of the signature of each such procedure or the type of each such variable. We must also make sure the names of these procedures and variables are correctly translated. It may come as a surprise to you that this is not trivial.

In Fortran, there can only be one variable or procedure with a given name within each module or at the global scope. However, it is permissible to have variables or procedures with the same names in different modules. The linker that brings object files together into an executable program must see a unique name for each procedure or variable. This implies that the names you see in a Fortran source file are not necessarily the same as the names seen by the linker.

Even the name of a Fortran global procedure is not necessarily identical with the name seen by the linker. The majority of Fortran compilers tweak the name slightly, so that it will not collide with similar procedure names in the standard C library. For example, the gfortran compiler appends a single underscore to every Fortran global procedure name, so that a function named "my_function" becomes "my_function_" in the object file produced by the compiler.

The name of a module procedure or variable is distinguished from identical names in other modules or in the global scope by having the name of the module glued to the procedure or variable name. For example, if your code has a procedure named "my_function" in a module named "my_module", the gfortran compiler will convert the name to "__my_module_MOD_my_function" for purposes of linking. Note that the name is further tweaked by a couple of preceding underscores. Module variables are treated in exactly the same way, so that "my_variable" in "my_module" becomes "__my_module_MOD_my_variable".

C is much simpler. Every global procedure or  variable name is left completely unchanged by most C compilers. If your C function is named "my_function", the linker will see it as "my_function". Same for global variables. Does this seem rather C-centric? It is. Compiler and tool writers generally are fluent in C rather than Fortran and their conventions reflect this: C is the 500-pound gorilla.

C++ is another matter entirely, because it supports function overloading. A single name can refer to any number of functions, with any number of different parameter lists. This is necessary to support polymorphism and generic programming. The only restriction is that the compiler must be able to distinguish between procedures, based on the types of the arguments used to call them. Thus, you can simultaneously have a function int my_function(int) that takens an integer parameter and returns an integer result, and a function double my_function(double) that takes a double precision parameter and returns a double precision result. The compiler will decide which to use based on the actual argument: my_argument(1) will use int my_argument(int), while my_argument(1.0) will use double my_argument(double)

This poses a problem for linking, because each overloaded function must have a unique link name. C++ accomplishes this by something called name mangling, and the designation is apt. The link name for int my_function(int) generated by g++ is _Z11my_functioni, which encodes the fact that this is a global function taking a single integer argument.

C++ can portably interface with C functions by using the syntax:

extern "C" { int my_function(int); }

which indicates that the name of the function is to be left unmangled for linking purposes. For obvious reasons, there can only be a single extern "C" function of a given name, though it can still be overloaded with non-extern "C" functions with the same name.

In principle, a C++ compiler could provide an extern "FORTRAN" linkage feature, but while the C++ standard permits this, it does not mandate it. And I don't know of a single C++ compiler that actually provides "FORTRAN" linkage. Remember, C is the 500-pound gorilla. Thus, the key to interfacing C++ with Fortran is to use extern "C" linkage, then further tweak the function name by hand. If you are using gfortran and g++, you would have declarations something like

extern "C" {
   int my_fortran_function_(int &);
   double __my_module_MOD_my_fortran_dfunction(double &);
}

(I'll explain the ampersands in the next section.) This poses a portability problem, because different compiler suites will have different conventions for tweaking Fortran names. In MESA++ we get around this by using the preprocessor to do the mangling:

extern "C" {
   int SYMBOL_NAME(my_fortran_function)(int &);
   double MOD_SYMBOL_NAME(my_fortran_dfunction, my_module)(double &);
}

Here SYMBOL_NAME ande MOD_SYMBOL_NAME are preprocessor macros that do the necessary mangling. We define them for your convenience (and ours!) for all supported platforms. You just have to install the right generate-hh.hh file from those provided in the mesapp/cxx/skel directory.

Let me add in passing that a C++ subroutine is treated syntactically the same as a C++ function returning the pseudo-type void, that is, void my_subroutine(int) is a subroutine taking a single integer argument.

Procedure argument lists

The other important distinction between C++ and Fortran is the treatment of procedure arguments.

By default, most arguments to a Fortran procedure are passed by reference. In other words, what actually appears on the call frame for the procedure is the addresses of the arguments. The exceptions are variable-length character strings, shaped arrays, and pointers.

Variable-length character strings must somehow provide the procedure with the length of the string as well as its location. This is done by providing the length as a hidden argument in the procedure call frame. Unfortunately, this is done differently on different platforms. Some platforms put the hidden length argument immediately after the address of the string; other platforms put all the lengths of all the string arguments at the end of the call frame. For this reason MESA++ makes no attempt to interface directly to Fortran functions with variable length character string arguments.

Shaped arrays introduce an added level of indirection. The address of the array and the shape information (stored in something called a dope vector) are placed in a single structure and the address of that structure is then put on the call frame of the procedure being called. Pointers are also handled with two levels of indirection. Fortunately, this implementation is similar across platforms, so MESA++ is able to interface directly to procedures with shaped array and pointer arguments.

By default, C and C++ pass arguments by value. In other words, a copy of each argument is placed on the call frame, rather than the address of the original argument. For simple types, this is more efficient than pass by reference. On the other hand, when a C++ procedure is passed a copy of an argument, it has no way to modify the original argument. However, C++ also allows you to explicitly pass the argument by reference. Thus, int my_function(int a) passes a copy of a to my_function, while int my_function(int &a) passes a reference to the original argument a. In the former case, the function cannot change the value of the original argument, while in the latter case, the function can change the value of the original argument. 

Arrays appear to be handled differently, but not really. A procedure void p(double a[]) does not copy the entire contents of a onto the call frame, but only the address of a. C programmer don't view this as pass by reference, because void p(double a[]) is identical to void p(double *a). You're not passing an array by reference; you're passing a pointer by value. At least, that's how a C programmer sees it.

What does it all mean?

If your Fortran function has simple arguments, you declare these as reference arguments in the corresponding C++ declaration. Thus, if you have a Fortran subroutine

subroutine my_subroutine(a, b)
  integer :: a
  double precision :: d

...

then the corresponding C++ declaration will be

extern "C" {
    void SYMBOL_NAME(my_subroutine)(int &a, double &b);
}

If the arguments in the Fortran procedure are qualified as intent(in), you should const-qualify the corresponding C++ declaration:

extern "C" {
     void SYMBOL_NAME(my_subroutine)(int const &a, double const &b);
}

telling the C++ that the arguments will not be modified (they are constant). If you are passing a non-shaped array:

subroutine my_subroutine(n, a)
  integer, intent(in) :: n
  double precision :: a(n)

then the corresponding C++ declaration is

extern "C" {
  void SYMBOL_NAME(my_subroutine)(int const &a, double b[]);
}

This is true even for multidimensional arrays. For the procedure

subroutine my_subroutine(m, n, a)
  integer, intent(in) :: m, n
  double precision :: a(m,n)

the corresponding C++ declaration is

extern "C" {
  void SYMBOL_NAME(my_subroutine)(int const &m,
                                 int const &n,

                                              b[]);
}

We mentioned earlier that C++ does not provide true multidimensional arrays. C++ only provides arrays of arrays, which are not the same thing, and for most purposes (including  interfacing with Fortran) are probably more trouble than they are worth. In general, multidimensional arrays in C++ are best treated as single-dimensioned arrays using computed indices. We illustrate with an example. The Fortran code

double precision :: a(50, 70)

do i=1, 10, 2
  do j=1, 20, 3
    a(i,j) = func(i,j)
  enddo
enddo 

translates to the C++ code

double a[50*70];

for (i=0; i<10; i+=2)
 {

  for (j=0; j<20; j+=3)
  {
    a[i+50*j] = func(i,j);
  }
}

This doubtless seems inelegant, but it maps closely to the hardware used by virtually all modern computers, which is what C was originally all about. It's pretty much what your Fortran code does anyway, but in C it's made explicit. The worst of it is remembering the array dimensions every time you write code to access the array.

This is as good a place as any to point out what you may already have noticed: Array indices on C arrays start at 0, not 1. This reflects the near-equivalence of arrays and pointers: The expression a[n] is equivalent to *(a+n). Computer scientists love this. Physical scientists may take some time getting used to it. The most painful aspect of it for MESA++ is that the many integer constants defined in MESA that are meant to be used as array indices are all off by 1 in MESA++, and you  must explicitly subtract 1 when using them to access an array on the C++ side.

Fortran variables qualified as pointers are straightforward. The code fragment

subroutine my_subroutine(m)
  integer, pointer :: m

translates to

extern "C" {
  void SYMBOL_NAME(my_subroutine)(int *&m);
}

Finally, there are shaped arrays. Here we have to define a C++ structure to hold the array pointer and dope vector, so that

subroutine my_subroutine(a)
  double precision :: a(:,:)

translates to

extern "C"
{

  void SYMBOL_NAME(my_subroutine)(Shape<double, 2> &a);
}

The exact definition of the Shape<type, rank>  template will vary somewhat across platforms, but we define it properly for all supported platforms. You just have to install the right generate-hh.hh file from those provided in the mesapp/cxx/skel directory. The only difficult part is properly initializing the shaped arrays, and we provide functions to help you do this in the MESA++ utils package. There are also functions in the MESA++ utils package to wrap an existing C++ array into a Shape so it can be passed to MESA, and there are functions to allocate and deallocate a Shape of any desired size from scratch.



web traffic statsSourceForge.net Logowebsite design by
Andreas Viklund