Interfacing C++ to Fortran
This page has information on how C++ has been interfaced to Fortran
in MESA++. .
Why?
What's in a name?
Procedure argument lists
Why?
C++ and its parent language, C, are widely used by writers of
operating systems, compilers, tools, graphical toolkits, and other
applications. These languages are also used by a sizable community of
scientific programmers. Their popularity can be attributed to the
availability of free high-quality compilers and development tools, such
as g++ and Anjuta, and to the ability of C++ to accomodate a wide range
of programming styles, from programming that is very close to the
hardware to very high-level polymorphic and generic programming.
However, Fortran remains a popular language for numerical computing,
in part from the perception that it is a more efficient language, and
in part from tradition. There is a massive body of legacy scientific
code written in various dialects of Fortran that ought not to be
lightly discarded.
Fortran was originally developed for number crunching, and it has always emphasized computational efficiency: A real man's programming language, from the days when such a blatantly sexist statement didn't raise many eyebrows. It has evolved and improved over the years, but the insistence on efficiency first has made the Fortran community very conservative about adopting some of the more powerful modern programming language concepts, such as object-oriented programming. This has had the effect of causing Fortran to be scorned by many computer scientists. This author can attest to once being asked in an interview for a professorship whether he was willing to teach Fortran to physics graduate students, because the computer science department at the university in question was uninterested in teaching a Fortran course.
Curiously, C was also originally developed for efficiency, though
with different applications in mind. C was originally developed for the
PDP-11
family of computers, and its operator set closely matched the
beautifully orthogonal instruction set of the PDP-11 machines. C
continued to be developed as a language very
close to the hardware, and, for certain applications, it is a very
efficient language. Its chief disadvantage for number crunching is its
very low-level treatment of arrays and its pervasive aliasing.
C really only offers one-dimensional arrays, because only one-dimensional arrays map directly to the linear address space of most computer memory. Oh, it's true that the following is legal C code:
double
a[10][20];
for (int i=0; i<9; ++i)
{
for (int j=0; j<19; ++j)
{
a[i][j] =
i+3*j;
}
}
and it does what you expect, but a is not actually a 10 by 20 array of doubles: It is an array of 10 arrays of 20 doubles. This is not the same thing even if it looks superficially similar.
Closely related is the fact that an array is more or less freely
convertible to a pointer to the underlying type. In other words,
double a[20];
double *aptr = a;
is perfectly legal C and is in fact what essentially takes place very time you pass an array to a procedure: The declarations
int func(double *a);
and
int func(double a[]);
are semantically identical. It follows that you can apply the
subscript operator [] to any pointer, and the dereference operator * to
any array. The expressions *a and aptr[0] are
equivalent.
Fortran has reluctantly added pointers in its last few dialects, but
with heavy restrictions on their use that are aimed at avoiding
aliasing. Aliasing is the state of affairs in which the compiler cannot
tell if two pointers will point to the same location in memory at run
time.
In Fortran, by default, two arrays coming into the same function are
assumed to be different arrays, and this gives the compiler some
optimization possibilities it would not have if the compiler had to
assume the two array arguments might refer to the same memory.
Furthermore, if an array is not specifically designated as a target,
the compiler can assume that no pointer points to it.
C offers no such guarantees about aliasing. On the contrary, the
compiler almost always is required to assume that two pointers might
point at
the same memory. This includes two
pointers coming into a procedure as array arguments. This lack of
restrictions on aliasing reflects the fact that pointers are true
first-class objects in C and are very powerful tools for many
(non-numerical) computing tasks.
How detrimental is aliasing to optimization, really? My most recent
experience with C
versus Fortran on a heavy-duty number crunching algorithm had
the Fortran coming in about 5% faster with both the gcc and ifort/icpc
compiler suites. That's in the grass, in my
opinion. Your mileage may vary, but the potential of aliasing to
inhibit optimizations should
not be overstated.
C++ is very nearly a proper superset of C. In other words, almost all C code is also legal C++ code, which almost always has the same semantic meaning. The "almost always" can be a gotcha, but rarely a serious one. This means that C++ code can be written that is as efficient as any C code, and probably not much worse than Fortran.
However, the power of C++ is in the very high-level programming
constructs it adds to its C foundation. These include polymorphic
classes, generic programming, exception handling, and an extensive
library. These powerful facilities often come at a price in
efficiency, but the programmer has a lot of control over when they are
used. The proper use of these features can be a joy. Their misuse in
bottleneck portions of a code can be an efficiency disaster.
If C++ can be cleanly interfaced to Fortran, then
one can have the best of both worlds. There are some definite
challenges to doing so, but these are addressed in this article.
What's in a name?
In C++, every procedure or variable must be declared before it is
first used, as if IMPLICIT NONE was always in effect. These
declarations are traditionally put into a header file that is
introduced wherever needed using a preprocessor #include directive.
For
a
C++
procedure
to
call
a
Fortran
procedure
or access a Fortran
variable, it must have available to it a C++ translation of the
signature of each such procedure or the type of each such variable. We
must
also make sure the names of these procedures and variables are
correctly translated. It may come as a surprise to you that this is not
trivial.
In Fortran, there can only be one variable or procedure with a given
name within each module or at the global scope. However, it is
permissible to
have variables or procedures with the same names in different modules.
The linker that brings object files together into an executable program
must see a unique name for each procedure or
variable. This implies that the names you see in a Fortran source file
are not necessarily the same as the names seen by the linker.
Even the name of a Fortran global procedure is not necessarily identical with the name seen by the linker. The majority of Fortran compilers tweak the name slightly, so that it will not collide with similar procedure names in the standard C library. For example, the gfortran compiler appends a single underscore to every Fortran global procedure name, so that a function named "my_function" becomes "my_function_" in the object file produced by the compiler.
The name of a module procedure or variable is distinguished from identical names in other modules or in the global scope by having the name of the module glued to the procedure or variable name. For example, if your code has a procedure named "my_function" in a module named "my_module", the gfortran compiler will convert the name to "__my_module_MOD_my_function" for purposes of linking. Note that the name is further tweaked by a couple of preceding underscores. Module variables are treated in exactly the same way, so that "my_variable" in "my_module" becomes "__my_module_MOD_my_variable".
C is much simpler. Every global procedure or variable name is left completely unchanged by most C compilers. If your C function is named "my_function", the linker will see it as "my_function". Same for global variables. Does this seem rather C-centric? It is. Compiler and tool writers generally are fluent in C rather than Fortran and their conventions reflect this: C is the 500-pound gorilla.
C++ is another matter entirely, because it supports function overloading. A single name can refer to any number of functions, with any number of different parameter lists. This is necessary to support polymorphism and generic programming. The only restriction is that the compiler must be able to distinguish between procedures, based on the types of the arguments used to call them. Thus, you can simultaneously have a function int my_function(int) that takens an integer parameter and returns an integer result, and a function double my_function(double) that takes a double precision parameter and returns a double precision result. The compiler will decide which to use based on the actual argument: my_argument(1) will use int my_argument(int), while my_argument(1.0) will use double my_argument(double).
This poses a problem for linking, because each overloaded function
must have a unique link name. C++ accomplishes this by something called
name mangling, and the
designation is
apt. The link name for int
my_function(int) generated by g++ is
_Z11my_functioni,
which
encodes
the
fact
that
this
is
a
global
function
taking a single integer argument.
C++ can portably interface with C functions by using the syntax:
extern "C" { int my_function(int); }
which indicates that the
name of the function is to be left unmangled for linking purposes. For
obvious
reasons, there can only be a single extern "C" function of a given
name, though it can still be overloaded with non-extern "C" functions
with the same name.
In principle, a C++ compiler could provide an extern "FORTRAN" linkage feature, but while the C++ standard permits this, it does not mandate it. And I don't know of a single C++ compiler that actually provides "FORTRAN" linkage. Remember, C is the 500-pound gorilla. Thus, the key to interfacing C++ with Fortran is to use extern "C" linkage, then further tweak the function name by hand. If you are using gfortran and g++, you would have declarations something like
extern
"C"
{
int my_fortran_function_(int
&);
double
__my_module_MOD_my_fortran_dfunction(double &);
}
(I'll explain the ampersands in the next section.) This poses a portability problem, because different compiler suites will have different conventions for tweaking Fortran names. In MESA++ we get around this by using the preprocessor to do the mangling:
extern
"C"
{
int
SYMBOL_NAME(my_fortran_function)(int &);
double
MOD_SYMBOL_NAME(my_fortran_dfunction, my_module)(double &);
}
Here SYMBOL_NAME ande MOD_SYMBOL_NAME are preprocessor macros that do the necessary mangling. We define them for your convenience (and ours!) for all supported platforms. You just have to install the right generate-hh.hh file from those provided in the mesapp/cxx/skel directory.
Let me add in passing that a C++ subroutine is treated syntactically
the same as a C++ function returning the pseudo-type void, that
is,
void
my_subroutine(int) is a subroutine taking a single integer
argument.
Procedure argument lists
The other important distinction between C++ and Fortran is the treatment of procedure arguments.
By default, most arguments to a Fortran procedure are passed by
reference. In other words, what actually appears on the call frame for
the procedure is the addresses
of the arguments. The exceptions are variable-length character strings,
shaped arrays, and pointers.
Variable-length character strings must somehow provide the procedure
with the length of the string as well as its location. This is done by
providing the length as a hidden argument in the procedure call frame.
Unfortunately, this is done differently on different platforms. Some
platforms put the hidden length argument immediately after the address
of the string; other platforms put all the lengths of all the string
arguments at the end of the call frame. For this reason MESA++ makes no
attempt to interface directly to Fortran functions with variable length
character string arguments.
Shaped arrays introduce an added level of indirection. The address
of the array and the shape information (stored in something
called a dope vector) are placed in a single structure and the address
of that structure is then put on the call frame of the procedure being
called.
Pointers are also handled with two levels
of indirection. Fortunately, this implementation is similar across
platforms, so MESA++ is able to interface directly to procedures with
shaped array and pointer arguments.
By default, C and C++ pass arguments by value. In other words, a copy of each argument is placed on the call frame, rather than the address of the original argument. For simple types, this is more efficient than pass by reference. On the other hand, when a C++ procedure is passed a copy of an argument, it has no way to modify the original argument. However, C++ also allows you to explicitly pass the argument by reference. Thus, int my_function(int a) passes a copy of a to my_function, while int my_function(int &a) passes a reference to the original argument a. In the former case, the function cannot change the value of the original argument, while in the latter case, the function can change the value of the original argument.
Arrays appear to be handled differently, but not really. A procedure
void p(double
a[]) does not copy the entire contents of a onto the
call frame, but only the address of a. C
programmer don't view this as pass by reference, because void p(double a[])
is identical to void p(double *a).
You're
not
passing
an
array
by
reference;
you're
passing
a pointer by
value. At least, that's how a C programmer sees it.
What does it all mean?
If your Fortran function has simple arguments, you declare these as
reference arguments in the corresponding C++ declaration. Thus, if you
have a Fortran subroutine
subroutine
my_subroutine(a,
b)
integer :: a
double precision :: d
...
then the corresponding C++ declaration will be
extern
"C"
{
void
SYMBOL_NAME(my_subroutine)(int &a, double &b);
}
If the arguments in the Fortran procedure are qualified as intent(in), you should const-qualify the corresponding C++ declaration:
extern
"C"
{
void
SYMBOL_NAME(my_subroutine)(int const &a, double const &b);
}
telling the C++ that the arguments will not be modified (they are constant). If you are passing a non-shaped array:
subroutine
my_subroutine(n,
a)
integer, intent(in) :: n
double precision :: a(n)
then the corresponding C++ declaration is
extern
"C"
{
void
SYMBOL_NAME(my_subroutine)(int const &a, double b[]);
}
This is true even for multidimensional arrays. For the procedure
subroutine
my_subroutine(m,
n,
a)
integer, intent(in) :: m, n
double precision :: a(m,n)
the corresponding C++ declaration is
extern
"C"
{
void
SYMBOL_NAME(my_subroutine)(int const &m,
int const &n,
b[]);
}
We mentioned earlier that C++ does not provide true multidimensional arrays. C++ only provides arrays of arrays, which are not the same thing, and for most purposes (including interfacing with Fortran) are probably more trouble than they are worth. In general, multidimensional arrays in C++ are best treated as single-dimensioned arrays using computed indices. We illustrate with an example. The Fortran code
double precision :: a(50, 70)
do
i=1, 10, 2
do j=1, 20, 3
a(i,j) =
func(i,j)
enddo
enddo
translates to the C++ code
double a[50*70];
for
(i=0; i<10; i+=2)
{
for (j=0; j<20; j+=3)
{
a[i+50*j] =
func(i,j);
}
}
This doubtless seems inelegant, but it maps closely to the hardware
used by virtually all modern computers, which is what C was originally
all about. It's pretty much what your Fortran code does anyway, but
in C it's made explicit. The worst of it is remembering the array
dimensions
every
time you write code to access the array.
This is as good a place as any to point out what you may already
have noticed: Array indices on C arrays start at 0, not 1. This
reflects the
near-equivalence of arrays and pointers: The expression a[n] is
equivalent to *(a+n).
Computer
scientists
love
this.
Physical
scientists
may
take
some
time getting
used to it. The most painful aspect of it for MESA++ is that
the many integer constants defined in MESA that are meant to be used as
array indices are all off by 1 in MESA++, and you must explicitly
subtract 1 when using them to access an array on the C++ side.
Fortran variables qualified as pointers are straightforward. The
code fragment
subroutine
my_subroutine(m)
integer, pointer :: m
translates to
extern
"C"
{
void
SYMBOL_NAME(my_subroutine)(int *&m);
}
Finally, there are shaped arrays. Here we have to define a C++
structure to hold the array pointer and dope vector, so that
subroutine
my_subroutine(a)
double precision :: a(:,:)
translates to
extern
"C"
{
void
SYMBOL_NAME(my_subroutine)(Shape<double, 2> &a);
}
The exact definition of the Shape<type, rank> template will vary somewhat across platforms, but we define it properly for all supported platforms. You just have to install the right generate-hh.hh file from those provided in the mesapp/cxx/skel directory. The only difficult part is properly initializing the shaped arrays, and we provide functions to help you do this in the MESA++ utils package. There are also functions in the MESA++ utils package to wrap an existing C++ array into a Shape so it can be passed to MESA, and there are functions to allocate and deallocate a Shape of any desired size from scratch.
website
design
by
Andreas Viklund