Accelerating Python code with Numba and F2PY

Introduction

Following the suggestion mentioned in the conclusion of the previous article, the objective of this post is to speed up the computation of the 2D constant boundary element influence coefficients, $\mathbb{G}$ and $\mathbb{Q}$, by two methods and compare their performances.

Methods

The two methods to accelerate Python code that are studied in this post are Numba and F2PY. They are further examined in the following sections.

Numba

Numba is a open-source just-in-time compiler that translates a subset of Python and NumPy into fast machine code. The most common way of using Numba is through its decorators, which can be applied to the functions the user wants to compile. The following code snippet displays the application of Numba decorator jit to a class static method that computes the influence coefficients $\mathbb{G}$ and $\mathbb{Q}$ by analytical integration.

Class static method accelerated with Numba

# Extract from element.ipynb

import numpy as np
import numba

eps = np.finfo(np.float64).eps

class Element:
   
    @staticmethod
    @numba.jit('Tuple((f8, f8))(f8[:], f8)', nopython=True, cache=True)
    def _analytical_numba(field_local, element_length):
        """Get influence coefficients at a field point using analytical integration."""

        a = 0.5 * element_length
        x, y = field_local

        if np.abs(y) <= element_length * eps \
           and np.abs(np.abs(x) - a) <= element_length * eps:
            G = a / np.pi * (np.log(2*a) - 1)
            Q = 0.0
        else:
            xpa = x + a
            xma = x - a

            r1 = np.sqrt(xma**2 + y**2)
            r2 = np.sqrt(xpa**2 + y**2)
            t1 = np.arctan2(y, xma)
            t2 = np.arctan2(y, xpa)

            G = 0.5 / np.pi * (
                y * (t1 - t2) - xma * np.log(r1) + xpa * np.log(r2) - 2 * a
            )

            if np.abs(y) <= element_length * eps:
                # Q is discontinuous in |x| < a and y = 0.
                Q = 0.0
            else:
                Q = -0.5 / np.pi * (t1 - t2)

        return G, Q

The decorator could have been applied to a function, but it was used to a static method of the class Element since this calculation is exclusive to its objects. In Numba version 0.61.0, applying the decorator to an instance method was not possible, as the jit compiler could not interpret the self argument.

The used arguments to numba.jit are described below:

'Tuple((f8, f8))(f8[:], f8)': this is the signature that represents the types of the arguments, where f8 means 8-byte float number. The first part Tuple((f8, f8)) indicates two float outputs, the second part (f8[:], f8) describe one float array and one float variable as inputs. In general, the signature is optional, but it’s necessary when cache=True is used.
nopython=True: this argument tells Numba to run the static method entirely without the involvement of the Python interpreter. This is the recommended practice for best performance.
cache=True: this option makes Numba use a cached version of the compiled method. This avoids recompilation every time the routine is called for the first time.

F2PY

F2PY, already explored in another post, is a tool that builds Python wrappers for Fortran routines, allowing the speed of Fortran compiled code to be used from within the Python environment. Below is the Fortran code to be wrapped by F2PY.

Fortran subroutine

! Extract from incoef.f90

module incoef

  implicit none
  private
  public :: gauss_fortran

contains

  subroutine gauss_fortran(field_local, element_length, G, Q)
    ! Get influence coefficients at a field point using Gauss-Legendre quadrature.

    implicit none
    integer, parameter :: dp = kind(1.0d0)
    real(dp), intent(in) :: field_local(2)
    real(dp), intent(in) :: element_length
    real(dp), intent(out) :: G, Q
    real(dp), parameter :: pi = 3.141592653589793_dp
    real(dp), parameter :: roots(2) = [0.3399810435848563_dp, 0.8611363115940526_dp]
    real(dp), parameter :: weights(2) = [0.6521451548625461_dp, 0.3478548451374538_dp]
    real(dp) :: a, x, y
    integer :: i

    a = 0.5_dp * element_length
    x = field_local(1)
    y = field_local(2)

    G = 0.0_dp
    Q = 0.0_dp
    do i = 1, size(roots)
      G = G + weights(i) * (intG(a * roots(i)) + intG(-a * roots(i)))
      Q = Q + weights(i) * (intQ(a * roots(i)) + intQ(-a * roots(i)))
    end do

    G = G * 0.25_dp * a / pi
    Q = -Q * 0.5_dp * a / pi

  contains
    
    real(dp) function intG(t)
      real(dp), intent(in) :: t
      intG = log((x - t)**2 + y**2)
    end function intG

    real(dp) function intQ(t)
      real(dp), intent(in) :: t
      intQ = y / ((x - t)**2 + y**2)
    end function intQ

  end subroutine gauss_fortran

end module incoef

The incoef python module is created by running:

f2py -c incoef.f90 -m incoef

After creation of a shared library (incoef.cp312-win_amd64.pyd on the author machine), the gauss_fortran subroutine can then be called from a Python instance method like this:

# Extract from element.ipynb

import incoef

class Element:
    
    # [...]
        
    def get_influence_coefficients_gauss_fortran(self, field_global):
        field_local = self.get_point_local_coordinates(field_global)

        return incoef.incoef.gauss_fortran(field_local, self.length)

In the code above, the first incoef is the Python module created by F2PY. The second incoef is the Fortran module wrapped by F2PY. The inclusion of the Fortran subroutine inside a Fortran module allows the addition of auxilary functions that can be kept private from the Python environment.

Results

The previous post compared analytical integration and the 4-point Gauss-Legendre quadrature. Here, each method is independently reassessed, enhanced by the previously mentioned Numba and F2PY techniques, with focus on their performance improvements. The following plot shows the computation time for calculating the influence coefficients $\mathbb{G}$ and $\mathbb{Q}$ for several field points layed around a boundary element, the same conditions presented in the previous post.

On average, methods improved by Numba and Fortran perform similary, taking half the time of the original procedures.

Conclusion

The results indicate that both Numba and Fortran significantly enhance a Python routine performance. Numba’s ease of use encourages its further application, though it’s best suited for numerically oriented code using basic Python elements and NumPy, with limitations on other Python constructs. Conversely, Fortran offers greater flexibility in optimizing Python code, but the F2PY wrapper imposes constraints, requiring Fortran modules and procedures to follow a strict format distinct from traditional Fortran development.

References

Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: a LLVM-based Python JIT compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (LLVM ‘15). Association for Computing Machinery, New York, NY, USA, Article 7, 1–6. https://doi.org/10.1145/2833157.2833162
NumPy Developers. 2024. F2PY. Distributed as part of NumPy. https://numpy.org/doc/stable/f2py/

Introduction#

Methods#

Numba#

F2PY#

Results#

Conclusion#

References#

Appendices#