Co-Array Fortran 使用说明

# Co-Array Fortran

# 简介

Co-Array Fortran（下面简称 CAF，中文暂称为”Fortran的集合数组阵列扩展“），最初是作为 Fortran 95 一个小小的扩展，由 Numrich 和 Reid 发表于 1998年 ACM Fortran Forum 17卷第2期的 “Co-Array Fortran for Parallel Programming” 文章中。直到 2005 年的5月，国际标准组织 Fortran 委员会决定将 CAF 包含在下一版本的 Fortran 语言规范草案中 (也就是今天我们常说的 Fortran 2008).

它提供了一种非常简洁、直观的标注方法，来处理那些通常必须要通过消息传递（message-passing）模型来处理的数据分解问题；同时它本身使用的是自然的，类似于 Fortran 的语法。这种语法独立于硬件架构，可以在分布式内存系统，共享内存系统和集群系统等中得到使用。

由于Coarrays是语言标准的一部分，避免了用户直接使用类似MPI对通讯库的访问，简化了进程间变量交换的编程。并且语言标准提供了内在的同步功能（Synchronization），使竞态（Race Condition， RC）和死锁（Deadlock）得以避免。

# 共享内存简单示例

# 使用coarray写源程序

我们以GNU官方提供的Hello_World 为例简单介绍如何使用它。

网址为：https://gcc.gnu.org/wiki/CoarrayExample (opens new window)

源码文件：Hello_World.f90，内容如下：

! Created by Tobias Burnus 2010.

program Hello_World
  implicit none
  integer :: i  ! Local variable
  character(len=20) :: name[*] ! scalar coarray
  ! Note: "name" is the local variable while "name[<index>]"
  ! accesses the variable on a remote image

  ! Interact with the user on Image 1
  if (this_image() == 1) then
    write(*,'(a)',advance='no') 'Enter your name: '
    read(*,'(a)') name

    ! Distribute information to other images
    do i = 2, num_images()
      name[i] = name
    end do
  end if

  sync all ! Barrier to make sure the data has arrived

  ! I/O from all nodes
  write(*,'(3a,i0)') 'Hello ',trim(name),' from image ', this_image()
end program Hello_world

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

# 编译

# 使用module命令加载编译环境
module add Intel_compiler/18.0.4
module add MPI/Intel/IMPI/2018.4.274
# 编译
ifort -coarray HelloWorld.f90  -o  HelloWorld.out

1
2
3
4
5

说明：

ifort为编译器名称
-coarray 为需要开启的编译器选项
加载MPI/Intel/IMPI/2018.4.274 是因为，我们需要在编译过程中的链接阶段使用libicaf.so库，它需要MPI支持。

注意

你的编译器加载方式与示例很可能不同，请参考自己环境的用户手册，加载自己的编译器。

# 运行

# 直接执行
./HelloWorld.out
# 输入姓名, 例如 zhenggang
zhenggang

# 屏幕显示
$ ./Hello_World.out
Enter your name: zhenggang
Hello zhenggang from image 1
Hello zhenggang from image 11
Hello zhenggang from image 9
Hello zhenggang from image 10
Hello zhenggang from image 26
Hello zhenggang from image 28
Hello zhenggang from image 37
Hello zhenggang from image 38
Hello zhenggang from image 2
Hello zhenggang from image 4
Hello zhenggang from image 7
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

注意

以上方式编译后，通过测试发现，应该只支持共享内存式的运行，并且自动获得num_image并行数。当我们需要灵活配置，并跨节点使用时，需要在编译和运行阶段进行更多配置，请继续阅读。

# 分布式内存复杂示例

# 使用coarray写源程序

我们编写一个fortran程序term.f90如下：

program term
implicit none
integer :: i[*], img
real :: r
img = this_image()
  i = img
if ( img-1.eq. 0 ) stop "img cannot continue"
do i=1,100000000
 r = atan(real(i))
end do
write (*,*) "img", img, "r", r
end

1
2
3
4
5
6
7
8
9
10
11
12

# 编译

# 使用module命令加载编译环境
module add Intel_compiler/18.0.4
module add MPI/Intel/IMPI/2018.4.274

# 编译
ifort -coarray=distributed -coarray-config-file=cafconfig.txt term.f90 -o term.x

1
2
3
4
5
6

说明：

-coarray=distributed ：指明为分布式内存方式
-coarray-config-file=cafconfig.txt：指明当前目录有一个名为cafconfig.txt的配置文件。这个配置信息将会被写入可执行程序term.x中。

提示

对于参数-coarray-config-file=cafconfig.txt，你可以指定任何其他相对/绝对路径，但是一会儿运行的时候，要提供给可执行程序。因此，此处直接写名称，默认就在当前目录查找了。

# 运行

# 本地运行（仅测试）

直接运行报错

这次，我们先尝试直接运行：

./term.x

报错了，报错内容类似：

$ ./term.x
[mpiexec@ln0] HYDU_parse_hostfile (../../utils/args/args.c:553): 
	unable to open host file: cafconfig.txt
[mpiexec@tln0] config_tune_fn (../../ui/mpich/utils.c:2195): 
	error parsing config file
[mpiexec@ln0] match_arg (../../utils/args/args.c:243): 
	match handler returned error
[mpiexec@ln0] HYDU_parse_array_single (../../utils/args/args.c:294): 
	argument matching returned error

Usage: ./mpiexec [global opts] [exec1 local opts] : 
	[exec2 local opts] : ...

Global options (passed to all executables):

  Global environment options:
    -genv {name} {value}       environment variable name and value
    -genvlist {env1,env2,...}  environment variable list to pass
    -genvnone                  do not pass any environment variables
    -genvall                   pass all environment variables not managed
                                    by the launcher (default)
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

这是因为，我们将可执行程序编译为一个分布式内存的程序，需要使用mpiexec或类似的并行命令去执行。

编写hostsfile

我们先写手写一个名为hostsfile的文件，里面写上当前节点的主机名，以及核数，类似：

ln0:40

说明：

ln0：可以使用hostname命令可以获得当前节点的主机名
:40：可以使用cat /proc/cpuinfo| grep "processor"| wc -l命令获得当前节点核数

编写cafconfig.txt

我们先把刚才的cafconfig.txt文件写一下：

-genvall -genv I_MPI_FABRICS=shm:tcp -machinefile ./hostsfile -n 4 ./term.x

参数说明：

-genvall：从当前 shell 继承环境变量设置
-genv I_MPI_FABRICS=shm:tcp：选择要按优先级顺序使用的交换矩阵：在节点内使用共享内存shm，TCP 用于远程
-machinefile ./hostsfile：表示要查找要运行的群集节点列表
-n 4：运行4个CAF的images，就是4个核跑一下
./term.x：可执行程序的路径和名称，./表示当前目录

建议

当集群支持除tcp以外的高速网络是，推荐使用高速网络，例如ib网络，配置参数例如：

-genv I_MPI_FIBRIC=sch:ofa //节点内shm，节点外ofa
-genv I_MPI_FIBRIC=ofa//节点内外均为ofa

补充

补充内容，通讯结构如下：

结构	介绍
shm	Shared memory transport (used for intra-node)
ofi	OpenFabrics Interfaces (OFI)-capable network fabrics, such as Intel® True Scale Fabric, Intel® Omni-Path Architecture, InfiniBand, and Ethernet (through OFI API).
dapl	Direct Access Programming Library* (DAPL)-capable network fabrics, such as InfiniBand* and iWarp* (through DAPL).
tcp	TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*).
tmi	Tag Matching Interface (TMI)-capable network fabrics, such as Intel® True Scale Fabric, Intel® Omni-Path Architecture and Myrinet* (through TMI).
ofa	OpenFabrics Alliance (OFA)-capable network fabrics, such as InfiniBand (through OFED* verbs).
ofi	OpenFabrics Interfaces* (OFI)-capable network fabrics, such as Intel® True Scale Fabric, Intel® Omni-Path Architecture, InfiniBand* and Ethernet (through OFI API).

通讯结构由变量 I_MPI_FIBRIC 控制，默认的行为：使用共享内存，并从结构表中选择最前面且可用的结构方式，结构表一般为（dapl,ofa,tcp,tmi,ofi），也可以看I_MPI_FABRICS_LIST。

运行

将这个cafconfig.txt文件和可执行程序放在统一目录，并执行：

./term.x

这次就会运行并获得正确输出：

img cannot continue
 img           3 r   1.570796
 img           4 r   1.570796
 img           2 r   1.570796

1
2
3
4

注意

对于集群计算节点，每次运行的节点名和核数会变化，在分配资源前并不能确定节点名。因此我们尝试通过脚本方式自动生成 hostsfile 文件和 cafconfig.txt 文件，实现动态化。

# 提交运行

我们以SLURM作业调度系统为例，编写自动获取节点信息、自动生成 hostsfile 文件和 cafconfig.txt 文件的脚本，实现支持CAF的作业提交。

编写提交脚本sub.sh

#!/bin/bash
#SBATCH -N 2
#SBATCH -n 56
#SBATCH -p debug

# -------------------------------
# change executable program name
# -------------------------------
EXE="./term.x"

# -------------------------------
# create hostsfile
# -------------------------------
a=`expr $SLURM_NPROCS / $SLURM_NNODES `
b=`expr $SLURM_NPROCS % $SLURM_NNODES `
for name in `yhcontrol show hostnames $SLURM_NODELIST| sort`
do
if [[ $b -ne 0 ]];then
  c=`expr $a + $b `
  echo "$name":$c >> hostsfile-$SLURM_JOBID
  let b--
else
  echo "$name":$a >> hostsfile-$SLURM_JOBID
fi
done

# -------------------------------
# create cafconfig.txt
# -------------------------------
echo "-genvall -genv I_MPI_FABRICS=shm:ofa -machinefile 
./hostsfile-$SLURM_JOBID -n $SLURM_NPROCS $EXE" > cafconfig.txt

# -------------------------------
# run  executable program 
# -------------------------------
$EXE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

运行

yhbatch sub.sh

查询结果

默认输出到一个名为slurm-jobid.txt的文件，其中jobid为一串数字。打开这个文件，正确内容为：

img cannot continue
 img          34 r   1.570796
 img          47 r   1.570796
 img          49 r   1.570796
 img          51 r   1.570796
 img          52 r   1.570796
 img          53 r   1.570796
 img          30 r   1.570796
 img          31 r   1.570796
 img          33 r   1.570796
 img          35 r   1.570796
 img          36 r   1.570796
 img          38 r   1.570796
 img          40 r   1.570796
 img          43 r   1.570796
 img          44 r   1.570796
 img          48 r   1.570796
 img          54 r   1.570796
 img           2 r   1.570796
 img           7 r   1.570796
 img           9 r   1.570796
 img          10 r   1.570796
 img          11 r   1.570796
 img          14 r   1.570796
 img          15 r   1.570796
 img          16 r   1.570796

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

简要说明：

SBATCH 的三行，表示-N节点数，-n 总核数，-p 计算分区名（可以用yhi查看可用分区名）
EXE="./term.x" 表示可执行程序的名字
接下来创建hostsfile，创建cafconfig.txt。这两步无需更改。
最后执行可执行程序。

# 附录

参考资料：

← Linux使用：screen Anaconda 安装教程→