iSCSI multipath shared storage with OCFS2 filesystem on Ubuntu

Main purpose of multipath connectivity is to provide redundant access to the storage devices, i.e to have access to the storage device when one or more of the components in a path fail. Another advantage of multipathing is the increased throughput by way of load balancing. Common example for the use of multipathing is a iSCSI SAN connected storage device. You have redundancy and maximum performance.

The common use case for this kind of storage system is for the shared storage between multiple servers. It could be for virtualiztion system like VMware ESXi with VMFS file system or just between Linux hosts using GFS or OCFS. This post is my experience with configuring iSCSI multipath on Ubuntu Server using OCFS2 file system to have the shared storage that two servers can access at the same time.

iSCSI multipath

Install the softwares

On Ubuntu, open-iscsi and multipath-tools are two packages needed for multipath iSCSI configuration. The packages are availabble in Ubuntu offical repo, we can install them directly with apt-get command.

1
2
$ sudo apt-get update
$ sudo apt-get install open-iscsi multipath-tools

iSCSI target discovery

After having the tools, we can start discovering the iSCSI targets, we should see the IQN output of each target.

1
2
3
$ iscsiadm -m discovery -t st -p 10.20.0.101
$ iscsiadm -m discovery -t st -p 10.20.0.102
$ iscsiadm -m discovery -t st -p 10.20.0.103

We can manually login into the target with following commands

1
2
3
$ iscsiadm -m node --targetname "iqn.2000-05.com.3pardata:20210002ac02426b" --portal "10.20.0.101:3260" --login
$ iscsiadm -m node --targetname "iqn.2000-05.com.3pardata:20220002ac02426b" --portal "10.20.0.102:3260" --login
$ iscsiadm -m node --targetname "iqn.2000-05.com.3pardata:21210002ac02426b" --portal "10.20.0.102:3260" --login

After logging in successfully, we should able to see the new disk devices available in the OS. Let’s use fdisk -l command to validate them. In my example, there are 3 new devices which are sdc, sdd and sde.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
$ sudo fdisk -l
Disk /dev/sda: 745.2 GiB, 800132521984 bytes, 1562758832 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes
Disklabel type: gpt
Disk identifier: 42CD7FBD-63FB-4E06-AC35-6781A5BCECA6

Device Start End Sectors Size Type
/dev/sda1 2048 1050623 1048576 512M EFI System
/dev/sda2 1050624 1562757119 1561706496 744.7G Linux LVM

Disk /dev/mapper/os--vg-root: 743.7 GiB, 798566121472 bytes, 1559699456 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes

Disk /dev/mapper/os--vg-swap_1: 976 MiB, 1023410176 bytes, 1998848 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes

Disk /dev/sdc: 6.9 TiB, 7516192768000 bytes, 14680064000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 16777216 bytes

Disk /dev/sdd: 6.9 TiB, 7516192768000 bytes, 14680064000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 16777216 bytes

Disk /dev/sde: 6.9 TiB, 7516192768000 bytes, 14680064000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 16777216 bytes

Disk /dev/mapper/360002ac000000000000000030002426b: 6.9 TiB, 7516192768000 bytes, 14680064000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 16777216 bytes

On Ubuntu, the iSCSI feature is provided by open-iscsi service. We can adjust its configuration at /etc/iscsi/iscsid.conf. For example, we can make the target login process done automatically and adjust the replacement_timeout value

1
2
3
4
...
node.startup = automatic
node.session.timeo.replacement_timeout = 15
...

Restart the service

1
$ systemctl restart open-iscsi

iSCSI multipath configuration

In above fdisk example, the 3 new disks have exact same size and represented by a disk named /dev/mapper/360002ac000000000000000030002426b. This device is created by multipath-tools. We can verify the iSCSI multipath status

1
2
3
4
5
6
7
$ multipath -ll
360002ac000000000000000030002426b dm-2 3PARdata,VV
size=6.8T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 8:0:0:0 sdc 8:128 active ready running
|- 3:0:0:0 sdd 8:48 active ready running
`- 6:0:0:0 sde 8:96 active ready running

Multipath config file is at /etc/multipath.conf. It is recommended to adjust the multipath config to have the device blacklist and naming. I normally blacklist all devices, and only allow specific devices using blacklist_exceptions. I also set the device name to mpath0 using alias directive.

1
2
3
4
5
6
7
8
9
10
11
12
blacklist {
wwid .*
}
blacklist_exceptions {
wwid "360002ac0000000000000008300023cc6"
}
multipaths {
multipath {
wwid "360002ac0000000000000008300023cc6"
alias mpath0
}
}

Note: You can get the wwid from multipath -ll command

Following is a full example of multipath.conf file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
uid_attribute ID_SERIAL
rr_min_io 100
failback immediate
no_path_retry queue
user_friendly_names yes
}

blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^(td|hd)[a-z]"
devnode "^dcssblk[0-9]*"
devnode "^cciss!c[0-9]d[0-9]*"
device {
vendor "DGC"
product "LUNZ"
}
device {
vendor "EMC"
product "LUNZ"
}
device {
vendor "IBM"
product "Universal Xport"
}
device {
vendor "IBM"
product "S/390.*"
}
device {
vendor "DELL"
product "Universal Xport"
}
device {
vendor "SGI"
product "Universal Xport"
}
device {
vendor "STK"
product "Universal Xport"
}
device {
vendor "SUN"
product "Universal Xport"
}
device {
vendor "(NETAPP|LSI|ENGENIO)"
product "Universal Xport"
}
}
blacklist_exceptions {
wwid "360002ac0000000000000008300023cc6"
}
multipaths {
multipath {
wwid "360002ac0000000000000008300023cc6"
alias mpath0
}
}

Now if we list the available disks again, we should see the new name as following output

1
2
3
4
5
6
$ sudo fdisk -l
...
Disk /dev/mapper/mpath0: 6.9 TiB, 7516192768000 bytes, 14680064000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 16384 bytes / 16777216 bytes

OCFS2 file system

In order to share the same disk between hosts, we cannot use the traditional file systems such as ext4, zfs etc. We need a shared disk file system. In this article, I am gonna use OCFS2, which is the second version of Oracle Cluster File System.

First of all, we need to install the package

1
$ sudo apt-get install ocfs2-tools

Now on the first node, we initialize the cluster with name ocfs2test and add the node members. In the following example, server1 and server2 and 2 ocfs nodes. Make sure they can resolve the name to ip addresses properly.

1
2
3
$ sudo o2cb add-cluster ocfs2test
$ sudo o2cb add-node ocfs2test server1 172.17.0.11
$ sudo o2cb add-node ocfs2test server2 172.17.0.12

Now your cluster config file /etc/ocfs2/cluster.conf looks like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cluster:
heartbeat_mode = local
node_count = 2
name = ocfs2test

node:
number = 0
cluster = ocfs2test
ip_port = 7777
ip_address = 172.17.0.11
name = server1

node:
number = 1
cluster = ocfs2test
ip_port = 7777
ip_address = 172.17.0.12
name = server2

Note:

  • 7777 is the default port of OCFS cluster communication. Make sure this ported is allowed between nodes in the cluster.
  • After successfully initializing the cluster, copy its config file to other nodes.

Then make the OCFS2 file system for our iSCSI disk provided by multipath (only need to run on one node)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ sudo mkfs.ocfs2 -L OCFS2_TEST /dev/mapper/mpath0
mkfs.ocfs2 1.8.5
Cluster stack: classic o2cb
Label: OCFS2_TEST
Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg append-dio
Block size: 4096 (12 bits)
Cluster size: 4096 (12 bits)
Volume size: 7086696038400 (1730150400 clusters) (1730150400 blocks)
Cluster groups: 53639 (tail covers 3072 clusters, rest cover 32256 clusters)
Extent allocator size: 444596224 (106 groups)
Journal size: 268435456
Node slots: 16
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 6 block(s)
Formatting Journals: done
Growing extent allocator: done
Formatting slot map: done
Formatting quota files: done
Writing lost+found: done
mkfs.ocfs2 successful

On every single node, configure ocfs2-tools using Debian dpkg-reconfigure command. All the options can be set as default values except you have to make it starts on boot and specify cluster name as we set in the previous step.

1
$ sudo dpkg-reconfigure ocfs2-tools

Then update node’s fstab config file

1
$ sudo echo "/dev/mapper/mpath0 /data ocfs2 _netdev,defaults 0 0" >> /etc/fstab

Mount the new disk

1
2
$ mkdir /data
$ mount -a

Double check with df command, we should see the new disk mounted to /data mountpoint

1
2
3
4
5
6
7
8
9
10
11
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 126G 0 126G 0% /dev
tmpfs 26G 2.5M 26G 1% /run
/dev/mapper/os--vg-root 732G 2.0G 692G 1% /
tmpfs 126G 0 126G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/sda1 511M 6.1M 505M 2% /boot/efi
/dev/mapper/mpath0 6.5T 12G 6.5T 1% /data
tmpfs 26G 0 26G 0% /run/user/1000

The shared disk is now mounted to /data and can be accessed by multiple node at the same time.

Share Comments