Using the Parallel NFS (pNFS) SCSI Layout with NVMe
hch@lst.de
Transport
NFSv4
NFSv4
This document explains how to use the Parallel Network File System
(pNFS) SCSI Layout Type with transports using the NVMe or NVMe
over Fabrics protocol.
The pNFS SCSI layout is a layout type
that allows NFS clients to directly perform I/O to block storage
devices while bypassing the MDS. It is specified by using concepts
from the SCSI protocol family for the data path to the storage
devices. This documents explains how to access PCI Express, RDMA
or Fibre Channel devices using the NVM Express protocol
using the SCSI layout type by leveraging
the SCSI Translation Reference ().
This document does not amend the pNFS SCSI layout document in any
way, instead of explains how to map the SCSI constructs used in
the pNFS SCSI layout document to NVMe concepts using the NVMe
SCSI translation reference.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .
The following definitions are provided for the purpose of providing
an appropriate context for the reader.
The "client" is the entity that accesses the NFS server's
resources. The client may be an application that contains the
logic to access the NFS server directly. The client may also be
the traditional operating system client that provides remote file
system services for a set of applications.
The "server" is the entity responsible for coordinating client
access to a set of file systems and is identified by a server
owner.
The SCSI layout definition only references
few SCSI specific concepts directly. This document uses the
NVMe SCSI Translation Reference document ()
to provide mappings from these SCSI concepts to NVM Express
() concepts that SHOULD be used when using the
pNFS SCSI layout with NVMe devices.
The NVMe SCSI Translation Reference is used to define the
NVMe command and concepts that SHOULD be used to implement the
pNFS SCSI layout. Implementations MAY or MAY not use an actual
SCSI to NVMe translation layer.
The SCSI layout uses the Device Identification VPD page (page code
0x83) from to identify the devices used by
a layout. Section 6.1.4 of explains
how an an implementation SHOULD construct a valid Device
Identification VPD page based on the NVMe Identify data.
Only NVMe devices that support either the EUI64 or NGUID value in the
Identify Namespace data SHOULD be used as storage devices for the
pNFS SCSI layout, as the methods based on the Serial Number for
legacy devices might not be suitable for unique addressing needs.
The SCSI layout uses Persistent Reservations to provide client
fencing. For this both the MDS and the Clients have to register
a key with the storage device, and the MDS has to create a
reservation on the storage device.
Section 6.7 of contains a full
mapping of the required PERSISTENT RESERVE IN and
PERSISTENT RESERVE OUT SCSI command to NVMe commands which
SHOULD be used when using NVMe devices as storage devices
for the pNFS SCSI layout.
The equivalent of the WCE bit in the Caching Mode Page in
is the Write Cache Enable field in the
NVMe Get Features command, see Section 6.3.3.2 of
. If a write cache is enable
on a NVMe device used as a storage device for the pNFS SCSI layout,
the MDS must ensure to use the NVMe FLUSH command to flush
the volatile write cache.
Since no protocol changes are proposed here, no security
considerations apply.
The document does not require any actions by IANA.
Key words for use in RFCs to Indicate Requirement Levels
Harvard University
1350 Mass. Ave.
Cambridge
MA 02138
- +1 617 495 3864
sob@harvard.edu
Parallel NFS (pNFS) SCSI Layout
SCSI Primary Commands-4
INCITS Technical Committee T10
SCSI Block Commands-3
INCITS Technical Committee T10
NVM Express Revision 1.2.1
NVM Express, Inc.
NVM Express: SCSI Translation Reference Revision 1.5
NVM Express, Inc.