Network Working Group B. Sarikaya Internet-Draft L. Dunbar Intended status: Best Current Practice Huawei USA Expires: September 22, 2016 B. Khasnabish ZTE (TX) Inc. F. Xia Huawei March 21, 2016 Virtual Machine Mobility Protocol for Overlay Networks draft-sarikaya-nvo3-vmm-dmm-pmip-10.txt Abstract This document describes a virtual machine mobility protocol commonly used in data centers built with overlay-based network virtualization approach. It is based on using a Network Virtualization Authority (NVA)-Network Virtualization Edge (NVE) protocol to update Address Resolution Protocol (ARP) table or neighbor cache entries at the NVA and the source NVEs tunneling in-flight packets to the destination NVE after the virtual machine moves from source NVE to the destination NVE. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 22, 2016. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Sarikaya, et al. Expires September 22, 2016 [Page 1] Internet-Draft VM Mobility Solution March 2016 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Overview of the protocol . . . . . . . . . . . . . . . . . . 4 5. Handling Packets in Flight . . . . . . . . . . . . . . . . . 5 6. Moving Local State of VM . . . . . . . . . . . . . . . . . . 6 7. Handling of Hot, Warm and Cold Virtual Machine Mobility . . . 7 8. Virtual Machine Operation . . . . . . . . . . . . . . . . . . 7 8.1. Virtual Machine Lifecycle Management . . . . . . . . . . 7 9. Security Considerations . . . . . . . . . . . . . . . . . . . 8 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 12.1. Normative References . . . . . . . . . . . . . . . . . . 8 12.2. Informative references . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction Data center networks are being increasingly used by telecom operators as well as by enterprises. In this document we are interested in overlay-based data center networks supporting multitenancy. These networks are organized as one large Layer 2 network geographically distributed in several buildings. In some cases geographical distribution can span across Layer 2 boundaries. In that case need arises for connectivity between Layer 2 boundaries which can be achieved by the network virtualization edge (NVE) functioning as Layer 3 gateway routing across bridging domain such as in Warehouse Scale Computers (WSC). Virtualization which is being used in almost all of today's data centers enables many virtual machines to run on a single physical computer or compute server. Virtual machines (VM) need hypervisor running on the physical compute server to provide them shared processor/memory/storage. Network connectivity is provided by the network virtualization edge (NVE) [I-D.ietf-nvo3-arch], [I-D.ietf-nvo3-nve-nva-cp-req]. Being able to move VMs dynamically, or live migration, from one server to another allows for dynamic load Sarikaya, et al. Expires September 22, 2016 [Page 2] Internet-Draft VM Mobility Solution March 2016 balancing or work distribution and thus it is a highly desirable feature [RFC7364]. There are many challenges and requirements related to migration, mobility, and interconnection of Virtual Machines (VMs)and Virtual Network Elements (VNEs). Retaining IP addresses after a move is a key requirement [RFC7364]. Such a requirement is needed in order to maintain existing transport connections. In view of many virtual machine mobility schemes that exist today, there is a desire to define a standard control plane protocol for virtual machine mobility. The protocol should be based on IPv4 or IPv6. In this document we specify such a protocol. 2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and [I-D.ietf-nvo3-arch]. This document uses the terminology defined in [RFC7364]. In addition we make the following definitions: Hot VM Mobility. A given VM could be moved from one server to another in running state. Warm VM Mobility. In case of warm VM mobility, the VM states are mirrored to the secondary server (or domain) at a predefined (configurable) regular intervals. This reduces the overheads and complexity but this may also lead to a situation when both servers may not contain the exact same data (state information) Cold VM Mobility. A given VM could be moved from one server to another in stopped or suspended state. 3. Requirements This section states requirements on data center network virtual machine mobility. Data center network SHOULD support virtual machine mobility in IPv6. IPv4 SHOULD also be supported in virtual machine mobility. Virtual machine mobility protocol SHOULD not support host routes. Virtual machine mobility protocol SHOULD not support triangular Sarikaya, et al. Expires September 22, 2016 [Page 3] Internet-Draft VM Mobility Solution March 2016 routing. Virtual machine mobility protocol SHOULD not need to use tunneling except for handling packets in flight. 4. Overview of the protocol Being able to move Virtual Machines dynamically, from one server to another allows for dynamic load balancing or work distribution and thus it is a highly desirable feature. In a Layer-2 based data center approach, virtual machine moving to another server does not change its IP address. Because of this an IP based virtual machine mobility protocol is not needed. However, when a virtual machine moves, NVEs need to change their caches associating VM Layer 2 or Medium Access Control (MAC) address with NVE's IP address. Such a change enables NVE to send outgoing MAC frames addressed to the virtual machine. VM movement across Layer 3-boundaries is not typical but the same solution solution applies because the VM moves in the same link such as in WSCs. Virtual machine moves from its source NVE to a new, destination NVE. The move is initiated by the source NVE and is in the same L2 link, the virtual machine IP address(es) do not change but this virtual machine is now under a new NVE, previously communicating NVEs will continue to send their packets to the source NVE. Address Resolution Protocol (ARP) cache in IPv4 [RFC0826] or neighbor cache in IPv6 in the NVEs need to be updated. It takes a few seconds for a VM to move from its source NVE to the new destination one. During this period, a tunnel is needed so that source NVE forwards packets to the destination NVE. In IPv4, the virtual machine immediately after the move sends a gratuitous ARP request message containing its IPv4 and Layer 2 or MAC address in its new NVE, destination NVE. This message's destination address is the broadcast address. NVE receives this message. NVE should update VM's ARP entry in the central directory at the NVA. NVE asks NVA to update its mappings to record IPv4 address of VM along with MAC address of VM, and NVE IPv4 address. An NVE-to-NVA protocol is used for this purpose [I-D.ietf-nvo3-arch]. Reverse ARP (RARP) which enables the host to discover its IPv4 address when it boots from a local server [RFC0903] is not used by VMs because the VM already knows its IPv4 address. IPv4/v6 address is assigned to a newly created VM, possibly using Dynamic Host Configuration Protocol (DHCP). There are some vendor deployments (diskless systems or systems without configuration files) wherein VM users, i.e. end-user clients ask for the same MAC address upon migration. This can be achieved by the clients sending RARP request reverse message which carries the old MAC address looking for an IP Sarikaya, et al. Expires September 22, 2016 [Page 4] Internet-Draft VM Mobility Solution March 2016 address allocation. The server, in this case the new NVE needs to communicate with NVA, just like in the gratuitous ARP case to ensure that the same IPv4 address is assigned to the VM. NVA uses the MAC address as the key in the search of ARP cache to find the IP address and informs this to the new NVE which in turns sends RARP reply reverse message. This completes IP address assignment to the migrating VM. All NVEs communicating with this virtual machine uses the old ARP entry. If any VM in those NVEs need to talk to the new VM in the destination NVE, it uses the old ARP entry. Thus the packets are delivered to the source NVE. The source NVE MUST tunnel these in- flight packets to the destination NVE. When an ARP entry in those VMs times out, their corresponding NVEs should access the NVA for an update. IPv6 operation is slightly different: In IPv6, the virtual machine immediately after the move sends an unsolicited neighbor advertisement message containing its IPv6 address and Layer-2 MAC address in its new NVE, the destination NVE. This message is sent to the IPv6 Solicited Node Multicast Address corresponding to the target address which is VM's IPv6 address. NVE receives this message. NVE should update VM's neighbor cache entry in the central directory at the NVA. IPv6 address of VM, MAC address of VM and NVE IPv6 address are recorded to the entry. An NVE-to-NVA protocol is used for this purpose [I-D.ietf-nvo3-arch]. All NVEs communicating with this virtual machine uses the old neighbor cache entry. If any VM in those NVEs need to talk to the new VM in the destination NVE, it uses the old neighbor cache entry. Thus the packets are delivered to the source NVE. The source NVE MUST tunnels these in-flight packets to the destination NVE. When a neighbor cache entry in those VMs times out, their corresponding NVEs should access the NVA for an update. 5. Handling Packets in Flight Source hypervisor may receive packets from the virtual machine's ongoing communications and these packets should not be lost and they should be sent to the destination hypervisor to be delivered to the virtual machine. The steps involved in handling packets in flight are as follows: Preparation Step It takes some time, possibly a few seconds for a VM to move from its source hypervisor to a new destination one. Sarikaya, et al. Expires September 22, 2016 [Page 5] Internet-Draft VM Mobility Solution March 2016 During this period, a tunnel needs to be established so that source NVE forwards packets to the destination NVE. Tunnel Establishment - IPv6 Inflight packets are tunneled to the destination NVE using the encapsulation protocol such as VXLAN in IPv6. Source NVE gets destination NVE address from NVA in the request to move the virtual machine. Tunnel Establishment - IPv4 Inflight packets are tunneled to the destination NVE using the encapsulation protocol such as VXLAN in IPv4. Source NVE gets destination NVE address from NVA when NVA requests NVE to move the virtual machine. Tunneling Packets - IPv6 IPv6 packets are received for the migrating virtual machine encapsulated in an IPv6 header at the destination NVE. Destination NVE decapsulates the packet and sends IPv6 packet to the migrating VM. Tunneling Packets - IPv4 IPv4 packets are received for the migrating virtual machine encapsulated in an IPv4 header at the destination NVE. Destination NVE decapsulates the packet and sends IPv4 packet to the migrating VM. Stop Tunneling Packets When source NVE stops receiving packets destined to the virtual machine that has just moved to the destination NVE. 6. Moving Local State of VM After VM mobility related signaling (VM Mobility Registration Request/Reply), the virtual machine state needs to be transferred to the destination Hypervisor. The state includes its memory and file system. Source NVE opens a TCP connection with destination NVE over which VM's memory state is transferred. File system or local storage is more complicated to transfer. The transfer should ensure consistency, i.e. the VM at the destination should find the same file system it had at the source. Precopying is a commonly used technique for transferring the file system. First the whole disk image is transferred while VM continues to run. After the VM is moved any changes in the file system are packaged together and sent to the destination Hypervisor which reflects these changes to the file system locally at the destination. Sarikaya, et al. Expires September 22, 2016 [Page 6] Internet-Draft VM Mobility Solution March 2016 7. Handling of Hot, Warm and Cold Virtual Machine Mobility Cold Virtual Machine mobility is facilitated by the VM initially sending an ARP or Neighbor Discovery message at the destination NVE but the source NVE not receiving any packets inflight. Cold VM mobility also allows all previous source NVEs and all communicating NVEs to time out ARP/neighbor cache entries of the VM and then get NVA to push to NVEs or get NVEs to pull the updated ARP/neighbor cache entry from NVA. The VMs that are used for cold standby receive scheduled backup information but less frequently than that would be for warm standby option. Therefore, the cold mobility option can be used for non- critical applications and services. In cases of warm standby option, the backup VMs receive backup information at regular intervals. The duration of the interval determines the warmth of the standby option. The larger the duration, the less warm (and hence cold) the standby option becomes. In case of hot standby option, the VMs in both primary and secondary domains have identical information and can provide services simultaneously as in load-share mode of operation. If the VMs in the primary domain fails, there is no need to actively move the VMs to the secondary domain because the VMs in the secondary domain already contains identical information. The hot standby option is the most costly mechanism for providing redundancy, and hence this option is utilized only for mission-critical applications and services. 8. Virtual Machine Operation Virtual machines are not involved in any mobility signalling. Once VM moves to the destination NVE, VM IP address does not change and VM should be able to continue to receive packets to its address(es). This happens in hot VM mobility scenarios. Virtual machine sends a gratuitous Address Resolution Protocol or unsolicited Neighbor Advertisement message upstream after each move. 8.1. Virtual Machine Lifecycle Management Managing the lifecycle of VM includes creating a VM with all of the required resources, and managing them seamlessly as the VM migrates from one service to another during its lifetime. The on-boarding process includes the following steps: 1. Sending an allowed (authorized/authenticated) request to Network Virtualization Authority (NVA) in an acceptable format with Sarikaya, et al. Expires September 22, 2016 [Page 7] Internet-Draft VM Mobility Solution March 2016 mandatory/optional virtualized resources {cpu, memory, storage, process/thread support, etc.} and interface information 2. Receiving an acknowledgement from the NVA regarding availability and usability of virtualized resources and interface package 3. Sending a confirmation message to the NVA with request for approval to adapt/adjust/modify the virtualized resources and interface package for utilization in a service. 9. Security Considerations Security threats for the data and control plane are discussed in [I-D.ietf-nvo3-arch]. This document does not introduce any new security threats. 10. IANA Considerations This document makes no request to IANA. 11. Acknowledgements The authors are grateful to Qiang Zu, Tom Herbert, Andrew Malis and Saumya Dikshit for helpful comments. 12. References 12.1. Normative References [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware", STD 37, RFC 826, DOI 10.17487/RFC0826, November 1982, . [RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A Reverse Address Resolution Protocol", STD 38, RFC 903, DOI 10.17487/RFC0903, June 1984, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, DOI 10.17487/RFC2629, June 1999, . Sarikaya, et al. Expires September 22, 2016 [Page 8] Internet-Draft VM Mobility Solution March 2016 [RFC5213] Gundavelli, S., Ed., Leung, K., Devarapalli, V., Chowdhury, K., and B. Patil, "Proxy Mobile IPv6", RFC 5213, DOI 10.17487/RFC5213, August 2008, . [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy Mobile IPv6", RFC 5844, DOI 10.17487/RFC5844, May 2010, . [RFC3007] Wellington, B., "Secure Domain Name System (DNS) Dynamic Update", RFC 3007, DOI 10.17487/RFC3007, November 2000, . [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, DOI 10.17487/RFC3022, January 2001, . [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, . [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, DOI 10.17487/RFC4861, September 2007, . [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution Problems in Large Data Center Networks", RFC 6820, DOI 10.17487/RFC6820, January 2013, . [I-D.ietf-nvo3-vm-mobility-issues] Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., Dunbar, L., and A. Sajassi, "Network-related VM Mobility Issues", draft-ietf-nvo3-vm-mobility-issues-03 (work in progress), June 2014. [I-D.ietf-nvo3-arch] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. Narten, "An Architecture for Overlay Networks (NVO3)", draft-ietf-nvo3-arch-04 (work in progress), October 2015. Sarikaya, et al. Expires September 22, 2016 [Page 9] Internet-Draft VM Mobility Solution March 2016 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, . [RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., Kreeger, L., and M. Napierala, "Problem Statement: Overlays for Network Virtualization", RFC 7364, DOI 10.17487/RFC7364, October 2014, . [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, "Framework for Data Center (DC) Network Virtualization", RFC 7365, DOI 10.17487/RFC7365, October 2014, . 12.2. Informative references [I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network Virtualization NVE to NVA Control Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-04 (work in progress), July 2015. [I-D.wkumari-dcops-l3-vmmobility] Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in progress), August 2011. [I-D.shima-clouds-net-portability-reqs-and-models] Shima, K., Sekiya, Y., and K. Horiba, "Network Portability Requirements and Models for Cloud Environment", draft- shima-clouds-net-portability-reqs-and-models-01 (work in progress), October 2011. [I-D.raggarwa-data-center-mobility] Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., Fang, L., and A. Sajassi, "Data Center Mobility based on E-VPN, BGP/MPLS IP VPN, IP Routing and NHRP", draft- raggarwa-data-center-mobility-07 (work in progress), June 2014. Sarikaya, et al. Expires September 22, 2016 [Page 10] Internet-Draft VM Mobility Solution March 2016 [I-D.khasnabish-vmmi-problems] Khasnabish, B., Liu, B., Lei, B., and F. Wang, "Mobility and Interconnection of Virtual Machines and Virtual Network Elements", draft-khasnabish-vmmi-problems-03 (work in progress), December 2012. Authors' Addresses Behcet Sarikaya Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: sarikaya@ieee.org Linda Dunbar Huawei USA 5340 Legacy Dr. Building 3 Plano, TX 75024 Email: linda.dunbar@huawei.com Bhumip Khasnabish ZTE (TX) Inc. 55 Madison Avenue, Suite 160 Morristown, NJ 07960 Email: vumip1@gmail.com, bhumip.khasnabish@ztetx.com Frank Xia Huawei Nanjing, China Phone: +1 972-509-5599 Email: xiayangsong@huawei.com Sarikaya, et al. Expires September 22, 2016 [Page 11]