This post was tedious in the formatting and was one of the reasons I put off posting it. The notes were taken months ago put I was weary of posting it because of the time involved in formatting it. As a result this post could have been better, I just dreaded working on it. Either I have to find a new editor or this will be the last post of this variety with unordered lists and line items to make bulleted points.
A few months ago we replaced our cores with a pair of 6509Es. The night of go live we had trouble because of some decisions we made at the last minute, and these notes saved us. I hope you find them as useful as we did. These are my notes from the design guide.
Because this post turned out so long, I put what most people will want to see at the top, the configuration. My notes from the configuration guide follow.
Configuration
1. Define the domain ID.
VSS(config)# switch virtual domain 100
2. Set the switch number:
VSS(config-vs-domain)# switch 1
2a. Have the switches use virtual MAC addresses:
VSS(config-vs-domain)# mac-address use-virtual
2b. Check to make sure OOB is active and set to 480 seconds.
VSS# sh mac-address-table synchonize statistics !sh stats for OOB
3. Configure VSL port-channel
VSS (config-vs-domain)# exit
3a.
Standalone SW1:
no hw-module 1 oversubscription no hw-module 2 oversubscription int po1 switch virtual link 1 int rnage t1/1, t2/1 channel-gr 1 mode on
Standalone SW2:
no hw-module 1 oversubscription no hw-module 2 oversubscription int po2 switch virtual link 2 int rnage t1/1, t2/1 channel-gr 2 mode on
4. Convert to VSS mode:
VSS# switch convert mode virtual (Switch will ask to reload) (Reload)
5. Only the first time conversion is this needed, this merges only VSL-related configurations, they say you MUST execute this command:
VSS# switch accept mode virtual
6. Configure fast-hello for dual-active detection. (p.4-29)
! Enable fast-hello under VSS global config. VSS(config)# switch virtual domain 100 VSS(config-vs-domain)# dual-active detection fast-hello !Enable fast-hello at the interface level VSS(config)# int gi1/5/3 VSS(config-if)# dual-active fast-hello VSS(config)# int gi2/5/3 VSS(config-if)# dual-active fast-hello ! Confirm fast-hello VSS# sh switch virtual dual-active fast-hello VSS# remote command standby-rp show switch virtual dual-active fast-hello
Commands
These are some commands that I kept for handy reference.
sh vslp lmp neighbor
VSS#sh vsl lmp nei Instance #2: LMP neighbors Peer Group info: # Groups: 1 (* => Preferred PG) PG # MAC Switch Ctrl Interface Interfaces --------------------------------------------------------------- *1 9999.aaaa.0000 1 Te2/5/4 Te2/5/4, Te2/5/5
sh switch virtual role
VSS#sh switch virtual role Switch Switch Status Priority Role Session ID Number Oper(Conf) Local Remote ------------------------------------------------------------------ LOCAL 2 UP 100(100) ACTIVE 0 0 REMOTE 1 UP 100(100) STANDBY 9111 9273
sh int vsl
VSS#sh int vsl VSL Port-channel: Po1 Port: Te1/5/4 Port: Te1/5/5 VSL Port-channel: Po2 Port: Te2/5/4 Port: Te2/5/5
sh switch virtual
VSS#sh switch virtual Switch mode : Virtual Switch Virtual switch domain number : 100 Local switch number : 2 Local switch operational role: Virtual Switch Active Peer switch number : 1 Peer switch operational role : Virtual Switch Standby sh switch virtual redundancy VSS#sh switch virtual redundancy My Switch Id = 2 Peer Switch Id = 1 Last switchover reason = none Configured Redundancy Mode = sso Operating Redundancy Mode = sso Switch 2 Slot 5 Processor Information : ----------------------------------------------- Current Software state = ACTIVE Uptime in current state = 14 weeks, 4 days, 14 hours, 34 minutes Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXJ1, RELEASE SOFTWARE (fc2) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2011 by Cisco Systems, Inc. Compiled Wed 22-Jun-11 18:03 by prod_rel_team BOOT = sup-bootdisk:s72033-ipservicesk9_wan-mz.122-33.SXJ1.bin,12; Configuration register = 0x2102 Fabric State = ACTIVE Control Plane State = ACTIVE Switch 1 Slot 5 Processor Information : ----------------------------------------------- Current Software state = STANDBY HOT (switchover target) Uptime in current state = 14 weeks, 4 days, 14 hours, 31 minutes Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXJ1, RELEASE SOFTWARE (fc2) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2011 by Cisco Systems, Inc. Compiled Wed 22-Jun-11 18:03 by prod_rel_team BOOT = sup-bootdisk:s72033-ipservicesk9_wan-mz.122-33.SXJ1.bin,12; Configuration register = 0x2102 Fabric State = ACTIVE Control Plane State = STANDBY
sh vsl rrp summ
VSS#sh vsl rrp summ RRP Summary: ------------------------------------------------------------------------ RRP information for Instance 2 -------------------------------------------------------------------- Valid Flags Peer Preferred Reserved Count Peer Peer -------------------------------------------------------------------- TRUE V 1 1 1 Peer Valid Switch Status Priority Role Local Remote Switch Group Number Oper(Conf) SID SID --------------------------------------------------------------------- Local 0 TRUE 2 UP 100(100) ACTIVE 0 0 Remote 1 TRUE 1 UP 100(100) STANDBY 9111 9273 Peer 0 represents the local switch Flags : V - Valid
sh mls cef
sh mls cef Codes: decap - Decapsulation, + - Push Label Index Prefix Adjacency 64 0.0.0.0/32 receive ! Removed for brevity
sh mac-address-table synchronize statistics
VSS#sh mac-address-table synchronize statistics MAC Entry Out-of-band Synchronization Feature Statistics: --------------------------------------------------------- Switch [1] Module [1] ----------------------- Module Status: Statistics collected from Switch/Module : 1/1 Number of L2 asics in this module : 1 ! Removed for brevity.
sh switch virtual redundancy mismatch
VSS#sh switch virtual redundancy mismatch No Config Mismatch between Active and Standby switches
redundancy reload peer
! Reload a switch from RPR mode to hot-standby VSS#redundancy reload peer ! Did not get output.
Configuration Guide Notes Below
Virtual Switch Member Boot-up Behavior
- Diagnostics
- VSL Link Initialization
- LMP Establishment
- Role negotiation through RRP
Link Management Protocol (LMP)
- Establishes and verifies bidirectional communication during startup and normal operation
- Exchange switch ID
- Sends hello packets to monitor health of VSL and peer
Role Resolution Protocol (RRP)
- Determines the operational status of each switch member.
Virtual Switch Link (VSL)
Each member link must be configured configured in unconditional EtherChannel mode:
channel-group 12 mode on
Stateful Switch Over (SSO)
- Enables supervisor redundnacy in a standalone 6000, keeping the backup
supervisor up to date. - State 13-Active If in active state the supervisor is responsible for forwarding
and managing the control plane. Manage control plane functions,
synchornizes the configuration and the protocols. - State 8-Standby Supervisor is synchronized with with the active. This is the
final state hot-standby supervisor. - SSO is the core of VSS, however VSS is a dual forwarding solution while the
control plane is managed by one supervisor.
Virtual Switch Prioroity
- The first to boot will become active.
- If simultaneous boot, lowest switch ID becomes active.
- Highest priority wins, except highest priority switch will not become active unless preemption is enabled.
- Default priority is 100.
- Switch preemption should not be taken lightly.
- It forces multiple reboots of the VSS member.
- Cisco recommends _not_ configuring preemption.
Multi-chassis Etherchannel (MEC)
- Preferred connectivity method using VSS.
- Extends etherchannel to from multiple ports on one switch to multiple ports on two chassis.
- Access-layer switches are configured with traditional etherchannel.
- VSS with MEC is loop-free.
MEC Configuration
- Do not explicitly create layer-2 MEC from the CLI, allow IOS to generate the interface.
- Create a layer-3 MEC explicitly and associate the port-channel group under each member interface.
- This syslog configuration command is recommended in VSS with MEC interfaces.
int po20 logging event link-status logging event spanning-tree status
remote command switch test EtherChannel load-balance interface po 1 ip 1.1.1.1 2.2.2.2 show EtherChannel load-balance hash-result interface port-channel 2 205 ip 10.120.7.65 vlan 5 10.121.100.49
MAC Addresses
- MAC address allocation is derived from the back plane EEPROM on each chassis, therefore a VSS instance has two pools. The VSS MAC address pool is determined by RRP. MAC address allocation does not change during a switch over event, however, MAC addresses will change in the event both switches reboot without the mac-address use-virtaul command. This avoids gratuitous ARPs.
- When upgrading the change of MAC address for the default gateway can cause problems for hosts not capable of updating the default gateway ARP entry. It is typically cached for four hours.
- MAC Out-of-Band Sync (OOB)
- MAC addresses normally age out age out in a single chassis environment.
- Depending upon the etherchannel hash MAC addresses have the chance to age out because they are not updated.
- MAC OOB is designed to synchronize MAC addresses in all line cards of the VSS over the VSL.
- In VSS trunk mode of a port-channel interfaces being desirable or undesirable does not act the same as in standalone mode. When a link member is brought on line it is not a separate negotiation, it is an addition to MEC. p.3-25
PAgP
- The active switch is responsible for origination and termination of PAgP control plane traffic.
- The same device ID is sent by both VSS switches so the end device assumes a single logical device.
- Cisco recommends PAgP neighbors to be in desirable-desirable mode with the silent sub option.
LACP p. 2-37
- In VSS it works for both layer-2 and layer-3 interfaces.
- The recommended mode for LACP neighbors is Active-Active
- During the EtherChannel bundling process LACP performs a configuration consistency check on each link trying to become a port-channel member.
- If a port does not pass it is placed in a “lettered” system bundle.
- The first etherchannel bundle contains the ports that passed the configuration check.
- The second “lettered” bundle includes the ports that did not pass the configuration check.
- Avoid using the min-links LACP command
- Avoid LACP fast-hello in VSS
- During failover and recovery the VSS might not be able to recover before the remote end declares VSS down. False positive.
- Fast-hello as sent per link which can overrun a switch CPU in large deployments.
6500-VSS# show etherchannel 20 summary | inc Gi Po20(SU) LACP Gi2/1(P) Po20B(SU) LACP Gi2/2(P) ! Bundled in separate system-generated ! port-channel interface
Implementation Notes
Recommended to have one port from the supervisor and one from a line card, however, the have different queue structures and the etherchannel bundle would fail. To fix this turn on:
no mls channel-consistency
The Sup720-10G uplink port can be configured in one of two modes:
- Default, Non-10g-only mode:
- All supervisor ports have the same CoS queuing mode if any 10G port is used for VSL. VSL only allows CoS-based queuing.
- Non-blocking, 10g-only mode:
- All 1G ports are disabled, the entire module operates in non-blocking mode. 12.2(33)SXI allows non-VSL 10G ports to be DSCP based.
Resilient VSL Design Options (p2-18 thru 2-20)
- Use the two 10G ports on the Sup720-10G supervisor.
- Most common, does not provide optimal hardware diversity.
- Use on 10G port on the Sup720-10G and another from a VSL capable line card.
- Best for balancing cost and redundancy.
- Use 10G ports on two separate VSL capable line cards.
- Best option for flexibility but not as cost effective.
EtherChannel
Etherchannel is the fundamental building block of VSS. Traditionally load
sharing and failure are governed by STP, FHRP and topology (looped and
non-looped). In VSS Etherchannel replaces all three.
- The etherchannel hash algorithm becomes more important to get right in VSS.
- Layer-4 hashing is more random than layer-3 hashing.
- Layer-2 hashing is not as efficient when all hosts are sending to a default
gateway.
There are a variety of etherchannel options in VSS.
VSS(config-if)# port-channel port hash-distribution X
By default the load-sharing hash method on all non-VSL etherchannel is fixed.
VLAN ID
Traffic optimized when:
- With VSS it is possible to have more VLANs per closet.
- Traffic might not be fairly hashed due to similarities such as default gateway or multicast traffic.
VSS# sh platform hardware pfc mode VSS# sh etherchannel load-balance
- Layer 3 and 4 Hash Tuning
- dst-mixed-ip-port
- src-dst-mixed-ip-port
- rc-mixed-ip-port
- For lower end switches:
- Cisco Catalyst 4500
- src-dst-ip
- Cisco Catalyst 36xx, 37xx Stack, 29xx
- src-dst-ip
Failures
Convergence
VSS Member Failures
Core to VSS Failure
Access Layer to VSS Failure
STP Loops and VSS
- These issues can introduce a loop that STP might not block
- Faulty hardware causes a missed BPDU
- Faulty software cause high CPU load, preventing BPDU processing.
- Configuration mistake
- Non-standard switch implementation
- VSS over comes these issues
- Creates a loop free topology using MEC.
- No FHRP needed, replaced by one logical node.
Unidirection Link Detection (UDLD)
- Aggressive UDLD should _not_ be used as link-integrity check. VSS is by definition a loop-free topology.
- STP protocols (RPVST+ and MST) converge faster than UDLD detects.
Spanning Tree Configuration with VSS
- The root of the STP should always be VSS.
- Loop guard is not needed.
- The active switch is responsible for generating the BPDU.
- Two ways to connect VSS to the core:
- Equal Cost Multipath (ECMP)
- Layer-3 MEC
- The higher the number of routes the longer ECMP takes to recover.
- Because MEC failer detection is hardware based, it does not matter
the number of routes, the hardware will detect failure and adjust
traffic to the healthy link. - Advantage MEC.
- A single link failure in ECMP will result in path reprogramming.
- When the active supervisore fails, the standby supervisor must reinitialize the routing protocol.
- This can be mitigated with Non-Stop Forwarding (NSF) and the neighboring router must be NSF aware.
- NSF must be enable on both the VSS and adjacent nodes.
Routing with VSS
Layer-3 MEC is the recommended design rather than ECMP.
Routing Protocols, Topology and Interaction
Link Failure Convergence
Path availability during link failure
Routing Protocol Interaction During Active Failure
VSS(config)# router ospf 100 VSS(config-router)# nsf cisco VSS# sh ip osfp nsf
Dual Active Detection (p. 4-29)
- PAgP
- Fast-Hello
- BFD
Fast-Hello
- Requires a dedicated physical port between the VSS nodes.
- The dedicated link is not capable of carrying control-plan or user-data traffic.
- During dual-active the that is configured to carry fast-hello is operational and continues
to exchange hellos. If the old-active continues to see hellos during
what it believes to be a failover state, then it knows dual-active has occurred. - The Sup720-10G 1Gb uplink ports can be used if the supervisor is not
configured in 10Gb on mode.
Configure fast-hello for dual-active detection. (p.4-29)
! Enable fast-hello under VSS global config. VSS(config)# switch virtual domain 100 VSS(config-vs-domain)# dual-active detection fast-hello !Enable fast-hello at the interface level VSS(config)# int gi1/5/1 VSS(config-if)# no shut VSS(config-if)# dual-active fast-hello VSS(config)# int gi2/5/1 VSS(config-if)# no shut VSS(config-if)# dual-active fast-hello ! Confirm fast-hello VSS# sh switch virtual dual-active fast-hello VSS# remote command standby-rp show switch virtual dual-active fast-hello
Using Bidirectional Forwarding Detection
- BFD session establishment is the indication of dual-active condition.
- Normally VSS would not be able to establish BFD with itself because it is one logical node.
- BFD takes 20-25 seconds for detection.
- Requires IP connectivity.
- Needs IP processes and static route.
Configure BFD for dual-active Detection
VSS(config)# switch virtual domain 10 VSS(config)# dual-active pair interface gi1/5/1 int gi2/5/1 bfd ! ! Enable unique IP subnet and BFD interval on interfaces. VSS(config)# int gi1/5/1 VSS(config-if)# ip add 192.168.1.1 255.255.255.0 VSS(config-if)# bfd interval 50 min_rx 50 multiplier 3 ! VSS(config)# int gi2/5/1 VSS(config-if)# ip add 192.168.2.1 255.255.255.0 VSS(config-if)# bfd interval 50 min_rx 50 multiplier 3 ! ! The static route is automatically added. ! Confirm and monitor BFD. VSS# sh switch virtual dual-active bfd VSS# sh switch virtual dual-active summary
Dual-Active Recovery
- Once the VSL connectivity is established RRP handles the negotiation.
OSPF Tuning
VSS(config)# router ospf 100 VSS(config-router)# nsf VSS(config-router)# auto-cost reference bandwidth 20000 ! Confirm OSPF VSS# sh ip ospf neighbor detail VSS# sh ip protocol