This post was tedious in the formatting and was one of the reasons I put off posting it. The notes were taken months ago put I was weary of posting it because of the time involved in formatting it. As a result this post could have been better, I just dreaded working on it. Either I have to find a new editor or this will be the last post of this variety with unordered lists and line items to make bulleted points.
A few months ago we replaced our cores with a pair of 6509Es. The night of go live we had trouble because of some decisions we made at the last minute, and these notes saved us. I hope you find them as useful as we did. These are my notes from the design guide.
Because this post turned out so long, I put what most people will want to see at the top, the configuration. My notes from the configuration guide follow.
Configuration
1. Define the domain ID.
[code]
VSS(config)# switch virtual domain 100
[/code]
2. Set the switch number:
[code]
VSS(config-vs-domain)# switch 1
[/code]
2a. Have the switches use virtual MAC addresses:
[code]VSS(config-vs-domain)# mac-address use-virtual[/code]
2b. Check to make sure OOB is active and set to 480 seconds.
[code]VSS# sh mac-address-table synchonize statistics !sh stats for OOB[/code]
3. Configure VSL port-channel
[code]VSS (config-vs-domain)# exit[/code]
3a.
Standalone SW1:
[code]
no hw-module 1 oversubscription
no hw-module 2 oversubscription
int po1
switch virtual link 1
int rnage t1/1, t2/1
channel-gr 1 mode on
[/code]
Standalone SW2:
[code]
no hw-module 1 oversubscription
no hw-module 2 oversubscription
int po2
switch virtual link 2
int rnage t1/1, t2/1
channel-gr 2 mode on
[/code]
4. Convert to VSS mode:
[code]
VSS# switch convert mode virtual
(Switch will ask to reload)
(Reload)
[/code]
5. Only the first time conversion is this needed, this merges only VSL-related configurations, they say you MUST execute this command:
[code]
VSS# switch accept mode virtual
[/code]
6. Configure fast-hello for dual-active detection. (p.4-29)
[code]
! Enable fast-hello under VSS global config.
VSS(config)# switch virtual domain 100
VSS(config-vs-domain)# dual-active detection fast-hello
!Enable fast-hello at the interface level
VSS(config)# int gi1/5/3
VSS(config-if)# dual-active fast-hello
VSS(config)# int gi2/5/3
VSS(config-if)# dual-active fast-hello
! Confirm fast-hello
VSS# sh switch virtual dual-active fast-hello
VSS# remote command standby-rp show switch virtual dual-active fast-hello
[/code]
Commands
These are some commands that I kept for handy reference.
sh vslp lmp neighbor
[code]
VSS#sh vsl lmp nei
Instance #2:
LMP neighbors
Peer Group info: # Groups: 1 (* => Preferred PG)
PG # MAC Switch Ctrl Interface Interfaces
—————————————————————
*1 9999.aaaa.0000 1 Te2/5/4 Te2/5/4, Te2/5/5
[/code]
sh switch virtual role
[code]
VSS#sh switch virtual role
Switch Switch Status Priority Role Session ID
Number Oper(Conf) Local Remote
——————————————————————
LOCAL 2 UP 100(100) ACTIVE 0 0
REMOTE 1 UP 100(100) STANDBY 9111 9273
[/code]
sh int vsl
[code]
VSS#sh int vsl
VSL Port-channel: Po1
Port: Te1/5/4
Port: Te1/5/5
VSL Port-channel: Po2
Port: Te2/5/4
Port: Te2/5/5
[/code]
sh switch virtual
[code]
VSS#sh switch virtual
Switch mode : Virtual Switch
Virtual switch domain number : 100
Local switch number : 2
Local switch operational role: Virtual Switch Active
Peer switch number : 1
Peer switch operational role : Virtual Switch Standby
sh switch virtual redundancy
VSS#sh switch virtual redundancy
My Switch Id = 2
Peer Switch Id = 1
Last switchover reason = none
Configured Redundancy Mode = sso
Operating Redundancy Mode = sso
Switch 2 Slot 5 Processor Information :
———————————————–
Current Software state = ACTIVE
Uptime in current state = 14 weeks, 4 days, 14 hours, 34 minutes
Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXJ1, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2011 by Cisco Systems, Inc.
Compiled Wed 22-Jun-11 18:03 by prod_rel_team
BOOT = sup-bootdisk:s72033-ipservicesk9_wan-mz.122-33.SXJ1.bin,12;
Configuration register = 0x2102
Fabric State = ACTIVE
Control Plane State = ACTIVE
Switch 1 Slot 5 Processor Information :
———————————————–
Current Software state = STANDBY HOT (switchover target)
Uptime in current state = 14 weeks, 4 days, 14 hours, 31 minutes
Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXJ1, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2011 by Cisco Systems, Inc.
Compiled Wed 22-Jun-11 18:03 by prod_rel_team
BOOT = sup-bootdisk:s72033-ipservicesk9_wan-mz.122-33.SXJ1.bin,12;
Configuration register = 0x2102
Fabric State = ACTIVE
Control Plane State = STANDBY
[/code]
sh vsl rrp summ
[code]
VSS#sh vsl rrp summ
RRP Summary:
————————————————————————
RRP information for Instance 2
——————————————————————–
Valid Flags Peer Preferred Reserved
Count Peer Peer
——————————————————————–
TRUE V 1 1 1
Peer Valid Switch Status Priority Role Local Remote
Switch Group Number Oper(Conf) SID SID
———————————————————————
Local 0 TRUE 2 UP 100(100) ACTIVE 0 0
Remote 1 TRUE 1 UP 100(100) STANDBY 9111 9273
Peer 0 represents the local switch
Flags : V – Valid
[/code]
sh mls cef
[code]
sh mls cef
Codes: decap – Decapsulation, + – Push Label
Index Prefix Adjacency
64 0.0.0.0/32 receive
! Removed for brevity
[/code]
sh mac-address-table synchronize statistics
[code]
VSS#sh mac-address-table synchronize statistics
MAC Entry Out-of-band Synchronization Feature Statistics:
———————————————————
Switch [1] Module [1]
———————–
Module Status:
Statistics collected from Switch/Module : 1/1
Number of L2 asics in this module : 1
! Removed for brevity.
[/code]
sh switch virtual redundancy mismatch
[code]
VSS#sh switch virtual redundancy mismatch
No Config Mismatch between Active and Standby switches
[/code]
redundancy reload peer
[code]
! Reload a switch from RPR mode to hot-standby
VSS#redundancy reload peer
! Did not get output.
[/code]
Configuration Guide Notes Below
Virtual Switch Member Boot-up Behavior
- Diagnostics
- VSL Link Initialization
- LMP Establishment
- Role negotiation through RRP
Link Management Protocol (LMP)
- Establishes and verifies bidirectional communication during startup and normal operation
- Exchange switch ID
- Sends hello packets to monitor health of VSL and peer
Role Resolution Protocol (RRP)
- Determines the operational status of each switch member.
Virtual Switch Link (VSL)
Each member link must be configured configured in unconditional EtherChannel mode:
channel-group 12 mode on
Stateful Switch Over (SSO)
- Enables supervisor redundnacy in a standalone 6000, keeping the backup
supervisor up to date. - State 13-Active If in active state the supervisor is responsible for forwarding
and managing the control plane. Manage control plane functions,
synchornizes the configuration and the protocols. - State 8-Standby Supervisor is synchronized with with the active. This is the
final state hot-standby supervisor. - SSO is the core of VSS, however VSS is a dual forwarding solution while the
control plane is managed by one supervisor.
Virtual Switch Prioroity
- The first to boot will become active.
- If simultaneous boot, lowest switch ID becomes active.
- Highest priority wins, except highest priority switch will not become active unless preemption is enabled.
- Default priority is 100.
- Switch preemption should not be taken lightly.
- It forces multiple reboots of the VSS member.
- Cisco recommends _not_ configuring preemption.
Multi-chassis Etherchannel (MEC)
- Preferred connectivity method using VSS.
- Extends etherchannel to from multiple ports on one switch to multiple ports on two chassis.
- Access-layer switches are configured with traditional etherchannel.
- VSS with MEC is loop-free.
MEC Configuration
- Do not explicitly create layer-2 MEC from the CLI, allow IOS to generate the interface.
- Create a layer-3 MEC explicitly and associate the port-channel group under each member interface.
- This syslog configuration command is recommended in VSS with MEC interfaces.
- These hidden commands are now available in 12.2(33)SXH1
[code]
int po20
logging event link-status
logging event spanning-tree status
[/code]
[code]
remote command switch test EtherChannel load-balance interface po 1 ip 1.1.1.1 2.2.2.2
show EtherChannel load-balance hash-result interface port-channel 2 205 ip 10.120.7.65 vlan 5 10.121.100.49
[/code]
MAC Addresses
- MAC address allocation is derived from the back plane EEPROM on each chassis, therefore a VSS instance has two pools. The VSS MAC address pool is determined by RRP. MAC address allocation does not change during a switch over event, however, MAC addresses will change in the event both switches reboot without the mac-address use-virtaul command. This avoids gratuitous ARPs.
- When upgrading the change of MAC address for the default gateway can cause problems for hosts not capable of updating the default gateway ARP entry. It is typically cached for four hours.
- MAC Out-of-Band Sync (OOB)
- MAC addresses normally age out age out in a single chassis environment.
- Depending upon the etherchannel hash MAC addresses have the chance to age out because they are not updated.
- MAC OOB is designed to synchronize MAC addresses in all line cards of the VSS over the VSL.
- In VSS trunk mode of a port-channel interfaces being desirable or undesirable does not act the same as in standalone mode. When a link member is brought on line it is not a separate negotiation, it is an addition to MEC. p.3-25
PAgP
- The active switch is responsible for origination and termination of PAgP control plane traffic.
- The same device ID is sent by both VSS switches so the end device assumes a single logical device.
- Cisco recommends PAgP neighbors to be in desirable-desirable mode with the silent sub option.
LACP p. 2-37
- In VSS it works for both layer-2 and layer-3 interfaces.
- The recommended mode for LACP neighbors is Active-Active
- During the EtherChannel bundling process LACP performs a configuration consistency check on each link trying to become a port-channel member.
- If a port does not pass it is placed in a “lettered” system bundle.
- The first etherchannel bundle contains the ports that passed the configuration check.
- The second “lettered” bundle includes the ports that did not pass the configuration check.
- Avoid using the min-links LACP command
- Avoid LACP fast-hello in VSS
- During failover and recovery the VSS might not be able to recover before the remote end declares VSS down. False positive.
- Fast-hello as sent per link which can overrun a switch CPU in large deployments.
[code]
6500-VSS# show etherchannel 20 summary | inc Gi
Po20(SU) LACP Gi2/1(P)
Po20B(SU) LACP Gi2/2(P) ! Bundled in separate system-generated
! port-channel interface
[/code]
Implementation Notes
Recommended to have one port from the supervisor and one from a line card, however, the have different queue structures and the etherchannel bundle would fail. To fix this turn on:
[code]
no mls channel-consistency
[/code]
The Sup720-10G uplink port can be configured in one of two modes:
- Default, Non-10g-only mode:
- All supervisor ports have the same CoS queuing mode if any 10G port is used for VSL. VSL only allows CoS-based queuing.
- Non-blocking, 10g-only mode:
- All 1G ports are disabled, the entire module operates in non-blocking mode. 12.2(33)SXI allows non-VSL 10G ports to be DSCP based.
Resilient VSL Design Options (p2-18 thru 2-20)
- Use the two 10G ports on the Sup720-10G supervisor.
- Most common, does not provide optimal hardware diversity.
- Use on 10G port on the Sup720-10G and another from a VSL capable line card.
- Best for balancing cost and redundancy.
- Use 10G ports on two separate VSL capable line cards.
- Best option for flexibility but not as cost effective.
EtherChannel
Etherchannel is the fundamental building block of VSS. Traditionally load
sharing and failure are governed by STP, FHRP and topology (looped and
non-looped). In VSS Etherchannel replaces all three.
- The etherchannel hash algorithm becomes more important to get right in VSS.
- Layer-4 hashing is more random than layer-3 hashing.
- Layer-2 hashing is not as efficient when all hosts are sending to a default
gateway.
There are a variety of etherchannel options in VSS.
[code]
VSS(config-if)# port-channel port hash-distribution X
[/code]
By default the load-sharing hash method on all non-VSL etherchannel is fixed.
VLAN ID
Traffic optimized when:
- With VSS it is possible to have more VLANs per closet.
- Traffic might not be fairly hashed due to similarities such as default gateway or multicast traffic.
[code]
VSS# sh platform hardware pfc mode
VSS# sh etherchannel load-balance
[/code]
- Layer 3 and 4 Hash Tuning
- dst-mixed-ip-port
- src-dst-mixed-ip-port
- rc-mixed-ip-port
- For lower end switches:
- Cisco Catalyst 4500
- src-dst-ip
- Cisco Catalyst 36xx, 37xx Stack, 29xx
- src-dst-ip
Failures
Convergence
VSS Member Failures
Core to VSS Failure
Access Layer to VSS Failure
STP Loops and VSS
- These issues can introduce a loop that STP might not block
- Faulty hardware causes a missed BPDU
- Faulty software cause high CPU load, preventing BPDU processing.
- Configuration mistake
- Non-standard switch implementation
- VSS over comes these issues
- Creates a loop free topology using MEC.
- No FHRP needed, replaced by one logical node.
Unidirection Link Detection (UDLD)
- Aggressive UDLD should _not_ be used as link-integrity check. VSS is by definition a loop-free topology.
- STP protocols (RPVST+ and MST) converge faster than UDLD detects.
Spanning Tree Configuration with VSS
- The root of the STP should always be VSS.
- Loop guard is not needed.
- The active switch is responsible for generating the BPDU.
- Two ways to connect VSS to the core:
- Equal Cost Multipath (ECMP)
- Layer-3 MEC
- The higher the number of routes the longer ECMP takes to recover.
- Because MEC failer detection is hardware based, it does not matter
the number of routes, the hardware will detect failure and adjust
traffic to the healthy link. - Advantage MEC.
- A single link failure in ECMP will result in path reprogramming.
- When the active supervisore fails, the standby supervisor must reinitialize the routing protocol.
- This can be mitigated with Non-Stop Forwarding (NSF) and the neighboring router must be NSF aware.
- NSF must be enable on both the VSS and adjacent nodes.
- NSF configuration http://www.cisco.com/en/US/docs/ios/ha/configuration/guide/ha-nonstp_fwdg.html#wp1056927
- PAgP
- Fast-Hello
- BFD
- Requires a dedicated physical port between the VSS nodes.
- The dedicated link is not capable of carrying control-plan or user-data traffic.
- During dual-active the that is configured to carry fast-hello is operational and continues
to exchange hellos. If the old-active continues to see hellos during
what it believes to be a failover state, then it knows dual-active has occurred. - The Sup720-10G 1Gb uplink ports can be used if the supervisor is not
configured in 10Gb on mode. - BFD session establishment is the indication of dual-active condition.
- Normally VSS would not be able to establish BFD with itself because it is one logical node.
- BFD takes 20-25 seconds for detection.
- Requires IP connectivity.
- Needs IP processes and static route.
- Once the VSL connectivity is established RRP handles the negotiation.
Routing with VSS
Layer-3 MEC is the recommended design rather than ECMP.
Routing Protocols, Topology and Interaction
Link Failure Convergence
Path availability during link failure
Routing Protocol Interaction During Active Failure
[code]
VSS(config)# router ospf 100
VSS(config-router)# nsf cisco
VSS# sh ip osfp nsf
[/code]
Dual Active Detection (p. 4-29)
Fast-Hello
Configure fast-hello for dual-active detection. (p.4-29)
[code]
! Enable fast-hello under VSS global config.
VSS(config)# switch virtual domain 100
VSS(config-vs-domain)# dual-active detection fast-hello
!Enable fast-hello at the interface level
VSS(config)# int gi1/5/1
VSS(config-if)# no shut
VSS(config-if)# dual-active fast-hello
VSS(config)# int gi2/5/1
VSS(config-if)# no shut
VSS(config-if)# dual-active fast-hello
! Confirm fast-hello
VSS# sh switch virtual dual-active fast-hello
VSS# remote command standby-rp show switch virtual dual-active fast-hello
[/code]
Using Bidirectional Forwarding Detection
Configure BFD for dual-active Detection
[code]
VSS(config)# switch virtual domain 10
VSS(config)# dual-active pair interface gi1/5/1 int gi2/5/1 bfd
!
! Enable unique IP subnet and BFD interval on interfaces.
VSS(config)# int gi1/5/1
VSS(config-if)# ip add 192.168.1.1 255.255.255.0
VSS(config-if)# bfd interval 50 min_rx 50 multiplier 3
!
VSS(config)# int gi2/5/1
VSS(config-if)# ip add 192.168.2.1 255.255.255.0
VSS(config-if)# bfd interval 50 min_rx 50 multiplier 3
!
! The static route is automatically added.
! Confirm and monitor BFD.
VSS# sh switch virtual dual-active bfd
VSS# sh switch virtual dual-active summary
[/code]
Dual-Active Recovery
OSPF Tuning
[code]
VSS(config)# router ospf 100
VSS(config-router)# nsf
VSS(config-router)# auto-cost reference bandwidth 20000
! Confirm OSPF
VSS# sh ip ospf neighbor detail
VSS# sh ip protocol
[/code]