Upon completion of this module, you will be will be able to:
Identify the components of an internetwork and explain the role of each component.
Describe the basic concepts behind the OSI model and how it facilitates information exchange on an internetwork.
Describe the basic concepts behind the TCP/IP model and how it facilitates information exchange on an internetwork.
Describe each layer of the TCP/IP model and the function it performs on an internetwork.
Describe each protocol in the TCP/IP suite of protocols and explain the services and applications that each provides.
Describe how TCP/IP provides connection-oriented and connectionless services, and list the applications best suited to each type of connection.
Explain how packets are routed on a TCP/IP network.
Describe ICMP router discovery standards, operation, and advertisement messages.
Describe the features and functionality offered by SNMP, management stations, managed devices, and MIBs.
Internetwork Example
An internetwork is a collection of individual networks connected by intermediate networking devices, which functions as a single large network.
Internetworking refers to the industry, products, and procedures that meet the challenge of creating and administering internetworks.
The diagram shows different kinds of network technologies that can be interconnected by routers and other networking devices to create an internetwork.
Local-Area Networks (LANs)
A local-area network (LAN) is a computer network that spans a small area. LANs can be confined to a single building or group of buildings. A LAN can also be connected to other LANs over any distance via telephone lines and radio waves. A system of LANs connected in this way is called a wide-area network (WAN).
LANs connect workstations and personal computers. Each individual computer (node) in a LAN has its own CPU with which it runs programs, but it is also able to access data and devices anywhere on the LAN. This means that many users can share expensive devices, such as laser printers, as well as data. Users can also use the LAN to communicate with each other, by sending e-mail. There are many different types of LANs Ethernets being the most common for PCs.
The following characteristics differentiate one LAN from another:
topology : The geometric arrangement of devices on the network. For example, devices can be arranged in a ring or in a straight line.
protocols : The rules and encoding specifications for sending data. The protocols also determine whether the network uses a peer-to-peer or client/server architecture
media : Devices can be connected by twisted-pair wire, coaxial cables, or fiber optic cables. Some networks do without connecting media altogether, communicating instead via radio waves.
Wide-Area Networks (WANs)
A wide-area network (WAN) is a computer network that spans a large geographical area. A WAN consists of two or more local-area network (LANs).
Computers connected to a wide-area network are connected through public networks, such as the telephone system. However, they can also be connected through leased lines or satellites. Today, the largest WAN in existence is the Internet.
Intermediate Internetworking Devices (bridges, routers, switches)
Bridge
A bridge is a device that connects two local-area networks (LANs), or two segments of the same LAN. The two LANs being connected can be alike or dissimilar. A bridge can connect an Ethernet LAN with a Token-Ring network LAN. Unlike routers, bridges are protocol-independent. They only forward packets without analyzing and re-routing messages.
Router
A router is a device that connects any number of LANs. Routers use packet headers and a forwarding routing table to determine packet destinations
Switch
A switch is a device that filters and forwards packets between LAN segments. Switches operate at the data link layer (layer 2) of the OSI Reference Model and therefore support any packet protocol. LANs that use switches to join segments are called switched LANs or, in the case of Ethernet networks, switched Ethernet LANs.
Routing on a TCP/IP network is done by the router device that sends information along a path (route) between two networks. This path can traverse one or more routers.
The router creates a logical path between networks as shown in the slide.
Role of IP and the IP Address
IP specifies the format of packets and the addressing scheme. IP is something like the postal system. It allows you to address a package and drop it in the system, but there's no direct link between you and the recipient.
The IP address is an identifier for a computer or device on a TCP/IP network. Networks using the TCP/IP protocol route messages based on the IP address of the destination.
The IP Packet header contains the sender’s source address and the receiver’s destination address. The source and destination addresses are 32-bit IP addresses.
The format of an IP address is a 32-bit numeric address written as four numbers separated by periods. Each number can be zero to 255. For example, 192.160.10.24 could be an IP address.
Within a single network, you can assign IP addresses at random as long as each one is unique. However, connecting a private network to the Internet requires you use registered IP addresses (called Internet addresses) to avoid duplicates.
The four numbers in an IP address are used in different ways to identify a particular network and a host on that network.
The InterNIC Registration Service assigns Internet addresses from the following three classes.
Class A - supports 16 million hosts on each of 127 networks
Class B - supports 65,000 hosts on each of 16,000 networks
Class C - supports 254 hosts on each of 2 million networks
Given the increased number of Internet users, the number of unassigned Internet addresses is running out. Therefore, a new classless scheme called Classless Inter-Domain Routing (CIDR) is replacing the system based on classes A, B, and C and is tied to adoption of IPv6.
With CIDR, a single IP address can be used to designate many unique IP addresses. A CIDR IP address looks like a normal IP address except that it ends with a slash followed by a number, called the IP prefix. For example:
172.200.0.0/16
The IP prefix specifies how many addresses are covered by the CIDR address, with lower numbers covering more addresses.
An IP prefix of /12, for example, can be used to address 4,096 former Class C addresses.
CIDR addresses reduce the size of routing tables and make more IP addresses available within organizations.
ARP and RARP
The Address Resolution Protocol (ARP) is a TCP/IP protocol used to convert an IP address into a physical address called a MAC address, such as an Ethernet address.
A host that wants to obtain a physical address broadcasts an ARP request onto the TCP/IP network. The host on the network that has the IP address in the request then replies with its physical hardware address.
There is also Reverse ARP (RARP), which can be used by a host to discover its IP address. The host broadcasts its physical address and a RARP server replies with the host's IP address.
Broadcast versus Point-to-Point
To broadcast means to send the same message to multiple recipients simultaneously, whereas point-to-point means to send a message to a single recipient only. Broadcasting
Importance of IP Addressing
The IP addressing scheme is integral to the process of routing IP data through an internetwork.
Two major scaling issues have emerged given the Internet’s continuous growth:
The depletion of IPv4 address space.
The ability to route traffic among the increasing number if networks making up the Internet.
The Classful IP addressing specification requires that each system attached to an IP-based Internet be assigned a unique, 32-bit Internet address.
Systems such as routers, which have interfaces to more than one network, must be assigned a unique IP address for each network interface.
The first part of an Internet address identifies the network on which the host resides.
The second part identifies the particular host on the given network.
This created the two-level addressing hierarchy.
Primary Address Classes
IP addressing supports the following three types of address classes; Class A, Class B, and Class C.
The first octet is the network portion of a Class A address. Therefore, the class A address o, 10.1.20.1, has a major network address of 10. Octets 2, 3, and 4 (the next 24 bits) are reserved for the hosts. Note that Class A addresses are used for networks that have more than 65,536 hosts (actually, up to 16,581,375 hosts!).
The first two octets are the network portion in a class B address. Therefore, the class B address of, 172.16.100.202, has a major network address of 172.16. Octets 3 and 4 (the next 16 bits) are reserved for the hosts. Class B addresses are used for networks that have between 256 and 65,536 hosts.
The first three octets are the network portion in a class C address. The class C address of 193.18.9.30, has a major network address of 193.18.9. Octet 4 (the last 8 bits) is for hosts. Class C addresses are used for networks with less than 254 hosts.
Dotted Decimal Notation
The 32-bit IP address is grouped 8 bits at a time. Each group of 8 bits is called an octet. Each of the four octets are separated by a period, called a dot, and represented in decimal format. This is known as dotted decimal notation.
Each bit in an octet has a binary weight (128, 64, 32, 16, 8, 4, 2, 1).
The minimum value for an octet is 0 (all bits set to 0).
The maximum value for an octet is 255 (all bits set to 1).
First Octet Rule
You can determine the class of an address by examining the first octet of the address.
The left-most (high-order) bits in the first octet indicate the network class.
For example, an IP address of 192.168.21.40, the first octet is 192. 192 falls between 192 and 221; our sample is a Class C address.
Limitations of Classful IP Addressing for the Internet
The IP addressing and routing table growth problems have their origins in the following early Internet design decisions:
The decision to limit the IPv4 IP address space to a 32-bit address.
The perceived, unlimited IP address space allocated to an organization was based on the organization’s request and not for the actual number of addresses needed.
Although the Class A, B, and C Classful addressing scheme was easy to implement, it did not provide efficient allocation of the limited address space.
In the early days, organizations with a few hundred sites were allocated a single Class B address, rather than multiple Class C addresses. This resulted in the early depletion of the Class B address space, which leaves the Class C address space for medium-sized organizations.
IP Subnetting
All Classes of IP networks can be divided into smaller networks called subnetworks (or subnets).
Dividing the major class network is called subnetting.
Problems Solved with Subnetting
Subnetting gives network administrators the following benefits:
Subnetting provides extra flexibility, makes more efficient use of network address utilization, and contains broadcast traffic because a broadcast will not cross a router.
Subnets are locally administered. Therefore, the outside world sees an organization only as a single network. External users and organizations have no detailed knowledge of the organization's internal network structure.
Subnet Address Hierarchy
A subnet address is created by "borrowing" bits from the host field and designating them as the subnet field.
The number of borrowed bits is variable and specified by the subnet mask.
The following figure shows how bits are "borrowed" from the host address field to create the subnet address field:
Subnet Example 1
An organization has been assigned the network number 193.1.1.0/24 and it needs to define six subnets. The largest subnet is required to support 25 hosts. Defining the Destination Prefix Length:
First, determine the number of bits required to define the six subnets. Since a network address can only be subnetted along binary boundaries, subnets must be created in blocks of powers of two [ 2 (21 ), 4 (22 ), 8 (23 ), 16 (24 ), etc. ].
Therefore, you cannot define an IP address block such that it contains exactly six subnets. In this example, the network administrator must define a block of 8 (23 ) and have two unused subnets that can be reserved for future growth.
Extended-network-prefix
Since 8 = 23 , three bits are required to enumerate the eight subnets in the block. Furthermore, our organization is subnetting a /24 so it needs three additional bits, or a /27, as the extended-network-prefix.
A 27-bit extended-network-prefix can be expressed in dotted-decimal notation as 255.255.255.224.
Subnet Example 1 (Extended-network-prefix)
A 27-bit extended-network-prefix has 5 bits to define host addresses on each subnet. Therefore, each subnetwork with a 27-bit prefix represents a contiguous block of 25 (32) individual IP addresses. However, since the all-0s and all-1s host addresses cannot be allocated, there are 30 (25 -2) assignable host addresses on each subnet.
Defining Each Subnet Number
Each of the eight subnets will be numbered 0 through 7. The XXX2 notation indicates the binary representation of the number. The 3-bit binary representation of the decimal values 0 through 7 are: 0 (0002 ), 1 (0012 ), 2 (0102 ), 3 (0112 ), 4 (1002 ), 5 (1012 ), 6 (1102 ), and 7 (1112 ).
To define Subnet #n, the network administrator places the binary representation of n into the bits of the subnet-number field. For example, to define Subnet #6, the network administrator simply places the binary representation of 6 (1102 ) into the 3-bits of the subnet-number field.
The eight subnet numbers for this example are given below. The italicized portion of each address identifies the extended-network-prefix, while the bold digits identify the 3- bits representing the subnet-number field:
Base Net: 11000001.00000001.00000001 .00000000 = 193.1.1.0/24Subnet #0: 11000001.00000001.00000001.000 00000 = 193.1.1.0/27Subnet #1: 11000001.00000001.00000001.001 00000 = 193.1.1.32/27Subnet #2: 11000001.00000001.00000001.010 00000 = 193.1.1.64/27Subnet #3: 11000001.00000001.00000001.011 00000 = 193.1.1.96/27Subnet #4: 11000001.00000001.00000001.100 00000 = 193.1.1.128/27 Subnet #5: 11000001.00000001.00000001.101 00000 = 193.1.1.160/27Subnet #6: 11000001.00000001.00000001.110 00000 = 193.1.1.192/27Subnet #7: 11000001.00000001.00000001.111 00000 = 193.1.1.224/27
An easy way to check if the subnets are correct is to ensure that they are all multiples of the Subnet #1 address. In this case, all subnets are multiples of 32: 0, 32, 64, 96, … etc.
Subnet Example 1 (Extended-network-prefix)
A 27-bit extended-network-prefix has 5 bits to define host addresses on each subnet. Therefore, each subnetwork with a 27-bit prefix represents a contiguous block of 25 (32) individual IP addresses. However, since the all-0s and all-1s host addresses cannot be allocated, there are 30 (25 -2) assignable host addresses on each subnet.
Defining Each Subnet Number
Each of the eight subnets will be numbered 0 through 7. The XXX2 notation indicates the binary representation of the number. The 3-bit binary representation of the decimal values 0 through 7 are: 0 (0002 ), 1 (0012 ), 2 (0102 ), 3 (0112 ), 4 (1002 ), 5 (1012 ), 6 (1102 ), and 7 (1112 ).
To define Subnet #n, the network administrator places the binary representation of n into the bits of the subnet-number field. For example, to define Subnet #6, the network administrator simply places the binary representation of 6 (1102 ) into the 3-bits of the subnet-number field.
The eight subnet numbers for this example are given below. The italicized portion of each address identifies the extended-network-prefix, while the bold digits identify the 3- bits representing the subnet-number field:
Base Net: 11000001.00000001.00000001 .00000000 = 193.1.1.0/24Subnet #0: 11000001.00000001.00000001.000 00000 = 193.1.1.0/27Subnet #1: 11000001.00000001.00000001.001 00000 = 193.1.1.32/27Subnet #2: 11000001.00000001.00000001.010 00000 = 193.1.1.64/27Subnet #3: 11000001.00000001.00000001.011 00000 = 193.1.1.96/27Subnet #4: 11000001.00000001.00000001.100 00000 = 193.1.1.128/27 Subnet #5: 11000001.00000001.00000001.101 00000 = 193.1.1.160/27Subnet #6: 11000001.00000001.00000001.110 00000 = 193.1.1.192/27Subnet #7: 11000001.00000001.00000001.111 00000 = 193.1.1.224/27
An easy way to check if the subnets are correct is to ensure that they are all multiples of the Subnet #1 address. In this case, all subnets are multiples of 32: 0, 32, 64, 96, … etc.
Growth of the Internet
The Internet is today’s world’s largest public data network. The Internet connects millions of users and businesses worldwide given increased public awareness and the expanding popularity of the World Wide Web (WWW).
The ongoing technical advancements in local-area networking and networking hardware have also contributed to the Internet’s growth and its increased value to individual and corporate users.
The Internet is a global network connecting millions of computers. As of 1999, the Internet has more than 200 million users worldwide and includes more than 100 countries.
Growth of Internet Routing Tables
Another problem caused by the Internet’s expansion is the growth of Internet routing tables.
Internet backbone routers must maintain complete routing information for the entire Internet. Therefore, the routing tables have had exponential growth over the past decade.
Additional factors related to router table capacity:
Increasing demand for CPU processing speed to handle routing table and topology updates
The dynamic nature of World Wide Web connections
Increased volume of diverse information
Although the IP Next Generation IPng (a.k.a. IPv6) is the long term solution to today’s Internet addressing and routing table limitations, the Internet community is not ready to make the investments required for the IPv4 to IPng migration and implementation.
Therefore, IPv4 has been modified to accommodate the Internet’s uninterrupted growth.
Classless Inter-Domain Routing (CIDR) Definition
Classless Inter-Domain Routing is a new IP addressing scheme that replaces the older system based on classes A, B, and C. With CIDR, a single IP address can be used to designate many unique IP addresses. A CIDR IP address looks like a normal IP address except that it ends with a slash followed by a number, called the IP prefix.
The following is an example of an IP address with CIDR notation:
172.100.0.0/16
The IP prefix specifies how many addresses are covered by the CIDR address, with lower numbers covering more addresses.
For example, an IP prefix of /12 can be used to address 4,096 former Class C addresses.
CIDR addresses reduce the size of routing tables and make more IP addresses available within organizations.
Implications of CIDR on the Router
In September 1993, CIDR was officially documented in RFC 1517, 1518, 1519, and 1520. CIDR supports two important features that benefit the Internet routing system:
CIDR eliminates the traditional concept of Class A, Class B, and Class C network addresses. This enables the efficient allocation of the IPv4 address space which will allow the continued growth of the Internet until IPv6 is deployed.
CIDR supports route aggregation where a single routing table entry represents the address space of perhaps thousands of traditional classful routes. This allows a single routing table entry to specify how to route traffic to many individual network addresses.
Route aggregation controls the amount of routing information in the Internet's backbone routers, reduces route flapping (rapid changes in route availability), and eases the local administration of that task updating external routing information.
Efficient Address Allocation
In a classful environment, an Internet Service Provider (ISP) can only allocate /8, /16, or /24 addresses. In a CIDR environment, the ISP can allocate a block of its registered address space that specifically meets the needs of each client, provides additional room for growth, and does not waste a scarce resource.
Let’s say an ISP has been assigned the address block 206.0.64.0/18. This block represents 16,384 (214) IP addresses which can be interpreted as 64 /24s.
If a client requires 800 host addresses, rather than assigning a Class B address and wasting ~64,700 addresses, or four Class C addresses (and introducing 4 new routes into the global Internet routing tables), the ISP could assign the client the address block 206.0.68.0/22, a block of 1,024 (210) IP addresses (4 contiguous /24s).
The efficiency of this allocation shown in the slide.
CIDR Example 3 (Routing in a Classless Environment)
The slide shows the routing advertisements for an organization.
Because all of Organization A's routes are part of ISP #1's address block, the routes to Organization A are implicitly aggregated via ISP #1's aggregated announcement to the Internet.
Therefore, the eight networks assigned to Organization A are hidden behind a single routing advertisement. Using the longest match forwarding algorithm, Internet routers will route traffic to host 200.25.17.25 to ISP #1, which will in turn route the traffic to Organization A.
The two basic functions of a router are route determination (sometimes called route processing) and packet forwarding (sometimes called network-layer switching).
The first step in route determination is to learn routes. A router may learn routes by several means, such as:
A network administrator can manually tell the router about routes (static routes).
Neighboring routers can tell the router about routes via some common routing protocol (dynamic routes).
The router can use known information, such as directly attached networks
All of the learned routes are entered into the routing table, a database of known routes.
Frequently, the routing table contains more than one route to a single destination. The router must apply a selection process to determine the best or most preferred route. This route is entered into the forwarding table, a database of routes the router is actively using to forward packets.
When the router receives a packet, it examines the destination address in the IP header. The router then examines the forwarding table, and tries to find a route to that destination. If a route is found, the packet is forwarded out the specified interface. If no route is found, the packet is dropped.
A routing algorithm, as distinct from a forwarding algorithm, is responsible for obtaining destination prefixes (along with their metrics and other attribute information) from incoming dynamic updates as well as other sources, and extracting a subset of the best routes to all reachable destinations. The next-hop addresses of the routers that make those destinations available, along with the local port through which they are reached, constitute the essential parts of the forwarding table. The forwarding table can then be consulted to make forwarding decisions for packets.
This forwarding table shows the destination prefix, the next-hop address, and the outgoing interface (Netif). Other information shown in the table describes how the route was learned (Type and RtRef), the type of next hop (the second Type column), and the number of routes pointing to the next hop (NhRef).
An IP router does not need all the information in a packet’s IP header to forward the packet. In fact, the router probably does not even need the entire destination IP address. Conceptually, a router classifies all packets according to the next hop to which the packets are forwarded. It does not care about any more specific packet characteristics. Stating this in another way, the router interprets every packet as belonging to a Forwarding Equivalency Class (FEC), where each FEC is the set of packets forwarded on a particular route.
In this forwarding table, the NhRef of the first two entries indicate that there are a total of 74,212 routes using the next-hop address 10.100.76.254, reachable via interface fxp0.0. In other words, 74,212 routes belong to the same FEC. The significance of Forwarding Equivalency Classes becomes apparent later in this class, when you examine longest-match route lookups.
So, faced with a number of possible next-hop options for a particular prefix, how does a router go about deciding which to use? For Interior Gateway Protocols, the choice is usually made according to a metric – a value that somehow indicates the “goodness” of a route. That number may represent some quality on which the network is required to optimize, such as smallest hop count, highest bandwidth path, or lowest delay.
In practice, however, the use of these numbers dynamically is a difficult problem. It is difficult to continually update changing conditions on large networks, and fluctuating values can lead to the continual re-optimization of paths, which injects instability into the routing system. Therefore, modern IGPs like OSPF and IS-IS tend towards administratively configured per-interface metrics. The job of the routing protocol is then to simply spread the metric information around so that all routers have access to it in their calculations.
Juniper Networks Role in the Internet
Subsequent pages discuss the following topics:
Networking hardware evolution: How did routers evolve from a single CPU architecture?
Juniper Networks—the company: What are the business, mission, market, and products of Juniper Networks?
Juniper Networks M-series and T-series product line: What is the Juniper Networks product M-series and T-series product line and how does each product compare?
General-Purpose Computers
The first routers were general-purpose computers with a single CPU, RAM, and an operating system. These early routers typically supported a mix of LAN and low-speed serial interfaces. Juniper Networks took this hardware architecture and modified it to include a purpose-built operating system and a hardware-based switching engine capable of keeping pace with increases in optical bandwidth.
Networking Advancements
Networking and PC technology advancements spawned the growth of ever more PCs being attached to an increasing number of networks. These PCs ran applications with increasing application bandwidth requirements. Sophisticated software developers were able to keep this pace for a period of time; however, the single-CPU router architecture could not.
Breaking Tradition
To accommodate these technology trends, Juniper Networks began modifying the hardware architecture. The availability of application-specific integrated circuits (ASICs) allowed functions previously performed by software to be executed by hardware. The custom ASICs designed by Juniper Networks combined with the purpose-built JUNOS software has led to industry-leading forwarding and convergence characteristics.
The general design of Juniper Networks routers reflects the realization that the problem of Internet control, which involves active route selection through the operation of various routing protocols, was as complex as achieving wire-rate performance for all packet sizes in the presence of advanced services. Separating the router’s control and forwarding planes allows these orthogonal problems to be tackled with a best-of-breed solution.
Business
Juniper Networks, Inc. converts bandwidth into scalable, differentiable IP services using a new class of integrated silicon- and software-based routing systems. Juniper Networks products are designed and built by industry experts who have broad experience in IP infrastructure applications, as well as in-depth understanding of the future direction of IP infrastructures.
Juniper Networks does not simply make and sell routers. Juniper Networks is in the market of selling solutions to complex packet processing problems. These solutions allow providers to create and deploy services for additional revenue.
Mission
The Juniper Networks mission is to be the primary supplier of scalable, reliable, high-performance IP systems for the new IP infrastructure.
Market
Juniper Networks supplies systems to numerous worldwide markets that provide high-speed IP services with both core and edge solutions. Applications include backbone bandwidth management, multiservices content and Web hosting, public and private peering, and high-speed access. Customers include many of the world's leading service providers, including Cable & Wireless plc, VERIO, Inc., NTT/Verio, Inc, Verizon, Inc., Bell South, Inc., and UUNET, an MCI WorldCom company.
Juniper Networks Product Positioning
The network of today’s service providers is typically made up of two major components: the network edge and the network core. These two components operate differently and have different network device requirements and application focuses.
The consumer network edge is normally associated with a large number of broadband remote access servers (B-RAS) that support large numbers of low- to medium-speed subscriber interfaces using a variety of physical link layer technologies, such as DSL, ATM, Frame Relay, and dedicated access links based on T1/E1 and T3/E3 technology.
Edge devices often rely on simple static routing and might provide security and class-of-service (CoS) related features as needed. In most cases network edge applications are served by the E-series family of edge routers. Although smaller M-series routers can be deployed to address dedicated access business edge applications as needed, E-series products are specifically designed to meets the needs of large-scale B-RAS applications.
In contrast, the network core is often associated with a smaller number routers supporting far fewer interfaces that operate at much higher speed. These high-speed interfaces are typically based on SONET technology and act to aggregate the data from large numbers of individual subscriber lines for efficient long-haul transport. Core routers almost always run dynamic routing protocols, both for internal routing (IGP) and external routing (EBGP), and might also deploy Multiprotocol Label Switching (MPLS) for traffic engineering and VPN-related applications. Core routers might also provide CoS, and in some cases, security-related features.
Network core applications are normally served by M-series and T-series routing platforms.
High-Performance Remote and Dedicated Access Platforms
Juniper Networks acquired the E-series family of edge routers as a result of the acquisition of Unisphere Networks in mid 2002. The E-series family of hardware platforms brings high-performance broadband remote access sever (B-RAS) capabilities to the Juniper Networks portfolio.
E-series Edge Router Curriculum
Due to the differences in hardware, command-line interface (CLI), and their typical edge-market application, coverage of E-series edge router technology is beyond the scope of this class. This material in this course covers the operation and configuration of Juniper Networks M-series and T-series platforms only.
For more information on E-series platform-related curriculum offerings, go to: http://www.juniper.net/training.
The M-series and T-series Product Line: Part 1
The key application driving the new IP infrastructure is the Internet, which continues to grow despite recent market downturns and various dot-bomb explosions. The Internet has moved from a convenience to a mission-critical platform for conducting and succeeding in business. As reliance on the Internet grows, so do customer expectations for value-added services that require increased bandwidth. There is little doubt that these rising demands for new services and for the Internet will set the standard for the emerging IP infrastructure.
Juniper Networks delivers a family of router platforms that provide industry-leading performance with solutions that offer high availability, scalability in multiple dimensions, market-leading port density, and flexible control over traffic to achieve optimal bandwidth efficiency. This Juniper Networks M-series and T-series product line currently consists of the M5/M10, M7i/M10i, M20, M40, M40e, M160, and T320 routers, and the T640 routing node. All these platforms run a common JUNOS Internet software image, and all are based on purpose-built ASICs for packet forwarding, including the Internet Processor II ASIC, which has the unique ability to provide enhanced services on all interfaces without compromising performance.
Platform highlights include:
The M40 Internet router provides more than 40-Gbps aggregate throughput and supports up to 32 PICs per chassis.
The M20 Internet router also provides aggregate throughput of 20+ Gbps, supporting up to 16 PICs per chassis.
The M10 Internet router supports up to eight PICs per chassis with an aggregate throughput of 10+ Gbps.
The M5 Internet router supports up to four PICs per chassis with an aggregate throughput of 5+ Gbps.
The M-series and T-series Product Line: Part 2
The Juniper Networks product portfolio continues to grow with the release of ever-faster and higher-density routing platforms. A significant aspect of the M-series and T-series product line is that all platforms run a common software image with support for all features. This is significant when you consider that one router vendor currently has 5,800 images for the 2600/3600/3700 family of products alone!
Recent platform offerings include:
The T320 router is based on T640 ASIC technology and features a 1/3 rack footprint with an impressive 160 Gbps of throughput (320 Gbps aggregate).
The T640 Internet routing node is Juniper Networks largest router, offering an aggregate throughput of 320+ Gbps (640 Gbps aggregate) and support for up to 128 OC-48/STM-16, 32 OC-192/STM-64, or 128 Gigabit Ethernet ports for the router.
The M160 Internet router offers an aggregate throughput of 160+ Gbps. It supports up to 32 OC-48c/STM-16 PICs or up to eight OC-192c/STM-64 PICs per chassis.
The M40e Internet router delivers M40 throughput while meeting the needs of the edge aggregation market through its redundancy features and support of high-density PICs, such as the 48-port Fast Ethernet PIC.
Continued on next page.
The M-series and T-series Product Line: Part 2 (contd.)
The M7i and M10i platforms offer a migration path/alternative for customers using the previously released M5 and M10 platforms. Both platforms feature support for existing M5/M10 PICs. The M7i supports a Fixed Interface Card (FIC) with Ethernet connectivity for added value and flexibility as well as an integrated Tunnel Services PIC; an integral Adaptive Services PIC, which is in addition to the Tunnel PIC, is an optional CFEB upgrade. The M10i, on the other hand, offers RE and CFEB redundancy but does not support the FIC or integral services PIC. At only two rack units, up to 21 M7is can be mounted in a single 19-inch rack!
M-series and T-series Hardware Overview
We discuss the following topics in subsequent pages:
General platform architecture: The architecture of Juniper Networks M-series and T-series platforms is designed to separate the equally complex problems of control and packet forwarding.
Hardware overview: The unique roles of the routing Engine (RE) and the Packet Forwarding Engine (PFE) are fully explored, as are the components that constitute the Packet Forwarding Engine on M-series and T-series platforms.
Craft Interface: The function and operation of the Craft Interface, including the LCD panel when so equipped, is described.
Typical Field Replaceable Units: The Field Replaceable Units (FRUs) associated with a typical Juniper Networks routing platform are discussed.
M-series and T-series Summary: A summary of the differences between the various M-series and T-series platforms currently supported by Juniper Networks is provided in table form.
Architectural Philosophy
Architecturally, all Juniper Networks M-series and T-series platforms share a common design that separates the router's control and forwarding planes. To this end, all M-series and T-series platforms consist of two major components:
The Routing Engine (RE): The RE is the brains of the platform; it is responsible for performing routing updates and system management. The RE runs various protocol and management software processes that live inside a protected memory environment. The RE is a general-purpose computer platform based on an Intel microprocessor. The RE is connected to the PFE through an internal 100-Mbps connection.
The Packet Forwarding Engine (PFE): The PFE is responsible for forwarding transit packets through the router using an ASIC-based switching path. The PFE is a high-performance switch capable of forwarding up to 320 Mpps, in the case of the M160 platform, for all packet sizes. By adding a cross-bar switching fabric between multiple PFEs, T-series platforms can achieve up to 640 Mpps of forwarding capacity in a single chassis!
Because this architecture separates control operations—such as routing updates and system management—from packet forwarding, the router can deliver superior performance and highly reliable Internet operation.
The simple fact that you can enable enhanced services without significantly impacting forwarding rates or system stability is a testament to the validity of M-series and T-series architecture.
Continued on next page.
Routing and Forwarding Table Interaction
The JUNOS software routing protocol process implements the various routing protocols that can be run on the router. The routing protocol process starts all configured routing protocols and handles all routing messages. The routing daemon (rpd) maintains one or more routing tables, which consolidates the routing information learned from all routing protocols into common tables. From this routing information, the routing protocol process determines the active routes to network destinations and installs these routes into the Routing Engine’s forwarding table (FT).
Routing Engine and Packet Forwarding Engine Synchronization
The Packet Forwarding Engine (PFE) receives the forwarding table from the Routing Engine. The Packet Forwarding Engine’s FT and the Routing Engine’s FT are kept synchronized over the 100-Mbps fxp1 Ethernet link, which interconnects the two entities. This synchronization ensures that a change in topology produces identical FTs in the Routing Engine and Packet Forwarding Engine. FT updates are a high priority for the JUNOS software kernel and are performed incrementally. The router’s FT is large enough to hold over 800,000 entries. Thus, entries are never aged out of the FT to make room for new entries, or because they have not been recently used. This behavior ensures that packets are switched in hardware not software, as can happen on the platforms of other vendors when FT entries are aged out of line cards, forcing process-based switching of packets destined to those addresses.
JUNOS Software
The primary copy of JUNOS software resides on the flash memory of the router. A backup copy is available on the hard drive when you issue a request system snapshot command.
Routing Engine Intelligence
The Routing Engine handles all the routing protocol processes as well as other software processes that control the router’s interfaces, a few of the chassis components, system management, and user access to the router. These routing and software processes run on top of a kernel that interacts with the Packet Forwarding Engine. All routing protocol packets from the network are directed to the Routing Engine.
Command-Line Interface
The Routing Engine provides the command-line interface (CLI). The CLI runs on top of the kernel; it is controlled by the management daemon (mgd). We examine the CLI and its features in more detail in a subsequent module.
Packet Forwarding Engine Management
The Routing Engine controls the Packet Forwarding Engine by providing an accurate and up-to-date forwarding table and by downloading microcode and managing software daemons that live in the Packet Forwarding Engine’s microcode. The RE receives hardware and environmental status messages from the PFE and acts upon them as appropriate.
Routing Engine Specifications
Juniper Networks has periodically released updated Routing Engines (REs) to enhance the performance of a given routing platform. For example, you can replace the original 233-MHz RE that shipped with the M40 (RE-M40) router by a higher performance RE-333 or RE-600 when you want increased memory and processor speed. The RE-333 and RE-600 are also known as RE2 and RE3 respectively when viewing the output of a show chassis hardware command.
At the time of this writing, the latest RE enhancement is reflected in the RE-600, which features a 600-MHz clock rate and support for up to 2 GB of RAM. The RE-600 is now standard issue on all M-series and T-series platforms except the M7i and M10i routers; upgrades are available for systems that originally shipped with the RE-333 or RE-M40.
The new M7i and M10i platforms make use of a specially designed RE model that is not based on a compact PCI platform. Juniper Networks chose the particulars of the RE-400 (RE5) to reduce platform costs (Celeron processor, optional compact flash and PCMCIA card support) while matching the processing requirements of the relatively small M7i and M10i routers. The RE-400 is only supported in the M7i and M10i platforms. Originally the RE-600 shipped with a 128-MB flash memory. New RE-600 units are now shipping with 256 MB of flash memory to better accommodate the new partition scheme used in JUNOS software Release 6.x. Juniper Networks will upgrade RE-600 units returned for maintenance with 256-MB of flash storage when needed.
Continued on next page.
Custom ASICs
ASICs enable the router to achieve data forwarding rates that match current fiber-optic capacity. Such high forwarding rates are achieved by distributing packet processing tasks across highly integrated ASICs. As a result, Juniper Networks M-series and T-series platforms do not require a general purpose processor for packet forwarding; this makes process switching (the software-based handling of packet forwarding) an alien concept for Juniper Networks routers. The custom ASICs provide enhanced services and features, such as multicast, CoS/queuing, and firewall filtering in hardware so that you can enable services on production routers without concern of significant performance hits.
Divide-and-Conquer Architecture
Each ASIC provides a piece of the forwarding puzzle, allowing a single ASIC to perform its specific task optimally. We examine the the role that each M-series and T-series ASIC plays in packet forwarding in following slides.
Physical Interface Cards
Juniper Networks M-series platforms provide a complete range of fiber optic and electrical transmission interfaces to the network through a variety of Physical Interface Cards (PICs). These space-efficient modules offer exceptional flexibility and high port density.
Flexible PIC Concentrator
Flexible PIC Concentrators (FPCs) house the PICs and provide shared memory for the M-series switch fabric. These intelligent, high-performance interface concentrators allow you to mix and match PIC types within a given FPC.
System Control Boards
On M-series platforms, the System Control Boards provide the route lookup component of the Packet Forwarding Engine using the Internet Processor II ASIC. Each System Control Board on M-series routers provides the same function, despite their having different names. On the M5 and M10 routers, the FPC and control board components are combined onto a single board called the Forwarding Engine Board (FEB). The M7i and M10i combine the same functionality into a smaller version of the FEB called a Compact FEB (CFEB).
The M-series System Control Board (FEB/CFEB, SSB, or SCB) also houses the buffer management ASICs on all models except the M40.
The Midplane
The system midplane is the component of the Packet Forwarding Engine that distributes power and electrical signals to each card in the system. The midplane is passive in all M-series routers except the M40. The M40’s midplane houses the Distributed Buffer Manager ASICs.
Continued on next page.
The M40e and M160 Platforms
The M160 and the M40e platforms (the latter being a scaled down version of the M160) differ from the other M-series platforms in a number of ways. For example, the route lookup function associated with an M20’s SSB is now performed by the Switch Fabric Module (SFM). Differences between the original M-series platforms and the M40e and M160 platforms include:
Route lookup: On M40e and M160 platforms, the Internet Processor II route lookup ASIC is housed on the Switch Fabric Module (SFM). An M160 system supports a total of four SFMs. Each SFM is capable of performing 40 million packet lookup operations per second. The presence of four SFMs yields the 160-Mpps capacity of the M160. The failure of an SFM gracefully reduces total system throughput by approximately 25%. The M40e supports a total of two SFMs, with only one SFM active at any given time. The failure of the active SFM results in the automatic switchover to the spare SFM when so equipped.
System control: The Miscellaneous Control Subsystem (MCS) card works with the active Routing Engine to provide control and monitoring functions for the various components in the chassis.
PFE clock generation: The active host module on an M40e or M160 platform requires a PFE Clock Generator (PCG) to provide a 125-MHz signal that clocks the various gates in the PFE complex. Redundant PCGs are supported.
Physical Interface Cards
As with the M-series routers, PICs provide the T-series PFE with a large range of fiber optic and electrical transmission interfaces to the network.
The T-series PFE
T-series platforms implement either one to two complete PFE complexes on each FPC. On the T640 platform a single PFE is present on the FPC2 while two PFEs are present on the FPC3. The latter FPC type is designed specifically for native T-series PICs. Packets that ingress and egress on the same PFE complex (for example, on PICs 0 and 1, or 2, and 3 of a given FPC) do not leave that PFE. Packets are switched between PFEs across the T-series switch fabric as needed.
The T-series Switch Fabric
T-series platforms make use of a shared memory switch fabric for intra- and inter-FPC/PFE communications. In addition, inter-FPC communications requires transit of the T-series cross-bar switch fabric, which is instantiated by the system’s Switch Interface Boards (SIBs). The T320 can support up to three SIBs while the T640 supports five. In the case of the T640, four SIBs provide the necessary speedup for a nonblocking architecture. The fifth SIB is only used in the event of a SIB failure. The system’s throughput gracefully degrades in the unlikely event of multiple SIB failures. In normal operation the T320 router makes use of SIBs 1 and 2 with SIB 0 functioning as a standby. In the event of a SIB failure, SIB 0 automatically becomes active. There might be a slight performance degradation when using SIB 0 because each FPC only has one high-speed line (HSL) to SIB 0 (two HSLs interconnect the FPCs to each of SIB 1 and SIB 2).
The Midplane
The midplane distributes power and electrical signals to the components and cards that make up the PFE and switch fabric.
PIC Overview
PICs provide the physical connection to various network media types. PICs receive incoming packets from the network and transmit outgoing packets to the network. During this process, each PIC performs appropriate framing and signaling for its media type. Before transmitting outgoing data packets, the PIC adds media-specific framing to the packets received from the FPCs. You can install up to four PICs into slots on each FPC. PIC types can be intermixed within the same FPC. The number of ports on a given PIC varies with the PIC and platform type. As of this writing, M40e PICs are available with as many as 48 Fast Ethernet ports!
IP Services PICs enable a hardware assist for complex packet processing functions. Examples include the Tunnel Services and Multilink Services PICS. With the Tunnel Services PIC, routers can function as the ingress or egress point of an IP-IP unicast tunnel, a generic routing encapsulation (GRE) tunnel, or a Protocol Independent Multicast-Sparse Mode (PIM-SM) tunnel. The Multilink PIC uses the Multilink Point-to-Point Protocol (MLPPP) and Multilink Frame Relay (MLFR, FRF 1.5) to group up to eight T1 or E1 links per bundle to yield a service offering ranging from 1.5 Mbps through 12 Mbps (T1) or 2 Mbps through 16 Mbps (E1).
Custom ASIC
Each PIC is equipped with an ASIC that performs control functions tailored to the PIC’s media type. For instance, an ATM PIC and a Fast Ethernet PIC each contain a unique ASIC suited to the particulars of each medium.
Continued on next page.
PIC Status
Each PIC supports one or more status LEDs that accommodate quick verification of the PIC, and in some cases, the port’s operational status.
Hot-Pluggable in Most Platforms
You can replace or install PICs without removing the associated FPC on all platforms except the M20 and M40. On these routers you must remove the host FPC, which is hot-removable and hot-insertable, to gain access to the PIC’s mounting screws.
General FPC Characteristics and Features
FPCs install into the backplane from the front of the chassis. You can install an FPC into any FPC slot; a specific order is not required. If an FPC does not occupy a slot, you must install a blank FPC carrier to shield the empty slot so that cooling air can circulate properly through the card cage. FPCs can support from one to four PICs, depending upon specifics. For example, an OC-192 interface on an M160 platform has an FPC built around it because the high-speed interface consumes all four PIC positions. This yields the lowest possible density of one PIC/FPC. Most FPCs support four PIC connectors; the current exception is the T320, which supports two PIC connectors per FPC.
When you install an FPC into a running system, the FPC requests its operating software from the Routing Engine, the FPC runs its diagnostics, and the PICs on the FPC slot are enabled. FPCs are hot-insertable and hot-removable on all platforms except the M5/ M10, and the M7i/M10i. This is because these routers have FPCs that are combined with the system board components to create a Forwarding Engine Board/Compact Forwarding Engine Board (FEB/CFEB). On the M5 and M10 platforms, the FEB is not hot-pluggable/insertable. You must remove power before inserting or removing the FEB. The CFEB on the M7i and M10i is hot-insertable, however.
Note that when you remove or install an FPC on a M-series platform the system must repartition the shared memory pool; this process results in about 200 milliseconds of disruption to all packets associated with the affected PFE. T-series platforms contain from one to two complete PFEs on each FPC, and therefore packet forwarding on one FPC is not affected by the removal or insertion of other FPCs.
Continued on next page.
General FPC Characteristics and Features (contd.)
Some high-speed PICs, like the OC-48c/STM-16 for the M20/M40 and the M160’s OC-192c/STM-64 SONET/SDH PIC, are quad-wide and do not require an FPC because quad-wide PICs have FPC functionality built in.
A portion of the memory associated with each FPC is pooled together with the memory from other FPCs to create the M-series shared memory switch fabric. The actual amount of FPC memory varies by FPC type, but in all cases there is at least a 100 milliseconds of delay buffer (for each transmit and receive yielding a total of 200 milliseconds of delay/bandwidth buffering). Currently, the amount of memory present on a given FPC ranges from 64 MB on the M5/M10 to 1.2 GB on the T640 FPC3. In the latter case, this yields approximately 600 MB per PFE complex.
Some routing platforms support multiple FPC types to allow customers to reuse PICs from earlier platforms. For example, the original FPC is the only FPC type supported on the M20, and M40 platforms. The M160’s FPC1 is designed to support the reuse of M20 and M40 PICs in the higher-end M160 platform. The M160’s FPC2, on the other hand, supports PICs designed specifically for the M160’s increased throughput like the OC48-c PIC. In a similar fashion, the T320 platform supports three types of FPCs (type 1, 2, and 3). The type 3 FPC offers support for native T-series PICs while the type 2 and type 1 FPCs offer support for M160 and M20/40 PICs respectively. The T640 supports only type 2 and type 3 FPCs at this time.
Industry-Leading Throughput
M-series routers have a aggregate slot throughput of 6.4 Gbps, except for the M160 router, which supports an aggregate capacity of 25.6 Gbps as needed to support OC-192c PICs. T-series platforms increase aggregate throughput to a respectable 40 Gbps for the T320 platform and 80 Gbps for the T640 platform when using the native FPC3.
General M-series System Board Functions
The M-series System Board houses the Internet Processor II ASIC and performs a variety of functions. These include:
Route lookups and forwarding table maintenance: The Internet Processor ASIC performs route lookups using a forwarding table stored in the chip’s synchronous SRAM (SSRAM). The System Board updates its copy of the forwarding table when instructed by the JUNOS software kernel. The M20’s SSB contains the memory management ASICs as well as the Internet Processor II ASIC. The M40’s SCB does not contain the memory management ASICs, as these are part of the M40’s midplane.
Management of ASICs and PFE components: The System Board monitors various system components for failures and alarm conditions. It collects statistics from all sensors in the system and relays them to the Routing Engine, which sets the appropriate alarm. For example, if a temperature sensor exceeds the first internally defined threshold, the Routing Engine issues a high temp alarm. The System Board handles the power up and power down of PFE components with diagnostic errors reported to the RE over the 100 Mbps fxp1 interface.
Environmental monitoring: The System Board monitors the various temperature sensors to control fan speed and over temperature alarm generation.
SONET clock: The System Board generates a Stratum 3 clock reference used to clock SONET interfaces.
Continued on next page.
General M-series System Board Functions (contd.)
Transferring exception and control packets: The Internet Processor ASIC passes exception packets to a microprocessor on the System Board, which processes almost all of them. Remaining packets are sent to the RE for further processing. Errors originating in the Packet Forwarding Engine detected by the System Board are sent to the RE where they are logged and made available to the CLI.
The Names Vary
The System Board names vary by platform. The M40 has a System Control Board (SCB), while the M20 has a System Switching Board (SSB). SSB redundancy is supported on the M20. The M5/M10 and M7i/M10i platforms integrate System Board functionality into their Forwarding Engine Board/Compact Forwarding Engine Board (FEB/CFEB).
Enhanced System Boards
Enhanced System Boards support the second generation Internet Processor II ASIC on M-series platforms (except the M5/M10 and M7i/M10i). Enhanced system boards offer improved performance and scalability. For example, the size of the forwarding table is increased from approximately 420,000 entries to approximately 840,000 entries with an enhanced S-board. Some of the enhancements present in the second generation Internet Processor II ASIC are listed here:
Doubles the amount of on-chip memory (now 16 MB).
Increases memory to 128 MB in the CPU complex for the M40 router, and to 256 MB for the M20, M40e, and M160 routers
Increased CPU speed (now 256 MHz).
Enhanced System Boards first shipped with JUNOS software Release 5.5 circa September 2002.
General Control Board Functions
Newer M-series routers and all T-series platforms make use of a Control Board to provide some of the functionality associated with the System Board found on previous M-series platforms. These functions include:
Management of ASICs and PFE components: The Control Board handles the power up and power down of other PFE components. Diagnostic errors are reported to the RE over the 100-Mbps fxp1 interface.
Environmental monitoring: The Control Board monitors the various temperature sensors to control fan speed and over temperature alarm generation.
SONET clock generation: On M-series platforms the Miscellaneous Control Subsystem generates and distributes a Stratum 3 clock reference used to clock SONET interfaces. On T-series platforms the SONET Clock Generator (SCG) generates the Stratum 3 reference clock that is then distributed to the PFE components by the system’s active Control Board.
M160/M40e Platforms
The M160 and M40e platforms use a Miscellaneous Control Subsystem (MCS) board to provide control functions. The MCS works in conjunction with a Routing Engine to form a Host Module. Host module redundancy is supported, as is redundant MCSs that are controlled by a common (nonredundant) RE.
T-series Platforms
T-series platforms use a Control Board (CB) to provide control functions. The CB works in conjunction with an RE to form a Host Subsystem. Host Subsystem redundancy is supported.
Internet Processor II
The Internet Processor ASIC, which first shipped with the M40 router in September 1998, heralded a breakthrough technology that facilitated longest-match traffic forwarding for virtually all packet sizes at or very near line rate. Performance tests in the lab, test networks, and on the Internet itself have all demonstrated 40 Mpps of 40-byte packets with 80,000 prefixes in the routing table!
Building on this tradition, the Internet Processor II ASIC continues to deliverbest-of-class functionality for network core and edge applications. Simply put, as of this writing, T-series platforms are the highest performing systems on the market.
The experience of interacting with the world's largest service providers gives Juniper Networks a unique advantage in developing products and features that meet the unique requirements of the world's busiest and fastest networks. Juniper Networks translated the experience gained in the implementation and deployment of the first-generation Internet Processor ASIC into an expanded and scalable feature set on the Internet Processor II ASIC. While the Internet Processor II ASIC still delivers a 40-Mpps forwarding rate, it also adds the packet processing features you need to build a competitive advantage in a rapidly evolving industry. Offering rich packet processing features that include firewall filtering, sampling, logging, counting, and enhanced load balancing, the Internet Processor II ASIC maintains high performance in the presence of value-added feature sets and enhanced services.
For systems that did not originally ship with an Internet Processor II ASIC, a simple field upgrade of the system board is all that you need to enable Internet Processor II functionality on all existing interfaces.
A second generation Internet Processor II ASIC is offered on enhanced system boards, which are now available for all M-series platforms except the M5 and M10. The enhanced S-board contains the second generation Internet Processor II processor, which sports a faster clock speed and increased memory.
M5/M10 and M7i/M10i System Midplanes
The M5/M10 and M7i/M10i systems are based on FPC and System Board functionality combined into the Forwarding Engine Board/Compact Forwarding Engine Board (FEB/CFEB). Each FEB/CFEB can hold four PICs. Thus, the M10/M10i routers support up to eight PICs, while the M5/M7i routers support up to four PICs. The M7i platform features a Fixed Interface Card (FIC). The M7i FIC supports two Fast Ethernet interfaces, or one Gigabit Ethernet interface, depending on the specific configuration, and also provides alarm LEDs and the PIC online/offline buttons. The M10i platform uses a Chassis Management Board (CMB) for PIC online/offline buttons and alarm indicators.
The M7i features an integral Tunnel Services PIC with optional support for an Adaptive Services PIC (ASP) (also internal). When so equipped, the two Services PICs share the bandwidth and FPC/PIC numbering (1/2/0) associated with the Tunnel Services PIC. While the M10i does not support a FIC or an integral Services PIC, the platform does feature RE and CFEB redundancy options.
M40e, M160, T640 and T320 System Midplanes
The midplanes for these platforms support up to eight FPCs (0–7 counting from left to right). The midplane also contains the connector interface panel (CIP) slot.
M20 System Midplane
The M20 System midplane can hold up to four FPC slots (0–3 counting from top to bottom). This midplane also contains the System Switching Board (SSB) slots (control board redundancy is supported) and the Craft Interface slot.
Craft Interface
The Craft Interface is the collection of mechanisms on the Juniper Networks router that allows you to view system status messages and troubleshoot the router. The Craft Interface is located on the front of the chassis and contains system LEDs, and the FPC and PIC online/ofline buttons. On supported platforms the Craft Interface includes an LCD screen that provides status reporting for the entire system.
The M7i’s Fixed Interface Card (FIC) and the M10i’s High-Availability Chassis Manager (HCM) card provide PIC offline and online functionality.
System Status LEDs
The system status LEDs include:
FPC LEDs: Two LEDs exist—one green OK and one red fail. These lights indicate the status of each FPC. Each LED pair is located on the Craft Interface aligned with the corresponding FPC module slot.
Routing Engine LEDs: A red fail LED and a green OK LED on the Craft Interface indicate the status of the Routing Engines.
FPC and PIC Offline Buttons
FPC and PIC offline buttons allow you to take an FPC offline gracefully. Press and hold the offline button near the FPC until the green OK LED extinguishes. For systems like the M5 and M10 that contain fixed FPCs, the online/offline buttons are used to prepare a PIC for removal from the system.
Red Alarm
The red alarm LED indicates a system failure likely to cause an interruption in service. Examples of red alarms are:
Routing Engine failure;
Cooling system failure; and
Interface loss of light or framing.
Yellow Alarm
The yellow alarm LED indicates a system warning not likely to interrupt service, but if left uncorrected, might eventually cause a service interruption. Examples of yellow alarms are:
Maintenance alert;
FPC with recoverable errors; and
Cooling system problems.
You can configure the mapping of various events to an alarm action of ignore, yellow, or red. Environmental and safety-related alarms cannot be remapped, however.
LCD Display
The Craft Interface on selected platforms supports a four-line LCD screen with six navigation buttons. The LCD screen operates in one of two display mode. The default mode, idle mode, displays the current system status until it is preempted by alarm mode. The following list contains the basic status information displayed:
Router’s name, on the first line;
Number of days, hours, minutes, and seconds that the system has been running, on the second line; and
Status messages, on the fourth line, which are various system status messages that cycle at 3-second intervals.
You can alter the idle mode display by specifying a message of your choosing with the set chassis display message operational mode command. The Craft Interface display cycles between the user-defined and standard display every 2 seconds. The user-defined message only persists for 5 minutes unless you also give the permanent argument. You can view the LCD display, along with an ASCII representation of the status LEDs, with a show chassis craft-interface operational mode command.
Continued on next page.
LCD Display (contd.)
The following example shows a custom user message being displayed:
lab@San_Jose-3> show chassis craft-interface
Red alarm: LED off, relay off
Yellow alarm: LED off, relay off
Routing Engine OK LED: On
Routing Engine fail LED: Off
FPCs 0 1 2 3
-------------------
Green * * . .
Red . . . .
LCD screen:
+--------------------+
|"NOC contact Foo @ |
|555-1212" |
| |
| |
+--------------------+
Alarm mode displays alarm conditions whenever the red or yellow alarm LED is lit. When a red or yellow alarm occurs, alarm mode preempts idle mode and the LCD displays a message to alert you of serious alarm conditions. In alarm mode, the screen displays the following information:
Router’s name, on the first line;
Number of alarms active on the router, on the second line; and
Individual alarms, with the most severe condition shown first, on the third and fourth lines. Each line indicates whether the alarm is a red (R) or yellow (Y) alarm.
Idle and alarm present display samples are provided here:
Currently, the only use for the menu and navigation buttons associated with the LCD is to display the port status for certain high-density PICs, such as the 12-port and 48-port Fast Ethernet PICs supported on the M40e and M160 platforms. To display port status on such a PIC, follow these steps:
Locate the LCD and select MENU.
Choose fe pic status and press ENTER.
Scroll the arrow buttons to select the slot and PIC number. Then press ENTER to see port status.
Read the port numbers vertically. You will see one of three symbols:
* (asterisk–equivalent to green port LED): Port is active and receiving data.
– (minus sign–equivalent to flashing green port LED): Port might be active but is not receiving data.
Blank: Port is not active.
Dry Relay Contacts
The Craft Interface on M20 and M40 routers contains two sets of relay contacts—one set is activated by a system red alarm and one set is activated by a system yellow alarm. You can connect the alarm relay contacts to an external alarm device such as a siren or bell. The term dry indicates that the relays provide either a normally open or normally closed contact but that current/voltage is not provided by the relays themselves.
Whenever a system alarm condition, such as fan failure or excessive temperature, triggers the red alarm on the Craft Interface, it also activates the red alarm relay contacts. Maintenance alerts, which trigger the yellow alarm on the Craft Interface, also activate the yellow alarm relay contacts.
On the M160, M40e, and T-series platforms, the dry relay contacts are located on the Connector Interface panel (CIP).
Cutoff Button
You can manually silence external devices connected to the alarm relay contacts by pressing the alarm cutoff/lamp test (ACO/LT) button, which is located on the Craft Interface panel. Silencing the device does not remove the alarm messages from the display or extinguish the alarm LEDs. In addition, new alarms that occur after silencing an external device reactivate the external device. If no alarm is present, pressing the alarm cutoff button illuminates all LEDs on the Craft Interface as a lamp (LED) test.
Typical Platform Components
This slide shows the location and placement of the field replaceable units (FRUs) that are typically found in M-series and T-series routing platforms in the context of the T-640 Internet routing node. The field replaceable components of the T640 platform are similar to those of various other routing platforms.
Please refer to Appendix A for a detailed treatment of each platform’s FRUs and their locations.
Product Comparison: M-Series
This slide provides a matrix of key characteristics associated with the M-series product line. The model number for each router is based on the aggregate throughput capabilities of that router. For example, an M160 router can process 160 million packets per second. Also, all M-series routers support both AC/DC power (but not both simultaneously), except the M160, which requires DC power.
Product Comparison: T-series
Juniper Networks is currently shipping two T-series platforms. The T640 Internet routing node is the flagship of the product line with its clustering capability and single-chassis forwarding rate of 640 million packets per second. The T320 is a smaller platform that supports 16 PICs (two per FPC).
To allow the reuse of existing M-series PICs, the T320 platform supports three FPC types; a T320 FPC1 supports original M-series PICs, while the FPC2 supports PICs native to the M160. The FPC3 is the native FPC for the T-series and is designed to support high-speed T-series PICs. You can mix and match FPC types within a single chassis. The T640 supports only FPC types 2 and 3.
T-series platforms require DC power.
PICs
The following slides examine a few of the common PICs. For a complete listing, see the attached PIC appendix (Appendix D).
Basic PICs
The slide shows a partial listing of the PICs supported by Juniper Networks M-series and T-series routers. Please see Appendix D in this document or https://www.juniper.net/products/ip_infrastructure/modules/ for a current breakdown of PIC support. The slide begins with a partial listing of the basic PIC types currently available. These PICs provide conventional functionality and do not offer enhanced services.
IP Services PICs
IP Services PICs provide a hardware assist to complex packet processing tasks, such as the stateful firewall and Network Address Translation (NAT) functions provided by the Adaptive Services PIC or the PIM sparse mode register encapsulation provided by the Tunnel Services PIC. IP Services PICs have no physical connectors.
Continued on next page.
Services PICs
Juniper Networks is now shipping a new generation of service-enabling PICs based on Q Performance Processor (QPP) technology. These PICs deliver granular QoS capabilities along with extensive instrumentation and diagnostics on a per-logical interface basis. Some of the QPP highlights associated with Ethernet Services PICs are:
Granular per-VLAN QoS—including WRR, strict priority scheduling, RED, WRED, policing, marking, and shaping—supports differentiated services and converged applications over a single interface.
MAC policing and filtering enables providers to establish peering arrangements without complex routing configurations and also supports additional levels of QoS enforcement.
VLAN rewrite, tagging, and deleting enables flexible use of VLAN address space to support more customers and services.
Extensive per-MAC and per-VLAN billing and accounting capabilities are supported by multiple counters for gathering statistics on frames, packets, and bytes that are transmitted, received, or dropped.
Fast Ethernet PICs
The 4-port Fast Ethernet PIC for the M-series Internet routers provides economic 100-Mbps performance with high reliability and low maintenance costs. The 48-port Fast Ethernet PIC, which is supported in the M160 and M40e platforms, provides excellent performance and density for more demanding applications. Four high-density connectors on the PIC each support 12 10/100-Mbps ports using the very high-density connector interface (VHDCI) to RJ-21 cables provided. Note that each of the VHDCI connectors can support an aggregate throughput of 700 Mbps. Using all 48 ports at 100 Mbps represents oversubscription, which the router handles gracefully.
ATM PICs
The ATM OC-3/STM-1 and ATM OC-12/STM-4 PICs for the M-series Internet routers provide both the performance and the density to scale ATM-based backbones. They are useful for terminating ATM access circuits and for terminating ATM virtual circuits extending across a network backbone.
Continued on next page.
SONET PICs
Juniper Networks supports a variety of SONET PICs that range in speed from STM-1/OC3 to STM-64/OC192c. The OC-192c/STM-64 PIC is advantageous when offering high bandwidth for inter- and intra-POP connections. The PIC can also support four 2.4-Gbps OC-48/STM-16 circuits when operating in nonconcatenated mode. The OC-192c PIC does not require an FPC because the quad-wide PIC contains built-in FPC functionality. Note that on the T320 and T640 platforms the STM-64/OC192c interface is a PIC, and you can insert four such PICs into a T-series FPC3!
ASIC Functionality and Packet Flow
The following pages examine the ASICs and packet flow through the M-series platforms.
M-series ASICs
This slide displays the ASICs that make up an M-series router’s Packet Forwarding Engine (PFE). The function of each ASIC is detailed on subsequent pages. In M-series platforms the ASICs that comprise the PFE are located in the PICs, FPCs, and the System Board. On the M40 router the buffer management ASICs are mounted on the system midplane. The M5/M10 and M7i/M10i routers combine FPC and System Board functionality into the FEB/CFEB. The CFEB makes use of a combined I/O manager, Distributed Buffer Manager, and Internet Processor II ASIC to reduce cost and power consumption while also improving reliability. This ASIC is sometimes called the ABC ASIC in keeping with the internal ASIC designation of A, B, and C for the Distributed Buffer Manager, I/O Manager, and Internet Processor II ASICS respectively.
M-series Packet Flow: Part 1
When a packet arrives on an input interface of the router, the PIC controller ASIC performs all the media-specific operations such as physical-layer framing and link-level FCS (CRC) verification. The PIC then passes a serial stream of bits to the I/O Manager ASIC on the FPC.
M-series Packet Flow: Part 2
The I/O Manager ASIC parses the bit stream to locate the Layer 2 and Layer 3 encapsulation and chops the packet into 64-byte chunks called J-cells. These J-cells are then sent to the inbound Distributed Buffer Manager ASIC.
The I/O Manager ASIC also:
Removes Layer 2 encapsulation to locate the beginning of the Layer 3 packet;
Identifies incoming logical interface;
Performs basic packet integrity checks;
Counts packets and bytes for each logical circuit; and
Performs BA-based traffic classification to associate traffic with a forwarding class for egress queuing and scheduling operations. Examples of BA classification include IP precedence, DiffServ code points, and MPLS EXP bit settings.
M-series Packet Flow: Part 3
The Distributed Buffer Manager 1 ASIC receives J-cells from each FPC’s I/O Manager ASIC and writes them into the shared memory bank. The shared memory bank is made up of memory contributed by each FPC installed in the router.
The I/O Manager ASIC also extracts the key information, which is normally the first 64-bytes of a Layer 3 packet, and passes this information to the Internet Processor II ASIC in the form of a notification cell. The Internet Processor II performs a longest-match route lookup against the forwarding table to identify the packet’s outgoing interface and forwarding next hop.
M-series Packet Flow: Part 4
The Internet Processor II ASIC determines the ultimate destination for every packet arriving on a transit interface. The Internet Processor ASIC consults a copy of the forwarding table, which contains destination prefixes and their corresponding next hops. The forwarding table is constructed by the Routing Engine and maintained by the JUNOS software kernel.
After the Internet Processor II ASIC determines the packet’s egress interface and forwarding next hop, it amends the notification cell with this information and passes the notification cell to the second Distributed Buffer Manager ASIC. The second Distributed Buffer Management ASIC then passes the notification cell to the I/O Manager ASIC on the egress FPC (as identified by the modified contents of the notification cell). The Distributed Buffer Manager 2 ASIC acts as an agent for the FPC’s I/O Manager ASIC. Once the I/O Manager ASIC receives a notification cell indicating that a packet is waiting to be serviced, it issues read requests to the Buffer Manager 2 ASIC for the J-cells associated with this packet. As the I/O Manager receives the J-cells, it transmits them to the PIC Controller ASIC, which in turn transmits them out the appropriate port.
In the case of a multicast packet, multiple outgoing interfaces might exist, in which case the notification cell is directed to multiple FPCs or to the same FPC multiple times, once for each outgoing interface served by that FPC.
M-series Packet Flow: Part 5
When the egress FPC is ready to service the packet, the I/O Manager ASIC issues read requests for the 64-byte J-cells that comprise the packet. In response, the Distributed Buffer Manager 2 ASIC retrieves the J-cells from shared memory and feeds them to the I/O Manager ASIC. The I/O Manager ASIC reassembles the packet, decrements the packet’s TTL, adds the Layer 2 framing, and then sends the bit stream to the egress PIC.
The I/O Manager ASIC is responsible for CoS-related queuing, scheduling, and congestion avoidance operations at packet egress. Note that the packet itself is never queued on the FPC; rather, a pointer to the packet, in the form of a notification cell, is queued on the egress FPC. Each output port on a a given PIC is associated with four forwarding classes (or queues). You configure schedulers to provide each forwarding class with some share of the port’s bandwidth.
Note that traffic classification, which associates traffic with one of the defined forwarding classes, occurs at the ingress FPC. Once identified at ingress, the traffic is handled in accordance with the parameters configured for that traffic class by the I/O Manager on the egress FPC. The I/O Manager implements the random early detection (RED) algorithm during egress processing to avoid tail drops and the resulting risk of global synchronization of TCP retransmissions. A full coverage of JUNOS software CoS capabilities is beyond the scope of this class.
ASIC Functionality and Packet Flow
The following pages examine the function of each T-series ASIC, the T-series switch fabric, and the flow of packets through the T-series PFE.
The T-series Packet Forwarding Engine
The term Packet Forwarding Engine (PFE) is used as a collective noun to describe the collection of components that work together to perform longest-match lookups and packet forwarding using a high-performance, silicon-based switching path. This slide lists the ASICs associated with the T-series PFE and provides a high-level description of the function performed by each ASIC. Subsequent pages delve into the role that each ASIC plays in packet forwarding in greater detail. Note that each T-series FPC provides one (FPC2) or two (FPC3) complete PFE complexes when the FPC is also equipped with one or more PICs:
Media-Specific ASIC: Each PIC type is equipped with a one or more ASIC specifically designed to handle the needs of a particular medium. For example, a SONET PIC is equipped with an ASIC that handles SONET framing and alarm generation.
Layer2/Layer 3 Processing ASIC: After the PIC performs the medium-specific functions, the bit stream is handed to the Layer2/Layer3 processing ASIC, which removes Layer 2 encapsulation, parses the Layer 3 header, and segments the bit stream into 64-byte chunks.
Queuing and Memory Interface ASIC: The Queuing and Memory Interface ASIC is responsible for writing and reading the 64-bite chunks to the shared memory switch fabric present on each T-series PFE.
Internet Processor II ASIC: The Internet Processor II ASIC performs longest-match route lookups using the information found in the notification cell (the first 64-byte chunk of a Layer 3 packet).
Switch Interface ASIC: The Switch Interface ASICs handle the movement of data between T-series PFEs by facilitating the exchange of 64-byte chunks across the T-series cross-bar switch fabric.
The T-series Switch Fabric
T-series platforms use a nonblocking cross-bar switch fabric to switch traffic between the system’s FPCs. The switch fabric is instantiated by the Switch Interface Board (SIB), which contains the F16 ASIC. SIBs interface to each FPC through high-speed lines (HSLs) that terminate on the SIB’s F16 ASIC. The F16 ASIC provides a 16x16 matrix of high-speed input/output lines. Each HSL can support 10 Gbps of half-duplex traffic. By connecting each FPC to two of the F16’s HSLs, 10 Gbps of full-duplex capacity (20 Gbps aggregate throughput) is achieved between that FPC and SIB. Each FPC is connected to multiple SIBs to provide the speedup needed for a nonblocking switch fabric and for redundancy reasons.
T-series FPCs interface to the switch fabric over the fabric side (f) of the Switch Interface ASIC; the WAN (w) side of the ASIC interfaces to the Layer2/Layer3 processing ASIC. The Switch Interface ASIC is also called the “N” chip. We use this terminology on the slide to save space.
The graphic illustrates the specifics of a T320’s switch fabric. Here, each FPC (or PFE) has four HSL connections to each of SIB 1 and SIB 2. This provides the T320 FPC with 40 Gbps of aggregate capacity. To accommodate SIB failures, each T320 FPC is also connected to a third SIB (SIB 0) using a single HSL. In normal operation, SIB 1 and 2 are active while SIB 0 functions in hot standby mode. SIB 0 automatically becomes active in the event of a SIB 1 or 2 failure. However, the fact that each FPC is interconnected to SIB 0 through a single HSL means that switch fabric speedup is reduced. The reduction in speedup results in a graceful degradation of the T320’s switch fabric that might result in some packet loss. The T640 makes use of five SIBs in a similar configuration, with the exception that all FPCs are attached to all SIBs using two HSLs. The result is that a T640’s switch fabric remains nonblocking despite the presence of a SIB failure. Multiple SIB failures results in graceful degradation of the T640 switch fabric capacity.
T-series Packet Flow: Part 1
We begin our tour of packet flow through a T-series routing node with the arrival of traffic on the incoming PIC interface.
The media-specific ASIC on the ingress PIC handles the required physical layer signaling, framing, and medium-specific alarm generation. The PIC passes the stream of bits to the Layer 2/Layer 3 ASIC on the FPC along with an indication that the frame was received without errors (no CRC error detected).
T-series Packet Flow: Part 2
The Layer 2/Layer 3 Packet Processing ASIC performs Layer 2 and Layer 3 parsing. The Layer 2/Layer 3 ASIC also divides the packets into 64-byte chunks called J-cells. The J-cells are sent to the Switch Interface ASIC.
Errors detected during the Layer 2/Layer 3 parsing steps or when the Layer 2/Layer 3 processing ASIC receives an indication from the PIC that the received frame is corrupt results in error counter increments and an effective no-op flag for any J-cells relating to the corrupted frame still housed in shared memory.
The Layer 2/Layer 3 Processing ASIC also performs behavior aggregate (BA) based traffic classification to associate traffic with a forwarding class for egress queuing and scheduling operations. Examples of BA classification include IP precedence and DiffServ code points.
T-series Packet Flow: Part 3
The Switch Interface ASIC extracts the route lookup key (comprised of the first 64 bytes of data in the Layer 3 packet), places it in a notification cell, and passes the notification to the T-series Internet Processor. The Switch Interface ASIC then passes the remaining data cells to the Queuing and Memory Interface ASICs. These ASICs manage the shared memory switch fabric associated with each T-series PFE. Note that the shared memory fabric facilitates the switching of packets within a specific PFE complex, such as occurs when the source and destination PICs share a PFE.
T-series Packet Flow: Part 4
The Queuing and Memory Interface ASICs pass the received J-cells to the PFE’s memory for buffering in the shared memory fabric within the PFE. Note that the cross-bar switch fabric is only used to exchange packets between PFE complexes.
While the J-cells are being written into shared memory, the Internet Processor II ASIC performs a route lookup operation on the key data. The modified notification cell is then forwarded to the Queuing and Memory Interface ASIC.
T-series Packet Flow: Part 5
At this stage of the packet’s processing, the Queuing and Memory Interface ASIC sends the notification cell to the Switch Interface ASIC that faces the switch fabric, unless the destination is a port on the same Packet Forwarding Engine. In this case, the notification is sent to the Switch Interface ASIC that faces the Layer 2/Layer 3 Processing ASIC. Packets exchanged between ports on a common PFE do not transit the switch fabric.
The Switch Interface ASIC sends bandwidth requests through the switch fabric to the destination PFE for those destinations that reside on another PFE. The Switch Interface ASIC also issues read requests to the Queuing and Memory Interface ASIC to begin reading data cells out of memory when the egress PFE (and the switch fabric) indicates it is ready to handle a given J-cell.
T-series Packet Flow: Part 6
The destination Switch Interface ASIC returns bandwidth grants through the switch fabric to the originating Switch Interface ASIC in response to received bandwidth requests.
Upon receipt of each bandwidth grant, the originating Switch Interface ASIC sends a cell through the switch fabric to the destination PFE.
T-series Packet Flow: Part 7
The destination Switch Interface ASIC receives cells from the switch fabric. Each notification cell is modified to reflect the new memory locations for the related J-cells (the memory locations for each chunk varies by PFE) and forwards the modified notification cell to that PFE’s Internet Processor II ASIC for another longest-match lookup operation.
The T-series Internet Processor II performs the route lookup and forwards the notification to the Queuing and Memory Interface ASIC.
T-series Packet Flow: Part 8
The Queuing and Memory Interface ASIC forwards the modified notification cell, which now includes next-hop information, to the Switch Interface ASIC.
The Switch Interface ASIC sends read requests to the Queuing and Memory Interface ASIC to read the data cells out of memory and passes the cells to the Layer 2/Layer 3 Packet Processing ASIC.
T-series Packet Flow: Part 9
The Layer 2/Layer 3 Packet Processing ASIC reassembles the data cells into packets. The Layer2/Layer 3 processing ASIC then adds appropriate Layer 2 encapsulation and sends the resulting bit stream to the egress PIC.
The Layer 2/Layer 3 Processing ASIC is responsible for CoS-related queuing, scheduling, and congestion avoidance operations at packet egress. Each output port on a a given PIC is associated with four forwarding classes (or queues). You configure schedulers to provide each forwarding class with some share of the port’s bandwidth.
Note that traffic classification, which associates traffic with one of the defined forwarding classes, occurs at the ingress FPC. Once so identified at ingress, the traffic is handled in accordance with the parameters configured for that traffic class by the Layer 2/Layer 3 Processing ASIC on the egress FPC. The random early detection (RED) algorithm is implemented by the Layer 2/Layer 3 Processing ASIC during egress processing to avoid tail drops and the resulting risk of global synchronization of TCP retransmissions.
The T-series switch Interface ASICs handle switch fabric queuing and prioritization to extend CoS across the T-series switch fabric.
A full coverage of JUNOS software CoS capabilities is beyond the scope of this class.
T-series Packet Flow: Part 10
The final steps in egress packet processing come into play when the egress PIC sends the packet out into the network with the appropriate physical layer signaling and medium-specific framing. The egress PIC also calculates and adds a CRC to the frame as needed for each particular medium.
Exception Packets
Exception packets require some form of special handling. Examples of exception traffic include:
Packets addressed to the chassis, such as routing protocol updates, Telnet sessions, pings, traceroutes, and replies to traffic sourced from the RE.
IP packets with the IP options field. Options in the packet's IP header are rarely seen, but the Packet Forwarding Engine was purposely designed not to handle IP options. They must be sent to the Routing Engine for processing.
Traffic that requires the generation of ICMP messages. ICMP messages are sent to the packet’s source to report various error conditions and to respond to ping requests. Examples of ICMP errors include destination unreachable messages, which are sent when there is no entry in the forwarding table for the packet's destination address, or time-to-live (TTL) expired messages, which are sent when a packet’s TTL is decremented to zero. In most cases, the PFE process handles the generation of ICMP messages.
Packet Forwarding Engine CPU
The Internet Processor II ASIC passes exception packets to the microprocessor on the Packet Forwarding Engine Control Board, which in turn processes almost all of them. Certain exception packets are also sent to the Routing Engine for further processing. Exception traffic destined for the Routing Engine is sent over the 100 Mbps fxp1 interface. Exception traffic is rate-limited by the PowerPC processor to protect the Routing Engine from denial-of-service attacks. During times of congestion, the router gives preference to the local and control traffic, with the latter being afforded a minimum of 5% of the fxp1 interface’s bandwidth through hardware-based weighted round-robin (WRR) queuing.
JUNOS Software
The following slides detail key features and characteristics of JUNOS software.
A Single JUNOS Image with All Features
Currently, all M-series and T-series platforms execute a common JUNOS software image. Further, this image can support all features on all platforms. In some cases, you need some form of IP Services PIC for a given feature. These PICs are available for all M-series and T-series platforms. The ease and simplicity of downloading a JUNOS software image and knowing that all features are supported across the entire product line are significant benefits to using Juniper Networks products.
Independent Modular Design
Software modules inside the JUNOS software (called processes) are separated by hardware-assisted memory protection, which prevents one software process from accessing memory being used by another. This arrangement allows the system to recover from errors quickly and divides the software debugging tasks into manageable pieces. For example, a failure in the network management software module does not impact any of the routing protocols or the forwarding performance.
A side effect of the modular design is that detailed failure information is saved for analysis by the Juniper Technical Assistance Center (JTAC). In most cases, the failure of a software module results in a memory snapshot (core dump) and an automatic attempt to restart the failed module—all without interrupting packet forwarding. A core file describes the exact state of the system when the error occurred, and this allows Juniper Networks Engineering personnel to accurately diagnose and correct the problem.
Purpose Built, Internet Proven
JUNOS software is based on stable, open source code, that was modified by experts to provide industry-leading stability and scalability. To date no vendor has demonstrated Internet software as robust and as scalable as JUNOS software, as documented by independent testing performed by Light Reading. Details can be found at: http://www.lightreading.com/document.asp?doc_id=4101&site=lightreading.
Software Processes Overview
JUNOS software consists of a series of system processes that handle the router’s management processes, routing protocols, and control functions. The JUNOS kernel, which is responsible for scheduling and device control, underlies and supports these processes. The JUNOS architecture is a multi-module design, with each process running in protected memory to guard against system crashes and to ensure that runaway applications do not corrupt each other. This modular design makes it significantly easier to restart or upgrade a specific module because you do not have to reboot the entire chassis. The introduction of services is a highly reliable process because the failure of one module does not impact the entire operating system adversely. Between these independent modules are clean, well-defined interfaces, which provide interprocess communication, resulting in a highly reliable software architecture.
JUNOS software resides in the Routing Engine, which is designed around an Intel-based PCI platform. The RE has a dedicated 100-Mbps internal connection to the PFE, which is responsible for ASIC-driven packet flow through the router.
The RE connects directly to the PFE. This separation of routing and forwarding performance ensures that the RE never processes transit packets. Of the traffic that goes to the RE, data link layer keepalives and routing protocol updates receive the highest priority to ensure that adjacencies never go down—regardless of the load—thereby preventing failures from cascading through the network.
Additionally, JUNOS software passes incremental changes in the forwarding table to the PFE so that high rates of change are handled quickly and cleanly. Together, the nearly instantaneous routing updates and JUNOS software stability ensure that the PFE continues to forward packets at wire-rate speeds during times of heavy route fluctuations.
JUNOS Kernel
The Routing Engine (JUNOS) kernel provides the underlying infrastructure for all the JUNOS software processes. In addition, the kernel provides the link between the routing tables and the Routing Engine's forwarding table. It is also responsible for all communication with the Packet Forwarding Engine, which includes keeping the PFE's copy of the forwarding table synchronized with the master copy in the Routing Engine.
Routing Protocol Daemon Core Functions
The routing protocol daemon (rpd) controls the routing protocols running on the router. It starts all configured routing protocols and handles all routing messages. It also maintains one or more routing tables, which are also called routing information bases (RIBs). These tables consolidate the routing information learned from various routing protocols into a common table.
The routing protocol process determines the active routes to network destinations and installs these routes into the Routing Engine's forwarding table, also called the forwarding information base (FIB). Finally, it implements routing policy, which allows you to control the routing information that is transferred between the routing protocols and the routing table. Using routing policy, you can filter routing information or modify attributes associated with the routes such as adding or removing BGP communities.
JUNOS software implements unicast and multicast IP routing functionality for IP Version 4 (IPv4) and Version 6 (IPv6) and also supports MPLS signaling and switching. JUNOS software also supports IPSec and various forms of VPN services.
Unicast Routing Protocols
JUNOS software supports the following unicast routing protocols:
IS-IS: Intermediate System-to-Intermediate System (IS-IS) is an interior gateway, link-state routing protocol for IPv4 and IPv6.
OSPF: Open Shortest Path First (OSPF), Version 2, is an interior gateway protocol (IGP) that was developed for IP networks by the Internet Engineering Task Force (IETF). OSPF is a link-state protocol that makes routing decisions based on the SPF algorithm. Version 3 of the OSPF protocol adds support for IPv6.
RIP: Routing Information Protocol (RIP), Version 2, is a distance-vector IGP for IP networks based on the Bellman-Ford algorithm. The RIPng protocol adds support for IPv6.
BGP: Border Gateway Protocol (BGP), Version 4, is an exterior gateway protocol (EGP) that guarantees the loop-free exchange of routing information between routing domains, also called autonomous systems.
Continued on next page.
Multicast Protocols
JUNOS software supports the following multicast protocols:
DVMRP: Distance Vector Multicast Routing Protocol (DVMRP) is a dense-mode, or flood-and-prune, multicast routing protocol.
PIM sparse mode and PIM dense mode: Protocol Independent Multicast (PIM) is a multicast routing protocol. PIM-Sparse Mode routes to multicast groups that can span wide-area and interdomain internets. PIM dense mode is a flood-and-prune protocol.
MSDP: Multicast Source Discovery Protocol (MSDP) allows multiple PIM sparse mode domains to be joined. A rendezvous point (RP) in a PIM sparse mode domain has a peering relationship with an RP in another domain, enabling it to discover multicast sources from other domains.
IGMP: Internet Group Management Protocol (IGMP), Versions 1, 2, and 3 are used to manage membership in multicast groups. The Multicast Listener Discovery protocol (MLD) replaces IGMP functionality for IPv6 hosts.
MPLS Applications Protocols
JUNOS software supports the following MPLS applications protocols:
MPLS: Multiprotocol Label Switching (MPLS), formerly known as tag switching, allows you to establish label-switched paths (LSPs) through a network. The resulting LSPs allow you to direct traffic through particular paths, rather than relying on the IGP's least-cost algorithm. JUNOS software supports a variety of provider-provisioned VPN solutions based on MPLS forwarding within the provider’s backbone.
RSVP: The Resource Reservation Protocol (RSVP) provides a mechanism for signaling LSPs over paths that are independent of the shortest path seen by a link-state routing protocol. RSVP itself is not a routing protocol; it operates with current and future unicast and multicast routing protocols. The primary purpose of the JUNOS RSVP software is to support dynamic signaling for MPLS LSPs in a traffic engineered environment.
LDP: The Label Distribution Protocol (LDP) provides a mechanism for distributing labels in nontraffic engineered MPLS applications.
If you plan to manually lift the chassis into the rack, you must remove enough of the system components so that two people can safely lift the chassis. After removing the components, for example, the M40 chassis weighs approximately 80 pounds (36 kg). The heaviest components are the power supplies and the Routing Engine. You remove the following components from the chassis:
From the back of the chassis:
Remove the power supplies
Remove Routing Engine
Remove the fan assemblies
From the front of the chassis:
Remove the FPCs
Remove the SCB or SSB
Remove the fan assemblies
If you plan to manually lift the chassis into the rack, you must remove enough of the system components so that two people can safely lift the chassis. After removing the components, for example, the M40 chassis weighs approximately 80 pounds (36 kg). The heaviest components are the power supplies and the Routing Engine. You remove the following components from the chassis:
From the back of the chassis:
Remove the power supplies
Remove Routing Engine
Remove the fan assemblies
From the front of the chassis:
Remove the FPCs
Remove the SCB or SSB
Remove the fan assemblies
If you plan to manually lift the chassis into the rack, you must remove enough of the system components so that two people can safely lift the chassis. After removing the components, for example, the M40 chassis weighs approximately 80 pounds (36 kg). The heaviest components are the power supplies and the Routing Engine. You remove the following components from the chassis:
From the back of the chassis:
Remove the power supplies
Remove Routing Engine
Remove the fan assemblies
From the front of the chassis:
Remove the FPCs
Remove the SCB or SSB
Remove the fan assemblies
If you plan to manually lift the chassis into the rack, you must remove enough of the system components so that two people can safely lift the chassis. After removing the components, for example, the M40 chassis weighs approximately 80 pounds (36 kg). The heaviest components are the power supplies and the Routing Engine. You remove the following components from the chassis:
From the back of the chassis:
Remove the power supplies
Remove Routing Engine
Remove the fan assemblies
From the front of the chassis:
Remove the FPCs
Remove the SCB or SSB
Remove the fan assemblies
A Juniper router has three forms of storage media:
Removable media—Depending on router model, your router may have an LS-120 floppy drive (which reads a 120-MB LS-120 floppy disk or a standard 1.4 MB floppy disk) or a PCMCIA card slot (which reads flash cards). A copy of the JUNOS software is shipped on removable media with each router.
Flash drive—Nonrotating drive. On new Juniper routers, the JUNOS software is preinstalled on the flash drive.
Hard drive—Rotating drive. On new Juniper routers, a backup copy of the JUNOS software is preinstalled on the hard drive. This drive is also used to store system log files and diagnostic dump files.
A Juniper router typically boots either from the flash drive or from the hard drive. (While it is possible to boot the router from the removable media drive, this is not typically done.) These drives are referred to as the boot media. The drive from which the router boots is called the primary boot medium, and the other drive is the secondary boot medium. The primary boot medium is generally the flash drive, and the secondary boot medium is generally the hard drive.
When you receive a Juniper router, all the software is installed on the system. When you power on the router, the software and all the software processes start automatically. You simply need to configure the software and the router will be ready to participate in the network.
The software is installed on the router’s flash drive (a nonrotating drive) and hard drive (a rotating disk), and a copy of the software also is provided on either a 120-megabyte LS-120 floppy disk or a 110-megabyte PCMCIA flash card. Normally, when you power on the router, it runs the copy of the software that is installed on the internal flash drive.
Periodically, you might want to upgrade the router software as new features are made available or as software problems are fixed. You normally obtain new software by downloading the images onto your router or onto another system on your local network. Then you install the software upgrade on the router’s flash and hard drives and, optionally, you can copy it onto an LS-120 floppy disk or PCMICA flash card.
If the copy of the software that is on the flash or hard drive becomes damaged, you should reinstall the software onto those media.
At power-on, when the router boots, it first attempts to start the image from the removable media if it is installed in the Routing Engine. If this fails, or no media is installed, the router next tries the flash drive, then finally the hard drive.
This sequence is controlled by hardware that waits for a special signal from the JUNOS kernel, indicating a successful boot. If the hardware does not receive the signal after a few minutes, it forces the system to boot from the next available device in the boot chain.
When you receive a Juniper router, the JUNOS Internet software is preinstalled on the router; once the router is powered on, it is ready to be configured. You can configure the router from a console connected to the router’s console port.
Before you configure the software for the first time, you need the following information:
Root password desired for this router
Name of this router
Internet Protocol address and prefix length information for the management Ethernet interface
IP address of a default router
IP address of a Domain Name System (DNS) server
You should always set the root password. The secure shell (ssh) package is only available on domestic systems.
When setting the plain-text password, the system prompts for the password on a separate line and does not log the plain-text password if you enable command logging.
There are two methods used to configure a default router connected to your management interface. Juniper Networks recommends using both methods.
The first method ensures a default router is usable while the system is booting. Should the routing process fail to start, the backup router remains installed in the routing and forwarding tables and access to the router through the management interface will remain enabled. The backup default route is removed when the routing process starts successfully.
The second method specifies a static default route used by the routing protocol software. This route remains in the routing table until removed manually. The retain statement tells the system to keep the route in the system routing and forwarding tables even if the routing software process fails. The no-readvertise statement ensures that your default route is not advertised via routing protocols to other routers in your network.
You perform a complete software reinstallation when the JUNOS system software on the router has become damaged for some reason; for example if the secondary storage on the system has failed. To reinstall the software, perform these steps:
Prepare to reinstall the JUNOS software
Reinstall the JUNOS software
Configure the JUNOS software
Before you install the JUNOS software, you must do the following:
If your router is currently running, take a quick inventory of your current network configuration. After you install the software, you will need to re-enter some of this information during the installation process. Items to record include:
Name of the router
IP address and prefix length information for the fxp0 interface
IP address of a default router
IP address of a DNS server
If you want to continue using the existing configuration after you reinstall the software, copy the existing configuration, which is in the file /config/juniper.conf, from the router, either to another system or to a floppy disk.
Ensure that you have a Juniper installation floppy or PCMCIA card. Contact customer support if you cannot locate the installation media that shipped with your router.
Because installation resets the root password, you need to set the root password by entering either a clear-text password that the system encrypts, a password that is already encrypted, or an RSA public key string for use with SSH.
Each JUNOS software release consists of the base operating system and four software packages:
jkernel—Operating system package
jroute—Contains the software that runs on the Routing Engine
jpfe—Contains the software that runs on the router’s Packet Forwarding Engine
jdocs—Contains on-line configuration reference documentation
jbundle—Contains the above four packages combined
You can upgrade the software packages individually.
Each JUNOS software release consists of the base operating system and four software packages:
jkernel—Operating system package
jroute—Contains the software that runs on the Routing Engine
jpfe—Contains the software that runs on the router’s Packet Forwarding Engine
jdocs—Contains on-line configuration reference documentation
jbundle—Contains the above four packages combined
You can upgrade the software packages individually.
A JUNOS software package has a name in the following format:
package-m.nZnumber.tgz
m.n is two integers that represent software release number.
Z is a capital letter that indicates the type of software release. In most cases, it is an R, to indicate that this is released software. If you are involved in testing prereleased software, this letter might be an A (for alpha-level software), B (for beta-level software), or I (a capital letter I; for internal, test, or experimental versions of software).
number represents the version of the software release, and includes the internal build number for that version. For example, 4.1R1.2 indicates release 4.1, version 1, build 2.
Before you are about to upgrade the JUNOS software, or after you have upgraded the software on the router and are satisfied that the new packages are successfully installed and running, you should consider issuing the request system snapshot command to back up the software onto the /altroot and /altconfig file systems located on the router’s hard drive.
Specifically, the root file system (/) is backed up to /altroot and /config is backed up to /altconfig. Normally, the root and /config file systems are on the router's flash drive, and the /altroot and /altconfig file systems are on the router's hard drive.
Each FPC can have installed on it up to four PICs, which provide the actual physical interfaces to the network. These physical interfaces are the router’s transient interfaces. They are referred to as transient because you can hot-swap an FPC, along with its PICs, at any time, removing it from or inserting it into the router. From the point of view of the Packet Forwarding Engine, you can place any FPC into any slot, and you can generally place any combination of PICs in any location on an FPC. (You are limited by the total FPC bandwidth, which cannot exceed 12.8 Gbps on the M160, or 3.2 Gbps on all other Juniper Networks routers.) From the point of view of the Routing Engine, you must configure each of the transient interfaces based on which slot the FPC is installed in, which location on the FPC the PIC is installed in, and which port you are connecting to.
The number of ports varies depending on the PIC. The ports are numbered from top to bottom and, generally, from right to left. The port numbers are printed on the PIC.
For the interfaces on a Juniper router to function, you must configure them, specifying properties such as the interface location (that is, which slot the FPC is installed in and which location on the FPC the PIC is installed in), the interface type (such as SONET or ATM), encapsulation, and interface-specific properties. You can configure the interfaces that are currently present in the router, and you can also configure interfaces that are not currently present but that you might be adding in the future. When a configured interface appears, the JUNOS software detects its presence and applies the appropriate configuration to it.
When you configure an interface, you are effectively specifying the properties for a physical interface descriptor. Each physical interface descriptor corresponds to a single physical device and is identified by an interface name, which defines the media type, the slot the FPC is located in, the location on the FPC that the PIC is installed in, and the PIC port, and which can optionally define the interface’s channel and logical unit numbers.
Identifying Transient Interfaces
Transient interfaces are identified by the interface’s FPC slot number, the PIC slot number, and the PIC’s physical port number in the form of media-type-fpc-slot/pic-slot/port-number. Channelized interfaces identify a particular subchannel with the addition of a suffix in the form of :sub-channel-number. A logical unit (aka, a subinterface) is identified with a suffix in the form of .logical-interface-number.
FPC and PIC Slot Numbering Varies
The FPC and PIC slot numbering varies by platform due to some platforms using vertically aligned FPC slots while other platforms use a horizontal FPC arrangement. The slide details the differences in FPC and PIC slot numbering for the M160/M160 platforms versus the M20 platform. The graphic shows typical FPC and PIC numbering in the content of the T640 platform, which makes use of vertically aligned FPCs.
The Upside?
The upside to this story is that each platform has labels that clearly identify the FPC slot number and PIC number. Further, each PIC has a label to identify the number associated with that PIC’s physical ports.
Each physical interface descriptor can contain one or more logical interface descriptors. These allow you to map one or more logical (sometimes called virtual) interfaces to a single physical device. Creating multiple logical interfaces is useful for ATM and Frame Relay networks, in which you can associate multiple virtual circuits or data-link connections with a single physical interface device.
Each Juniper router has two permanent interfaces. One, the management Ethernet interface, provides an out-of-band method for connecting to the router. You can connect to the management interface over the network using utilities such as SSH and Telnet, and SNMP can also use the management interface to gather statistics from the router.
The second permanent interface is the internal Ethernet interface, which connects the Routing Engine (the portion of the router running the JUNOS Internet software) to the Packet Forwarding Engine.
For your router to function properly, you must configure each PIC interface present in the router. No PIC interfaces are preconfigured.
For each network media type, the software driver for that media sets reasonable default values for general interface properties, such as the interface’s MTU size, receive and transmit leaky bucket properties, link operational mode, and clock source.
For a physical interface device to function, you must configure at least one logical interface on that device. For each logical interface, you must at a minimum specify the protocol family that the interface supports. You can also configure other logical interface properties. These vary by PIC and encapsulation type, but include the IP address of the interface, whether the interface does not support multicast traffic, DLCIs, VCIs and VPIs, and traffic shaping.
Each logical interface must have a logical unit number. The logical unit number corresponds to the logical unit part of the interface name.
PPP and Cisco HDLC encapsulations support only a single logical interface, and its logical unit number must be zero. Frame Relay and ATM encapsulations support multiple logical interfaces, so you can configure one or more logical unit numbers.
For each logical interface, you can configure one or more of the following protocols that run on the interface:
inet—IP (Internet Protocol). You must configure this protocol family for the logical interface to support IP protocol traffic, including OSPF, BGP, and ICMP.
iso—ISO. You must configure this protocol family for the logical interface to support IS-IS traffic.
mpls—Multiprotocol Label Switching (MPLS). You must configure this protocol family for the logical interface to participate in an MPLS path.
For each address, you can optionally configure one or more of the following:
Address of the remote side of the connection (for point-to-point interfaces only)—Specify this in the destination statement.
Broadcast address for the interface’s subnet—Specify this in the broadcast statement.
Whether this address is the preferred address—Each subnet on an interface has a preferred local address. If you configure more than one address on the same subnet, the preferred local address is chosen by default as the source address when you originate packets to destinations on the subnet. By default, the preferred address is the lowest numbered address on the subnet. To override the default and explicitly configure the preferred address, include the preferred statement when configuring the address.
Whether this address is the primary address—Each interface has a primary local address. If an interface has more than one address, the primary local address is used by default as the source address when you originate packets out the interface where the destination gives no hint about the subnet (for example, some ping commands). By default, the primary address on an interface is the lowest numbered non-127 preferred address on the interface. To override the default and explicitly configure the preferred address, include the primary statement when configuring the address.
For each interface, you can configure an interface-specify MTU. If you increase the size of the protocol MTU, you must ensure that the size of the media MTU is equal to or greater than the sum of the protocol MTU and the encapsulation overhead.
When a network change occurs, it is in everyone’s interest to detect the change and distribute the information as quickly as possible in order to allow routers to change their tables to reflect the new topology. The amount of time that this process takes is called convergence time.
Following a change, at least three things must occur to allow for convergence: first, the routers involved in the change must notice it. That may be as simple as detecting the failure of an interface or the loss of signal, or be a more involved problem, such as detecting a lack of keepalive messages on an Ethernet interface. Second, the involved routers must adjust their own databases and/or routing tables to reflect the change, taking into account any information they may have about how best to adapt to it. Finally, the routers must propagate this change information to other routers.
All routing protocols store their routing information in a common routing table that is maintained by the JUNOS software. From the collected routing information, the JUNOS software calculates the best routes to each destination. These routes are used to forward traffic through the router, and they can be advertised to neighbors using one or more routing protocols.
Routing policy controls the routing information that is transferred between the routing protocols and the JUNOS software routing table. You can filter the routing information so that only some of it is transferred, and you can set properties associated with the routes.
You define routing policies in the following circumstances:
When you do not want a routing protocol to transfer all of its routes into the routing table. That is, when you do not want the routing table to learn about certain routes so that these routes are never used to forward packets through the router.
When you do not want a routing protocol to receive from the routing table all the active routes learned by that protocol.
When you want a routing protocol to receive active routes learned from another routing protocol. This is sometimes called route redistribution.
When you want to set or change the information associated with a route, such as the preference value, AS path, or the BGP community.
Before discussing the design of routing policy, it is necessary to understand two terms—import and export. JUNOS routing policy uses these terms to describe how routes move between the routing protocols and the routing tables:
When a routing protocol places its routes into the routing table, this process is referred to as importing routes into the routing table.
When a dynamic routing protocol uses the routes in the routing table to send protocol advertisements, the protocol takes the route from the routing table. This process is referred to as exporting routes from the routing table.
The process of moving routes between a routing protocol and the routing table is always described from the point of view of the routing table. That is, routes are imported into a routing table from a routing protocol and they are exported from a routing table to a routing protocol. It is important to remember this distinction when working with routing policy.
When evaluating routes for export, the policy software uses only active routes from the routing table. For example, if you have multiple routes to the same destination and one route has a more attractive metric, only that route is evaluated by the policy processing software. Said another way, export policy does not evaluate all routes, but instead it evaluates only those routes a routing protocol could correctly advertise to a neighbor.
Routing policy allows you to filter which routes a routing protocol imports into the routing table and which routes a routing protocol can export from the routing table. You also use routing policy to set the information associated with a route as it is being imported into or exported from the routing table.
Apply an import routing policy to control the routes that the routing protocol process uses to determine active routes.
Apply an export routing policy to control the routes that a protocol advertises to its neighbors.
Each routing protocol can have one or more points within its processing path where you can apply filters. With BGP, for example, you can filter all BGP routing information or limit filtering to a specific group of BGP neighbors. You can establish an import filter for a single BGP neighbor. In all cases, the most specific filter list is always applied.
To design routing policy, you construct a policy or a sequence of policies, which you then apply to a specific routing protocol. All routes received by that protocol are evaluated by the policies. When a route matches a policy, the route is either accepted or rejected. If a route does not match one policy, it is evaluated by the next policy. If the route matches none of the policies, it is subject to a per-protocol default policy action and is accepted or rejected on that basis.
To have routing policies take effect, you apply them to individual routing protocols. You can apply policies to routing information that a routing protocol is importing into the routing table, or to the routes that a routing protocol is exporting from the routing table.
Each policy term consists of statements that define match conditions and actions to take if the conditions are matched.
Actions specify how to handle routes that match the conditions in the term. There are four types of policy actions:
Flow control actions, which affect whether to evaluate the next term or next policy
Terminating actions, which determine if the route is accepted or rejected
Tracing actions, which log route matches to a file
Actions that set properties associated with the route’s information
If the conditions in a term do not match, the default action is taken. The default action can be one of the following:
If there is another term in the current policy, that term is evaluated.
Otherwise, if there is another routing policy listed in the import or export statement, the first term in that policy is evaluated.
Otherwise, if there are no more terms or policies, the protocol-specific default policy action is taken, which might result in the route being rejected or accepted.
Valid BGP routes received from peers can be seen using the CLI command: show route receive-protocol bgp neighbor x.x.x.x
BGP routes that pass the Import policy are placed in the local routing table: inet.0
BGP routes that will be advertised to peers can be seen using the CLI command:
show route advertising-protocol bgp neighbor x.x.x.x
In JUNOS you must configure your neighbors under groups.
Output fields:
Peer—Address of each BGP peer. Each peer has one line of output.
Type—Type of peer (internal or external).
State—BGP state for this neighbor.
Flags—Internal peer-specific flags for this neighbor.
Last State—BGP state that this neighbor was in prior to the current state.
Last Event—Last BGP state transition event.
Last Error—Last notification sent to the neighbor.
Options—Configuration options that are in effect for this neighbor.
Holdtime—Configured hold time for this neighbor.
Preference—Configure preference for routes learned from the neighbor.
Peer ID—Neighbor's router ID.
Local ID—Local system's router ID.
Active Holdtime—Hold-time value that was negotiated during the BGP open.
Group Bit—Internal bit being used for the peer group.
Send state—Whether all peers in the group have received all their updates (in sync or out of sync).
Active Prefixes—Number of prefixes accepted as active from this neighbor.
Last traffic (seconds)—How recently a BGP message was sent or received between the local system and this neighbor.
Output Queue—Number of BGP update messages that are pending for transmission to the neighbor.
Deleted routes—Prefixes that are queued for withdrawal through pending update messages.
Queued AS Path—An AS path that is queued for transmission in an update message.
Output fields:
Groups—Number of BGP groups.
Peers—Number of BGP peers.
Unestablished peers—Number of unestablished BGP peers.
Peer—Address of each BGP peer. Each peer has one line of output.
AS—Peer’s AS number.
InPkt—Number of packets received from the peer.
OutPkt—Number of packets sent to the peer.
OutQ—Count of the number of BGP packets that are queued to be transmitted to a particular neighbor. It usually is 0 because the queue is emptied quickly.
Last Up/Down—Last time since the neighbor transitioned to or form the established state.
State/#Act/Recv/Damped—Displays either the BGP state or, if the neighbor is connected, the number of paths received from the neighbor, the number of these paths that have been accepted as active and are being used for forwarding, and the number of routes being damped.
user@host> show route receive-protocol bgp 11.1.1.1
This command displays the routing information as it was received through a particular neighbor of a particular dynamic routing protocol. This information includes the routes that the local router advertised to the neighbor. The information reflects the routes before they are filtered by that protocol’s import policy statements. This command works for routing protocols BGP, RIP, DVMRP, and PIM only.
user@host> show route advertising-protocol bgp 11.1.1.2
This command displays the routing information as it has been prepared for advertisement to a particular neighbor of a particular dynamic routing protocol. The information reflects the routes that the routing table exported into the routing protocol and that were filtered by that protocol’s export routing policy statements. This command works for routing protocols BGP, RIP, DVMRP and PIM only.
Integrates IP Routing and Layer 2 Switching
It is commonly believed that Multiprotocol Label Switching (MPLS) significantly enhances the forwarding performance of label-switching routers. It is more accurate to say that exact-match lookups, such as those performed by MPLS and ATM switches, have historically been faster than the longest-match lookups performed by IP routers. However, recent advances in silicon technology allow ASIC-based route-lookup engines to run just as fast as MPLS or ATM virtual path identifier/virtual circuit identifier (VPI/VCI) lookup engines.
The real benefit of MPLS is that it provides a clean separation between routing (that is, control) and forwarding (that is, moving data). This separation allows the deployment of a single forwarding algorithm—MPLS—that can be used for multiple services and traffic types.
Leverages Existing IP Infrastructure
In the future, the MPLS forwarding infrastructure can remain the same while new services are built by simply changing the way packets are assigned to an LSP. For example, packets could be assigned to a label-switched path based on a combination of the destination subnetwork and application type, a combination of the source and destination subnetworks, a specific quality of service (QoS) requirement, an IP multicast group, or a VPN identifier. Consequently, new services can migrate easily to a common MPLS forwarding infrastructure.
Optimizes IP Networks
MPLS provides the ability to implement traffic engineering easily. This ability optimizes the flow of traffic through the IP network. Traffic can be classified easily by type and be routed diversely or processed based upon different performance requirements. Additionally, both private networks and public traffic can co-exist on a common backbone; each type of traffic is opaque to the other—similar to ships in the night.
Traffic Engineering
The task of mapping traffic flows onto an existing physical topology is called traffic engineering. Traffic engineering provides the ability to move flows away from the shortest path selected by the interior gateway protocol (IGP) and onto a potentially less congested physical path across a network.
Prolonged congestion is the root of poor network performance. The two major causes of prolonged congestion are inefficient or inadequate network resources and the inefficient mapping of traffic streams onto available network resources. Two approaches to eliminating the first cause of prolonged congestion exist—expanding existing capacity and using classical techniques, such as rate limiting and queue management. To overcome the second source of prolonged network congestion, however, you must use traffic engineering.
MPLS is an integration of Layer 2 and Layer 3 technologies. By making traditional Layer 2 features available to Layer 3, MPLS enables traffic engineering. Thus, you can offer in a one-tier network what now can be achieved only by overlaying a Layer 3 network on a Layer 2 network.
MPLS traffic engineering automatically establishes and maintains a tunnel across the backbone, using the Resource Reservation Protocol (RSVP). The path used by a given tunnel at any point in time is determined based on the tunnel resource requirements and network resources, such as bandwidth. Available resources are flooded by using extensions to a link-state IGP.
Tunnel paths are calculated at the tunnel ingress based on a fit between required and available resources (constraint-based routing). The IGP automatically routes the traffic into these tunnels. Typically, a packet crossing the MPLS traffic engineering backbone travels on a single tunnel connecting the ingress point to the egress point.
Traffic Engineering Uses
Traffic engineering provides the following capabilities:
Routes primary paths around known bottlenecks or points of congestion in the network;
Provides precise control over how traffic is rerouted when the primary path is faced with single or multiple failures;
Provides efficient use of available aggregate bandwidth and long-haul fiber by ensuring that subsets of the network do not become overused while other subsets of the network along potential alternative paths are underused;
Maximizes operational efficiency;
Enhances the traffic-oriented performance characteristics of the network by minimizing packet loss, minimizing prolonged periods of congestion, and maximizing throughput; and
Enhances statistically bound performance characteristics of the network (such as loss ratio, delay variation, and transfer delay) required to support a multiservices Internet.
One approach to engineer a backbone is to define a mesh of tunnels from every ingress device to every egress device. The IGP, operating at an ingress device, determines which traffic should go to which egress device, and steers that traffic into the tunnel from ingress to egress. The MPLS traffic engineering path calculation and signaling modules determine the path taken by the LSP tunnel, subject to resource availability and the dynamic state of the network. For each tunnel, counts of packets and bytes sent are kept.
Continued on next page.
Traffic Engineering Uses (contd.)
Sometimes a flow is so large that it cannot fit over a single link, so it cannot be carried by a single tunnel. In this case, multiple tunnels between an ingress router and an egress router can be configured, and the load of the flow can be shared among them.
Traffic engineering maps traffic flows to the physical network topology. Specifically, it provides the ability to move traffic flows away from the shortest path calculated by the IGP and onto a less congested path. The purpose of traffic engineering is to balance the traffic load on the various links, routers, and switches in the network so that none of these components is overutilized or underutilized. Traffic engineering allows an ISP to exploit its network infrastructure fully.
Information Distribution Component
Traffic engineering requires detailed knowledge about the network topology as well as dynamic information about network loading. The information distribution component is implemented by defining relatively simple extensions to the IGPs so that link attributes are included as part of each router's link-state advertisement. IS-IS extensions include the definition of new type/length/values (TLVs), while OSPF extensions are implemented with opaque LSAs. The standard flooding algorithm used by the link-state IGPs ensures that link attributes are distributed to all routers in the routing domain. Some of the traffic engineering extensions to be added to the IGP link-state advertisement include maximum link bandwidth, maximum reserved link bandwidth, current bandwidth reservation, and link coloring.
Each router maintains network link attributes and topology information in a specialized traffic engineering database (TED). The TED is used exclusively for calculating explicit paths for the placement of LSPs across the physical topology. A separate database is maintained so that the subsequent traffic engineering computation is independent of the IGP and the IGP’s link-state database. Meanwhile, the IGP continues its operation without modification, performing the traditional shortest-path calculation based on information contained in the router's link-state database.
Continued on next page.
Path Selection Component
After the IGP floods network link attributes and topology information and places this information in the TED, each ingress router uses the TED to calculate the paths for its own set of LSPs across the routing domain. The path for each LSP can be either a strict or loose explicit route. An explicit route is a preconceived sequence of routers that should be part of the physical path of the LSP. If the ingress router specifies all the routers in the LSP, the LSP is a strict explicit route. If the ingress router specifies only some of the routers in the LSP, the LSP is a loose explicit route. Support for strict and loose explicit routes allows the path selection process to be given broad latitude whenever possible, but to be constrained when necessary. Constrained shortest path first (CSPF) (defined below) LSPs always generate a strict route.
The ingress router determines the physical path for each LSP by applying a CSPF algorithm to the information in the TED. CSPF is a shortest path first algorithm that has been modified to take into account specific restrictions when calculating the shortest path across the network. Input into the CSPF algorithm includes:
Topology link-state information learned from the IGP and maintained in the TED;
Attributes associated with the state of network resources (such as total link bandwidth, reserved link bandwidth, available link bandwidth, and link color) carried by IGP extensions and stored in the TED; and
Administrative attributes required to support traffic traversing the proposed LSP (such as bandwidth requirements, maximum hop count, and administrative policy requirements) obtained from user configuration.
Path Signaling Component
An LSP is unworkable until the signaling component actually establishes it. The signaling component, which is responsible for establishing LSP state and distributing labels, relies on a number of extensions to RSVP:
The explicit route object (ERO) allows an RSVP path message to traverse an explicit sequence of routers that is independent of conventional shortest-path IP routing. (The explicit route can be either strict or loose.)
The label request object permits the RSVP path message to request intermediate routers to provide a label binding for the LSP that it is establishing.
The label object allows RSVP to support the distribution of labels without having to change its existing mechanisms. Because the RSVP RESV message follows the reverse path of the RSVP path message, the label object supports the distribution of labels from downstream nodes to upstream nodes.
Packet Forwarding Component
The packet forwarding component of the JUNOS traffic engineering architecture is MPLS, which is responsible for directing a flow of IP packets along a predetermined path across a network. This path is called a label-switched path (LSP). LSPs are similar to ATM PVCs in that they are simplex in nature—that is, the traffic flows in one direction from the head-end, or ingress, router to a tail-end, or egress, router. Duplex traffic requires two LSPs—that is, one LSP to carry traffic in each direction. An LSP is created by the concatenation of one or more label-switched hops, allowing a packet to be forwarded from one router to another across the MPLS domain.
Continued on next page.
Packet-Forwarding Component (contd.)
When an ingress router receives an IP packet, it adds an MPLS header to the packet and forwards it to the next router in the LSP. The labeled packet is forwarded along the LSP by each router until it reaches the tail end of the LSP, at which point the MPLS header is removed and the packet is forwarded based on Layer 3 information, such as the IP destination address. The key point is that the physical path of the LSP is not limited to what the IGP would choose as the shortest path to reach the destination IP address.
IGP Extensions
Each IGP propagates information through some form of extension. IS-IS carries different parameters in type/length/value tuples (TLVs). IS-IS TLVs propagate within a level; they do not propagate from one level to another. OSPF uses a Type 10 opaque LSA. There are three types of opaque LSAs: Types 9, 10, and 11. Type 10 LSAs have an area flooding scope, meaning that the information is propagated within an area. The information does not cross an area border router. The MPLS traffic engineering information carried by the IGP extensions is defined in IETF documents Draft-ietf-isis-traffic-02.txt and Draft-katz-yeung-ospf-traffic-02.txt.
Information Propagated
The following TLVs are what IS-IS propagates for traffic engineering. Draft-katz-yeung-ospf-traffic-02.txt pretty much defines the same parameters; the difference is that the opaque LSAs carry the information.
Router ID: This TLV is a single stable address, regardless of the node’s interface state. Do not install /32 prefix for router ID into the forwarding table, as this can lead to forwarding loops for systems that do not support this TLV.
Extended IP reachability: This TLV is one bit for route leaking. It extends metrics to 32 bits from 6 bits.
Continued on next page.
Information Propagated (contd.)
Extended IS reachability: This TLV contains information about a series of neighbors. It consists of the following sub-TLVs:
IPv4 neighbor address: Contains the IP address of each neighboring router.
Maximum link bandwidth: A 32-bit field and uses IEEE floating point format. Units are bytes per second. It is unidirectional.
Maximum reservable bandwidth: A 32-bit field and uses IEEE floating point format. Units are bytes per second. It is unidirectional and supports oversubscription (can be greater than link bandwidth).
Unreserved bandwidth: A 32-bit field and uses IEEE floating point format. Units are bytes per second. It has a value for each priority level from 0 through 7. Priority 0 is highest.
Traffic engineering default metric: A 24-bit unsigned integer. OSPF (if unsupported by another router) is silently ignored because it uses an opaque LSA.
Router: This TLV is the stable IP address of advertising router.
Link: This TLV is composed of the following sub-TLVs:
Link: Can be point-to-point or multi-access.
Link ID: Identifies the other end of the link. A designated router is identified if the link is used for multi-access.
Local interface address: The IP address of the link (the advertising router address if the link is unnumbered).
Remote interface IP address: The neighbor’s IP address. The first two octets are 0 if the link is unnumbered; the remaining octets are local interface index assignment. This sub-TLV and the local address sub-TLV are used to discern multiple, parallel links between systems.
Traffic engineering metric: The link metric for traffic engineering. It might be different than the standard OSPF link metric.
Maximum bandwidth: A unidirectional, 32-bit sub-TLV and uses IEEE floating point format. Units are bytes per second.
Maximum reservable bandwidth: A 32-bit sub-TLV and uses IEEE floating point format. Oversubscription is supported. Units are bytes per second.
Unreserved bandwidth: The unreserved bandwidth for each of the eight priority levels. Units are bytes per second. It is a 32-bit sub-TLV and uses IEEE floating point format. Each value is less than or equal to the maximum reservable bandwidth.
Resource class/color: Specifies administrative group membership (also known as affinity class). It can have up to 32 different groups. Each group is represented by a different bit.
Path Selection
There are two primary methods of determining the physical path for LSPs and a hybrid method that combines both of the two primary methods. The two primary methods are the offline and the online approach.
The offline method lets network architects decide their own constraints for the paths. It provides deterministic behavior in that architects know exactly what path the LSPs follow. Additionally, the architects have a complete picture of the network. For example, the architects might realize that by switching the types of traffic carried on two different links, it might be possible to balance the load of the traffic. Real-time path calculators do not have the insight that a complete view of the network can provide. Offline calculations can be incredibly complex; there is definitely the potential for suboptimal routes within the network. The network architects can make offline calculations by merely staring at a map, using a spreadsheet, or by using very expensive third-party tools (which can exceed $100,000). However, with the expensive offline tools, the architects can play what if scenarios, based on such constraints as link capacity and utilization.
Online path calculation has the router make decisions about what it thinks is the best path, based upon the information currently available in the traffic engineering database. Configuring online paths is easier because the router does all the work. The path selected is the best available at any given instant. Additionally, online path calculation requires no extra software. The only objection to online calculations is that they are not deterministic. Architects do not know in advance what the physical path of the LSP will be.
Path Signaling
A label-switched path is unusable until the signaling component actually establishes it. The path is determined based on constraints like the ones identified on the slide. The signaling component, which is responsible for establishing label-switched paths and label distribution, relies on RSVP or the Label Distribution Protocol (LDP).
Signaling Mechanisms
The IETF does not specify a particular signaling method for dynamic LSPs. There are three possible signaling protocols: the Label Distribution Protocol (LDP), the Resource Reservation Protocol (RSVP), and the Constraint-Based Routing Label Distribution Protocol (CR-LDP).
LDP: This protocol associates a set of destinations (route prefixes and router addresses) with each data-link layer LSP. This set of destinations is called the forwarding equivalence class (FEC). These destinations all share a common data LSP path egress and a common unicast routing path. LDP LSPs follow the same path that a routed IP packet follows. Juniper Networks M-series and T-series routers support LDP, version 1.0
RSVP: Juniper Networks M-series and T-series routers use RSVP as their signaling protocol for traffic engineered LSPs. The following list provides the primary reasons RSVP was selected:
RSVP was designed to be the resource reservation protocol of the Internet and “provide a general facility for creating and maintaining distributed reservation state across a set of multicast or unicast delivery paths.”1 Reservations are an important part of traffic engineering, so it made sense to continue to use RSVP for this purpose rather than reinventing the wheel.
Continued on next page.
1. R. Branden, Ed., L. Zhang, S. Berson, S. Herzog, S. Jamin, RFC 2205. Internet Engineering Task Force. Reston, VA. September, 1997.
Signaling Mechanisms (contd.)
RSVP was designed explicitly to support extensibility mechanisms by allowing it to carry opaque objects. This design encourages RSVP extensions that create and maintain distributed state for information other than pure resource reservation. The designers believed that extensions could be developed easily to add support for explicit routes and label distribution.
Extensions do not make the enhanced version of RSVP incompatible with existing RSVP implementations. An RSVP implementation can differentiate between LSP signaling and standard RSVP reservations by examining the contents of each message.
With the proper extensions, RSVP provides a tool that consolidates the procedures for a number of critical signaling tasks into a single message exchange. Extended RSVP can establish an LSP along an explicit route, can distribute label-binding information to LSRs in the LSP, and can reserve network resources in nodes comprising the LSP (the traditional role of RSVP). Extended RSVP also permits an LSP to be established to carry best-effort traffic without making a specific resource reservation.
CR-DP: This protocol basically takes all the traffic engineering extensions added to RSVP and adds them to LDP so that CR-LDP can do the same thing the RSVP-TE-signaled LSPs can do. Juniper Networks M-series and T-series routers do not support CR-LDP.
Examine IP Packet
When packets enter an MPLS-based network, label edge routers (LERs) give them a label (identifier). These labels not only contain information based on the routing table entry but also refer to the IP header field (source IP address), Layer 4 socket number information, and differentiated service.
Packet Classification
Once the packet is examined, the output queue is assigned, the packet is assigned to an LSP, and the router places an outgoing label on the packets. The packet is then forwarded along the correct LSP.
Forward Equivalence Class
A forward equivalence class (FEC) is a stream of IP packets that are forwarded over the same path, treated in the same manner, and mapped to the same label. The FEC/label binding mechanism is currently based on the destination IP address prefix. This mechanism has two types—host FEC and prefix FEC. A host FEC corresponds to a /32 network address and uniquely identifies a host.
MPLS Labels
A label is a short, fixed-length packet identifier. It is unstructured, and has link-local significance. Labels are distributed in two methods—downstream on demand (DoD) and unsolicited downstream. In DoD, the router provides a label when it receives a label mapping request. In unsolicited downstream (or downstream unsolicited, depending upon which RFC you read), the router provides label mappings without waiting for a specific label mapping request. Routers retain labels in their databases either liberally or conservatively. Liberal label retention means that the router retains all labels, even if they are not used by the router. This retention provides for quicker failover in case the labels that are used go bad. In this scenario, there is no need to request new label mapping information because all the information is already retained for quick alternate path computation. Routers using conservative label retention only retain labels currently in use. While the information maintained is much less, it requires more time to failover in case of topology changes because the router must request new label mappings. The routers can use either ordered or independent control when advertising label values. In independent control, once a router recognizes a FEC, it can choose to distribute that specific label binding. In ordered control, a router only distributes the label binding information if it is the egress LSR for that FEC, or if it received a label binding from the downstream neighbor for that FEC.
Continued on next page.
Label Processing
The router supports the following label operations:
Push: This operation adds a new label to the top of the packet. For IPv4 packets, the new label is the first label. The TTL, stack, and CoS fields are derived from the IP packet header. If the push operation is performed on an existing MPLS packet, the packet will have two or more labels. Having more than one label is called label stacking. The top label must have its S field set to 0, and might derive CoS and TTL from lower levels. In JUNOS software Releases 4.2 through 5.0, the new top label in a label stack always initializes its TTL to 255, regardless of the TTL value of lower labels. Starting in Release 5.1, the router can specify TTL values for all the labels in a stack.
Pop: This operation removes the label from the beginning of the packet. Once the label is removed, the TTL is copied from the label into the IP packet header, and the underlying IP packet is forwarded as a native IP packet. In the case of multiple labels in a packet (label stacking), removal of the top label yields another MPLS packet. The new top label might derive CoS and TTL from a previous top label. In JUNOS software Release 4.2 and later, the popped TTL value from the previous top label is not written back to the new top label (meaning that the label that used to be one from the top maintains its value when the router pops off the top label).
Swap: This operation replaces the label at the top of the label stack with a new label. The S and CoS bits are copied from the previous label, and the TTL value is copied and decremented (unless you configure the no-decrement-ttl or no-propagate-ttl statements). A transit router supports a label stack of any depth.
Multiple push: This operation adds multiple labels (up to three total) on top of existing packets. This operation is equivalent to doing push multiple times.
Swap and push: This operation replaces the existing top of the label stack with a new label, followed by pushing another new label on top. This operation is typically performed when a LDP-signaled LSP transits an RSVP-TE-signaled core.
MPLS Shim Header Fields
Juniper always uses a shim header. Juniper does not implement any version of MPLS that uses VPI/VCI or DLCI as the label. The following list provides descriptions of the MPLS header fields:
Label: This 20-bit field carries the actual value of the label. When a router receives a labeled packet, the router looks up the label value at the top of the stack. After a successful lookup, the router knows the next hop to which the packet is to be forwarded and the operation to be performed on the label stack before forwarding. This operation might be the replacement of the top label stack entry with another, or the popping of an entry off the label stack, or the replacement of the top label stack entry and then the pushing of one or more additional entries onto the label stack. In addition to learning the next hop and the label stack operation, the router might also learn the outgoing data-link layer encapsulation, and possibly other information needed to forward the packet properly.
Experimental: EXP bits in MPLS headers are actually interpreted as CoS bits. They map to the output interface. High-order bits indicate output queue, and the low-order bit indicates PLP. By default, all MPLS packets are serviced by queue 0. You can enable specific output interface mapping with the mpls-cos-map knob.
Stacking: If the stack (S) bit is 1, then the label is at the bottom of the stack. If it is 0, then the label is not the bottom of the stack. Label 0 must have S=1 (therefore no label stacking). Label 3 can have S=0 (therefore allows label stacking).
Continued on next page.
MPLS Shim Header Fields (contd.)
Time to live: The TTL bit is an 8-bit field that provides loop prevention. It defaults to IP TTL for S=1 (unstacked) and defaults to 255 for any stacked label (S=0) in JUNOS releases up through 5.0. In Release 5.1 code, the different labels can all inherit the IP TTL.
Label Values
Label values 0 through 15 are reserved according to RFC 3032, MPLS Label Stack Encoding, January 2001.
Label 0 is the IPv4 explicit null label. This label can only exist on a label with a stack depth of one. The router that receives a label value of 0 must pop the label and process the packet as an IP packet.
Label 1 is the router alert label. All routers receiving this label value must process the packet, even if they are not the ultimate destination. Although reserved, there are currently no applications that use label value 1. The label can be used in any depth label stack. If label 1 is on top of a label stack, when the router receives it, the router delivers it to a local software module for processing. The label beneath label 1 then determines the actual destination of the packet. The router alert label should be pushed back onto the label stack before forwarding. However, we discourage its use, as it can be a denial-of-service target.
Label 2 is the IPv6 explicit null label. This label is equivalent to label 0 for IPv4 packets; it can only be used in a stack depth of one. The router must pop this label and forward the packet based upon IPv6 information.
Label 3 is the implicit null label. This label supports stack depths greater than one (current limit is 3). This label is actually assigned, but never gets placed on the front of an IP packet. It implies that the router that receives a label 3 mapping must pop the top label off the stack and forward the packet based on what is beneath the top label (either an IP packet if the stack depth is 1 or as an MPLS packet is the stack depth is 2 or greater.) LDP specifies a use for label 3, so a value is reserved. Juniper Networks M-series and T-series routers default to signaling label 3 dynamically (since Release 4.1).
Labels 16–1023 are allocated by Juniper Networks for static LSPs. Juniper Networks only assigns these label values to static LSPs. Juniper Networks M-series and T-series routers switch any label assigned by another router vendor, so the restriction does not impose any interoperability issue. Juniper Networks M-series and T-series routers support label values above 1023 for static assignment, but do not guarantee that such labels will not be interspersed with dynamically assigned labels. Using labels below 1024 guarantees that you do not have to account for dynamically allocated labels in that label range.
Labels 1024–99,999 are reserved for future use.
Label 100,000–799.999 are assigned on a per-box basis. Juniper Networks M-series and T-series routers allocate these labels upstream to other routers. A label in this range is only allocated once, regardless of which interface it is allocated on. This method is the Juniper Networks default label assignment method.
Continued on next page.
Label Values (contd.)
Labels 800,000–1,048,575 are assigned on a per-interface basis. Juniper Networks M-series and T-series routers allocate these labels upstream to other routers. A label in this range can only be allocated once per interface; therefore, the router might allocate the same label value multiple times, but only once per interface.
Labels 0 and 1 are inserted in the mpls.0 switching/routing table when MPLS is configured.
Label Swapping
Each LSR maintains a connection table. The connection table keeps an input port and label-to-output port and label operation mapping. Valid label operations include pop, push, swap, multiple push, and swap-push.
The slide shows an IP packet with an MPLS label (value 25) arriving on port 1. The LSR checks the connection table and determines that the output label operation for a packet arriving on interface 1 with a label of 25 is a swap operation. The output port is port 4 and the output label is label 19.
As a side note, what type of label mapping is this LSR using? Is it per box or per interface? It is probably using per-interface label mapping because interface 1 has labels 22, 24, and 25, while interface 2 has label 23. If the label mappings were per interface, there would probably be a duplicate label value (on a different interface). The exception, of course, is that the duplicate interface entry on port 1 has already been torn down.
Label-Switched Path
A label-switched path (LSP) is a one-way flow of traffic, carrying packets from beginning to end. Packets must enter the LSP at the beginning of the path and can only exit the LSP at the end. Packets cannot be injected into an LSP at an intermediate hop. Each router in an MPLS path performs a specific function based on whether the packet enters, transits, or leaves the router.
Label-Switching Router
A label-switching router (LSR) forwards MPLS packets, which are part of an LSP. In addition, an LSR participates in constructing LSPs for the portion of each LSP entering and leaving the LSR. Throughout this course, we use LSR and router interchangeably.
Ingress LSR (Head-End LSR)
At the beginning of the tunnel, the ingress router encapsulates an Internet Protocol (IP) packet within an MPLS Layer 2 frame and forwards it to the first router in the path. There can be only one ingress router in a path, and it is always at the beginning of the path. Each ingress router uses the traffic engineering database (TED) to calculate the paths for its own set of LSPs across the routing domain. The path for each LSP can be represented by either strict or loose explicit hops. The ingress router determines the physical path for each LSP by applying the constrained shortest path first (CSPF) algorithm to the information in the TED.
Continued on next page.
Transit LSR
A transit router forwards a received MPLS packet to the next hop in the MPLS path. There can be zero or more transit routers in a path. The MPLS protocol enforces a maximum limit of 253 transit routers in a single path. A transit LSR forwards MPLS packets using label swapping.
Penultimate Router
The penultimate router is the next-to-last router in the path. It is responsible for removing the top label prior to forwarding the packet to the egress router when the egress router signals with the implicit null label. This operation is known as penultimate hop popping.
Egress LSR (Tail-End LSR)
At the end of an LSP, the egress router removes the MPLS encapsulation for explicit null labels and forwards the packet towards its final destination using the normal IP forwarding table. Only one egress router can exist in a path.
FEC Determination
The ingress router maps a specific FEC to an LSP. In the example on the slide, the router binds Paris-bound traffic to the top LSP and Rome-bound traffic to the bottom LSP.
Label Switching
Once the traffic starts down the LSP, each LSR along the way performs label swapping until the packets reach the egress of the LSP.
Egress LSR
The egress LSR (or possibly the penultimate LSR) pops the top label and forwards the packet based upon the destination address.
Packet Forwarding Example
The ingress LSR maps destinations to the BGP next hop and the next physical hop. Both the 134.5/16 and the 200.3.2/24 prefixes share the same BGP next-hop address, 192.168.2.1. There is an LSP that exits at the same 192.168.2.1 address. The ingress LSR knows it can use the LSP to forward the traffic if the BGP next hop and the LSP egress point are the same.
Prefix 200.3.2/24 is mapped to outgoing label 99 on interface 3. The packet enters interface 1 of the next downstream router. That router has an MPLS switching table that maps a swap operation to packets arriving on interface 1 with a label of 99. The output interface is interface 2, with a swapped label of 56. The next router receives the packet on interface 3, with a label of 56. The MPLS switching table indicates that the output interface is interface 5, with an outgoing label of 3. But label 3 is the implicit null label, which means that the router should pop the label and forward what lies beneath the implicit null header out the interface on which the MPLS packet normally would have been forwarded, which, in this case, is interface 5. Popping off the label leaves an IP packet destined for IP address 200.3.2.7. The last router does an IP lookup on the prefix. Once the routing table is consulted, the traffic is forwarded out the correct interface, which, in this case, is the interface leading to 200.3.2.1.
Implicit Null Signaling
Once the LSP is established (essentially verifying connectivity from ingress to egress), the signaling protocol assigns label values to each leg of the path. Labels are established for each leg from the egress router toward the ingress router. In effect, the receiving router tells the sending router which label to use when sending packets for the LSP.
With signaled protocols, the egress router assigns label 3 to the last leg of the path. Label 3 is a reserved label that tells the penultimate router to pop the MPLS header from the packet and send the resulting IP packet to the egress router for final routing. Label 3 never appears in an MPLS header.
The penultimate router is responsible for removing the MPLS header instead of the egress router. The egress LSR would normally receive an MPLS header with label 0, remove the header, and route the packet at the same time. Either method is allowed by the MPLS specification. JUNOS software supports both methods.
Internet Standard for Resource Reservation
The Resource Reservation Protocol (RSVP) is a resource reservation setup protocol used by both network hosts and routers. Hosts use RSVP to request a specific quality of service (QoS) from the network for particular application flows. Routers use RSVP to deliver QoS requests to all routers along the data path. RSVP also can maintain and refresh states for a requested QoS application flow. RSVP treats an application flow as a simplex connection. That is, the QoS request travels only in one direction—from the sender to the receiver.
Not a Routing Protocol
RSVP is an Internet-layer protocol (protocol code is 46) that uses IP as its network layer. It is similar to ICMP (protocol code 1) or IGMP (protocol code 2) in that it runs as a separate software process in the JUNOS Internet software and is not in the packet forwarding path.
Simplex Reservations
RSVP is one of the ways in which an application (or a router on behalf of an application) can signal the network for a desired level of QoS. RSVP relies on the periodic exchange of path/resv messages between the two ends; it is considered a receiver-initiated protocol because it is the receiver of the data flow, which initiates and maintains the resource reservation for that particular flow. Because RSVP requires that each intermediate router maintain state information about each RSVP flow, it can introduce scalability/cost issues when it is used over an infrastructure, such as the Internet, where the messages might have to traverse numerous routers. RSVP is useful where explicit QoS and granularity are a must—for example on low-speed WAN links. RSVP supports several different reservation styles. Additionally, it works for both IPv4 and IPv6.
Sessions
RSVP creates independent sessions to handle each data flow. A session is identified by a combination of the destination address, an optional destination port, and a protocol. Within a session, there can be one or more senders. Each sender is identified by a combination of its source address and source port. An out-of-band mechanism, such as a session announcement protocol or human communication, is used to communicate the session identifier to all senders and receivers.
In RSVP, a data flow is a sequence of messages that have the same source, destination (one or more), and QoS. QoS requirements are communicated through a network using a flow specification, which is a data structure used by internetwork hosts to request special services from the internetwork. A flow specification often guarantees how the internetwork handles some of its host traffic.
RSVP Message Types
The following list describes the RSVP message types:
Path: This message establishes state. The destination address is the egress LSR.
Resv: This message reserves resources. The destination address is the next hop.
PathTear: Path teardown messages delete the path state from nodes that receive them. This message is originated either by the sender or by a node whose path state has timed out. It always travels downstream toward the receiver.
ResvTear: Reservation teardown messages delete the reservation state from nodes that receive them. This message is originated either by the receiver or by a node whose reservation state has timed out. It always travels upstream toward the sender.
PathErr: Path error messages report errors in processing path messages and travel upstream to the sender along the reverse route as the path messages. Path error messages only report errors; they do not modify the path state of any node through which they pass.
ResvErr: Reservation error messages report errors in the processing of resv messages and travel hop-by-hop downstream to the receiver. Like path error messages, reservation error messages only report errors and do not modify the reservation state of any node through which they pass.
Continued on next page.
RSVP Message Types (contd.)
ResvConf: Reservation confirmation messages are sent by each node in the path to the receiver if the receiver requests a reservation confirmation in its resv message.
State Blocks
Different RSVP messages establish RSVP state block information. These state blocks are data structures that store the soft state information involved in allocating resources.
Path Message
A path message is transmitted by the ingress LSR toward the egress LSR when it wants to establish an LSP tunnel. The path message is addressed to the egress LSR, but it contains the router alert IP option (RFC 2113) in its IP header to indicate that the datagram requires special processing by intermediate routers. The path message can include a number of different RSVP objects:
Label request object: Requests label mapping from downstream node.
Explicit route object: Lists strict or loose nodes that RSVP path messages must visit.
Record route object: Lists addresses of all nodes visited by the path message.
Session attribute object: Provides characteristics of the session.
CoS flowspec object: Identifies the resources that will be allocated.
Resv Message
A resv message is transmitted from the egress LSR toward the ingress in response to the receipt of a path message. The resv message establishes path state in each LSR by distributing label bindings, requesting resource reservations along the path, and specifying the reservation style (fixed-filter or shared-explicit). Like the path message, it can contain a record route object, which contains the list of all the nodes visited by the resv message.
RSVP Path Message
The ingress LSR generates an RSVP path message with a session type of LSP tunnel supporting IPv4. The path message contains a label request object that asks intermediate LSRs and the egress LSR to provide a label binding for this path. If the label request object is not supported by each LSR along the path, the ingress LSR is notified by the first LSR on the path that does not support the label request object. In addition to the label request object, an RSVP path message can also contain a number of optional objects:
Explicit route object (ERO): This object can be added to specify a predetermined path for the LSP across the service provider's network. When the ERO is present, the RSVP path message is forwarded towards the egress LSR along the path specified by the ERO, independent of the IGP shortest path.
Record route object (RRO): This object allows the ingress LSR to receive a listing of the LSRs that the LSP tunnel traverses across the service provider's network.
Session attribute object: This object can be included in the RSVP path message to aid in session identification and diagnosis. The session attribute object also controls the path setup priority, holding priority, and local rerouting features.
Continued on next page.
RSVP Path Message (contd.)
The path message also contains the standard RSVP objects (as opposed to extended objects):
Sender template: This object contains the sender's IP address and perhaps some additional information to identify the sender of the path message.
Sender Tspec: This object describes the traffic characteristics of the flow that will be sent along the LSP. On the slide, LSR 4 uses this information to construct an appropriate receiver Tspec (describing the traffic flow) and Rspec (defining the desired QoS). The format and content of the Tspec and Rspec are opaque to RSVP.
Analysis
Given
Assume that MPLS and RSVP have been configured and enabled on LSR 1, LSR 2, LSR 3, and LSR 4.
By some mechanism, LSR 1 knows that the LSP should follow the explicit route (LSR 1 to LSR 2 to LSR 3 to LSR 4). Each abstract node in the ERO has the L bit cleared (a strict hop in the explicit route) and is a simple abstract node (consists of only a single node identified by a 32-bit IPv4 prefix).
Required
We want to establish an LSP for transit traffic that enters the service provider's network at LSR 1 and exits at LSR 4. Transit traffic should follow the physical path of the LSP rather than the route calculated by the IGP (LSR 1 to Router A to LSR 4) through the network. The result is that all transit traffic entering the service provider's network at LSR 1 (with LSR 4 as its IBGP next hop) is forwarded along the LSP.
The physical path for the LSP has been specifically selected by another process to reduce the amount of traffic flowing along the IGP route, optimize the overall utilization of network resources, enhance the traffic-oriented performance characteristics for the traffic flow, and enhance the traffic-oriented performance characteristics for the entire network.
Processing at LSR 1
The path message is transmitted toward LSR 4 along the path specified by the ERO. The path message is addressed to the egress LSR but contains the router alert IP option to indicate that the datagram requires special processing by intermediate routers.
Processing at LSR 2
When the path message arrives at LSR 2, it records the label request object and the ERO in its path state block. The path state block also contains the IP address of the previous hop, the session, the sender, and the Tspec. This information is used to route the corresponding resv message back to LSR 1.
LSR 2 forwards the path message toward LSR 4 along the path specified in the ERO. If LSR 2 cannot allocate a label for the LSP, it responds by sending a path error message with an unknown object class error to LSR 1.
Continued on next page.
Processing at LSR 3
When the path message arrives at LSR 3, it records the label request object and ERO in its path state block. The path state block also contains the previous hop, session, sender, and Tspec. This information is used to route the corresponding resv messages back to LSR 2.
The path message is forwarded toward LSR 4 along the path specified in the ERO. If LSR 3 cannot allocate a label for the LSP, it responds by sending a path error message to LSR 1.
Processing at LSR 4 (Egress LSR)
When the path message arrives at LSR 4, it notices from the label request object that it is the egress LSR for the LSP.
Resv Message
The egress LSR generates an RSVP reservation message with a session type of LSP tunnel supporting IPv4. The reservation message contains a label object that actually assigns the MPLS label between each pair of LSRs. The following points describe the processing of the reservation message:
LSR 4 transmits a resv message to LSR 3: Following standard RSVP procedures, LSR 4 generates a resv message for the session to distribute labels and establish forwarding state for the LSP tunnel. The IP destination address of the resv message is the unicast address of the previous hop node, obtained from the LSR's local path state block.
Processing at LSR 4 (Egress LSR): LSR 4 allocates a label with a value of 3 and places it in the label object of the resv message. The value of 3 has a special meaning to LSR 3. When LSR 3 receives an MPLS packet with a label value equal to 3, it knows that it is the penultimate LSR for the LSP. LSR 3 simply pops the top label off the label stack and forwards the packet based on the destination IP address contained in the IP header. If LSR 1 inserted a Tspec in the path message, LSR 4 uses this information to construct an appropriate Receiver Tspec and Rspec. The resv message is transmitted back toward LSR 1 through LSR 3. The resv message does not carry a reverse ERO to find its way back along the path to LSR 1. Instead, the resv message follows the reverse path that is set up in the path state block by the RSVP Path message. In a way, the path message leaves a trail of bread crumbs that allows the resv message to follow the reverse path back to LSR 1.
Continued on next page.
Resv Message (contd.)
Processing at LSR 3 and LSR 2: LSR 3 receives the resv message containing the label assigned by LSR 4. LSR 3 stores the label (3) as part of the reservation state for the LSP. LSR 3 uses this label when forwarding outgoing traffic along the LSP to LSR 4. LSR 3 allocates a new label (20) and places it in the label object (replacing the received label) of the resv message that it sends upstream to LSR 2. This label is the label that LSR 3 uses to identify incoming traffic on the LSP from LSR 2.
LSR 2 receives the resv message containing the label assigned by LSR 3. LSR 2 stores the label (20) as part of the reservation state for the LSP. LSR 2 uses this label when forwarding outgoing traffic along the LSP to LSR 3. LSR 2 allocates a new label (10) and places it in the label object (replacing the received label) of the resv message that it sends upstream to LSR 1. LSR 2 uses this label to identify incoming traffic on the LSP from LSR 1.
Processing at LSR 1 (Ingress LSR): LSR 1 receives the resv message that contains the label assigned by LSR. It uses this label for all outgoing traffic that it maps to the LSP. Because of these operations, the LSP is established from LSR 1 to LSR 4 following the explicitly routed path specified in the ERO. LSR 1 forwards traffic for prefix x to LSR 2 by pushing the label it received from LSR 2 (10) into the label header.
Named Path
The explicit route object (ERO) is added to an RSVP path message by the ingress LSR to specify an explicit route for the message, independent of conventional IP routing. The ERO is to be used only when all routers along the explicit route support RSVP and the ERO. The ERO is also intended to be used only for unicast situations.
If you cite a path in the label-switched-path definition, but do not configure the path at the protocol mpls level, the CSPF calculation will fail because the path is not defined, even if there are no loose or strict explicit route elements to list. (For example, you want to make a primary path have a bandwidth constraint, but have the secondary path stand by with no path constraint, you still must define the path).
Loose versus Strict Addresses
Loose addresses should be a loopback address so that if an interface comes down, the LSP can (potentially) reroute dynamically to another interface. If an interface address is specified and the interface comes down, the LSP will come down and will not reroute. Strict addresses MUST specify the next hop. The next hop can be either a directly connected interface or the loopback address of the next hop. The following process is used by each router to evaluate a named path explicit route object.
Continued on next page.
ERO Processing Algorithm
If destination address of RSVP message belongs to your router:
You are the egress router
End ERO processing
Send resv message along reverse path to ingress
Otherwise, examine next object in ERO:
Consult routing table
Determine physical next hop
If ERO object is strict:
Verify that physical next hop is directly connected
Forward to physical next hop
(Note: the router pops its own address off the top of the ERO stack before forwarding the message to the next RSVP neighbor.)
Analysis
An ERO consisting entirely of strict elements specifies the path from the source to the destination. Each element must be connected directly to the next element, although a loopback address can be specified instead of an interface address. If one of the nodes fails, the LSP fails because there is no way to re-route traffic and still comply with the ERO.
Analysis
A loose ERO indicates that the node represented by the address must be visited, but does not state how the node is reached. Every router checks its routing table to determine how best to reach that next hop. Routing tables are consulted at each hop only when the default LSP routing behavior is disabled by issuing the no-cspf option. When the default behavior is enabled, a special constrained SPF algorithm on the ingress router determines the path from the source to the destination. This algorithm is discussed in a later section.
Analysis
Strict and loose nodes can be mixed in the same ERO. In this case, traffic must flow to C strictly (meaning as a next hop), then it can go any way possible to D, and then it must flow directly to F (as a next hop).
Analysis
This highlights that the loose address should be a loopback address so if an interface fails, the routing protocol possibly can reroute traffic to the node through another interface.
This slide shows that the path is defined at the [edit protocols mpls] hierarchy. The path definition is then referenced within the label-switched-path definition. This process is like defining a variable before it can be used in a computer program.
Analysis
You can verify that an LSP is using a particular named path by looking in the ActivePath field of the show mpls lsp command output.
Shortest Path First Algorithm
The ingress router determines the physical path for each LSP by applying a constrained shortest path first (CSPF) algorithm to the information in the traffic engineering database (TED). CSPF is a shortest path first algorithm modified to take into account specific restrictions when calculating the shortest path across the network. Links that do not comply with the restrictions are removed from the calculations.
CSPF: TED and User Constraint Integration
CSPF integrates topology link-state information learned from the IGP and maintained in the TED. Some of the information stored in the TED is:
Attributes associated with the state of network resources (such as total link bandwidth, reserved link bandwidth, available link bandwidth, and link color). These attributes are propagated by the IGP but stored in the TED.
Administrative attributes required to support traffic traversing the proposed LSP (such as bandwidth requirements, maximum hop count, and administrative policy requirements) obtained from user configuration.
Prune Non-Qualifying Links
As CSPF considers each candidate node and link for a new LSP; it either accepts or rejects a specific path component based on resource availability or whether selecting the component violates user policy constraints. The output of the CSPF calculation is an explicit route consisting of a sequence of router addresses providing the shortest path through the network that meets the constraints. This explicit route is then passed to the signaling component, which establishes forwarding state in the routers along the LSP.
Overview
The entire CSPF process has the following six major parts:
Information propagation: Traffic engineering extensions to either IS-IS or OSPF carry traffic engineering topology information.
Information storage: The router stores traffic engineering link-state information in the traffic engineering database.
User constraints: The user specifies constraints for a specific LSP.
Physical path calculation: The CSPF algorithm finds the shortest path of links that comply with the user constraints.
Explicit route generation: The router forms an explicit route (object) from the list of IP addresses that represent the shortest path.
RSVP signaling: The router uses the explicit route object to determine the forwarding path for RSVP path messages.
IGP Extensions
Traffic engineering must be enabled explicitly if OSPF is the IGP. This enabling tells the IGP to carry CSPF updates. Traffic engineering is enabled by default for IS-IS; it carries the updates once MPLS is enabled. The updates carry information on maximum reservable link bandwidth, remaining reservable bandwidth per priority level (eight different priority levels), and any administrative grouping information. The OSPF no-topology and IS-IS traffic engineering disable options both have the same effect. They are used in conjunction with the traffic engineering shortcuts feature; the router does not use the TED for CSPF calculations but can install downstream prefixes for the LSP into inet.3.
Mechanisms
OSPF uses type 10 LSAs to propagate information within an area. This LSA is an opaque LSA with area scope. Opaque LSAs carry information not essential to the OSPF protocol; the protocol does not take action upon their contents. IS-IS uses TLVs to carry the traffic engineering information.
Traffic Engineering Database
Each router maintains network link attributes and topology information in a specialized traffic engineering database (TED). The TED is used exclusively for calculating explicit paths for the placement of LSPs across the physical topology. Because the TED does not know about existing LSPs, it does not allow a CSPF LSP to form over an LSP (but because no-cspf LSPs look in the routing table on a hop-by-hop basis to forward the RSVP messages, no-cspf LSPs might try to form over LSPs).
Traffic Engineering Database Contents
CSPF uses the TED to calculate explicit paths across the physical topology. The TED is similar to IGP link-state databases and relies on extensions to the IGP, but it is stored independently of the IGP database. Included in the TED are network link attributes and topology information.
Traffic engineering requires detailed knowledge about the network topology as well as dynamic information about network loading. The information distribution component is implemented by defining relatively simple extensions to the IGPs so that each router's link-state advertisement includes link attributes. IS-IS extensions include the definition of new type/length/values (TLVs), while OSPF extensions are implemented with opaque LSAs. The standard flooding algorithm used by the link-state IGPs ensures that link attributes are distributed to all routers in the routing domain. Some of the traffic engineering extensions to be added to the IGP link-state advertisement includes maximum link bandwidth, maximum reserved link bandwidth, current bandwidth reservation, and link coloring.
User-Defined Constraints
You can apply the following constraints to path selection:
Bandwidth: The bandwidth to reserve for this LSP. The reserved bandwidth is calculated against each link’s available bandwidth. The available bandwidth is the bandwidth remaining after the the subscription factor is applied to the link and all existing link subscriptions are removed.
Hop count: The maximum number of hops to extend the path to bypass the next downstream node when creating a fast-reroute detour.
Link color: Administrative groups, also known as link coloring or resource class, are manually assigned attributes that describe the color of links, such that links with the same color conceptually belong to the same class. You can use administrative groups to implement a variety of policy-based LSP setups.
Priority: Specifies the setup and hold priority for the LSP. New setup priorities are compared with existing hold values. Only if the setup priority is stronger than the hold priority is that link considered in the path.
Explicit route: Either CSPF or no-CSPF LSPs can specify a loose listing or exhaustive listing of IP addresses for the RSVP messages to follow when signaling the LSP.
How CSPF Selects a Path
The following list provides details on CSPF path selection:
CSPF computes LSPs one at a time, beginning with the highest priority LSP (the one with the lowest setup priority value). Among LSPs of equal priority, CSPF starts with those that have the highest bandwidth requirement.
CSPF prunes the traffic engineering database of all the links that are not full duplex and do not have sufficient reservable bandwidth.
If the LSP configuration uses the include statement, CSPF prunes all links that do not share any included colors, including links with no color assigned.
If the LSP configuration uses the exclude statement, CSPF prunes all links containing excluded colors; links with no color are not pruned.
CSPF finds the shortest path towards the LSP's egress router, taking into account explicit-path constraints. For example, if the path must pass through Router A, two separate SPFs are computed, one from the ingress router to Router A, the other from Router A to the egress router.
If several paths have equal cost, CSPF chooses the one whose last hop address is the same as the LSP's destination.
If several equal-cost paths remain, CSPF selects the one with the fewest number of hops.
If several equal-cost paths remain, CSPF applies the CSPF load-balancing rule configured on the LSP (least-fill, most-fill, or random).
RSVP Signaling
CSPF performs online LSP path calculation. First, you configure LSP constraints at the ingress LSR. These constraints could include bandwidth reservation, the inclusion or exclusion of a specific link(s), and explicit route information. Next, the network actively participates in selecting an LSP through population of the TED. CSPF computes a full ERO for a path that meets the constraints and hands off the completed ERO to RSVP for signaling. The RSVP path messages follow the explicit route object hop by hop to the destination. RSVP resv messages return along the reverse path that the path messages followed. The LSP forms along the same path that the RSVP messages follow, with traffic for the LSP flowing from the source to the destination.
Administrative Groups
Administrative groups allow you to define an LSP that should only cross links that meet the correct qualification of administrative grouping. Each interface can support 32 different administrative groups. Each link’s administrative groups are transmitted with the IGP updates and placed in the TED. When the ingress router computes an explicit route object for the RSVP messages to follow when signaling the LSP, it either includes or excludes links with the appropriate colors—as specified in the LSP definition.
If you use administrative groups, you must configure them identically on all routers participating in an MPLS domain. You have the option to assign more than one administrative group to one physical link.
Administrative Groups
The IGP traffic engineering updates are sent as a 4-byte link affinity or administrative group associated with each link. Each bit value in the 4 bytes represents a different administrative group. The bit values are correlated to names (called colors—but they do not have to be a color; they can be any descriptive term you want). Each link can have one or more bit enabled. The colors advertised by each link display in hexadecimal format in the output of the show ted database extensive command. If no color is assigned, the word defaults to all zeros.
Administrative Groups
Under the [edit protocols mpls] hierarchy, the administrative groups define the group names and their associated bit values. The bit values can range from 0 to 31. The bit values are passed in IGP updates for placement in the TED; the group names are for local reference only.
You must configure all defined groups in the administrative domain, even if a particular groups will not be referenced on an interface. Undefined administrative groups referenced in LSP definitions cause the configuration to fail parsing.
Interfaces
Each interface references the ASCII text group name for the group names associated with that interface. You can configure multiple group names on a single interface.
interface so-0/0/0 is not a typographical error. The two administrative groups are “good” and “management.” Interface s0-0/0/0 must be very happy, because it has good management!
Include/Exclude Groupings
If you omit the include or exclude statement, the path calculation proceeds unchanged using automatic LSP computation. If you configure an exclude list, links that have an administrative group present in the list will not be chosen. If you configure an include list, only links that have an administrative group present in the list will be chosen. Links that do not have an administrative group are automatically disqualified by any include list, but can be chosen if only an exclude list is defined. Changing the LSP's administrative group causes an immediate recomputation of the route; therefore, the LSP might be rerouted.
Logical Groupings
When you define multiple colors for either an include or an exclude statement, the requirement for that link to pass the CSPF calculation is that the link must possess any one of the colors defined in the LSP definition. This mimics the functionality of the logical OR.
When you define both an include and an exclude list in the label switched path definition, the link must comply with both lists. This mimics the functionality of the logical AND.
The IGP’s View of the Best Path
This page illustrates that, based on link metric, the IGP’s view of the optimal path between nodes A and H consists of the path A-D-H with a total IGP metric of 4. Subsequent pages detail how the use of administrative constraints and CSPF can force LSP routing away from the IGP’s shortest path.
Include and Exclude Constraints
The LSP definition from A to H requires that the link include either the color copper or the color bronze AND exclude the color admin. The destination has changed from previous examples. Additionally, the cost of G-H changed and the color of G-I changed. In other words, a link can [include copper and exclude admin] OR [include bronze and exclude admin]. The CSPF algorithm prunes out the following links because they do not include either the color copper or bronze first—then it prunes out exclude links: A-C, A-B, C-F, C-D, F-G, and F-H. The CSPF algorithm prunes out the D-H link because it possesses the excluded color admin. Links A-B and F-H were already excluded by the include constraint, but link D-H passed the include constraint—it did not get pruned until the exclude constraint. The links that pass both include and exclude constraints are A-D, D-E, E-B, B-G, E-G, G-I, G-H, and H-I. From the available links that comply with the constraints, a shortest path is computed. There are four possible paths: A-D-E-B-G-H (cost 14), A-D-E-G-H (cost 8), A-D-E-B-G-I-H (cost 13), and A-D-E-G-I-H (cost 7). The router uses the Dijsktra algorithm to calculate the shortest path. You can calculate all shortest paths and just pick the lowest one yourself (the process is referred to as an exhaustive computation). The shortest path is A-D-E-G-I-H. In this case, note that the shortest path has more hops, due to the cost of 3 on the link G-H, versus the cost of 2 for the sum of links G-I and I-H.
The Net Result
The graphic on the slide shows the results of the CSPF decision process based on administrative groups. The key point is that the IGP’s view of the best path was not chosen due to the need to comply with administrative group constraints.
A Short-Term Solution
You configure fast reroute on an LSP to minimize the effect of a failure in the LSP. Fast reroute enables a router upstream from the failure to route around the failure quickly to the router downstream of the failure. The upstream router then signals the outage to the ingress router, thereby maintaining connectivity before a new physical path for the LSP is established. It is only a short-term solution.
When fast reroute is enabled, the ingress router adds an object to the RSVP path messages telling downstream routers to do a fast reroute. All downstream routers then originate detour path messages to detour around the next downstream node. They receive detour resv messages with appropriate label information.
Once an active physical path fails, if a detour is available, the upstream router sends a path error message to the ingress router. That message triggers new CSPF computations, as well as a switch-over to any alternate path. If no detour is available, it sends a resv tear message and, at the same time, withdraws the MPLS labels, which brings down the LSP. A fast-reroute path can stay up indefinitely if an alternative path is not available.
Traffic Engineering Constraints
By default, the fast-reroute path only inherits the administrative group settings from the original LSP; therefore, it is possible for a fast-reroute path to have substantially less bandwidth than was specified in the original LSP. As soon as the ingress node re-signals the LSP, the fast reroute path tears down. The newly signaled LSP will have the correct traffic parameters, including bandwidth. You can configure the following fast-reroute parameters, if you want: bandwidth, hop limit, include administrative groups, and exclude administrative groups.
Fast Reroute in Operation
By default, the router uses the TED to calculate a detour path. The ingress router, which specifies the fast reroute, can add up to an additional six hops to the path to bypass the next downstream node. You can use the hop-count parameter in the LSP to change the default number of hops the router supports when calculating a reroute. When a router with a fast-reroute detour available recognizes a link or node failure, it immediately detours the traffic—giving the traffic a detour time on the order of hundreds of milliseconds.
Each downstream node originates its own detour path messages. It is possible for a node not to have an available detour path, which means that particular node is not included within the fast reroute protection. If that node fails, there will not be quick failover.
You can specify fast reroute either at the label-switched-path level or within a primary or secondary physical path definition. The label-switch-path level means that all primary and secondary physical paths maintain fast reroute, which maintains additional state within the routers.
Enable Fast Reroute on Ingress LSR: Analysis
In this case, the San Francisco node determines that the next downstream node is Los Angeles, with a follow-on node of Austin. The San Francisco node therefore calculates and signals a fast-reroute path around Los Angeles to Austin. The Los Angeles node likewise calculates and signals a path around the Austin node. The Austin node calculates and signals a route around the Miami node. If any link or node fails, the fast-reroute path will recognize the failed LSP quickly and immediately begin sending traffic on the fast-reroute path.
mpls {
label-switched-path to-ny {
to 192.168.24.1;
primary use-austin;
secondary use-fargo;
fast-reroute;
}
path use-austin {
192.168.1.2 loose;
}
path use-fargo {
192.168.8.1 loose;
}
}
Los Angeles to Austin Link: Analysis
In this case, the link between Los Angeles and Austin has failed. Lost Angeles recognizes the loss of link. It immediately forwards the traffic along the fast-reroute path to the Miami node. It also sends a path error message to San Francisco so San Francisco can re-signal the LSP.
Failover to Secondary Path: Analysis
The San Francisco node signals the secondary path through Fargo to New York. Traffic is migrated from the fast-reroute path to the secondary path.
Analysis
Fast reroute at the label-switched-path level means that both primary and secondary paths maintain extra state information. You can also put fast reroute at individual primary or secondary levels, but then only those particular paths have fast-reroute paths available.
By default, fast reroute has a limit of six hops out of the way to get to the next downstream path. You can configure a larger or smaller number with the hop count parameter.
Layer 2 Circuits
Circuit cross-connect (CCC) allows you to configure transparent connections between two circuits, where a circuit can be a Frame Relay DLCI, an ATM VC, a PPP interface, a Cisco HDLC interface, or an MPLS LSP. Using CCC, packets from the source circuit are delivered to the destination circuit with, at most, the Layer 2 address being changed. No other processing—such as header checksums, TTL decrementing, or protocol processing—is done.
Cross-Connect Types
CCC circuits fall into two categories: logical interfaces, which include DLCIs, VCs, and PPP and Cisco HDLC interfaces; and LSPs. The two circuit categories provide the following three types of cross-connect:
Layer 2 switching: Cross-connects between logical interfaces provide what is essentially Layer 2 switching. The interfaces that you connect must be of the same type.
MPLS tunneling: Cross-connects between interfaces and LSPs allow you to connect two distant interface circuits of the same type by creating MPLS tunnels that use LSPs as the conduit.
LSP stitching: Cross-connects between LSPs provide a way to stitch together two LSPs, including paths that fall in two different TED areas.
Interface Tunneling
CCC allows you to connect two ATM, FR, PPP, or Cisco HDLC access links using an MPLS tunnel. Layer 2 packets are essentially bridged from end to end in this configuration. In the figure on the slide, MPLS LSPs connect two ATM networks across an IP cloud. The ATM interface on the M40 expects a VCI of 514 (on whatever path is enabled on that interface). The M20 will transmit on VCI 590 (on whatever path is enabled on the output interface). The IP backbone between the two routers has two LSPs—one in each direction—that connect the two routers. When the cells are converted back to AAL 5 convergence sub-layer PDUs, the routers put an MPLS header on the PDUs and transmit them down the LSP. At the far end, the MPLS headers are stripped off, and the PDU is again segmented into cells and transmitted.
ATM Considerations
ATM packets are reassembled into an AAL 5 convergence sub-layer PDU on input, then encapsulated in an MPLS header and transmitted down the LSP. At the egress router, the MPLS header is stripped and the packet is divided into cells. These cells are transmitted through the outgoing ATM PVC. The MPLS tunnel is transparent to both ATM networks.
Analysis
When traffic from Router A (VC 514) reaches the M40, it is encapsulated and placed onto an LSP, which is sent through the backbone to the M20. At the M20, the label is removed and the packets are placed onto the ATM PVC (VC 590) and sent to Router B. The code samples on the slide show that the receive LSP on one router is the transmit LSP on the other router. The names referenced are the names of the transmit or receive LSPs displayed when you issue the show mpls lsp command.
To configure LSP tunnel cross-connects, you must also configure the CCC encapsulation on the ingress and egress routers (M40 and M20). An example of this configuration on the M40 is shown here:
[edit interfaces]
user@M40# show
at-7/1/1 {
atm-options {
vpi 1 maximum-vcs 1024;
}
unit 514 {
point-to-point; # Default interface type
encapsulation atm-ccc-vc-mux;
vci 1.514;
}
}
CCC Caveats
There are a variety of caveats for configuring CCC on Ethernet family interfaces:
VLAN-ID number: If the VLAN CCC encapsulation is not specified, GE/FE interfaces support VLAN-IDs from 0 to 4094. Regardless of the range of numbers supported, there is a limit of 1024 logical units. If the VLAN CCC encapsulation at the physical interface level is specified, then on logical units that do VLAN CCC, the VLAN CCC encapsulation is specified again AND the VLAN-ID must fall in the range of 512 to 4094. Logical units between 0 and 511 only support normal IEEE 802.1Q VLAN tagging.
PIC revision: All Fast Ethernet PICs support VLAN CCC, but Gigabit Ethernet PICs must be Rev B or greater to support CCC. When you issue the show chassis hardware operational command, the output line for the GE PIC shows the part number. Use the information below and on the next page to determine if the PIC is Rev A or Rev B. If it is Rev A, it will not support VLAN CCC.
lab@SanJose> show chassis hardware
Hardware inventory:
ItemVersionPart numberSerial number Description
PIC 2REV 08 750-001072AC5837 1x G/E, 1000 BASE-SX
Continued on next page.
CCC Caveats (contd.)
Given the output of the show chassis hardware command on the previous page, you can tell that this PIC is a Rev B PIC and does not support VLAN CCC.
The following PICs are Rev A:
750-002980 is the original M5/M10 GigE SX
750-002981 is the original M5/M10 GigE LX
750-001072 is the original M20/M40 GigE SX
750-001324 is the original M20/M40 GigE LX
750-001887 is the original M160 GigE LX
750-001894 is the original M160 GigE SX
The following PICs are Rev B:
750-003163 is the new M5/M10 GigE SX
750-003164 is the new M5/M10 GigE LX
750-003074 is the new M5/M10 Quad GigE SX
750-003075 is the new M5/M10 Quad GigE LX
750-002785 is the new M20/M40 GigE SX
750-002786 is the new M20/M40 GigE LX
750-002879 is the new M20/M40 Quad GigE SX
750-002880 is the new M20/M40 Quad GigE LX
750-003141 is the new M160 GigE SX
750-003142 is the new M160 GigE LX
750-002510 is the original M160 dual port GigE SX
750-002731 is the original M160 dual port GigE LX
Frame Relay: The only issue with Frame Relay is the DLCI range. As stated earlier, when the physical interface is configured for Frame Relay CCC encapsulation, the logical units can be either normal Frame Relay interfaces or they can be CCC Frame Relay interfaces. Normal Frame Relay logical interfaces use a DLCI value between 1 and 511. CCC Frame Relay logical interfaces use a DLCI value between 512 and 1022. Additionally, the Frame Relay CCC encapsulation must also be configured on the logical interface.
PPP and Cisco HDLC: Because both protocols are point-to-point serial protocols, the logical unit can be 0 only. This is not a requirement of the CCC capability, but a requirement of the physical-layer encapsulation.
ATM: If an ATM interface is configured for atm-ccc-vc-mux encapsulation (which is another way of saying CCC), no families can be configured on the logical interface. Likewise, unless Cell Relay (explained later) is configured, CCC only works for ATM Adaptation Layer 5.
Forward Equivalence Class
LDP associates a set of destinations (route prefixes and router addresses) with each data-link layer LSP. This set of destinations is called the forwarding equivalence class (FEC). These destinations all share a common data LSP path egress router and a common unicast routing path.
Managing FEC on LSP to the Egress Router
LDP maps groups of prefixes and router addresses to an egress router at the end of an LSP. LDP manages the LSP to the egress router for each FEC. You can implement VPNs using MPLS for tunneling. This implementation allows the use of overlapping address spaces by different VPNs. Some of these MPLS-based approaches to VPNs support only LDP for signaling. With the JUNOS software implementation of LDP, and Juniper Networks M-series and T-series routers at the core of a network, you can implement edge devices that support VPNs using LDP signaling for MPLS. LDP is not related to RSVP or traffic engineering concepts from previous lectures.
RFC 2547 mandates support for LDP (however, it does not require actually using LDP to signal LSPs—you can still use RSVP to signal the LSPs for RFC 2547).
Purpose of LDP
LDP maps FECs (address prefixes) to label values. The LSP forwarding paths look like a unicast forwarding path, in that MPLS traffic for the ultimate destination is forwarded along the unicast forwarding tree.
LDP allows multiple prefixes to share the same label mapping. No constraints are allowed when signaling the LSPs. The LSPs must follow the IGP path. Because LDP merges together traffic from different tunnels, less total tunnels are required than with RSVP. Also, you can tunnel LDP-signaled LSPs through resource-constrained, RSVP-signaled LSPs (discussed next).
The slide shows the comparison of the number of labels on individual links. On the D-H link, for example, LSPs from A-I and C-I each end up with a separate label mapping for the D-H link—for a total of two labels on that link. LDP at D would aggregate the prefixes downstream from A and C and only advertise a single label on the D-H link. Although this is a small point for only a few LSPs, it might prove much more beneficial with a large number of prefix mappings.
Label Distribution Protocol
LDP establishes sessions between peers. Sequentially, hello messages are exchanged using UDP. After neighbor recognition, the higher IP address establishes the TCP session. Once the TCP session is established, LDP initialization occurs (negotiation/agreement of parameters). Once the LDP session is established, label request and label mapping messages map FECs (essentially, address prefixes with mask length indicators) to labels.
LDP Message Types
LDP uses several types of messages to establish and remove mappings and to report errors. All LDP messages have a common structure that uses a type/length/value (TLV) encoding scheme. LDP defines the following message types:
Discovery messages: These messages announce and maintain the presence of a router in a network. Routers indicate their presence in a network by sending the hello message periodically. This hello message is transmitted as a UDP packet to the LDP port at the group multicast address for all routers on the subnet.
Session messages: These messages establish, maintain, and terminate sessions between LDP peers. When a router establishes a session with another router learned through the hello message, it uses the LDP initialization procedure over TCP transport. When the initialization procedure completes successfully, the two routers are LDP peers and can exchange advertisement messages.
Continued on next page.
LDP Message Types (contd.)
Advertisement messages: These messages create, change, and delete label mappings for FECs. The local router makes the decision to request a label or advertise a label mapping to a peer. In general, the router requests a label mapping from a neighboring router when it needs one and advertises a label mapping to a neighboring router when it wants the neighbor to use a label.
Notification messages: These messages provide advisory information and signal error information. LDP sends notification messages to report errors and other events of interest. The two kinds of LDP notification messages are:
Error notifications: Signal fatal errors. If a router receives an error notification from a peer for an LDP session, it terminates the LDP session by closing the TCP transport connection for the session and discarding all label mappings learned through the session.
Advisory notifications: Pass information about the LDP session or the status of some previous message received from the peer to a router.
LDP Label Mapping
Label request and label map messages associate FECs and labels. On the slide, the router on the right has knowledge of networks 11.0.0.0 and 10.0.0.0. It is running LDP with its upstream neighbors. The router in the middle receives a FEC mapping of networks 11.0.0.0 and 10.0.0.0 to label 52.
The middle router then advertises the FEC with networks 11.0.0.0 and 10.0.0.0 upstream to the left router with a label mapping of 17. The process continues until there are no more LDP adjacencies.
Restrictions for LDP over RSVP
The IGP shortcut computation imposes some restrictions on the network topology allowed. All the routers in the traffic engineered core and in the surrounding LDP cloud must belong to the same OSPF area or IS-IS level. Using multiple areas or levels prevents the IGP shortcut computation from finding an RSVP LSP next hop (because the shortcut goes to the router ID and the only way to determine if an address is a router ID is by having a single area) . As a result, you cannot use a label from a remote LDP session for this router.
You can run LDP over LSPs established by RSVP, effectively tunneling the LDP-established LSP through the one established by RSVP. To do so, you must enable LDP on the lo0.0 interface. Additionally, you must configure the LSPs over which you want LDP to operate, including the ldp-tunneling statement.
Analysis
This slide shows that although the shortest path is the LDP path, RSVP tunneling can cause the LDP traffic to be forwarded through the RSVP tunnel over a traffic engineered path. By default, LDP always follows the IGP shortest path.
Basic Configuration
The only change required to cut and paste the code on the slide is to identify the interface in the second line and the IP address in the last line correctly.
To enable MPLS on a Juniper Networks M-series or T-series router, you must enable MPLS forwarding on each interface and on the router itself. All other MPLS configuration statements are optional.
To configure RSVP, you include statements at the [edit protocols rsvp] hierarchy level of the configuration. (By default, RSVP is disabled.)
If you have configured interface properties on a group of interfaces and want to disable RSVP on one of the interfaces, include the disable statement within the rsvp interface statement.
Finally, you must create a signaled LSP to build the tunnel. (You must configure MPLS and RSVP on all routers in the network that you want the LSP to transit.)
Displaying MPLS LSPS
The show mpls lsp command enables you to verify the status of LSPs touching the router. It displays separate sections that correlate to ingress LSPs, egress LSPs, and transit LSPs. The output categories of this command are:
Ingress LSP: Provides information about LSPs on the ingress router. Each session has one line of output.
Egress RSVP: Provides information about the LSPs on the egress router. MPLS learns this information by querying RSVP, which holds all the transit and egress session information. Each session has one line of output.
Transit RSVP: Provides information about the LSPs on the transit routers. MPLS learns this information by querying RSVP, which holds all the transit and egress session information. Each session has one line of output.
The output fields of this command are:
To: Displays the destination (egress router) of the session.
From: Displays the source (ingress router) of the session.
State: Displays the state of the LSP handled by this RSVP session. It can be either up or down (Up or Dn).
Rt: Displays the number of active routes (prefixes) installed in the routing table to follow this RSVP session. For ingress RSVP sessions, the routing table is the primary IPv4 table (inet.0). For transit and egress RSVP sessions, the routing table is the primary MPLS table (mpls.0).
ActivePath: Displays the named path used for LSP creation.
Continued on next page.
Displaying MPLS LSPS (contd.)
P: Displays whether or not the active path is the primary path. An asterisk (*) indicates that the designated path is the primary path for that LSP.
Style: Displays the RSVP reservation style. This field consists of two parts—the first is the number of active reservations, and the second is the reservation style, which can be FF (fixed filter), SE (shared explicit), or WF (wildcard filter).
Labelin: Displays the incoming label for this LSP.
Labelout: Displays the outgoing label for this LSP.
LSPname: Displays the name of the LSP.
Displaying Additional MPLS Information
The show mpls lsp extensive command provides additional information beyond what is displayed without the extensive flag. Typically, the reason for success or failure of the LSP is displayed in the LSP history, along with a timestamp. Specific RSVP-TE object values are also displayed.
The following list provides the output fields of the show mpls extensive command. The slide does not show all of these fields, as the output does not fit on a single page.
Ingress LSP: Displays information about LSPs on the ingress router. Each session has one line of output.
Egress RSVP: Displays information about the LSPs on the egress router. MPLS learns this information by querying RSVP, which holds all the transit and egress routers’ session information. Each session has one line of output.
Transit RSVP: Displays information about the LSPs on the transit routers. MPLS learns this information by querying RSVP, which holds all the transit and egress routers’ session information. Each session has one line of output.
Address (first line of each section): Displays the destination (egress router) of the LSP.
From: Displays the source (ingress router) of the session.
State: Displays the state of the LSP handled by this RSVP session. It can be either up or down (Up or Dn).
Continued on next page.
Displaying Additional MPLS Information (contd.)
ActiveRoute: Displays the number of active routes (prefixes) that have been installed in the forwarding table to follow this LSP. For ingress LSPs, the forwarding table is the primary IPv4 table (inet.0). For transit and egress RSVP sessions, the forwarding table is the primary MPLS table (mpls.0).
Style: Displays the RSVP reservation style. This field consists of two parts—the first is the number of active reservations, and the second is the reservation style, which can be FF (fixed filter), SE (shared explicit), or WF (wildcard filter). Although WF is a valid reservation style, Juniper Networks M-series and T-series routers do not signal for a WF style.
Labelin: Displays the incoming label for this LSP.
Labelout: Displays the outgoing label for this LSP.
LSPname: Displays the name of the LSP.
Time left: Displays the number of seconds remaining in the lifetime of the reservation.
Since: Displays the date and time when the RSVP session was initiated.
Tspec: Displays the sender’s traffic specification, which describes the sender’s traffic parameters.
Port number: Displays the protocol ID and sender/receiver port used in this RSVP session.
PATH rcvfrom: Displays the previous-hop router or local client.
PATH sentto: Displays the next-hop router or local client.
RESV rcv from: Displays the next-hop router or local client.
Record Route: Displays the recorded route for the session as taken from the record route object.
Details: Displays the details of the most recent 50 events.
Displaying the MPLS Switching Table
The show route table mpls.0 command lets you see the MPLS switching table. The label value on the left of the output is the expected incoming label value. The information to the right indicates the output interface and label value.
For every interface you enable, two special routes are installed in the label forwarding table. One route has a label value of 00000, and the second has a label value of 00001. Both labels 0 and 1 are reserved labels. Label 0 means IPv4 explicit null. Upon seeing label 0, the router automatically decapsulates the MPLS header and performs an IPv4 lookup. The penultimate LSR (the last router before the egress router) inserts label 0 in the MPLS header as it forwards the packet to the egress router. Label 0 indicates to a router that it is the egress LSR, and it should therefore perform an IPv4 lookup.
Label 1 means router alert. This alert is similar to the router alert option in the IP world. MPLS packets with label 1 are intercepted and processed, regardless of any LSP configured or not. This configuration does nothing more than enable the MPLS software on the specified interfaces and on the router.
The additional labels displayed show what the input labels expected by the router are and what the output LSP will be. To see the actual output label values, use the show rsvp session command. Some of the labels have an (S=0) flag, which means that this is the label mapping for an MPLS label, with the stack bit set to 0.
Displaying RSVP Session Information
The show rsvp session command displays information about the different categories of LSPs. It identifies the source and destination address for the various sessions. Interfaces without sessions are not displayed as sessions. The following list provides details of the output fields of this command:
Ingress RSVP: Displays information about ingress router RSVP sessions. Each session has one line of output.
Egress RSVP: Displays information about the egress router RSVP sessions. Each session has one line of output.
Transit RSVP: Displays information about the transit router RSVP sessions. Each session has one line of output.
To: Displays the destination (egress router) of the session.
From: Displays the source (ingress router) of the session.
State: Displays the state of the LSP handled by this RSVP session. It can be either up or down (Up or Dn).
Rt: Displays the number of active routes (prefixes) installed in the routing table to follow this RSVP session. For ingress router RSVP sessions, the routing table is the primary IPv4 table (inet.0). For transit and egress router RSVP sessions, the routing table is the primary MPLS table (mpls.0).
LSPname: Displays the name of the LSP.
Continued on next page.
Displaying RSVP Session Information (contd.)
Style: Displays the RSVP reservation style. This field consists of two parts—the first is the number of active reservations, and the second is the reservation style, which can be FF (fixed filter), SE (shared explicit), or WF (wildcard filter).
Labelin: Displays the incoming label for this LSP.
Labelout: Displays the outgoing label for this LSP.
Time left: Displays the number of seconds remaining in the lifetime of the reservation.
Displaying Neighbor Information
The show rsvp neighbor command identifies neighbors that have had active sessions. The following list provides details of the output fields:
RSVP neighbor: Displays the number of neighbors about which the router has learned. Each neighbor has one line of output.
Address: Displays the address of a learned neighbor.
Idle: Displays the amount of time the neighbor has been idle, in seconds.
Up/Dn: Displays the neighbor up/down transitions as detected by RSVP hello packets. If the up count is one greater than the down count, the neighbor is currently up. Otherwise, the neighbor is down. Neighbors that do not support RSVP hello packets, such as routers running JUNOS Release 3.2 or earlier, are not reported as being up or down.
LastChange: Displays how long ago the neighbor state changed (either from up to down or from down to up).
HelloInt: Displays the configured hello interval for the neighbor.
HelloTx/Rx: Displays the number of hello packets sent to and received from the neighbor.
MsgRcvd: Displays the number of path and resv messages that this router has received from the neighbor.
MsgType: Displays the types of RSVP messages that this router has received from the neighbor. Only path and resv messages are counted.
Displaying RSVP-Enabled Interfaces
The show rsvp interface command shows interfaces with RSVP enabled. Use it to verify that RSVP is configured. You can use the highwater mark for capacity planning purposes. The following list provides details of the output fields:
RSVP interface: Displays the number of interfaces on which RSVP is active. Each interface has one line of output.
Interface: Displays the name of the interface.
State: Displays the state of the interface. It can be up or down.
Active resv: Displays the number of reservations actively reserving bandwidth on the interface.
Subscription: Displays the user-configured subscription factor. The default is 100%.
Static BW: Displays the reservation bandwidth in bits per second (bps).
Available BW: Displays the amount of bandwidth, in bps, that RSVP is allowed to reserve. It is equal to the static bandwidth multiplied by the subscription factor.
Reserved BW: Displays the currently reserved bandwidth in bps.
Highwater mark: Displays the highest bandwidth, in bps, ever reserved on this interface.
Next Hop Resolution
Because LSPs have a lower preference than most routing protocols, the router prefers using the LSP to the BGP next hop 192.168.24.1 over the route learned from IS-IS.
Proving the LSP Works
The traceroute command shows that the traceroute to a BGP route follows the LSP. The MPLS label values are visible, along with the remainder of the MPLS header fields. The command is abbreviated, so not all output is present.
This Module Discusses:
The roles of P, PE, and CE routers;
VPN-IPv4 address formats;
Route distinguisher use and formats;
RFC 2547bis control flow; and
RFC 2547bis data flow.
RFC 2547bis Terminology
The following slides examine RFC 2547bis terminology.
Customer Edge Routers
Customer edge (CE) routers are located at the customer location and provide access to the provider-provisioned VPN (PP-VPN) service. CE routers can interface to provider PE routers using virtually any Layer 2 technology and routing protocol.
Provider Edge Routers
Provider edge (PE) routers are located at the edge of the provider’s network. They interface to the CE routers on one side and to the provider’s core routers on the other. PE routers maintain site-specific VPN route and forwarding tables (VRFs). The PE and CE routers function as routing peers, with the PE router terminating the routing exchange between customer sites and the provider’s core.
Routes learned from the CE routers (and stored in the PE router’s VRF) are sent to remote PE routers using MP-IBGP.
PE routers use MPLS LSPs when forwarding customer VPN traffic between sites. The use of fixed-length label swapping in the provider’s core allows the customer sites to use private addressing (RFC 1918).
Provider Routers
Provider (P) routers are located in the provider’s core. These routers do not carry VPN customer routes, nor do they interface in the VPN control and signaling planes. This is a key aspect of the RFC 2547bis scalability model; only PE routers are aware of VPN customer routes, and no single PE router must hold all VPN customer state information.
P routers are involved in the VPN forwarding plane where they act as label-switching routers (LSRs) performing label swapping (and popping) operations.
VPN Sites
A VPN site is a collection of devices that can communicate with each other without the need to transit the provider’s backbone. A site can range from a single location with one router to a network consisting of many geographically diverse routers.
Mapped to a VRF
Each VPN site is attached to at least one PE router, and can be dual-homed with multiple connections to different PE routers. Each site is associated with a site-specific VRF in the PE routers. It is here that the PE maintains the routes specific to that site and, based on policy, the routes for remote sites to which this location can communicate.
Virtual Private Network Routing and Forwarding Tables
In the Layer 3 VPN model, site-specific VPN routing and forwarding (VRF) tables house each site’s routes. This separation of routes allows VPN customers to use private addresses that can overlap with addresses used by other VPN customers.
On this slide, PE1 has three VRF tables—one for each of its attached VPN sites. The VRF tables store routes learned from the attached site, as well as routes learned through MP-IBGP interaction with remote PE routers. In the latter case, VPN policy determines what routes are copied into what VRFs based on the presence of a VPN-IPv4 route attribute known as a route target.
VRF Population
As mentioned previously, each PE router maintains site-specific VRF tables that house routes learned from the local CE device, as well as routes learned from remote PE routers having matching route attributes.
Site Separation
When a packet is received from a given site, the PE router performs a longest-match Layer 3 lookup against only the entries housed in that site’s VRF. This separation permits duplicate addressing among VPN customers with no chance of routing ambiguity.
VPN-IPv4 Address Structure
The following pages examine the structure of VPN-IPv4 addresses.
Duplicate Addresses Welcome!
This slide stresses that two VPN customers can use overlapping address space with no issues due to the separation of their routes in site-specific VRF tables.
In this example, VPN site A is using the 10.1/16 addresses space, which is also being used by VPN customer B. Housing these overlapping routes in separate VRF tables on PE routers is only half of the solution. A mechanism is needed to allow the PE routers to exchange these routes with remote PE routers without any chance of one address stepping on the other.
For example, when PE1 advertises routes from its two VRF tables to PE 2, they arrive over a common MP-IBGP connection that is not inherently associated with a particular VRF. How can we assure that PE 2 will interpret these routes as being independent and unrelated?
The answer lies in the structure of a VPN-IPv4 address containing a route distinguisher designed to fix the very problem posed here.
VPN-IPv4 Address Family
The slide shows the structure of a VPN-IPv4 address. VPN addresses use a new MP-BGP sub-address family identifier (SAFI). Because they are, in the end, IPv4 addresses, they use the same family identifier as conventional IPv4 routes.
VPN NLRI contains a 24-bit MPLS label, which is sometimes called a VRF label because the label’s function is to associate packets with a particular VRF instance in the receiving PE router. VPN addresses also contain a route distinguisher field, which is used to disambiguate VPN routes. In other words, two identical IP prefixes are considered as different, and therefore incomparable, when they carry different route distinguisher values.
Distributed by MP-BGP
Labeled VPN routes are exchanged over the MP-BGP sessions, which terminate on the PE routers.
VPN Route Masks
A 32-bit prefix combined with the other fields in a VPN address produce a route mask of 120 bits. JUNOS software only displays the mask for the IP prefix portion of the address. Thus, in this case, the operation would see a VPN route with a mask length of /32.
Two Route Distinguisher Formats Are Defined
The route distinguisher can be formatted two ways:
Type 0: This format uses a 2-byte administration field that codes the provider’s autonomous system number, followed by a 4-byte assigned number field normally set to the router ID (RID) of the PE router advertising the routes.
Type 1: This format uses a 4-byte administration field that is normally coded with the RID of the advertising PE router, followed by a two-byte assigned number field that caries a unique value for each VRF supported by the PE router.
The examples on the slide show both the type 0 and type 1 route distinguisher formats. The first example shows the 2-byte administration field with the 4-byte assigned number field (type 0).
Disambiguates IPv4 Addresses
As mentioned on the previous page, the route distinguisher allows the router to disambiguate two identical IP prefixes.
VPN-IPv4 Routes
The ingress PE router adds or prepends the route distinguisher to the IPv4 prefix of routes received from each CE router. Then these VPN-IPv4 routes are exchanged between PE routers using MP-BGP. The egress router converts the VPN-IPv4 routes back into IPv4 routes before inserting them into the site’s routing table.
Used Only in the Control Plane
The VPN address family exists only in the signaling or control plane between PE routers. Routes that match VPN policy, and are therefore installed into a particular VRF, will have the 8-byte route distinguisher (and MPLS label) removed so that they will appear as conventional IPv4 routes in the VRF. Because the site-specific VRFs provide route isolation, there is no need for the route distinguisher once a route is safely stored away in a VRF. Only signaling exchanges between PE routers use the VPN address format.
Overlapping Routes Revisited
With the inclusion of the route distinguisher, the overlapping address spaces used by VPN customers A and B do not cause ambiguity at PE router 2, as the different route distinguishers make these routes incomparable.
The sole purpose of the route distinguisher is to make what would otherwise be identical addresses incomparable. The PE routers do not interpret or act on the fields in the route distinguisher for any other reason.
Operational Characteristics: Control Flow
The following slides examine the RFC 2547bis model for the policy-based exchange of VPN routing information.
Control Flow
VPN control flows exist at various places in the RFC 2547bis environment. First, we have the signaling exchange between CE and PE routers that can take the form of OSPF, RIP, BGP, or even static routing. The control exchanges between PE and CE routers are totally independent, due to the PE routers terminating the local CE-PE signaling flows. The PE routers then use MP-IBGP to convey routes from site-specific VRFs for the purposes of populating the VRFs on remote PE routers.
Finally, the need for LSPs in the provider’s networks results in the presence of MPLS-related signaling in the form of either RSVP or LDP.
Data Flow
Data flow relates to the actual forwarding of VPN traffic from CE router to CE router using MPLS label-based switching through the provider’s core.
Administrative Policy
The use of policy in the PE routers determines the connectivity that will result between VPN sites. While site connectivity requirements are defined by the VPN customers, the act of implementing this policy is the job of the service provider.
Mistakes made by the provider when defining and implementing VPN policy can lead to security breaches at worst and broken VPN connectivity at best.
VPN Topology Options
VPN policy is extremely flexible and can result in full-mesh, partial-mesh, or hub-and-spoke topologies. The combination of VPN import and export policy determines the resulting site connectivity.
Route Distribution between PE Routers
VPN policy makes use of extended BGP communities that allow PE routers to filter routes for which they have no VPN members. When a PE router has locally attached VPN members, these communities allow the PE router to install the VPN route into the VRF associated with specific sites.
The most important extended community is the route target, which is used to convey a route’s association with a given VPN/VRF. The site of origin (SoO) community is used in certain corner cases to prevent the unnecessary advertisement of routes back to a site that originated it.
Structure of Extended Communities
BGP extended communities are defined in draft-ramachandra-bgp-ext-communities. Extended communities’ attributes have a structure similar to the route distinguisher in that they are 8 bytes in length and support the same type code options and structure.
Route Advertisements
Each VPN-IPv4 route advertised by a PE router contains one or more route target communities. These communities are added using VRF export policy or explicit configuration.
Receiving Routes
When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target matches one of its local VRFs. Matching route targets cause the PE router to install the route into all VRFs whose configuration matches the route target.
Careful Policy Administration
Because the application of policy determines a VPN’s connectivity, you must take extra care when writing and applying VPN policy to ensure that the VPN customer’s connectivity requirements are faithfully met. Several companies offer automated VPN provisioning tools to minimize the work required when re-provisioning a VPN to meet changing customer requirements. These tools can also limit the errors that tend to occur when changes are manually entered by human operations.
Go to http://www.juniper.net/products/network_mgmt_ioa.html to obtain updated information on the management alliances that Juniper Networks has formed with the providers of such provisioning tools.
Routing Exchange
The following sequence of slides discusses the end-to-end exchange of routing information between CE routers belonging to the same VPN.
CE-4 sends the routes associated with VPN A site 2 to its attached PE router. The 10.1/16 prefix can be exchanged using OSPF, RIP, or BGP. Static routing can also place a site’s routes into the local PE router’s VRF table.
Whatever protocol is used between CE-4 and PE-2, the operation of this protocol is terminated by the PE router. This termination provides isolation of the VPN site’s routing protocol, and the MP-IBGP protocols used to convey the routes between PE routers. This isolation improves scalability and stability as malfunctions in the PE-CE routing protocol tend to be limited to that PE-CE pairing.
Populating the Local VRF
Routes received by a local CE device are automatically installed into the VRF associated with that site.
VRF Export Policy
The PE router evaluates the route based on its configuration. If the vrf-export policy accepts the route or if a vrf-target is configured, the PE router converts the address into the VPN format by adding the configured route distinguisher. At this time the PE router also chooses a 20-bit MPLS label value used to associate received traffic with this VRF. Lastly, the PE router associates the route with one or more extended communities. At a minimum, the route will have a route target community added.
Advertisement to Remote PE Routers
In step 4, PE-2 generates a BGP update message containing the route learned from CE4 at VPN site A. This route is sent to all MP-IBGP peers configured on the PE router that have successfully negotiated the support of the VPN-IPv4 address family. Other routes learned from the CE device that share common community attributes can be packed into a single NLRI advertisement.
Import Targets Determine the Route’s Fate
Step 5 shows the remote PE routers receiving the VPN route advertisement. These PE routers use their configured VRF import policy or vrf-target to determine if any of their local VRFs have matching route targets.
If no local configuration matches on the route target, the PE router silently discards the route. Thus, a PE router must only carry VPN routes when it has one or more locally attached sites belonging to the same VPN. Should the remote PE router’s import policy or vrf-target change, BGP route refresh is used to solicit a retransmission of previously advertised routes as route target matches can now occur due to the policy modifications. Use of BGP route refresh means that BGP sessions do not have to be disrupted when adds, moves, or changes to the VPN topology occur.
When the received route’s target does match a vrf’s route target configuration, the PE router copies the VPN route into the bgp.l3vpn.0 table. This table houses all received VPN routes whose route target matched at least one VPN’s configuration. The route is also copied into one or more local VRF tables, after having the route distinguisher removed. The result is that prefix 10.1/16 is now present in PE-1’s “red” VRF table in a native IPv4 format.
PE-1 now associates the RID of PE-2 as the next hop for 10.1/16 when forwarding traffic that matches the prefix and was received on its “red” VRF interface.
Label Association
When VPN routes are advertised, part of the NLRI is the VRF label chosen by the advertising PE router. This label is often called the inner label as it is always found at the bottom of the label stack. The purpose of this label is to associate received packets with the correct VRF table.
The receiving PE router must be able to resolve the RID of the advertising route to an MPLS LSP stored in the inet.3 table. If an LSP does not exist to the advertising PE router, the route is hidden due to an unusable next hop. VPN traffic can only be forwarded across the provider’s backbone using MPLS switching. If an LSP to the egress PE router does not exist, the VPN route can never be used.
The result of this process is a two-level label stack that is used to forward packets across the provider’s backbone, and then to associate the traffic with a specific VRF on the receiving PE router.
Common Labels
RFC 2547bis allows the PE router to issue a single VRF label for all routes belonging to a common VRF interface, or to allocate a unique label for each route being advertised. JUNOS software takes the former approach as it drastically reduces the number of VRF labels that must be managed. Compliant implementations that use per-route VRF label assignment are interoperable with this one-label per VRF interface approach, however.
Advertising Received Routes
In the last step of the Layer 3 VPN signaling flow, the receiving PE router (PE-1)re-advertises the routes learned from remote PE routers to its locally attached CE routers.
These routes can be exchanged using any supported PE-CE routing protocol, or can be defined statically on the CE device. The CE device associates the PE router’s VRF interface as the next hop for the routes learned from the PE router.
Because the local PE-CE routing protocol is terminated by the PE router’s VRF, in this example, CE-4 can run EBGP while CE-3 might be running OSPF or RIP.
Where desired, routing policy can be used to control or refine the route exchange between PE and CE routers further. This policy would function in addition to the VRF import and export policy discussed in this section.
Operational Characteristics: Data Forwarding
The following slides examine RFC 2547bis traffic forwarding.
LSP Must Exist between Ingress and Egress PE Routers
Because VPN traffic is forwarded across the provider’s backbone using MPLS, the presence of an MPLS LSP between ingress and egress PE routers must be in place before VPN packets can be forwarded.
RSVP or LDP can establish the PE-to-PE LSP. The PE-to-PE LSPs can involve PE routers running LDP with the resulting LDP LSPs tunneled over a traffic engineered RSVP LSP.
CE Device Forwards VPN Traffic to PE Router
On this slide, the CE device performs a longest-match lookup on a packet addressed to 10.1/16. This lookup results in the CE device forwarding the packet to the IP address associated with the PE router’s VRF interface.
PE Router Consults VRF for Longest Match
Upon receipt of the packet, the PE router conducts a longest-match route lookup in the VRF associated with the interface with which the packet arrived.
Two Labels Are Derived
Assuming that a match is found, the PE router associates the packet with two labels: the VRF label originally advertised with the route, and an outer MPLS label, assigned by either LDP or RSVP, which is used to associate the packet with the LSP between ingress and egress PE routers.
Two-Level Label Stack Required
The PE router performs a double label push operation involving both the VRF and MPLS labels. The VRF label associates the packet with the correct VRF on the egress PE router while the MPLS label associates the packet with the LSP that terminates on the egress PE router. The ingress PE router now forwards the labeled packet to the next hop LSR along the LSP’s path.
MPLS Forwarding across Provider Core
As the labeled packet traverses the provider’s core, the LSRs that make up the LSP act upon (and swap) the outer MPLS label. In contrast, the inner VRF label remains untouched throughout the labeled packet’s journey.
The use of exact match MPLS forwarding allows the P routers to forward the packet towards the egress PE router correctly, without any awareness of the labeled packet’s contents. This concept is key to RFC 2547bis scalability, as this MPLS capability is what allows P routers to remain blissfully unaware of all things VPN.
Penultimate Hop Popping
The last P router in the LSP’s path performs a pop operation, which results in a single-level label stack. The packet is now forwarded to the LSP’s egress point with only the VRF label.
VRF Label Removed by Egress PE Router
The egress PE uses the received VRF label to map the packet to a specific VRF interface.
IPv4 Packet Is Sent to Outbound Interface
After mapping the packet to a specific VRF interface, the VRF label is popped, and the packet is sent to the CE device attached to that VRF interface.
This Module Discussed:
The roles of P, PE, and CE routers;
VPN-IPv4 address formats;
Route distinguisher use and formats;
RFC 2547bis control flow; and
RFC 2547bis data flow.
This Module Discusses:
Policy overview;
Import versus export policies;
Default policies for common protocols;
Route filters and match types;
The creation of multiterm policies;
Using operational mode commands to monitor policy operation; and
Advanced policy capabilities.
Routing Policy
The slide shows the topics discussed on the following pages.
Concept of Routing Policy
The concept of routing policy has been around for many years and is not specific to Juniper Networks platforms. Policy is a very powerful tool that lets you manipulate routes that you receive and/or send. In other words, you can manipulate the default decision process of the router by changing route attributes or ignoring and suppressing routes. As we look at policy in more detail, note that policy evaluation is centered on the routing table. Subsequent slides address this fact.
Match/Action Pairs
JUNOS software policies are sets of match and action pairs. The match section is a listing of criteria; the action section defines what to do if the match criteria are satisfied.
Applying Policy
Generically speaking, you use JUNOS software policies when you want to alter the default behavior of the router. More specifically, you might want to filter routing information from a neighbor, filter routes to a neighbor, or redistribute routes between routing protocols.
The filtering of routing information is one major use of the policy language. Based on criteria such as protocols or individual routes, you have the ability to allow or deny information to neighboring routers.
If a situation exists in your networking environment where information from a particular protocol (such as static routes) must be sent to another protocol (such as BGP), you need a policy. Due to the match/action pairing within a policy, you can select the criteria of all static routes and the action to perform send out via BGP with relative ease.
Lastly, you can alter and modify attribute information within the routes by using a policy. You can change things such as metric values and JUNOS software route preference.
Policy Filtering
All policy processing on Juniper Networks M-series and T-series platforms occurs with respect to the routing table. JUNOS software applies policies as the routing table adds and removes routing information. The keywords import and export imply the direction of data flow with respect to the routing table.
Policy Chaining
Policies can be cascaded to form a chain of policy processing. This is often done to solve a complex set of route manipulation tasks in a modular manner.
Evaluation Process
JUNOS software evaluates policies from left to right based on the order in which they are applied to a routing protocol. JUNOS software checks each policy's match criteria and performs the associated action when a match occurs. If the first policy does not match or if the match is associated with a nonterminating action, the route is evaluated against the next policy in the chain. This pattern repeats itself for all policies in the chain. JUNOS software ultimately applies the default policy for a given protocol when no terminating actions occur while evaluating the user-defined policy chain.
Policy processing stops once a route meets a terminating action, unless you are grouping policies with Boolean operators. Grouping policies for logical operations, such as AND or OR, is a subject that is beyond the scope of this class.
Individual Policies
Individual policies can be comprised of multiple entries called terms. Terms are individual match/action pairs and can be named numerically or symbolically.
JUNOS software lists terms sequentially from top to bottom and evaluates them in that manner. Each term is checked for its match criteria. When a match occurs, JUNOS software performs the associated action. If no match exists in the first term, JUNOS software checks the second term. If no match exists in the second term, JUNOS software checks the third term. This pattern repeats itself for all terms. If no match exists in the last term, JUNOS software checks the next applied policy.
When a match is found within a term, JUNOS software takes the corresponding action. When that action is taken, the processing of the terms and the applied policies stops.
Basic Policy Syntax
Up to this point, we have been referring to a policy term as a match/action pair. JUNOS software uses the keywords shown below for the match and the action:
Match = from
Action = then
Match Criterion
Match conditions in a policy are the criteria to be met for a policy action to take place. A policy term can have a single match condition, multiple conditions, or no conditions. The absence of any match conditions means that all possible routes match.
Match Conditions
As shown on the slide, match conditions can include the IP address of a neighbor in the AS, routing protocol-specific information (for example, the OSPF area), or the keyword protocol. JUNOS software also supports regex matches on AS path and communities. The discussion of regex matches is beyond the scope of this course.
In JUNOS software, the keyword protocol has special meaning. It is the source of routing information. This keyword is also used within the context of the routing table inet.0 as to which protocol placed the IP prefix in the table. Within a policy, because we are implicitly referring to the routing table, the keyword protocol means the same thing: which protocol in the routing table would you like to reference? Possible protocols to reference in a policy include:
BGP;
Direct;
IS-IS;
OSPF;
Static; and
Aggregate.
Match Actions
When the match criteria are met, any possible action can take place. Within a policy term, you can specify one action, multiple actions, or no action. If you configure no action, policy processing continues with the next term in the policy.
Terminate: The two terminating actions in a policy are accept and reject. Both of these actions stop the policy processing. If a term has multiple actions within it, the terminating action is completed last.
Flow control: Flow control actions allow you to match some routes from the routing table and move them either to the next term in the policy or the next policy in a string of applied policies. The next-policy action is useful when a policy has a global reject at the end of it, which normally rejects all routes. This way, a small subset can be matched and moved to further policies for processing while the majority of the routes get rejected.
Modify attributes: Lastly, you can modify routing protocol attributes within a policy action. Often these actions are specified along with a terminating action, such as accept. As noted above, JUNOS software performs the accept action last so that the modification can take place.
Default Policy
The default policy always applied to a string of policies sounds very mysterious, but in reality it is not. In fact, every routing protocol that runs on a Juniper Networks M-series or T-series platform always applies the default policy for that protocol. Simply put, the default policy is the default operation of the protocol.
Starting with JUNOS software Release 5.5, you can override the default action intrinsic to a particular protocol by including a default-action [accept | reject] within a policy statement. The default-action statement is a nonterminating action modifier, which means that subsequent policy statements can continue to evaluate matching routes.
IS-IS and OSPF
For IGPs such as OSPF and IS-IS, the default import policy is to accept all routes learned from that protocol. Technically speaking, Link State (LS) protocols do not receive routes. Instead, link state information is flooded to all routers to create a Link State Database. Each router then computes optimal paths from this database using a shortest path first algorithm. The default export policy rejects all routes; this is because these protocols advertise routes learned through that protocol, and local routes, by flooding link state information. Using an export policy to limit LSP/LSA flooding would break the operation of a LS protocol.
RIP
The default RIP import policy is to accept all routes learned through RIP. The default export policy advertises no routes, not even those learned through RIP.
BGP
The default BGP import policy has all received BGP routes imported into the routing table. For export, all active BGP routes are sent to all peers, with the exception of not sending routes learned through IBGP to other IBGP speakers. This behavior is in accordance with BGP protocol requirements.
Policy Configuration Example
As the slide shows, policies work best when you apply them to either a routing protocol or the forwarding table.
As far as the JUNOS software commit function is concerned, it is okay to define a policy under [edit policy-options] and not reference it elsewhere in the configuration. The opposite is not true, however. You must configure a referenced policy under a protocol under the [edit policy-options] hierarchy.
Multiple Conditions
Within the from section of a policy term, you can reference multiple match criteria. In this case, all the match criteria must be satisfied before the action is taken. This is a logical AND function.
It is possible to write and commit a policy term that can never be matched. For example, you could configure a from protocol static and from interface fe-0/0/0 match condition in the same policy term. While this configuration will commit, the term can never be matched because it is impossible for a static route to be associated with an incoming interface. Hence, nothing would match this policy term. This type of match criteria would be sensible for a direct routes, because direct routes are associated with an interface.
Applying Policies
You can apply policies as import or export policies, as described earlier. To apply a policy, you must specify an export or import statement that references the desired policy.
Link-State Protocols
When the goal of a policy is to affect IS-IS or OSPF routes, you must apply the policy to that protocol. Link-state protocols support export policies only and do not allow the filtering of LSA/LSPs, as this behavior could violate the standards for the link-state protocol in question. You accomplish LSA/LSP filtering with configuration for a specific protocol, such as a multilevel IS-IS configuration that filters Level 2 LSPs from Level 1 areas.
Link-state routing protocols support export policy applications at the global level only; thus, the policy will apply to all neighbors/adjacencies.
BGP and RIP
The BGP and RIP protocols support both import and export policy applications. You can also apply these policies at the global, group, or neighbor levels in the case of BGP. For RIP, you can apply import policy at the global, group, or neighbor levels. You can only apply export policy at the group level, however.
Filtering Points
BGP policies are not hierarchical. You can apply policy for BGP at the global, group, and neighbor levels.
BGP Policy Evaluation
If you configure single or multiple policies at the neighbor level, any group-level policies will never be evaluated. In this case, you must reference the same policy in both places if you want both levels to evaluate it.
BGP Policy Application
The following list explains the details of the slide:
Peer 1.2.2.4 will use the import policies martian-filter, long-prefix-filter, and as-47-filter. It will use the export policy local-customers.
Peer 1.2.2.6 will use the import policies as-47-filter, long-prefix-filter, and martian-filter. It will use the export policy kill-private-addresses.
Peer 1.2.2.8 will use the import policies reject-unwanted and as-666-routes. It will use the export policy kill-private-addresses.
Generally speaking, you can evaluate BGP policies (as well as other attributes) in the following way. Look at the neighbor level for configuration. If there is nothing at the neighbor level, go to the group level for configuration. If there is nothing at the group level, go to the global level for configuration. Put another way, JUNOS software always implements the most specific aspect of a configuration, and in the case of BGP, the most specific configuration item is applied at the exclusion of less specific items.
Route Filters
Use route filters as a match condition in a policy where the goal is to match criteria for a particular route or group of routes. You can have multiple route filters as a match condition. A group of route filters in a single match condition of a term in a policy are evaluated as a longest match, where only one of the route filters can result in a match.
Route Filter Evaluation
The following slides describe each of the different match types listed on the slide. You can use the CLI operational-mode test command to test a policy for matches against the active routes in the main routing table. The use of the test command is beyond the scope of this class.
exact
The match type exact means that only routes that match the given prefix exactly will pass the filter statement. For example, on the slide, only the prefix 192.168/16, and no other prefixes, will pass the filter statement.
orlonger
The match type orlonger means that routes greater than or equal to the given prefix will pass the filter statement, so the exact route 192.168/16 on the slide will match the statement. In addition, all routes that start with 192.168 and have bit-mask lengths between /17 and /32 will also pass. For example, the following prefixes match the statement: 192.168/16, 192.168.65/24, 192.168.24.89/32, 192.168.128/18, and 192.168.0/17. The following prefixes do not match the statement: 10.0/16, 192.167.0/17, and 200.123.45/24.
longer
The match type longer means that only routes greater than the given prefix will pass the filter statement, so from the example on the slide, all routes that start with 192.168 and have bit-mask lengths between /17 and /32 will pass the filter statement. The following prefixes match the statement: 192.168.65/24, 192.168.24.89/32, 192.168.128/18, and 192.168.0/17. The following prefixes do not match the statement: 10.0/16, 192.167.0/17, 200.123.45/24.
upto
The match type upto means that routes greater than or equal to the first specified prefix, but less than or equal to the second specified prefix, will pass the filter statement. Thus, using the example on the slide, the exact route 192.168/16 will match the statement. All routes that start with 192.168 and have bit-mask lengths between /17 and /24 will also pass. The following prefixes match the statement: 192.168/16, 192.168.65/24, 192.168.128/18, and 192.168.0/17. The following prefixes do not match the statement: 192.168.24.89/32, 10.0/16, 192.167.0/17, and 200.123.45/24.
through
The match type through represents a string of exact matches. In other words, instead of specifying multiple exact route-filter statements, you can use the through statement. However, the through route-filter statement requires that the exact matches follow a specific pattern described by the prefixes given. In the example on the slide, the first specified route in the statement is matched exactly (192.168/16), and the second specified route in the statement is matched exactly (192.168.16/20). Then prefixes along the path in a radix-like tree, called the J-tree, also match the policy statement. The following prefixes also match the statement: 192.168.0/17, 192.168.0/18, 192.168.0/19. All other prefixes do not match the statement.
prefix-length-range
The match type prefix-length-range works very much like the match type upto. The difference is that you are specifying both a starting bit-mask length and an ending length. So, all routes that start with 192.168 and have bit masks between /20 and /24 will pass the filter statement. In the example on the slide, the following prefixes match the statement: 192.168.0/20, 192.168.64/24, 192.168.128/21. The following prefixes do not match the statement: 192.168.24.89/32, 10.0/16, 192.167.0/17, 200.123.45/24, and 192.168.128/18.
Prefix Match
The slide shows a graphical summary of each of the route-filter match types.
Multiple Route Filters
Although having multiple match criteria in a single term is a logical AND function, the presence of multiple route filters is different. In this case, JUNOS software evaluates the prefixes on a route-by-route basis and performs a longest-match lookup. This lookup is similar to the one done by the forwarding table. Once that longest match is found, JUNOS software only evaluates that one route filter to determine a match for the policy term.
If there are multiple route filters and other match criteria in a single policy term, the one longest-match route filter must still be applied to the logical AND process with the other match criteria for the action to be taken.
Specified Route Filter Actions
You can specify policy actions at the router-filter level in addition to the policy term level. When there is an action at the route-filter level, that action is taken above all else, and the policy-term action is ignored. If there is not a route-filter action specified, JUNOS software applies the policy term.
Policy Evaluation
While all three route-filter prefixes match the candidate route, the middle route filter of 10.0.67/24 is the longest match. Because this route filter does not have an action specified at its configuration level, JUNOS software applies the policy. This route has its metric changed to 10 and is accepted.
Policy Evaluation
On this slide, only two of the route-filter prefixes match the candidate route: 10.0/16 and 10/8. The longest match for the candidate route is the 10.0/16 route filter. Because this route filter does have an action specified at its configuration level, JUNOS software takes that action and then ignores the policy. This route will be accepted with no other modifications made to its attributes.
For extra credit, what will happen to the 10.0.55.2/32 prefix if the 10.0/16 route filter statement has its match type changed to exact, as shown:
[edit policy-options policy-statement pop-quiz]
user@host# show
from {
route-filter 10.0.0.0/16 exact accept;
route-filter 10.0.67.0/24 orlonger;
route-filter 10.0.0.0/8 orlonger reject;
}
then {
metric 10;
accept;
}
In this case, the longest match is still the 10.0/16 router filter statement. However, because the match type is now exact, the prefix does not match the 10.0/16 statement. Because each prefix is evaluated only against the longest match within a particular term, this prefix matches nothing in the term. Therefore, it will continue to be evaluated by any remaining policies (possibly the default policy).
Monitoring Effects of Policy
The commands on the slide show routing updates received before import policy processing and the routing updates sent after export policy processing.
Use the show route receive-protocol protocol neighbor command to show the specified protocol-type route advertisements that a particular neighbor is advertising to your router before import policy is applied. Use the show route advertising-protocol protocol neighbor command to show the protocol-type route advertisements that you are advertising to a particular neighbor after export policy is applied.
The use of route filters marks an exception to the behavior documented above. JUNOS software evaluates route filters before the output of a show route receive-protocol command is generated. This means that you must specify the hidden switch to the show route receive-protocol command to display received routes filtered by your import policy.
Answer
After import policy processing, use the show route protocol protocol command to monitor the effects of your import policy. This command shows all routes from the protocol type specified that are installed in the routing table.
This Module Discussed:
Policy overview;
Import versus export policies;
Default policies for common protocols;
Route filters and match types;
The creation of multiterm policies;
Using operational mode commands to monitor policy operation; and
Advanced policy capabilities.