Usage patterns of RIS Live at Code BGP
RIS Live is a reliable, well-designed, operationally robust API that provides a real-time view of BGP updates. In this article, Ioannis Sermetziadis talks through the RIS Live features that have contributed to Code BGP building a platform capable of detecting BGP issues in seconds.
RIS Live provides real-time BGP data in JSON format using a streaming API. From a set of route collectors situated at various locations around the world - each of which peers with multiple BGP routers - the service observes inter-domain routing changes in the form of BGP updates, aggregates the data, and makes it available through a single API. As such, it serves as a public resource for BGP data that helps enable the development of monitoring tools, such as Code BGP.
The API is implemented by two protocols, WebSocket and HTTP. Of the two, the WebSocket API provides more options, and so is more useful for advanced use-cases. On top of BGP updates, the API provides state updates of its route collector peers as well as BGP keep-alives, open and notification messages. This post will focus on BGP update messages in order to serve as a guide for implementing systems that integrate with RIS Live for BGP update retrieval and processing.
RIS Live exposes the subscription entity - using
ris_unsubscribe message types - as a first-class citizen. Clients can filter which BGP data to receive, and so reduce incoming load.
A subscription contains a set of filters - e.g. route collector host, peer IP, AS path and prefix - that form a data stream of interest. If multiple subscriptions are created, a BGP update will be delivered to the client if its attributes match any of the created subscriptions.
In other words, the union of all messages matching the filters of any subscription is delivered. A subscription is acknowledged by the server to signal that it has been received and applied. And finally, a subscription can be unsubscribed (and optionally be resubscribed with new parameters) by the client when the filters of the client change dynamically.
The following cases are supported using subscriptions:
1. When a client needs the BGP updates relevant to one or more networks, one subscription should be created for each network prefix, containing the prefix filter value in CIDR format.
2. When a client needs the BGP updates for all prefixes announced by an Autonomous System (AS), two subscriptions are needed. One subscription needs to be created using the path filter to specify the origin AS. In case the AS can be present anywhere in the path, the path filter can be adjusted accordingly. An extra subscription for the withdrawals needs to be explicitly created for all withdrawals, so that the client is notified about them and can properly track associated prefixes. Note that the path filter is not useful with respect to withdrawals since the path attribute is not present in the BGP update, and the prefix correlation needs to take place client-side.
3. Clients can also subscribe to data of a specific route collector, using the host filter, or of a specific BGP peer, using the peer filter.
Subscriptions enable clients to request multiple streams of BGP data to be delivered in a single WebSocket connection, which is optimal because clients do not need to maintain multiple connections in order to fetch multiple BGP data streams.
Subscription acknowledgements (ACKs) are returned by RIS Live when subscription messages are received and processed by the RIS Live server. Clients need to track these ACKs to ensure that the subscriptions are successfully created. Clients can also perform retries to recover from missing ACKs and ensure completeness of the BGP updates provided by RIS Live. By enforcing a strict policy on the delay of the ACKs, the client can minimise the effect of possible uninitialised subscriptions.
RIS Live signals errors to the client using error messages, using the
ris_error message type, in order to provide debugging information about the connection or invalid messages received by the server.
RIS Live also provides a ping-pong mechanism at the API level, using
pong message types, to ensure liveness between the RIS Live server and the client. This mechanism provides more insights about the validity of the connection. Practically, a connection might be open, but the session with the API server might be non-responsive. This kind of error is detectable by the ping-pong mechanism, in order to make the client more resilient. The resolution action from the client’s side would be the reconnection and initialisation of subscriptions, in order to reset to a valid state.
Connection failures are possible on WebSocket connections, because such connections tend to be open forever until the client decides to close the connection. For this reason, it is important to apply a re-connection mechanism, in order to ensure that connections are restarted upon server restart/reconnection or any transient network issue. Retries can be implemented on different levels of the software stack. However, it is better to implement them on the lowest possible layer in order to simplify the higher application layers by removing this concern.
Dealing with withdrawal load
In the scenario of subscribing to BGP updates using an origin AS, the client needs to subscribe to the withdrawals of all existing prefixes - this is in order to retrieve the withdrawals related to AS-originated prefixes for which subscriptions are created, since AS path filtering does not apply on withdrawals.
This can introduce very high load to a system, increasing its resource usage and degrading performance. This effect can be minimized, however, by employing a cache that will hold all the announced prefixes returned by BGP announcements. When withdrawals are received they will be checked against that cache and only the relevant withdrawals, which contain the announced prefixes, will be forwarded for processing.
This solution depends on the prefix announcement being received before the corresponding prefix withdrawal, so the withdrawal is also filtered. The logic is that in BGP, before the withdrawal of a prefix is propagated, this prefix needs to have been previously - i.e., with an earlier timestamp - announced, making it subject to -path- filtering.
BGP Update Batching
Multiple BGP update messages can be batched into a single protocol message of type
path attribute is an array of ASNs or AS-SETs, which can represent multiple AS paths by producing the combination of each ASN in an AS-SET for each AS node of a path.
Additionally, one or more prefixes can be included in a message, using the
prefixes attribute under the
announcements attribute for BGP announcements, or using the
withdrawals attribute for BGP withdrawals.
What's more, a
ris_message can include multiple announcements containing different
next_hop. All these data model options allow batching multiple prefix announcements that share the same originator and AS paths, or prefix withdrawals, making the API more efficient in terms of data load and bandwidth consumption.
RIS Live is proven to be a well-designed API that forms the foundation of a reliable service providing BGP updates in real-time, usable both in community and commercial initiatives. Additionally, the RIS Live service is operationally robust from the Code BGP Platform point-of-view, since no availability issues have been observed so far in 24/7 operation and its streaming latency fits the needs of a real-time system.
The Code BGP Platform utilises RIS Live and other sources of BGP data - e.g. RPKI Validators, the Code BGP Monitor, users' own BGP routers - to provide a monitoring platform for BGP. The data is aggregated to extract the BGP state of autonomous systems, networks, peerings and routes of interest; the platform presents them in a web user interface in the form of a global looking glass.
In parallel, the BGP data is used for generating metrics in real-time and persisted in Prometheus, so the user can extract valuable information, using Grafana dashboards and other visualisation utilities. BGP update logs and history retrieval is provided using Grafana Loki. Finally, alerts can be set using Prometheus Alertmanager and the expressiveness of Code BGP’s GraphQL data interface, so that the user can be notified in real-time about -non-policy compliant- BGP state changes.
In general, Code BGP helps organisations to detect and resolve BGP issues in seconds. This benefits the entire Internet community due to the nature of BGP - e.g., resolving a prefix hijack or a route leak affecting a client of Code BGP benefits all networks that exchange traffic with that client.
Note that RIS Live is also used by ARTEMIS, an open-source tool that detects BGP prefix hijacking attacks, which is currently maintained by Code BGP. Specifically, it powers the RIS Live monitoring component of ARTEMIS, feeding its detection and mitigation components. The ARTEMIS project has been funded by the RIPE NCC Community Projects Fund.