ONOS motivations, approaches & roadmap
Presentation Outline ● Future networks demands ○ control plane requirements & challenges
● Architectural approaches to achieve... ○ high-availability, scalability, performance ○ device independence - portability of apps to different devices ○ agility & extensibility - allowing users to build their own solutions
● Retrospective & Roadmap ○ past releases ○ future plans
Future Networks require agility, reliability and performance at scale
Control Plane Requirements ● Agility (the key premise of SDN) ○ ○ ○
lift control & mgmt. planes from devices into a centralized entity support high-level abstractions for network control support both device configuration and flow programmability
● Reliability ○ ○
single controller is not a viable solution components do not have to support 6x9s; the solution does
● Performance at scale ○ ○
consider impact of distributed control plane on performance active/backup approach imposes an inherent limit on scalability
Central Control Plane Benefits ● Global view of the network environment ○ overarching visibility, broader perspective
● Simplified programming model ○ network centric operations, e.g. intents
● Accessible to high-level languages ○ Java, Scala, Python, ...
● Open field for innovation and new capabilities ○ opportunity to integrate with external knowledge sources
Central Control Plane Challenges ● Resilience, scale-out and performance ● Build control plane as a distributed system ○ this often bring the question “How is this solving anything?”
● Distributed control plane vs. distributed protocols ○ tightly coupled vs. loosely coupled ○ physically distributed yet logically centralized ○ cardinality of nodes is orders of magnitude smaller
● Hard problem with potential of high rewards ○ solving the problems unlocks the value of SDN
ONOS Unique Value Proposition ● Distributed core ○ provides high-availability, scalability and performance
● Abstractions & models ○ allows applications to configure and control the network without becoming dependent on device specifics
● Applications platform ○ allows developers to dynamically extend the core capabilities ○ neutral ground between apps and the network environment
High-Availability & Scalability
ONOS Core Architecture ● Physically-distributed system ○ to provide high-availability & scalability ○ symmetric - all cluster nodes are identical software-wise
● Logically-centralized state ○ location-transparent access to state and to control interactions
● Dynamic ○ cluster can be dynamically resized to environment needs
● Polyglot state management ○ applies type of treatment appropriate to the type state
strong-consistency treatment for edicts eventual-consistency treatment for sensing & telemetry
Distributed Primitives ● ● ● ● ● ● ●
EventuallyConsistentMap ConsistentMap LeadershipService DistributedQueue DistributedSet AtomicCounter AtomicValue ...
Core Stores & Primitives topology - eventually consistent map device mastership - leadership service flow rules - primary/backup replication intents - consistent map network configuration - consistent map device keys - consistent map applications - eventually consistent map ...
Application Stores & Primitives CORD Virtual Tenant Networks CORD Optical Line Terminator Segment Routing OpenStack Networking Service Function Chaining ACL Management DHCP Server ...
Performance
Control Plane Properties ● Availability ○ ability to withstand control plane failures while retaining control
● Scalability ○ ability to adjust control resources as network size changes while maintaining or improving control performance ○ no inherent limits imposed on scale of environment
● Performance ○ ability to sustain high throughput of control operations ○ ability to react with low latency on control channels ○ no inherent limits imposed on data plane performance
Control Plane Performance ● Throughput of proactive provisioning actions ○ path flow provisioning & global optimization of existing flows
● Latency of responses to topology changes ○ path repair in wake of link or device failures
● Throughput of distributing and aggregating state ○ batching, caching, parallelism, dependency reduction
● Controller vs. device responsibilities ○ defer to devices to provide low-latency reactivity, backup paths
Performance Metrics ● Device & link sensing latency ○
how fast can controller react to environment changes, e.g. switch or port down to rebuild the network graph and notify apps
● Flow rule operations throughput ○
how many flow rule operations can controller cluster handle and characterize relationship of throughput with cluster size
● Intent operations throughput ○
how many intent operations can controller cluster handle and characterize relationship of throughput with cluster size
● Intent operations latency ○
how fast can controller react to environment changes and reprovision intents on the data-plane and characterize scalability
Abstractions & Models
northbound
southbound
northbound
southbound ● ● ● ●
OpenFlow OVSDB NETCONF P4 ...
● ● ● ●
BGP OSPF PCEP TL1 ...
Approach to South-bound ● Standard protocols & models ○ promote & use whenever possible ○ however, do not rely on them solely
● Capabilities are more important than protocols ○ control & configuration capabilities are what is important ○ language is secondary
● Configuration & flow control ○ equally important ○ applications should not depend on device & protocol specifics
device-centric behaviour implementation
behaviour implementation
behaviour
implementation
intents
network-centric
resources
network config
Applications Platform
R-CORD E-CORD
M-CORD
SDN-IP
Segment Routing
OpenStack Integration
DHCP Server
Load Balancer Distributed DPI
Service Function Chaining
Fault Management
GUI
REST API
ONOS applications
ONOS distributed applications platform OSGI / Apache Karaf
Command Line
ONOS networking core
GUI
REST API
ONOS applications
ONOS distributed applications platform OSGI / Apache Karaf
Command Line
ONOS networking core
GUI
REST API
ONOS extensions ONOS networking core
ONOS distributed applications platform OSGI / Apache Karaf
Command Line
applications drivers protocols
GUI
REST API
ONOS applications
ONOS distributed applications platform OSGI / Apache Karaf
Command Line
ONOS networking core
Retrospective & Roadmap
Quarterly Releases ● Avocet (1.0.0) released 2014-12 ○
initial release of clean and modular code-base, protocol independence
● Blackbird (1.1.0) released 2015-03 ○
improved performance, scale-out, increased robustness
● Cardinal (1.2.0) released 2015-06 ○
new use-cases, additional core features, additional SB protocols
● Drake (1.3.0) released 2015-09 ○
platform enhancements, security, UI enhancements
● Emu (1.4.0) - released 2015-12 ○
CORD features, prototype of dynamic cluster scaling
Quarterly Releases ● Falcon (1.5.0) - released 2016-03 ○
dynamic cluster scaling, model extensibility, intents on flow objectives
● Goldeneye (1.6.0) - planned for 2016-06 ○
spring cleaning, YANG tools, GUI scaling, P4 PoC
● Hummingbird (1.7.0) - planned for 2016-09 ○
intent framework, network hypervisor, YANG at NB, P4 support
● I... (1.8.0) - planned for 2016-12 ○
● ...
separate platform & core, gRPC, rolling upgrade, ...
Brief Retrospective ● Started with a minimal platform with only a few apps ○ ○
built with sound structure and solid code & minimalistic REST API 4 apps and 1 SB plugin
● Added new core functionality and apps with each release ○ ○ ○
deliberately balancing investments in platform vs. use-cases and apps show innovation, but also take pragmatic steps to be deployment-ready maintain coherence of architecture and quality of code
● Now a platform with many features and apps ○ ○
new capabilities, distributed primitives and even greater extensibility now 70+ apps, including SB plugins, drivers, and samples
Platform Productization ● ONOS core is stable and maturing ○ core team & community moving forward with new features
● Vendors are hardening the platform edges ○ Ciena developing a production-ready version of ONOS based on the Falcon release ○ Huawei selling solutions based on ONOS
YANG Tools & Shell ● Tools to parse YANG models and produce Java DTOs ○ codecs to produce/consume XML for use in NETCONF ○ codecs to produce/consume JSON for use in NB external APIs
● NB API adapter to enable off-platform apps ○ to engage with ONOS using network-centric YANG models ○ e.g. L3 VPN, SFC
● Being developed by the community (Huawei) ○ unable to use ODL YANG tools due to licensing & compatibility
XOS
YANG shell Implementation Java/JSON
...
...
YANG YANG
behaviour NETCONF
behaviour NETCONF
Implementation Java/XML
Virtual Network Subsystem ● Produces SDN-capable virtual networks ○ with topology and without implicit connectivity ○ connectivity has to be explicitly programmed (via intents)
● One of several possible virtualization approaches ○ shim under ONOS, overlays, …
● Permits arbitrary virtual topologies ○ from big switch to isomorphic mapping to the physical network
● Being developed by the community (Ciena) ○ seed work done in the Drake release
Intent Subsystem 2.0 ● Based on networks comprising of regions with different technologies & limitations ○ ○
different regions of network can use different means to satisfy an intent multiple intent domains within a single administrative domain
● Offers composable network-centric primitives ○ e.g. tunnel, default route, {broad|multi|any}cast ○ efficient use of network resources via shared use of primitives
● Offers apps to negotiate/select from alternatives ○ presently only one intent “solution” is implicitly selected
gRPC API & Kafka Integration ● gRPC Allows fine-grained & high-performance interactions between ONOS and off-platform apps ○ presently available only for on-platform apps via Java API ○ REST API suitable only for low-frequency & coarse interactions ○ enables solutions based on micro-service architecture (Ciena)
● Kafka integration - export ONOS events to offplatform applications ○ export in reliable & deterministic manner
Cluster Federation ● Mechanism for coordination between multiple ONOS clusters ○ permits peer-to-peer & hierarchical arrangements ○ different administrative domains
● Represents peer network topologies as abstracted ● Peer-to-peer variant being developed by GEANT
Rolling Software Upgrade ● Mechanism for gradually upgrading an ONOS cluster ○ upgrades cluster one node at a time without downtime
● Requires portable serialization for cluster comms ○ upgraded nodes must be able to speak the “old” language
ONOS Platform & Core Separation ● Separate the distributed network-agnostic platform from the network-aware core subsystems ○ presently the separation is logical, but there exist some (albeit weak) ties between the two
● Requests for such platform from Fermilab & Oak Ridge National Lab, among others ○ provides significant benefits in context of SDN as well
Summary ● Built to support large mission critical networks ○ high-availability, scalability, performance ○ device independence - portability of apps to different devices ○ agility & extensibility - allowing users to build their own solutions
● Balance between platform, use-cases & innovation ○ stable and easy-to-use platform ○ developer environment that invites experimentations