

#### **IEEE ICNP 2018**

The 26th IEEE International Conference on Network Protocols Cambridge, UK, September 24-27, 2018

## 1st P4 European Workshop (P4EU)

# Named Data Networking with Programmable Switches



LASIGE, Faculdade de Ciências Universidade de Lisboa (Faculty of Sciences, University of Lisbon), Lisbon

#### **Salvatore Signorello**

SnT, University of Luxembourg, Luxembourg

#### Fernando M. V. Ramos

LASIGE, Faculdade de Ciências Universidade de Lisboa (Faculty of Sciences, University of Lisbon), Lisbon

## PRESENTATION OUTLINE





# PART I

Background



IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks

- A network architecture that **focuses on content distribution**.
  - In contrast to the point-to-point IP model.
  - Better suits nowadays Internet's most frequent use cases.
- Machines have no identification (addresses). Only data is named.
  - Consumers request a resource by name with an <u>Interest packet</u>.
  - Producers emit a <u>Data packet</u> uniquely bound to that name.
  - Routers forward these Interests according to its name.
  - Instead of asking a specific host for content, the consumer asks the network.





IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks





#### Named Data Networks





#### Named Data Networks





T

#### Named Data Networks





IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks





IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks





IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks





IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks





IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks





#### Named Data Networks





IEEE ICNP 2018 1st P4 European Workshop

#### Named Data Networks





IEEE ICNP 2018 1st P4 European Workshop

#### Motivation

- No line-rate production hardware exists for NDN.
- Existing routers can't be extended to support the nonconventional packet processing required by NDN.
  - Current solutions are software-based. FIB has thus far always been implemented in software.
- We leverage the recent availability of programmable switches to propose the design and implementation of a new line-rate NDN router written in P4\_16.



# PART II

Architecture



IEEE ICNP 2018 1st P4 European Workshop

#### P4<sub>16</sub> as implementation language

- High-level language to express data plane forwarding behavior.
- The programmer specifies **headers**, **parser and the processing sequence** a packet should undergo.
- A compiler maps the program to the underlying device's capabilities and hardware.





#### Architecture

Π





IEEE ICNP 2018 1st P4 European Workshop

#### Architecture

Π





IEEE ICNP 2018 1st P4 European Workshop



• **PROBLEM 1:** NDN packets do not follow the typical packet structure. They are a tree of **Type-Length-Values (TLVs)**.







• **PROBLEM 1:** NDN packets do not follow the typical packet structure. They are a tree of **Type-Length-Values (TLVs)**.



• What "uk/ac/cam/index" looks like at the network level:

| Interest            |    |           |    |              |   |    |      |   |    |              |   |     |              |   |       |
|---------------------|----|-----------|----|--------------|---|----|------|---|----|--------------|---|-----|--------------|---|-------|
| (TLV <sub>0</sub> ) | 22 | $TLV_{N}$ | 20 | <b>TLV</b> c | 2 | uk | TLVc | 2 | ас | <b>TLV</b> c | 3 | cam | <b>TLV</b> c | 5 | index |





• In P4\_14, parsing this packet structure incurs an enormous amount of parser states.



- In P4\_14, parsing this packet structure incurs an enormous amount of parser states.
- P4\_16 makes it easier. **SOLUTION:** 
  - 1. Use a header\_union. The union's members cover all the TLV possibilities.
  - Encapsulate TLV extraction logic inside a subparser. The main parser calls the subparser whenever a TLV needs to be extracted from the packet.



mediumTLV h mtlv;

largeTLV h ltlv;



- In P4\_14, parsing this packet structure incurs an enormous amount of parser states.
- P4\_16 makes it easier. **SOLUTION:** 
  - 1. Use a header\_union. The union's members cover all the TLV possibilities.
  - Encapsulate TLV extraction logic inside a subparser. The main parser calls the subparser whenever a TLV needs to be extracted from the packet.

#### • ADVANTAGES:

+ P4 program with less code & more readable



header smallTLV h {

bit<8> type;

10/30

header\_union tlv\_hu {
 smallTLV\_h stlv;
 mediumTLV\_h mtlv;
 largeTLV\_h ltlv;
}



- **PROBLEM 2:** Interest names may have any number of components.
  - "<u>uk/ac/cam/index</u>" (4 components)
- SOLUTION:
  - Use the header stack P4 type.
  - The parser is a state machine that can transition to itself.
  - BMv2-ss limitation: one must write MAX parser states, because only compile-time values are allowed as header stack indexes.





#### Architecture

Π





IEEE ICNP 2018 1st P4 European Workshop

## Forwarding Information Base (FIB)

- The **FIB** routes an Interest packet by longest prefix match of the name therein.
- **PROBLEM:** P4 has little support to process strings. We can parse them to varbit fields, but these can't be used to match on tables.

| FORWARDING INFO. BASE |      |  |  |  |  |
|-----------------------|------|--|--|--|--|
| /uk/ac                |      |  |  |  |  |
| /uk/ac/cam            |      |  |  |  |  |
| *                     | Drop |  |  |  |  |



## Forwarding Information Base (FIB)

- The **FIB** routes an Interest packet by longest prefix match of the name therein.
- **PROBLEM:** P4 has little support to process strings. We can parse them to varbit fields, but these can't be used to match on tables.

#### • SOLUTION:

- Use the hash() primitive to convert to an unsigned, fixed-length bit type.
- 2. Perform lpm (longest prefix match) match kind on the result (details follow).







- Introducing a new data structure, the hashtray.
  - DEFINITION: Given an NDN name, a hashtray is a series of blocks each with the result of hashing a component.
  - In our implementation, we used MAX=8 blocks and a 16-bit hash function (names longer than 8 components match only with the first 8 components).

typedef bit<8 × 16> hashtray\_t;

- The hashtray is used in 2 situations:
  - 1. When building FIB entries. Each FIB entry is a hashtray.
  - 2. Whenever an Interest needs to be routed.



## Constructing the FIB

• Let's add the entry: / uk / ac / cam  $\rightarrow$  port 2



• Let's add the entry:





• Let's add the entry:





• Let's add the entry:





• Let's add the entry:

Π

/ uk / ac / cam



#### Add hashtray to the FIB



• Our entry is associated with an Ipm mask covering three blocks (48 bits), as well as the outgoing port.

| FIB, collection of hashtrays (in TCAM) |   |   |   |   |          |  |  |  |
|----------------------------------------|---|---|---|---|----------|--|--|--|
| lpm mask                               |   |   |   |   | out port |  |  |  |
| 48 h("uk") h("ac") h("cam")            | 0 | 0 | 0 | 0 | 0 2      |  |  |  |
|                                        |   |   |   |   |          |  |  |  |
|                                        |   |   |   |   |          |  |  |  |
|                                        |   |   |   |   |          |  |  |  |
|                                        |   |   |   |   |          |  |  |  |
|                                        |   |   |   |   |          |  |  |  |



Π

#### Constructing the FIB

• All entries are added using the same process.

| FIB, collect | 16 bits | ashtray | s (in tCA | .1V1) |   |   |   |   | outport        |
|--------------|---------|---------|-----------|-------|---|---|---|---|----------------|
| 16           | h("uk") | 0       | 0         | 0     | 0 | 0 | 0 | 0 | (7)            |
| 32           | h("uk") | h("ac") | 0         | 0     | 0 | 0 | 0 | 0 | 9              |
| 48           | h("uk") | h("ac") | h("kcl")  | 0     | 0 | 0 | 0 | 0 | $\overline{3}$ |
| 48           | h("uk") | h("ac") | h("cam")  | 0     | 0 | 0 | 0 | 0 | (2)            |
| 48           | h("pt") | h("ul") | h("fc")   | 0     | 0 | 0 | 0 | 0 | (4)            |
|              |         |         |           |       |   |   |   |   |                |





| NAME : | uĸ | / | aC | / | Call | / | THUE |  |
|--------|----|---|----|---|------|---|------|--|
|        |    |   |    |   |      |   |      |  |

| pm mask | 16 bits |         |          |   |   |   |   |   | outpoi     |
|---------|---------|---------|----------|---|---|---|---|---|------------|
| 16      | h("uk") | 0       | 0        | 0 | 0 | 0 | 0 | 0 | 7          |
| 32      | h("uk") | h("ac") | 0        | 0 | 0 | 0 | 0 | 0 | 9          |
| 48      | h("uk") | h("ac") | h("kcl") | 0 | 0 | 0 | 0 | 0 | 3          |
| 48      | h("uk") | h("ac") | h("cam") | 0 | 0 | 0 | 0 | 0 | 2          |
| 48      | h("pt") | h("ul") | h("fc")  | 0 | 0 | 0 | 0 | 0 | $\sqrt{4}$ |

































| lpm mask | 16 bits |         |          |   |   |   |   |   | outport        |
|----------|---------|---------|----------|---|---|---|---|---|----------------|
| 16       | h("uk") | 0       | 0        | 0 | 0 | 0 | 0 | 0 | (7)            |
| 32       | h("uk") | h("ac") | 0        | 0 | 0 | 0 | 0 | 0 | (9)            |
| 48       | h("uk") | h("ac") | h("kcl") | 0 | 0 | 0 | 0 | 0 | $\overline{3}$ |
| 48       | h("uk") | h("ac") | h("cam") | 0 | 0 | 0 | 0 | 0 | (2)            |
| 48       | h("pt") | h("ul") | h("fc")  | 0 | 0 | 0 | 0 | 0 | $\overline{4}$ |



Π

• Our method guarantees line-rate processing and greatly reduces memory footprint over the state-of-the-art.





IEEE ICNP 2018 1st P4 European Workshop

#### • Our method is highly flexible.

- Not all components need to be used to build the hashtray;
- Blocks need not necessarily be the same size;
- So long as the FIB entries and temporary hashtrays from passing Interests are built the same way, the scheme works.

#### EXAMPLE:

- Doesn't start with "uk/ac/cam"  $\rightarrow$  port 1



#### FIB hash collisions

• **PROBLEM:** Hash collisions may lead to ambiguity between entries; in some cases, bad routing decisions.

#### • POSSIBLE SOLUTIONS:

- Wider hash outputs => higher memory requirements (double the width, double the memory ③), but has the advantage of supporting larger namespaces;
- Leave the problem to be solved by the control plane;
- (FUTURE WORK) Double hashing, cuckoo hashing;



#### Architecture

Π





IEEE ICNP 2018 1st P4 European Workshop



• Memorizes what each port requested.



|                  | port |   |   |   |  |  |  |
|------------------|------|---|---|---|--|--|--|
|                  | 4    | 3 | 2 | 1 |  |  |  |
| /uk/ac/cam/index | 0    | 1 | 0 | 1 |  |  |  |
| /pt/ul/fc/index  | 1    | 0 | 0 | 0 |  |  |  |



#### PIT

• Memorizes what each port requested.



- **SOLUTION:** Implemented with 2 register arrays.
  - Each register index holds a bit vector. Each bit represents a port. register<bit<MAX PORTS>>(PIT SIZE) PIT;
  - An additional register array stores hashtrays to deal with index collisions.
  - Whenever a hashtray indexes to an occupied cell, it is dropped and the consumer must reemit.





#### Architecture

Π





IEEE ICNP 2018 1st P4 European Workshop

#### **Content Store**

• Caches packets.

- Optional, but key to ensuring the network's efficiency.
- We implemented two versions in BMv2-ss:
  - Register implementation;
  - Directly implemented in C++ in the switch code.
- Possible solutions when porting to hardware:
  - Register implementation;
    - Optimal performance, but competes with PIT and FIB for memory.
  - Non-volatile storage attached to the device, accessed through extern calls;
  - Port mirroring: the content store is a host connected to the device.
    - Increased latency.
    - Additional mapping required.

# PART III

**Evaluation** 



IEEE ICNP 2018 1st P4 European Workshop

#### Experimental Setup

• Tests run on an off-the-shelf computer, **using Behavioral Model 2** and its **simple\_switch target** (BMv2-ss). A host emits Interests, the other receives them.



#### • Two tests run:

- CPU and throughput: The emitter attempts to exhaust system resources: first with a P4 Ethernet switch (parses the Ethernet header and matches against two tables) and then with our NDN router. We collect throughput and CPU usage measures.
- Functional block weight: Without exhausting the system, we find the latency of each functional block by isolating it from the others. Then, we vary the number of components in the Interest packets to assess the latency increase that results from that variation, on each of the functional blocks.



III

## CPU and throughput



• The P4 NDN router performs many more activities than the P4 Ethernet switch, so it has less throughput.



Ш

#### Functional block weight



- The parser and the hashtray construction ("Other") yield higher latency as the number of components increases.
  - The other functional blocks are within the acceptable error margins

# PART IV

### **Conclusions & Future Work**



IEEE ICNP 2018 1st P4 European Workshop



## Ongoing & Future Work

**Conclusions & Future** 

Work

- Optimize hashtray construction process, for example by parallelizing hash calculations.
- Port and adapt our solution to a hardware platform.
  - NetFPGA SUME

IV

- Barefoot Tofino
- Larger-scale evaluations.







### Conclusions

- NDN is a promising new architecture tailored for the Internet's most frequent use case: content distribution.
  - However, no NDN line-rate hardware exists.
  - By consequence, current NDN implementations are totally or partially software-based.
- Our contribution is the design and implementation of a P4\_16 NDN router that:
  - Includes all of an NDN router's functional blocks;
  - Can be ported to hardware, and takes existing hardware in consideration (e.g. fast TCAM for the FIB);
  - Requires scalable amounts of memory compared to the state-of-the-art solution.



## Thank you for listening!

## Questions?



(speaker)

LASIGE, Faculdade de Ciências Universidade de Lisboa (Faculty of Sciences, University of Lisbon), Lisbon

fc44396@alunos.fc.ul.pt



IEEE ICNP 2018 1st P4 European Workshop