Envoy first impression

January 25, 2019

When I was doing traffic mirroring with nginx I’ve stumbled upon a surprising problem – nginx was delaying original request if mirror backend was slow. This is really bad because you expect that mirroring is “fire and forget”. Anyway, I’ve solved this by mirroring only part of the traffic but this drove me to find another proxy that could have mirror traffic without such problems. This is when I finally found time and energy to look into Envoy – I’ve heard a lot of great things about it and always wanted to get my hands dirty with it.

Just in case you’ve never heard about it – Envoy is a proxy server that is most commonly used in a service mesh scenario but it’s also can be an edge proxy.

In this post, I will look only for edge proxy scenario because I’ve never maintained service mesh. Keep that use case in mind. Also, I will inevitably compare Envoy to nginx because that’s what I know and use.

What’s great about Envoy

The main reason why I wanted to try Envoy was its several compelling features:

Observability
Advanced load balancing policies
Active checks
Extensibility

Let’s unpack that list!

Observability

Observability is one of the most thorough features in Envoy. One of its design principles is to provide the transparency in network communication given how complex modern systems is built with all this microservices madness.

Out of the box it provides lots of metrics for various metrics system including Prometheus.

To get that kind of insight in nginx you have to buy nginx plus or use VTS module, thus compiling nginx on your own. Hopefully, my project nginx-vts-build will help – I’m building nginx with VTS module as a drop-in replacement for stock nginx with systemd service and basic configs. Think about it as nginx distro. Currently, it had only one release for Debian 9 but I’m open for suggestions. If you have a feature request, please let me know. But let’s get back to Envoy.

In addition to metrics, Envoy can be integrated with distributed tracing systems like Jaeger.

And finally, it can capture the traffic for further analysis with wireshark.

I’ve only looked at Prometheus metrics and they are quite nice!

Advanced load balancing

Load balancing in Envoy is very feature-rich. Not only it supports round-robin, weighted and random policies but also load balancing using consistent hashing algorithms like ketama and maglev. The point of the latter is fewer changes in traffic patterns in case of rebalancing in the upstream cluster.

Again, you can get the same advanced features in nginx but only if you pay for nginx plus.

Active checks

To check the health of the upstream endpoints Envoy will actively send the request and expect the valid answer so this endpoint will remain in the upstream cluster. This is a very nice feature that open source nginx lacks (but nginx plus has).

Extensibility

You can configure Envoy as a Redis proxy, DynamoDB filter, MongoDB filter, grpc proxy, MySQL filter, Thrift filter.

This is not a killer feature, imho, given that most of these protocols support is experimental but anyway it’s nice to have and shows that Envoy is extensible.

It also supports Lua scripting out of the box. For nginx you have to use OpenResty.

What’s not so great about Envoy

The features above alone make a very good reason to use Envoy. However, I found a few things that keep me from switching to Envoy from nginx:

No caching
No static content serving
Lack of flexible configuration
Docker-only packaging

No caching

Envoy doesn’t support caching of responses. This is a must-have feature for the edge proxy and nginx implements it really good.

No static content serving

While Envoy does networking really well, it doesn’t access filesystem apart from initial config file loading and runtime configuration handling. If you thought about serving static files like frontend things (js, html, css) then you’re out of luck - Envoy doesn’t support that. Nginx, again, does it very well.

Lack of flexible configuration

Envoy is configured via YAML and for me its configuration feels very explicit though I think it’s actually a good thing – explicit is better than implicit. But I feel that Envoy configuration is bounded by features specifically implemented in Envoy. Maybe it’s a lack of experience with Envoy and old habits but I feel that in nginx with maps, rewrite module (with if directive) and other nice modules I have a very flexible config system that allows me to implement anything. The cost of this flexibility is, of course, a good portion of complexity – nginx configuration requires some learning and practice but in my opinion it’s worth it.

Nevertheless, Envoy supports dynamic configuration, though it’s not like you can change some configuration part via REST call, it’s about the discovery of configuration settings – that’s what the whole XDS protocol is all about with its EDS, CDS, RDS and what-not-DS.

Citing docs:

Envoy discovers its various dynamic resources via the filesystem or by querying one or more management servers.

Emphasis is mine – I wanted to note that you have to provide a server that will respond to the Envoy discovery (XDS) requests.

However, there is no ready-made solution that implements Envoys’ XDS protocol. There was a rotor but the company behind it shut down so the project is mostly dead.

There is an Istio but it’s a monster I don’t want to touch right now. Also, if you’re on Kubernetes then there is a Heptio Contour, but not everybody needs and uses Kubernetes.

In the end, you could implement your own XDS service using go-control-plane stubs.

But that’s doesn’t seem to be used. What I saw most people do is using DNS for EDS and CDS. Especially, remembering that Consul has DNS interface, it seems that we can use Consul for dynamically providing the list of hosts to the Envoy. This isn’t big news because I can (and do) use Consul to provide the list of backends for nginx by using DNS name in proxy_pass and resolver directive.

Also, Consul Connect support Envoy for proxying requests but this is not about Envoy – this is about how awesome Consul is!

So this whole dynamic configuration thing of Envoy is really confusing and hard to follow because whenever you try to google it you’ll get bombarded with posts about Istio which is distracting.

Docker-only packaging

This is a minor thing but it just annoys me. Also, I don’t like that Docker images don’t have tags with versions. Maybe it’s intended so you always run the latest version but it seems very strange.

Conclusion on not-so-great parts

In the end, I’m not saying Envoy is bad in any way – from my point of view it just has a different focus on advanced proxying and out of process service mesh data plane. The edge proxy part is just a bonus that is suitable in some but not many situations.

What about mirroring

With that being said let’s see Envoy in practice and repeat mirroring experiments from my previous post.

Here are 2 minimal configs – one for nginx and the other Envoy. Both doing the same – simply proxying requests to some backend service.

# nginx proxy config

upstream backend {
    server backend.local:10000;
}

server {
    server_name proxy.local;
    listen 8000;

    location / {
        proxy_pass http://backend;
    }
}

# Envoy proxy config
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 8001
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: ingress_http
          route_config:
            virtual_hosts:
            - name: local_service
              domains: ['*']
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: backend
          http_filters:
          - name: envoy.router
  clusters:
  - name: backend
    type: STATIC
    connect_timeout: 1s
    hosts:
      - socket_address:
          address: 127.0.0.1
          port_value: 10000

They perform identical:

$ # Load test nginx
$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8000

Summary:
  Total:	10.0006 secs
  Slowest:	0.0229 secs
  Fastest:	0.0002 secs
  Average:	0.0004 secs
  Requests/sec:	996.7418
  
  Total data:	36881600 bytes
  Size/request:	3700 bytes

Response time histogram:
  0.000 [1]	|
  0.002 [9963]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 [3]	|
  0.007 [0]	|
  0.009 [0]	|
  0.012 [0]	|
  0.014 [0]	|
  0.016 [0]	|
  0.018 [0]	|
  0.021 [0]	|
  0.023 [1]	|

...

Status code distribution:
  [200]	9968 responses

$ # Load test Envoy
$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8001

Summary:
  Total:	10.0006 secs
  Slowest:	0.0307 secs
  Fastest:	0.0003 secs
  Average:	0.0007 secs
  Requests/sec:	996.1445
  
  Total data:	36859400 bytes
  Size/request:	3700 bytes

Response time histogram:
  0.000 [1]	|
  0.003 [9960]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.006 [0]	|
  0.009 [0]	|
  0.012 [0]	|
  0.015 [0]	|
  0.019 [0]	|
  0.022 [0]	|
  0.025 [0]	|
  0.028 [0]	|
  0.031 [1]	|

...

Status code distribution:
  [200]	9962 responses

Anyway, let’s check the crucial part – mirroring to the backend with a delay. A quick reminder – nginx, in that case, will throttle original request thus affecting your production users.

Here is the mirroring config for Envoy:

# Envoy mirroring config
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 8001
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: ingress_http
          route_config:
            virtual_hosts:
            - name: local_service
              domains: ['*']
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: backend
                  request_mirror_policy:
                    cluster: mirror
          http_filters:
          - name: envoy.router
  clusters:
  - name: backend
    type: STATIC
    connect_timeout: 1s
    hosts:
      - socket_address:
          address: 127.0.0.1
          port_value: 10000
  - name: mirror
    type: STATIC
    connect_timeout: 1s
    hosts:
      - socket_address:
          address: 127.0.0.1
          port_value: 20000

Basically, we’ve added request_mirror_policy to the main route and defined the cluster for mirroring. Let’s load test it!

$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8001

Summary:
  Total:	10.0012 secs
  Slowest:	0.0046 secs
  Fastest:	0.0003 secs
  Average:	0.0008 secs
  Requests/sec:	997.6801
  
  Total data:	36918600 bytes
  Size/request:	3700 bytes

Response time histogram:
  0.000 [1]	|
  0.001 [2983]	|■■■■■■■■■■■■■■■■■
  0.001 [6916]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.002 [72]	|
  0.002 [2]	|
  0.002 [0]	|
  0.003 [0]	|
  0.003 [3]	|
  0.004 [0]	|
  0.004 [0]	|
  0.005 [1]	|

...

Status code distribution:
  [200]	9978 responses

Zero errors and amazing latency! This is a victory and it proves that Envoy’s mirroring is truly “fire and forget”!

Conclusion

Envoy’s networking is of exceptional quality – its mirroring is well thought, its load balancing is very advanced and I like the active health check feature.

I’m not convinced to use it in the edge proxy scenario because you might need features of a web server like caching, content serving and advanced configuration.

As for the service mesh – I’ll surely evaluate Envoy for that when the opportunity arises, so stay tuned – subscribe to the Atom feed and check my twitter @AlexDzyoba.

That’s it for now, till the next time!