When I was doing traffic mirroring with nginx I’ve stumbled upon a surprising problem – nginx was delaying original request if mirror backend was slow. This is really bad because you expect that mirroring is “fire and forget”. Anyway, I’ve solved this by mirroring only part of the traffic but this drove me to find another proxy that could have mirror traffic without such problems. This is when I finally found time and energy to look into Envoy – I’ve heard a lot of great things about it and always wanted to get my hands dirty with it.
Just in case you’ve never heard about it – Envoy is a proxy server that is most commonly used in a service mesh scenario but it’s also can be an edge proxy.
In this post, I will look only for edge proxy scenario because I’ve never maintained service mesh. Keep that use case in mind. Also, I will inevitably compare Envoy to nginx because that’s what I know and use.
The main reason why I wanted to try Envoy was its several compelling features:
Let’s unpack that list!
Observability is one of the most thorough features in Envoy. One of its design principles is to provide the transparency in network communication given how complex modern systems is built with all this microservices madness.
Out of the box it provides lots of metrics for various metrics system including Prometheus.
To get that kind of insight in nginx you have to buy nginx plus or use VTS module, thus compiling nginx on your own. Hopefully, my project nginx-vts-build will help – I’m building nginx with VTS module as a drop-in replacement for stock nginx with systemd service and basic configs. Think about it as nginx distro. Currently, it had only one release for Debian 9 but I’m open for suggestions. If you have a feature request, please let me know. But let’s get back to Envoy.
In addition to metrics, Envoy can be integrated with distributed tracing systems like Jaeger.
And finally, it can capture the traffic for further analysis with wireshark.
I’ve only looked at Prometheus metrics and they are quite nice!
Load balancing in Envoy is very feature-rich. Not only it supports round-robin, weighted and random policies but also load balancing using consistent hashing algorithms like ketama and maglev. The point of the latter is fewer changes in traffic patterns in case of rebalancing in the upstream cluster.
Again, you can get the same advanced features in nginx but only if you pay for nginx plus.
To check the health of the upstream endpoints Envoy will actively send the request and expect the valid answer so this endpoint will remain in the upstream cluster. This is a very nice feature that open source nginx lacks (but nginx plus has).
You can configure Envoy as a Redis proxy, DynamoDB filter, MongoDB filter, grpc proxy, MySQL filter, Thrift filter.
This is not a killer feature, imho, given that most of these protocols support is experimental but anyway it’s nice to have and shows that Envoy is extensible.
It also supports Lua scripting out of the box. For nginx you have to use OpenResty.
The features above alone make a very good reason to use Envoy. However, I found a few things that keep me from switching to Envoy from nginx:
Envoy doesn’t support caching of responses. This is a must-have feature for the edge proxy and nginx implements it really good.
While Envoy does networking really well, it doesn’t access filesystem apart from initial config file loading and runtime configuration handling. If you thought about serving static files like frontend things (js, html, css) then you’re out of luck - Envoy doesn’t support that. Nginx, again, does it very well.
Envoy is configured via YAML and for me its configuration feels very explicit
though I think it’s actually a good thing – explicit is better than implicit.
But I feel that Envoy configuration is bounded by features specifically
implemented in Envoy. Maybe it’s a lack of experience with Envoy and old
habits but I feel that in nginx with maps, rewrite module (with if
directive)
and other nice modules I have a very flexible config system that allows me to
implement anything. The cost of this flexibility is, of course, a good portion
of complexity – nginx configuration requires some learning and practice but in
my opinion it’s worth it.
Nevertheless, Envoy supports dynamic configuration, though it’s not like you can change some configuration part via REST call, it’s about the discovery of configuration settings – that’s what the whole XDS protocol is all about with its EDS, CDS, RDS and what-not-DS.
Citing docs:
Envoy discovers its various dynamic resources via the filesystem or by querying one or more management servers.
Emphasis is mine – I wanted to note that you have to provide a server that will respond to the Envoy discovery (XDS) requests.
However, there is no ready-made solution that implements Envoys’ XDS protocol. There was a rotor but the company behind it shut down so the project is mostly dead.
There is an Istio but it’s a monster I don’t want to touch right now. Also, if you’re on Kubernetes then there is a Heptio Contour, but not everybody needs and uses Kubernetes.
In the end, you could implement your own XDS service using go-control-plane stubs.
But that’s doesn’t seem to be used. What I saw most people do is using DNS for
EDS and CDS. Especially, remembering that Consul has DNS interface, it seems
that we can use Consul for dynamically providing the list of hosts to the Envoy.
This isn’t big news because I can (and do) use Consul to provide the list of
backends for nginx by using DNS name in proxy_pass
and resolver
directive.
Also, Consul Connect support Envoy for proxying requests but this is not about Envoy – this is about how awesome Consul is!
So this whole dynamic configuration thing of Envoy is really confusing and hard to follow because whenever you try to google it you’ll get bombarded with posts about Istio which is distracting.
This is a minor thing but it just annoys me. Also, I don’t like that Docker images don’t have tags with versions. Maybe it’s intended so you always run the latest version but it seems very strange.
In the end, I’m not saying Envoy is bad in any way – from my point of view it just has a different focus on advanced proxying and out of process service mesh data plane. The edge proxy part is just a bonus that is suitable in some but not many situations.
With that being said let’s see Envoy in practice and repeat mirroring experiments from my previous post.
Here are 2 minimal configs – one for nginx and the other Envoy. Both doing the same – simply proxying requests to some backend service.
# nginx proxy config
upstream backend {
server backend.local:10000;
}
server {
server_name proxy.local;
listen 8000;
location / {
proxy_pass http://backend;
}
}
# Envoy proxy config
static_resources:
listeners:
- name: listener_0
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 8001
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: ingress_http
route_config:
virtual_hosts:
- name: local_service
domains: ['*']
routes:
- match:
prefix: "/"
route:
cluster: backend
http_filters:
- name: envoy.router
clusters:
- name: backend
type: STATIC
connect_timeout: 1s
hosts:
- socket_address:
address: 127.0.0.1
port_value: 10000
They perform identical:
$ # Load test nginx
$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8000
Summary:
Total: 10.0006 secs
Slowest: 0.0229 secs
Fastest: 0.0002 secs
Average: 0.0004 secs
Requests/sec: 996.7418
Total data: 36881600 bytes
Size/request: 3700 bytes
Response time histogram:
0.000 [1] |
0.002 [9963] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.005 [3] |
0.007 [0] |
0.009 [0] |
0.012 [0] |
0.014 [0] |
0.016 [0] |
0.018 [0] |
0.021 [0] |
0.023 [1] |
...
Status code distribution:
[200] 9968 responses
$ # Load test Envoy
$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8001
Summary:
Total: 10.0006 secs
Slowest: 0.0307 secs
Fastest: 0.0003 secs
Average: 0.0007 secs
Requests/sec: 996.1445
Total data: 36859400 bytes
Size/request: 3700 bytes
Response time histogram:
0.000 [1] |
0.003 [9960] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.006 [0] |
0.009 [0] |
0.012 [0] |
0.015 [0] |
0.019 [0] |
0.022 [0] |
0.025 [0] |
0.028 [0] |
0.031 [1] |
...
Status code distribution:
[200] 9962 responses
Anyway, let’s check the crucial part – mirroring to the backend with a delay. A quick reminder – nginx, in that case, will throttle original request thus affecting your production users.
Here is the mirroring config for Envoy:
# Envoy mirroring config
static_resources:
listeners:
- name: listener_0
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 8001
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: ingress_http
route_config:
virtual_hosts:
- name: local_service
domains: ['*']
routes:
- match:
prefix: "/"
route:
cluster: backend
request_mirror_policy:
cluster: mirror
http_filters:
- name: envoy.router
clusters:
- name: backend
type: STATIC
connect_timeout: 1s
hosts:
- socket_address:
address: 127.0.0.1
port_value: 10000
- name: mirror
type: STATIC
connect_timeout: 1s
hosts:
- socket_address:
address: 127.0.0.1
port_value: 20000
Basically, we’ve added request_mirror_policy
to the main route and defined the
cluster for mirroring. Let’s load test it!
$ hey -z 10s -q 1000 -c 1 -t 1 http://proxy.local:8001
Summary:
Total: 10.0012 secs
Slowest: 0.0046 secs
Fastest: 0.0003 secs
Average: 0.0008 secs
Requests/sec: 997.6801
Total data: 36918600 bytes
Size/request: 3700 bytes
Response time histogram:
0.000 [1] |
0.001 [2983] |■■■■■■■■■■■■■■■■■
0.001 [6916] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.002 [72] |
0.002 [2] |
0.002 [0] |
0.003 [0] |
0.003 [3] |
0.004 [0] |
0.004 [0] |
0.005 [1] |
...
Status code distribution:
[200] 9978 responses
Zero errors and amazing latency! This is a victory and it proves that Envoy’s mirroring is truly “fire and forget”!
Envoy’s networking is of exceptional quality – its mirroring is well thought, its load balancing is very advanced and I like the active health check feature.
I’m not convinced to use it in the edge proxy scenario because you might need features of a web server like caching, content serving and advanced configuration.
As for the service mesh – I’ll surely evaluate Envoy for that when the opportunity arises, so stay tuned – subscribe to the Atom feed and check my twitter @AlexDzyoba.
That’s it for now, till the next time!