A lot of people use nginx as a web server and fallback for something like haproxy or traefik for service routing. But you can use nginx for that too! In my experience nginx provides rich and flexible ways to route your requests. Here are few things that worked well for me when I was wearing a developer hat.
First, let’s look at the simple config that just forwards requests from http://proxy.local/ address to a single http://backend.local:10000.
user nginx;
worker_processes auto;
events {}
http {
access_log /var/log/nginx/access.log combined;
# include /etc/nginx/conf.d/*.conf;
upstream backend {
server backend.local:10000;
}
server {
server_name proxy.local;
listen 8000;
location / {
proxy_pass http://backend;
}
}
}
You declare your backend service as an upstream
group. Each instance of the
backend is described with a server
directive.
Then you declare entrypoint with a server
and location
. Given that it’s
nginx you can go crazy with regexp location matching and stuff but it’s not what
is required in the case of service routing.
Finally, you forward requests with proxy_pass
directive.
From this simple config, we can start to build the necessary complexity.
If your service needs active/passive configuration where one server is the main for requests handling and the other is a backup then you can configure it like this:
...
upstream backend {
server main-backend.local:10000;
server backup-backend.local:10000 backup;
}
...
backup
option tells nginx that this server in the upstream group will be used
only if the primary server is unavailable.
By default, server is marked as unavailable after 1 connection error or timeout.
This can be tuned with max_fails
option for each server in an upstream group
like this:
...
upstream backend {
# Try 3 times for the main server
server main-backend.local:10000 max_fails=3;
# Try 10 times for backup server
server backup-backend.local:10000 backup max_fails=10;
}
...
In addition to connection errors and timeouts, you can use various HTTP error codes like 500 as unsuccessful attempts. This is configured by the proxy_next_upstream directive.
...
upstream backend {
server main-backend.local:10000;
server backup-backend.local:10000 backup;
}
server {
server_name proxy.local;
listen 8000;
# Switch to the next upstream in case of connection error, timeout
# or HTTP 429 error (rate limit).
proxy_next_upstream error timeout http_429;
location / {
proxy_pass http://backend;
}
}
...
max_fails
option is crucial if your nginx is running inside Kubernetes and you
want to proxy requests to the Kubernetes service (using cluster DNS). In this
case, you should have a single server with max_fails=0
like this:
...
upstream backend {
server app.my-team.svc max_fails=0;
}
...
This way nginx will not mark Kubernetes service as unavailable. It won’t try to do passive health checks. All of these are not needed because Kubernetes service is doing active health checks by itself with readiness probes.
map
Sometimes you need to route requests based on some header value. Or query parameter. Or cookie value. Or hostname. Or any combination of those.
And this is the case where nginx really shines. It’s the only (in my experience) proxy server that allows requests routing with almost arbitrary logic.
The key part that makes this possible is
ngx_http_map_module
.
This module allows you to define variable from the combination of other
variables with regular expressions. Sounds complicated but wait for it.
Say, we have 3 backend services that are serving different kinds of data:
Call it microservices architecture, whatever.
These services are exposed to users via the same endpoint
https://<date>.api.com/?report=<report>
. Here are a few examples to give you
an idea of how it works:
This may seem like an ugly API but this is how the real world often looks like and you have to deal with it.
So let’s write a routing configuration. First, define 3 upstream groups:
upstream live {
server live-backend-1:8000;
server live-backend-2:8000;
server live-backend-3:8000;
}
upstream hist {
server hist-backend-1:9999;
server hist-backend-2:9999;
}
upstream agg {
server agg-backend-1:7100;
server agg-backend-2:7100;
server agg-backend-3:7100;
}
Next, define the server that will listen for all requests and somehow route them:
server {
server_name *.api.com "";
listen 80;
location / {
# FIXME: proxy pass to who?
proxy_pass http://???;
}
}
The question is what should we write in proxy_pass
directive?
Since nginx configuration is declarative we can write proxy_pass http://$destination/
and build the destination variable with maps.
In our example service, we make a routing decision based on the report
query
variable and date subdomain. This is what we need to extract into our variables:
map $host $date {
"~^((?<subdomain>\d{4}-\d{2}-\d{2}).)?api.com$" $subdomain;
default "";
}
Map will parse $host
variable (one of the many predefined nginx variables) and
set the result of parsing into our $date
variable. Inside the map, there are
parsing rules.
In my case there are 2 rules - the main one with regex and the other is a
fallback denoted with the default
keyword.
You can inspect the regex in regex101. The
first symbol ~
marks the rule as a regular expression. Our regex starts with
^
and ends with $
which denotes the start and end of the line - it’s a kind
of a best practice for regexes to explicitly match the whole string and I use it
as much as possible. To extract the subdomain we create a group with
parenthesis. Inside that group I use \d{4}-\d{2}-\d{2}
to parse the date
format 2021-05-01
. There is also ?<subdomain>
thing inside the group. This
is called capture group and it’s just to give a name to the matched part of the
regex. Capture group is then used on the right side of the map rule to assign
its value to the $date
variable. Note that subdomain is optional so we need to
wrap in parenthesis together with the dot (subdomain delimiter) and add ?
to
the whole group.
Phew! The regex part is done so we may relax.
To extract report we don’t need to use a map because nginx provides
arg_<param>
predefined variables for query parameters. So report
query
parameter can be accessed as arg_report
.
The full list of nginx variables can be googled with “nginx varindex” and is located here.
Ok, so now we have the date and report. How can we construct $destination
variable from it? With another map! The trick here is that you can use a
combination of variables to create the new variable in the map:
map "$arg_report:$date" $destination {
"~counters:.*" agg;
"~.*:.+" hist;
default live;
}
The combination here is a string where 2 variables are joined with a colon. Colon is a personal choice and used for convenience. You can use any symbol, just make sure that regex will be unambiguous.
In the map, we have 3 rules.
$destination
to agg
when report
query parameter is
counters
.$destination
to hist
when $date
variable is not empty.$destination
to
live
.Regexes in the map are evaluated in order.
Note that $destination
value is the name of the upstream group.
Here is the full config:
events {}
http {
upstream live {
server live-backend-1:8000;
server live-backend-2:8000;
server live-backend-3:8000;
}
upstream hist {
server hist-backend-1:9999;
server hist-backend-2:9999;
}
upstream agg {
server agg-backend-1:7100;
server agg-backend-2:7100;
server agg-backend-3:7100;
}
map $host $date {
"~^((?<subdomain>\d{4}-\d{2}-\d{2}).)?api.local$" $subdomain;
default "";
}
map "$arg_report:$date" $destination {
"~counters:.*" agg;
"~.*:.+" hist;
default live;
}
server {
server_name *.api.com "";
listen 80;
location / {
proxy_pass http://$destination/;
}
}
}
If you use Consul for service discovery then your services can be accessed via
DNS provided by Consul. It’s as simple as curl myapp.service.consul
.
Very convenient but nobody knows how to resolve names in .consul
zone. Consul
docs gives a few ways to configure it universally in your
infrastructure.
I’ve used dnsmasq with great success.
Anyway, to route requests in nginx via Consul DNS you don’t have to go hard.
There is a resolver
directive in nginx for using custom DNS servers.
Here is how to forward requests via Consul DNS from nginx:
...
server {
server_name *.api.com "";
listen 80;
# Resolve using Consul DNS. Fallback to Google and Cloudflare DNS.
resolver 10.0.0.1:8600 10.0.0.2:8600 10.0.0.3:8600 8.8.8.8 1.1.1.1;
location /v1/api {
proxy_pass http://prod.api.service.consul/;
}
location /v1/rpc {
proxy_pass http://prod.rpc.service.consul/;
}
}
...
Update: Nice people at lobste.rs pointed
out
that proxy_pass
caches DNS response until restart. There are a few ways to fix
this. First, put the Consul service URL into the upstream and use valid
option
in resolver
directive
for tuning DNS response TTL. The other option is to use a variable for
proxy_pass
as described by Jeppe Fihl-Pearson
here. Apparently, when nginx
sees a variable in proxy_pass
it will honor the TTL of DNS response.
Yes, it’s not dynamic in the way that traefik does it. If a new service needs to be added you have to edit the nginx config somehow while traefik does this automatically.
But you can implement decent service discovery using consul template that will update nginx config from consul data.
Nginx is a very versatile tool. It has a rich configuration language that enables nice features for developers.
Yes, it’s not perfect - the upstream healthchecks are passive (in the open source version), configuration defaults are not modern, initial setup is rough.
But given all the richness, investing a little bit of time into it is worth it. Before ditching it in favor of something else, think hard about all the features that nginx provides.