Bastion Host with nsscache and Consul
When I started my new job, last year, my first task was to redesign the infrastructure and move from EC2 Classic to EC2 VPC. I spent the first few weeks setting up a new VPC with different subnets for each concern, a bastion server to access the servers located within the network and a consul cluster to keep track of the running instances.
In the past, I used several ways to manage the accounts and keys to access my servers:
- a central LDAP server, but no jump host;
- a jump host which was using SSH Agent forwarding to access the other servers, but only a handful of accounts/keys managed manually;
- a jump host, and all the accounts created on each server by Ansible, but using the SSH public keys exposed by GitHub to authenticate the user.
I didn’t want to have to create the accounts on all the servers, and I wanted to avoid LDAP, because it would introduce a single point of failure and its management is a bit of a pain.
After some research, it seems there were no great winner to implement a bastion server. At that time the options were:
- Netflix BLESS, which seemed overkill;
- Hashicorp Vault SSH Secret Backend, which only helps to set up a SSH CA, but does not manage the accounts;
- the good old LDAP server.
I chose to stick with a central LDAP server, a bastion server configured to only allow jumps and all the servers would connect to the LDAP server to fetch the user info. The SSH keys would be stored on S3, and fetched when required using AuthorizedKeysCommand
.
Note: Since then, Gravitational Teleport has become quite popular and might be a good alternative.
The issue with that first approach was the latency when contacting the LDAP server to retrieve the info and the fact that if the LDAP server was unreachable, it was no longer possible to SSH. That’s when I found Google nsscache which synchronizes a local NSS cache (containing unix users and groups) from a remote directory service. I set up a cron job on all the servers to create a local cache of what was stored in the LDAP.
While I initially set up nsscache to use LDAP, it actually supports several other sources, one of them being Consul. Because, I already had Consul set up, that seemed the way to go. I had to submit a PR to add the support for caching /etc/shadow
, in addition to the already existing caches for /etc/passwd
and /etc/group
. After that, I was able to drop LDAP and use nsscache with Consul to do the SSH authentication.
I will now guide you through the different steps required to set up nsscache with Consul.
Generating the cache
Each server must have nsscache installed:
1 | apt-get install libnss-cache |
The /etc/nsscache.conf
file must be configured to read from Consul.
1 | [DEFAULT] |
You must then add the groups and accounts to Consul. Here is the tree of the data for a user foo
in a group bar
.
1 | /bastion/group/bar/gid = 5000 |
The group cache accepts the following attributes, which map to the fields present in /etc/group
:
passwd
(Default:x
)gid
(Required)members
(Optional)
The passwd cache accepts the following attributes, which map to the fields present in /etc/passwd
:
- passwd
(Default:
x`) uid
(Required)gid
(Required)comment
(Default:''
)home
(Default:/home/{name}
)shell
(Default:/bin/bash
)
The shadow cache accepts the following attributes, which map to the fields present in /etc/shadow
:
passwd
(Default:x
)lstchg
(Optional)min
(Optional)max
(Optional)warn
(Optional)inact
(Optional)expire
(Optional)
With nsscache configured and your data in Consul, you should be able to generate the cache files with the following command:
1 | /usr/local/bin/nsscache update |
Using the cache
To make your system use nsscache, you need to configure /etc/nsswwitch.conf
as follows:
1 | passwd: compat cache |
You should also configure PAM to create the home directory of the user upon login, if it doesn’t exist. It can be done by adding the following line to /etc/pam.d/common-session
:
1 | session required pam_mkhomedir.so skel=/etc/skel umask=0022 |
SSH keys management
The last missing step is the management of the SSH key. The following lines need to be added to /etc/ssh/sshd_config
to allow OpenSSH to fetch the keys when a user tries to log in:
1 | AuthorizedKeysCommand /usr/local/bin/syncKeys.sh |
And the /usr/local/bin/syncKeys.sh
itself:
1 |
|
After uploading your SSH key to s3://YOUR_S3_BUCKET/keys/foo
, you should be able to SSH into your server.
Last words
Throughout this guide, you learned how to use Consul to store the accounts used to SSH into your servers. One of the advantage of Consul over LDAP is that having a cluster with data replicated is easy to set up, and you no longer have a single point of failure.
I’ve written a quick demo, you can download and run on your computer
There are still some improvements possible:
- enabling Consul ACL to avoid other services to read/edit your account data;
- use Vault instead of Consul to store user accounts and passwords.