Bastion Host with nsscache and Consul

When I started my new job, last year, my first task was to redesign the infrastructure and move from EC2 Classic to EC2 VPC. I spent the first few weeks setting up a new VPC with different subnets for each concern, a bastion server to access the servers located within the network and a consul cluster to keep track of the running instances.

In the past, I used several ways to manage the accounts and keys to access my servers:

  • a central LDAP server, but no jump host;
  • a jump host which was using SSH Agent forwarding to access the other servers, but only a handful of accounts/keys managed manually;
  • a jump host, and all the accounts created on each server by Ansible, but using the SSH public keys exposed by GitHub to authenticate the user.

I didn’t want to have to create the accounts on all the servers, and I wanted to avoid LDAP, because it would introduce a single point of failure and its management is a bit of a pain.

After some research, it seems there were no great winner to implement a bastion server. At that time the options were:

I chose to stick with a central LDAP server, a bastion server configured to only allow jumps and all the servers would connect to the LDAP server to fetch the user info. The SSH keys would be stored on S3, and fetched when required using AuthorizedKeysCommand.

Note: Since then, Gravitational Teleport has become quite popular and might be a good alternative.

The issue with that first approach was the latency when contacting the LDAP server to retrieve the info and the fact that if the LDAP server was unreachable, it was no longer possible to SSH. That’s when I found Google nsscache which synchronizes a local NSS cache (containing unix users and groups) from a remote directory service. I set up a cron job on all the servers to create a local cache of what was stored in the LDAP.

While I initially set up nsscache to use LDAP, it actually supports several other sources, one of them being Consul. Because, I already had Consul set up, that seemed the way to go. I had to submit a PR to add the support for caching /etc/shadow, in addition to the already existing caches for /etc/passwd and /etc/group. After that, I was able to drop LDAP and use nsscache with Consul to do the SSH authentication.

I will now guide you through the different steps required to set up nsscache with Consul.

Generating the cache

Each server must have nsscache installed:

1
2
apt-get install libnss-cache
pip install git+https://github.com/google/nsscache.git@master#egg=nsscache pycurl

The /etc/nsscache.conf file must be configured to read from Consul.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[DEFAULT]
source = consul
cache = files
# NSS maps to be cached
maps = passwd, group, shadow
# Directory to store our update/modify timestamps
timestamp_dir = /var/lib/nsscache
##
# consul module
consul_datacenter = dc1
consul_passwd_url = http://127.0.0.1:8500/v1/kv/bastion/passwd
consul_group_url = http://127.0.0.1:8500/v1/kv/bastion/group
consul_shadow_url = http://127.0.0.1:8500/v1/kv/bastion/shadow
##
# nssdb module defaults
# Directory to store nssdb databases. Current libnss_db code requires
# the path below
nssdb_dir = /var/lib/misc
##
# files module defaults
# Directory to store the plain text files
files_dir = /etc
# Suffix used on the files module database files
files_cache_filename_suffix = cache

You must then add the groups and accounts to Consul. Here is the tree of the data for a user foo in a group bar.

1
2
3
4
5
/bastion/group/bar/gid = 5000
/bastion/passwd/foo/uid = 10000
/bastion/passwd/foo/gid = 5000
/bastion/passwd/foo/comment = Foo Bar
/bastion/shadow/foo/lstchg = 17313

The group cache accepts the following attributes, which map to the fields present in /etc/group:

  • passwd (Default: x)
  • gid (Required)
  • members (Optional)

The passwd cache accepts the following attributes, which map to the fields present in /etc/passwd:

  • passwd(Default:x`)
  • uid (Required)
  • gid (Required)
  • comment (Default: '')
  • home (Default: /home/{name})
  • shell (Default: /bin/bash)

The shadow cache accepts the following attributes, which map to the fields present in /etc/shadow:

  • passwd (Default: x)
  • lstchg (Optional)
  • min (Optional)
  • max (Optional)
  • warn (Optional)
  • inact (Optional)
  • expire (Optional)

With nsscache configured and your data in Consul, you should be able to generate the cache files with the following command:

1
/usr/local/bin/nsscache update

Using the cache

To make your system use nsscache, you need to configure /etc/nsswwitch.conf as follows:

1
2
3
4
passwd:         compat cache
group: compat cache
shadow: compat cache
...

You should also configure PAM to create the home directory of the user upon login, if it doesn’t exist. It can be done by adding the following line to /etc/pam.d/common-session:

1
session required    pam_mkhomedir.so skel=/etc/skel umask=0022

SSH keys management

The last missing step is the management of the SSH key. The following lines need to be added to /etc/ssh/sshd_config to allow OpenSSH to fetch the keys when a user tries to log in:

1
2
AuthorizedKeysCommand /usr/local/bin/syncKeys.sh
AuthorizedKeysCommandUser nobody

And the /usr/local/bin/syncKeys.sh itself:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/usr/bin/env bash
# main accounts should only use keys stored locally
if [ "${1}" = "root" ]; then
exit 0
fi
entry=$(getent passwd "$1")
res=$?
# if account not found
if [ ${res} -ne 0 ]; then
logger "account not found"
exit 0
fi
gid=$(echo ${entry} | cut -d: -f3)
# if account is not managed by nsscache
if [ ${gid} -lt 5000 ]; then
logger "not managed by nsscache"
exit 0
fi
logger "Fetching public key for ${1}"
aws s3 cp s3://YOUR_S3_BUCKET/keys/${1} - || true#!/usr/bin/env bash
# main accounts should only use keys stored locally
if [ "${1}" = "root" ]; then
exit 0
fi
entry=$(getent passwd "$1")
res=$?
# if account not found
if [ ${res} -ne 0 ]; then
logger "account not found"
exit 0
fi
gid=$(echo ${entry} | cut -d: -f3)
# if account is not managed by nsscache
if [ ${gid} -lt 5000 ]; then
logger "not managed by nsscache"
exit 0
fi
logger "Fetching public key for ${1}"
aws s3 cp s3://YOUR_S3_BUCKET/keys/${1} - || true

After uploading your SSH key to s3://YOUR_S3_BUCKET/keys/foo, you should be able to SSH into your server.

Last words

Throughout this guide, you learned how to use Consul to store the accounts used to SSH into your servers. One of the advantage of Consul over LDAP is that having a cluster with data replicated is easy to set up, and you no longer have a single point of failure.

I’ve written a quick demo, you can download and run on your computer

There are still some improvements possible:

  • enabling Consul ACL to avoid other services to read/edit your account data;
  • use Vault instead of Consul to store user accounts and passwords.