Session management at 100k MAU

At 100,000 monthly active users, your auth system is handling maybe 200-500 requests per second at peak. That’s still not a huge number — but a few specific pieces of the session infrastructure start to matter in ways they didn’t at 10k.

1. Session table growth

A typical session lasts 30 days. With 100k MAU each having 1-3 active devices and legacy rows from lapsed users, your sessions table easily hits 500k-1M rows. Unexamined, it grows indefinitely.

Index strategy:

Covering index on (user_id, expires_at, session_id) for “fetch active sessions for a user”
Partial index where expires_at > NOW() for hot-path lookups
BRIN index on created_at if you do time-range queries

Cleanup:

Nightly sweep: DELETE FROM sessions WHERE expires_at < NOW() - INTERVAL '7 days'
Keep recently expired rows for a few days to support debugging (support can look up why a user got logged out)

2. Session lookup latency

At 500k rows in Postgres, a primary-key lookup is still sub-millisecond. The cost grows if:

Your WHERE clause isn’t indexed (rare if you index properly)
You’re ORDER-BY’ing without an index
You’re reading related tables in the same query

Consider moving to Redis/Valkey once latency matters. Keyspace: session:{id} → JSON blob. TTL on the key for automatic expiration. Look up once, attach to the request context. Sub-millisecond P99.

The tradeoff: Redis sessions are harder to audit after the fact (“what sessions did this user have yesterday?”). Keep a mirror in Postgres if that matters for compliance.

3. Multi-device session management

Users at scale expect to see “your active sessions” with device names and locations, and to terminate individual devices. This requires structured session metadata:

session_id, user_id, device_name, user_agent, ip_country,
last_active_at, created_at, expires_at

Populate device_name on login (parse user agent for something readable). Update last_active_at on every request (or every N minutes — per-request updates can become a write bottleneck).

4. Session sync across regions

If you’re running multi-region for latency, session lookups need to be local. Options:

Replicate the session store. Redis with cross-region replication. Latency cost: session writes are slower, but reads are local.
Issue short-lived signed JWTs from sessions. The client presents the JWT (locally verified), the JWT has a reference to the session (fetched lazily if you need revocation). Combines the best of both.
Pin users to a home region. Simpler, but users traveling abroad hit cross-region latency.

The right answer depends on your product’s geographic distribution and latency tolerance.

The summary

None of this is exotic. The pattern is: observe your session system as a datastore in its own right, index it, clean it, and move to a purpose-built store when the DB query latency shows up on your P99 graphs. Most teams do this work reactively. Do it proactively and you skip the 3am incident.