At 100,000 monthly active users, your auth system is handling maybe 200-500 requests per second at peak. That’s still not a huge number — but a few specific pieces of the session infrastructure start to matter in ways they didn’t at 10k.
1. Session table growth
A typical session lasts 30 days. With 100k MAU each having 1-3 active devices and legacy rows from lapsed users, your sessions table easily hits 500k-1M rows. Unexamined, it grows indefinitely.
Index strategy:
- Covering index on
(user_id, expires_at, session_id)for “fetch active sessions for a user” - Partial index where
expires_at > NOW()for hot-path lookups - BRIN index on
created_atif you do time-range queries
Cleanup:
- Nightly sweep:
DELETE FROM sessions WHERE expires_at < NOW() - INTERVAL '7 days' - Keep recently expired rows for a few days to support debugging (support can look up why a user got logged out)
2. Session lookup latency
At 500k rows in Postgres, a primary-key lookup is still sub-millisecond. The cost grows if:
- Your WHERE clause isn’t indexed (rare if you index properly)
- You’re ORDER-BY’ing without an index
- You’re reading related tables in the same query
Consider moving to Redis/Valkey once latency matters. Keyspace: session:{id} → JSON blob. TTL on the key for automatic expiration. Look up once, attach to the request context. Sub-millisecond P99.
The tradeoff: Redis sessions are harder to audit after the fact (“what sessions did this user have yesterday?”). Keep a mirror in Postgres if that matters for compliance.
3. Multi-device session management
Users at scale expect to see “your active sessions” with device names and locations, and to terminate individual devices. This requires structured session metadata:
session_id, user_id, device_name, user_agent, ip_country,
last_active_at, created_at, expires_at
Populate device_name on login (parse user agent for something readable). Update last_active_at on every request (or every N minutes — per-request updates can become a write bottleneck).
4. Session sync across regions
If you’re running multi-region for latency, session lookups need to be local. Options:
- Replicate the session store. Redis with cross-region replication. Latency cost: session writes are slower, but reads are local.
- Issue short-lived signed JWTs from sessions. The client presents the JWT (locally verified), the JWT has a reference to the session (fetched lazily if you need revocation). Combines the best of both.
- Pin users to a home region. Simpler, but users traveling abroad hit cross-region latency.
The right answer depends on your product’s geographic distribution and latency tolerance.
The summary
None of this is exotic. The pattern is: observe your session system as a datastore in its own right, index it, clean it, and move to a purpose-built store when the DB query latency shows up on your P99 graphs. Most teams do this work reactively. Do it proactively and you skip the 3am incident.