Back to projects
Case study

FlagShip

A server-side enforcement system for business invariants — not a feature flag tool.

View Repository

At a glance

Purpose

Enforce business rules, usage limits, and entitlements at the server layer where they cannot be bypassed — with auditability as a first-class concern.

Who it's for

SaaS engineering teams who need to enforce plan limits, gate features by entitlement, and maintain a complete audit trail of access decisions for compliance and debugging.

Core capabilities
  • Server-side enforcement of business invariants
  • Usage limit tracking and enforcement with real-time counters
  • Plan-based entitlement evaluation
Stack
TypeScriptPostgreSQLRedisNode.jsReact

Context

FlagShip is not a feature-flag tool. It is a server-side enforcement system for business invariants. The distinction matters: feature flags are about deployment and experimentation. FlagShip is about enforcing who can do what, based on their plan, limits, and entitlements.

The Problem

Client-side checks are security theater. Browser dev tools can bypass them. API calls can circumvent UI-layer limits. Without centralized, server-side enforcement, business rules become inconsistent, unauditable, and trivially exploitable.

Approach

I chose to enforce everything server-side. Every entitlement check, every limit evaluation happens on the server. The client receives evaluated results but never makes decisions. This architecture accepts higher latency in exchange for correctness and auditability.

Architecture

FlagShip is built as three services:

• **Web Panel**: Admin UI for managing flags, limits, and plans • **Server Engine**: The core evaluation service that handles feature checks and limit enforcement • **Worker**: Background processing for audit log aggregation, limit reset jobs, and async tasks

PostgreSQL stores the source of truth for configurations and audit data. Redis provides the caching layer for high-performance evaluations.

Service architecture
services/
├── web/          # Admin panel (React)
├── engine/       # Evaluation service (Node.js)
├── worker/       # Background jobs
shared/
├── types/        # Shared TypeScript types
├── sdk/          # Client SDK for integrations

Why Server-Side Enforcement

Client-side checks fail under two conditions: adversarial clients and buggy clients. Both are inevitable.

• **Adversarial clients**: Users will inspect network traffic, modify JavaScript, replay requests. If enforcement happens client-side, it's not enforcement — it's a suggestion.

• **Buggy clients**: Mobile apps with stale SDKs, browser extensions that interfere, network issues that prevent flag fetches. Client-side logic degrades unpredictably.

• **Auditability**: When a customer disputes a charge or a compliance audit asks "who had access to this feature in Q3?", you need server-side logs. Client-side checks leave no reliable trail.

• **Fail-open vs fail-closed**: This is a product decision, not just technical. For paid features, I chose fail-closed by default — if evaluation fails, deny access. For non-critical features, fail-open is acceptable. The system makes this configurable per entitlement.

Server-side enforcement accepts higher latency (network round-trip) in exchange for correctness. I chose correctness.

Design Principles

FlagShip is built on non-negotiable principles:

• **Enforcement must be centralized**: All access decisions flow through the evaluation engine. No scattered if-statements in application code.

• **Evaluation must be deterministic**: Given the same inputs, the system returns the same result. No hidden state, no race conditions.

• **Auditability is non-optional**: Every evaluation is logged with full context — user, tenant, plan, flag version, decision, reasoning. This isn't optional; it's the point.

• **Systems should degrade predictably**: When Redis is down, evaluations fall back to PostgreSQL. When PostgreSQL is down, the system fails closed for paid features and fails open for non-critical ones. Degradation is explicit, not emergent.

These aren't aspirations. They're constraints I designed against.

Feature Evaluation

The evaluation engine is the core of FlagShip. It takes a feature key and context (user, tenant, plan) and returns an evaluation result. The evaluation considers:

• Flag state (enabled/disabled) • Targeting rules (user segments, percentages) • Plan entitlements (which plans include this feature) • Override rules (per-tenant or per-user overrides)

Evaluations are cached in Redis with short TTLs, balancing consistency with performance.

Evaluation flow (illustrative)
interface EvaluationContext {
  tenantId: string;
  userId: string;
  plan: string;
  environment: 'dev' | 'staging' | 'prod';
}

interface EvaluationResult {
  enabled: boolean;
  reason: 'flag_disabled' | 'plan_entitled' | 'targeting_match' | 'override';
  flagVersion: number;
}

// Server-side evaluation - never trust the client
async function evaluate(
  featureKey: string, 
  context: EvaluationContext
): Promise<EvaluationResult>

Limit Enforcement

Usage limits (API calls, storage, seats) are tracked in real-time using Redis counters with periodic persistence to PostgreSQL. The enforcement logic supports:

• Hard limits: Requests are rejected when limits are exceeded • Soft limits: Requests succeed but trigger alerts/webhooks • Burst allowances: Temporary overages before enforcement kicks in

Limits reset on configurable schedules (daily, monthly, billing cycle) handled by the worker service.

Audit Trail

Every access decision is logged with full context. The audit log captures:

• Timestamp and evaluation latency • Feature key and version evaluated • Full evaluation context • Decision and reasoning • SDK version and client info

This enables compliance reporting ("show me all denied access attempts in Q4"), debugging ("why was this user denied access?"), and analytics ("what's our flag evaluation volume?").

Tradeoffs

**Latency vs consistency**: Cached evaluations may be stale. FlagShip uses short TTLs (seconds) and supports cache invalidation for critical updates.

**Complexity vs flexibility**: The three-service architecture is more complex than a monolith, but enables independent scaling and deployment.

**Storage vs query speed**: The audit log grows quickly. FlagShip uses time-partitioned tables and supports archival to cold storage.

What I'd Improve Next

• Add A/B testing infrastructure with statistical analysis • Implement gradual rollouts with automatic rollback on errors • Add a GraphQL API alongside REST • Build integrations with popular observability platforms • Add multi-region support for lower latency evaluations