Feature Flags That Survive Reality

Feature Flags That Survive Reality

A Mobile Safe-Rollout Playbook

Feature flags are usually pitched as "turn features on and off." That's the junior version.

The senior version: Feature flags are an operational safety system. They let you ship changes gradually, observe impact, and recover quickly when reality wins.

On mobile, this matters more because you can't instantly roll back an app store release, users stay on old versions for weeks, and local state (caches, offline queues) makes failures sticky. A bug can persist even after you fix the code—because the app's local state is already corrupted.

That's why teams that ship reliably treat flags as control-plane infrastructure, not as a product gimmick.

This article is a practical playbook for using flags to ship safely—plus a concrete operational pattern: a remotely-triggered cache wipe that can unbrick users and eliminate poisoned persistence state.


Types of Flags That Matter

Release flags (feature gating)

"Enable new search UI for 10%."

Use for new user flows, refactors, new endpoints, changing business logic.

Experiment flags (A/B)

"Variant A vs B."

Use for conversion experiments and UX variations. These require analytics discipline—if you can't measure, don't experiment.

Operational flags (the underrated ones)

This is where senior teams differentiate. Operational flags are for:

  • Kill switches: "Disable risky endpoint"
  • Degrade modes: "Turn off video, fall back to text"
  • Recovery commands: "Wipe poisoned cache once"

They're designed for bad days.


Mobile-Specific Realities

Cached configs + offline users: Your app may run for days with stale flag values. Design for TTL caching, "force refresh on next launch" capability, and conservative defaults when config is unavailable.

Version skew: Your backend and mobile clients won't move in lockstep. A reliable rollout often uses a two-sided contract: server supports both old and new behavior, client flag decides when to opt in.

Local persistence makes failures persistent: A bad cache shape can outlive your fix. You fix the bug in code, but the app keeps reading corrupted on-disk state. That's where "wipe cache" becomes a real operational tool.


Why Stale-Data Collisions Happen

Stale local data collides with new data shapes when:

  • Schema changes but old on-disk records remain
  • Migrations partially run then crash
  • Backend payloads evolve and cached responses don't
  • Offline edits were created under old validation rules then sync into new rules

You can't "just be careful." You need system guardrails.


Prevention First: What Should Stop Collisions

A "wipe cache" flag is your break-glass tool, not your primary strategy.

Versioned persistence schema:

struct PersistenceConfig {
    static let currentSchemaVersion = 7
}

func validateStorage() -> StorageAction {
    let stored = storage.schemaVersion
    
    switch stored {
    case currentSchemaVersion:
        return .proceed
    case ..<currentSchemaVersion:
        return .migrate(from: stored)
    case (currentSchemaVersion + 1)...:
        return .wipeAndRebuild  // Downgrade: can't read future schema
    default:
        return .wipeAndRebuild
    }
}

Versioned payload caching: Cache responses alongside etag / apiVersion. On mismatch, refetch or invalidate.

Tiered storage (critical):

enum StorageTier {
    case rebuildableCache    // Safe to wipe: server snapshots, UI caches, derived data
    case authoritativeLocal  // Dangerous to wipe: outbox queue, user drafts, unsynced edits
}

This one design choice prevents most "wipe nuked my work" disasters. Your wipe command targets rebuildableCache by default—never the outbox.


The Flagship Pattern: Remote Cache Wipe

Here's the senior framing:

A remote wipe isn't a caching strategy; it's a recovery strategy. Versioning + migration + tiered storage prevent most issues. The wipe command clears poisoned state when reality wins.

Why it's valuable:

  • Unbricks crash loops caused by corrupted persistence
  • Clears poisoned caches after a bad server payload rollout
  • Gives ops a lever that doesn't require an app update

Why it's dangerous:

  • Data loss (especially if you wipe offline work)
  • Infinite wipe loops if misconfigured
  • Silent damage if not observable

So we design it like a controlled operation, not a boolean.


Don't Use a Boolean. Use a Maintenance Command.

Instead of:

{ "wipeCache": true }

Use:

{
  "maintenance": {
    "wipe": {
      "target": "rebuildable_cache",
      "version": 3,
      "reason": "migration_7_hotfix",
      "minAppVersion": "2.18.0",
      "cohortPercent": 10
    }
  }
}

Key properties:

Field Purpose
target What tier to wipe (default: rebuildable cache)
version Makes it one-shot and intentionally repeatable
reason Audit trail
minAppVersion Prevents older clients from executing unknown behavior
cohortPercent Limits blast radius

One-Shot Execution: Avoid Wipe Loops

Rule: Apply a maintenance command at most once per device per command version.

final class MaintenanceExecutor {
    private let storage: UserDefaults
    private let key = "lastAppliedMaintenanceVersion"
    
    func executeIfNeeded(_ command: WipeCommand) -> Bool {
        let lastApplied = storage.integer(forKey: key)
        
        guard command.version > lastApplied else {
            return false  // Already applied
        }
        
        guard meetsRequirements(command) else {
            return false  // Wrong app version or not in cohort
        }
        
        // Execute the wipe
        CacheManager.wipe(tier: command.target)
        
        // Record completion
        storage.set(command.version, forKey: key)
        
        // Emit telemetry
        Analytics.track("cache_wipe_applied", [
            "version": command.version,
            "target": command.target,
            "reason": command.reason
        ])
        
        return true
    }
    
    private func meetsRequirements(_ command: WipeCommand) -> Bool {
        let appVersion = Bundle.main.appVersion
        let inCohort = DeviceIdentifier.cohortPercent <= command.cohortPercent
        return appVersion >= command.minAppVersion && inCohort
    }
}

Even if the config stays "on," the wipe doesn't repeat.


Never Wipe the Outbox by Default

If your app supports offline edits, your outbox is user work. Treat it as sacred.

Defaults:

  • target = rebuildable_cache
  • Separate command needed to wipe outbox (extreme, explicit incidents only)

If you ever add an outbox wipe: gate it behind a higher privilege level, add user-facing messaging, and consider preserving a backup for support recovery.


Observability

Every operational flag must generate metrics. For wipe commands:

  • cache_wipe_applied (command version, target, app version)
  • Crash-free rate before/after
  • Storage size before/after
  • Number of affected devices

You want to answer: Did it reduce crash loops? Did it reduce decode/migration failures? Did it create secondary issues?

Operational flags without telemetry are blindfolded incident response.


The Rollout Playbook

Stage Cohort Watch For
0: Internal Employees only Metrics, logs, one-shot behavior
1 1% Crash-free rate, API errors, persistence errors
2 10% No wipe loops, incident symptom dropping
3 50% Support tickets for data loss, performance impact
4 100% Only after 24–48h stable indicators

Kill criteria:

  • Crash-free rate drops beyond threshold
  • Spike in sign-outs / re-auth events
  • Unexpected wipe re-application rate (one-shot failed)
  • "Empty state" reports from users

Flag Governance: Avoid Flag Debt

Flags accumulate and become permanent. A mature policy:

  • Every flag has an owner
  • Every flag has an expiry date
  • Flags must be deleted after rollout completes
  • Maintenance commands must be versioned and archived

If flags aren't cleaned up, you eventually ship a system no one understands.


Closing

Feature flags aren't just product levers. They're your ability to operate.

For mobile, that includes operational controls that acknowledge reality: users don't update instantly, local data persists across fixes, and you need break-glass tools that reduce blast radius.

A versioned, one-shot, targeted wipe cache command is the pattern:

  • Safe when designed with tiered storage and observability
  • Dangerous when treated like a boolean

If you can answer these two questions, your system is probably solid:

  1. What happens if this command executes twice?
  2. What user data could be lost, and is it recoverable?

Read more