Skip to content

Snapshot and Publishing Workflow Analysis

Overview

The system uses two separate but complementary publishing mechanisms:

  1. is_published (per-entity): Individual content item visibility
  2. snapshot.status (per-snapshot): Batch version control

How They Work Together

1. Entity-Level Publishing (is_published)

Purpose: Content moderation and individual item review workflow

  • Scope: Per entity (domain, trail, concept, spark)
  • Default Values:
  • domains: DEFAULT true (published by default)
  • trails, concepts, sparks: DEFAULT false (unpublished by default)
  • Set During Import: Values come from the bundle JSON file
  • Use Case: Allows fine-grained control over which items are visible

Example:

{
  "domains": [
    { "name": "ML", "is_published": true },   // Visible
    { "name": "AI", "is_published": false }   // Hidden
  ],
  "trails": [
    { "title": "Intro ML", "is_published": true },   // Visible
    { "title": "Advanced ML", "is_published": false } // Hidden
  ]
}

2. Snapshot-Level Publishing (snapshot.status)

Purpose: Batch version control and atomic deployments

  • Scope: Entire snapshot (all entities imported together)
  • States:
  • draft: Import in progress, not visible to default queries
  • published: Active snapshot, visible to default queries
  • archived: Old snapshot, only visible via explicit snapshot_id query
  • Set During Import: Created as draft, promoted to published via --promote flag
  • Use Case: Atomic batch deployments, version rollback, preview before publish

Example Workflow:

# 1. Import creates draft snapshot
deno task import bundle.json
# → snapshot.status = 'draft'
# → Not visible in default queries

# 2. Preview draft content
GET /discovery/domains?snapshot_id=<draft-id>
# → Shows content even if snapshot is draft

# 3. Promote to published
deno task import bundle.json --promote
# → snapshot.status = 'published'
# → Becomes current active snapshot
# → Visible in default queries

Current Query Logic

Default Queries (No snapshot_id parameter)

// 1. Get current snapshot ID (or null if none)
const snapshotId = await getEffectiveSnapshotId(supabase, null);

// 2. Filter by snapshot (if exists)
if (snapshotId) {
  query = query.eq("snapshot_id", snapshotId);
} else {
  query = query.is("snapshot_id", null); // Legacy data only
}

// 3. ALWAYS filter by is_published
query = query.eq("is_published", true);

Result: Returns only published entities from the current published snapshot (or legacy data if no snapshot).

Explicit Snapshot Queries (snapshot_id parameter)

// 1. Use requested snapshot ID
const snapshotId = await getEffectiveSnapshotId(supabase, requestedId);

// 2. Filter by snapshot
query = query.eq("snapshot_id", snapshotId);

// 3. ALWAYS filter by is_published
query = query.eq("is_published", true);

Result: Returns only published entities from the specified snapshot (even if snapshot is draft).

Is There Redundancy?

Yes, but with Different Purposes

Redundancy exists because both mechanisms control visibility, but they serve different use cases:

Aspect is_published snapshot.status
Granularity Per entity Per batch/snapshot
Use Case Content moderation Version control
Workflow Individual review Atomic deployment
Flexibility Fine-grained All-or-nothing

Potential Issues

  1. Published Snapshot with Unpublished Entities
  2. A snapshot can be status='published' (active) but contain entities with is_published=false
  3. Result: Snapshot is active but some content is hidden
  4. Is this intentional? Yes - allows partial content rollout

  5. Draft Snapshot with Published Entities

  6. A snapshot can be status='draft' but contain entities with is_published=true
  7. Result: Content is ready but snapshot isn't active
  8. Is this intentional? Yes - allows preview before promotion

  9. Double Filtering

  10. Both filters are always applied: snapshot_id AND is_published=true
  11. Is this redundant? Partially - but necessary for:
    • Previewing draft snapshots (need is_published filter)
    • Legacy data support (need snapshot_id filter)

Scenario 1: Batch Import with All Content Ready

# 1. Import bundle with all entities published
deno task import bundle.json --promote
# → All entities: is_published=true
# → Snapshot: status='published'
# → Immediately visible in production

Scenario 2: Staged Rollout

# 1. Import bundle with some entities unpublished
deno task import bundle.json --promote
# → Some entities: is_published=true (ready)
# → Some entities: is_published=false (not ready)
# → Snapshot: status='published'
# → Only published entities visible

# 2. Later: Publish remaining entities
UPDATE domains SET is_published=true WHERE id IN (...);
# → Now all entities visible

Scenario 3: Preview Before Publish

# 1. Import as draft
deno task import bundle.json
# → All entities: is_published=true
# → Snapshot: status='draft'
# → Preview via: GET /discovery/domains?snapshot_id=<draft-id>

# 2. Verify and promote
deno task import bundle.json --promote
# → Snapshot: status='published'
# → Now visible in default queries

Potential Simplification

Option 1: Remove is_published from Snapshots

Idea: If all content in a snapshot should be published together, use only snapshot.status.

Pros: - Simpler model - Atomic publishing (all or nothing) - Less redundancy

Cons: - No fine-grained control - Can't do staged rollouts - Can't preview individual items

Option 2: Remove snapshot.status (Use Only is_published)

Idea: Use only entity-level publishing, no snapshot versioning.

Pros: - Simpler model - More flexible per-entity control

Cons: - No atomic batch deployments - No version rollback - No preview before publish - Harder to manage large imports

Option 3: Keep Both (Current Approach) ✅

Recommendation: Keep both mechanisms because they serve different purposes:

  • is_published: Content moderation workflow (individual items)
  • snapshot.status: Version control workflow (batch deployments)

Benefits: - Fine-grained content control - Atomic batch deployments - Preview before publish - Staged rollouts - Version history

Conclusion

While there is some redundancy between is_published and snapshot.status, they serve complementary purposes:

  • is_published = "Is this individual piece of content ready?"
  • snapshot.status = "Is this entire version/batch active?"

The current design allows for: 1. ✅ Fine-grained content moderation 2. ✅ Atomic batch deployments 3. ✅ Preview before publish 4. ✅ Staged rollouts 5. ✅ Version history

Recommendation: Keep both mechanisms as they provide flexibility for different use cases.