May 7, 2026 · 12 min read · Cadence Editorial

How to design role-based access control (RBAC)

rbac design 2026 — How to design role-based access control (RBAC)
Photo by [Erik Mclean](https://www.pexels.com/@introspectivedsgn) on [Pexels](https://www.pexels.com/photo/selective-focus-photo-of-an-analog-telephone-7824263/)

How to design role-based access control (RBAC)

To design role-based access control in 2026, model five tables (users, roles, permissions, role_permissions, user_roles), scope every user-role assignment to a tenant, and route every read and write through one can(user, action, resource) function. Get those three things right and you can ship RBAC in a weekend that survives a year of growth before you need ABAC or ReBAC on top.

Most teams reach for roles first. That is the mistake. Roles are an abstraction over permissions, and you cannot abstract something you have not enumerated yet. Start with the verbs and nouns your product actually has, then group them into roles when you see the patterns.

This post is the engineering playbook: schema, working permission-check code, the three named open-source and managed authorization tools you should evaluate, the multi-tenant gotcha that bites every team once, and the concrete signals that you have outgrown RBAC.

Why RBAC design matters more in 2026

Two things changed in the last 24 months. AI coding agents now write most of the boring CRUD code, so authorization is one of the few places engineers spend real thought. And the bar for multi-tenant SaaS shifted: SOC 2, HIPAA, and ISO 27001 are table-stakes for B2B sales, and every audit asks for a role catalog on day one.

Tech Prescient's 2026 survey found 40 to 60% of organizations have unnecessary roles in production, and the worst offenders run 2,000+ roles for fewer than 500 employees. That is not access control, that is a graveyard.

A clean RBAC schema fits in five tables and 100 lines of code. The bad news: most tutorials skip the parts that matter (tenant scope, the check-function contract, the cache layer) and you ship a model the next team rips out in 18 months.

The default approach (and why it breaks)

The default approach is a single users.role enum column: 'admin', 'member', 'viewer'. It works for the first 50 customers, then breaks the first time someone asks for a "billing-admin who cannot delete projects" or "auditor who reads everything but writes nothing".

You add a fourth role, then a fifth. Then someone asks for "admin in workspace A but viewer in workspace B" and your enum cannot represent it. You start writing if user.role === 'admin' || (user.role === 'member' && project.ownerId === user.id) in 40 places. The fix is to separate three concerns from day one: who the user is, what permissions exist, and which roles bundle which permissions.

The schema: five tables, no shortcuts

Here is the canonical RBAC schema in Postgres. It scales from your first customer to a few thousand tenants without changes.

-- Tenants (workspaces, orgs, accounts; whatever you call them)
CREATE TABLE tenants (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name        TEXT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Users live globally; one user can belong to many tenants
CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email       CITEXT UNIQUE NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Permissions are the atomic verbs. Resource:action shape.
CREATE TABLE permissions (
  id          TEXT PRIMARY KEY,            -- e.g. 'projects:delete'
  resource    TEXT NOT NULL,               -- 'projects'
  action      TEXT NOT NULL,               -- 'delete'
  description TEXT
);

-- Roles are named bundles of permissions, scoped to a tenant
-- (or NULL tenant_id for global system roles like 'superadmin')
CREATE TABLE roles (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id   UUID REFERENCES tenants(id) ON DELETE CASCADE,
  name        TEXT NOT NULL,
  is_system   BOOLEAN NOT NULL DEFAULT false,
  UNIQUE (tenant_id, name)
);

-- Many-to-many: which permissions does each role grant?
CREATE TABLE role_permissions (
  role_id        UUID NOT NULL REFERENCES roles(id) ON DELETE CASCADE,
  permission_id  TEXT NOT NULL REFERENCES permissions(id) ON DELETE CASCADE,
  PRIMARY KEY (role_id, permission_id)
);

-- The crucial table: users get roles WITHIN a specific tenant
CREATE TABLE user_roles (
  user_id    UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  role_id    UUID NOT NULL REFERENCES roles(id) ON DELETE CASCADE,
  tenant_id  UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
  granted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  granted_by UUID REFERENCES users(id),
  PRIMARY KEY (user_id, role_id, tenant_id)
);

CREATE INDEX user_roles_lookup ON user_roles (user_id, tenant_id);

Five things to notice. Permissions use a stable string ID (projects:delete) rather than an autoincrement integer, because you reference them in code. Roles carry a tenant_id so each tenant can name its own roles without colliding with another tenant's. The user_roles table includes tenant_id even though it is reachable through the role, because you query WHERE user_id = ? AND tenant_id = ? constantly and you want the index to make that fast. The granted_by column is your audit trail. And is_system distinguishes the seed roles you ship ('admin', 'member') from custom roles a tenant created themselves.

The check function: one path, every time

Centralize the permission check in one function. Every API handler, every job, every cell in the React UI calls the same function with the same signature. If you have two paths, you have a bug waiting to ship.

// lib/auth/can.ts
import { db } from "@/db";
import { cache } from "@/lib/cache"; // Redis or in-memory LRU

type Action = `${string}:${string}`; // e.g. 'projects:delete'

export async function can(
  userId: string,
  tenantId: string,
  action: Action,
  resource?: { ownerId?: string; tenantId?: string }
): Promise<boolean> {
  // 1. Resource-scope check first: if the resource belongs to another
  //    tenant, deny immediately. This is the multi-tenant gotcha.
  if (resource?.tenantId && resource.tenantId !== tenantId) return false;

  // 2. Look up the user's permissions in this tenant. Cache for 60s.
  const cacheKey = `perms:${userId}:${tenantId}`;
  let permissions = await cache.get<string[]>(cacheKey);

  if (!permissions) {
    const rows = await db.query<{ permission_id: string }>(
      `SELECT DISTINCT rp.permission_id
         FROM user_roles ur
         JOIN role_permissions rp ON rp.role_id = ur.role_id
        WHERE ur.user_id = $1 AND ur.tenant_id = $2`,
      [userId, tenantId]
    );
    permissions = rows.map((r) => r.permission_id);
    await cache.set(cacheKey, permissions, 60); // 60 second TTL
  }

  return permissions.includes(action);
}

That is the entire enforcement layer. Every handler does:

if (!(await can(user.id, ctx.tenantId, "projects:delete", project))) {
  throw new ForbiddenError();
}
await db.projects.delete(project.id);

A 60-second cache TTL is the right default. It keeps p99 latency under 5ms on the hot path and means a permission revocation propagates within a minute. If you need instant revocation (compliance reason, security incident), bust the cache key on user_roles writes.

DIY vs OPA vs Oso vs Permit.io

You will be tempted to reach for an authorization library on day one. Resist until you have at least one tenant in production. The schema above is a few hundred lines including migrations and tests, and you will understand your own authorization model far better if you build it before you outsource it.

That said, here is the honest comparison once you do start shopping. This table reflects what these tools actually do well, not what the marketing pages say.

ToolWhat it isWhen it winsWhen it loses
DIY (the schema above)Five tables, one can() functionSingle product, RBAC fits the model, team owns the codeCross-service auth, complex relationships, audit reports
Open Policy Agent (OPA)Generic policy engine, Rego languageMulti-service infra, Kubernetes, sidecar enforcementApplication-level authorization, developer ergonomics
Oso CloudHosted policy engine with Polar DSLApp-level authz, ReBAC + RBAC hybrid, fast iterationTight budgets, fully on-prem requirements
Permit.ioHosted authz with admin UI for non-engineersCustomer-facing role management, SaaS with self-serve adminTeams that want code-as-source-of-truth

OPA is the boring infrastructure choice. It is widely deployed, battle-tested in Kubernetes, and great for cross-service consistency. The trade-off is Rego, which is a learning curve and reads nothing like your application code.

Oso shines for ReBAC patterns (think Notion-style document sharing or GitHub-style repository permissions) and has a nicer DX than OPA for application authorization. Their Polar policy language compiles to SQL, so a list query like "show me every doc this user can read" stays performant. Pricing scales with usage.

Permit.io's superpower is the admin UI. If your customers (not your engineers) need to define their own roles and assign permissions, Permit.io ships that out of the box. It is the right pick when "tenant admin manages their own RBAC" is a feature. If your engineers own all role definitions, the admin UI is overhead you do not need.

A clean evaluation flow: start DIY, then if you are following API design best practices and your authorization logic is leaking into 30 endpoints, move to Oso or OPA. Move to Permit.io only when "self-serve role management" becomes a product requirement.

The multi-tenant gotcha (and how to avoid it)

This is the single most common RBAC bug in B2B SaaS: a user has the right role, the right permission, and the right action, but the resource they are touching belongs to a different tenant.

// BAD: checks role permission, never checks resource ownership
async function deleteProject(userId: string, projectId: string) {
  const project = await db.projects.findById(projectId);
  if (!(await can(userId, currentTenantId, "projects:delete"))) {
    throw new ForbiddenError();
  }
  await db.projects.delete(projectId);
  // Bug: project might belong to a different tenant entirely
}

The fix is two-fold. First, every query that returns a resource must be scoped by tenant_id in the WHERE clause: SELECT * FROM projects WHERE id = $1 AND tenant_id = $2. If the project does not belong to the current tenant, you get null, and your handler 404s. Second, every can() call passes the resource so the function can double-check tenant isolation, as in the example above.

Belt and suspenders. Postgres Row-Level Security (RLS) is a third layer if you want defense in depth. Supabase users get this for free; on raw Postgres it takes about a day to set up properly. The pattern that bites teams without RLS is a SQL injection or an ORM bug that bypasses application-level scoping; RLS catches that at the database.

The same logic applies if you are adding rate limiting to your API: scope the rate-limit key by tenant, not just by user, or one tenant's misuse degrades service for everyone.

Common pitfalls

A handful of patterns look correct in code review and break in production:

  • Wildcard permissions like "*:*" for the admin role. Looks clean. Means you cannot answer "who can delete a project?" without parsing every role definition. Enumerate permissions per role, even for admin.
  • Permission strings used directly in UI conditionals. if (user.permissions.includes("billing:read")) in 40 React components is the same scattered-logic problem you escaped on the backend. Wrap it: <Can do="billing:read">...</Can>.
  • Role-based UI hiding without backend enforcement. A hidden button is not a security control. Every backend handler must call can() regardless of whether the UI exposes the action.
  • Missing audit log on role changes. When a customer asks "who gave Bob admin access?" six months from now, granted_by and granted_at are the only two columns that save you.
  • Forgetting the cache invalidation path. A 60-second cache is fine until a security incident requires instant revocation. Plan the cache-bust on user_roles INSERT, UPDATE, and DELETE from day one. This matters more if you are working through SOC 2 audit preparation, where revocation timing is an explicit control.

The three signals you have outgrown RBAC

RBAC works until it does not. Here are the concrete leading indicators that you need ABAC or ReBAC layered on top:

  1. Role explosion. You have more distinct roles than tenants. If a 200-customer SaaS is running 800 roles, every conditional ("admin but only on Tuesdays from a managed device") spawned a new role. That is ABAC asking to be born.
  2. Resource-tree permissions. A folder grants access to nested files; a workspace grants access to nested projects; a parent comment grants access to its replies. RBAC cannot model the tree without flattening it. ReBAC (Oso, OpenFGA, SpiceDB) is the right primitive.
  3. Cross-tenant sharing. A user in tenant A needs read access to one specific document in tenant B. RBAC's tenant-scoped roles cannot express "this one resource is shared". ReBAC handles it natively as a relationship: viewer(user:alice, doc:contract-42).

If two of these three are true, stop adding roles. Start designing for ABAC or ReBAC. The migration is doable but it is real work, and the longer you wait the more code you have to refactor.

When you can skip RBAC entirely

A two-founder pre-revenue SaaS with one tenant and three internal users does not need this schema. A users.role enum with 'owner' and 'member' is fine. Build RBAC the week you sell to your first multi-seat customer, not before. The cost of premature authorization architecture is real: you spend a week designing for a future that may not arrive.

The trigger to invest is the day a customer asks "can my finance person see billing but not delete projects?" That question always comes. Be ready to ship the schema in a sprint when it does.

A practical next step

If you have already shipped a users.role enum and are feeling the cracks, the cleanest path is:

  1. Add the five tables alongside the existing column (do not drop it yet).
  2. Backfill: for every user, write a user_roles row matching their current enum value to a seeded role.
  3. Wrap every authorization check in can(), reading from the new tables.
  4. Once every handler is migrated and tested, drop the enum column.

The migration is a 1 to 2 week project for a senior engineer. If you do not have one in-house, every engineer on Cadence is AI-native by default (vetted on Cursor, Claude Code, and Copilot fluency before they unlock bookings), and a senior at $1,500 per week typically owns this kind of authz refactor end to end. Median time to first commit across the platform is 27 hours, with a 12,800-engineer pool, so you can have someone digging into your schema by the end of the week. The 48-hour free trial covers the discovery and migration plan before you commit.

You can also run your current auth setup through our ship-or-skip stack audit to get an honest grade on whether your authorization layer is the actual bottleneck or whether something else (rate limiting, DKIM and SPF setup, API design) is the higher-ROI fix this quarter.

Already nursing a users.role enum that everyone knows is a ticking bomb? Book a senior engineer on Cadence for the migration. Weekly billing, 48-hour free trial, replace the engineer any week if it is not a fit. The schema in this post takes 1 to 2 weeks to roll out cleanly.

FAQ

How long does it take to implement RBAC from scratch?

For a single product with one tenant model, plan one week for the schema, the can() function, and tests. Add another week for the admin UI to manage roles and another week if you are also moving from an enum-based model. Most teams ship in 2 to 3 weeks of focused work.

Should I use a library like CASL or build it myself?

Build the schema yourself. CASL, Casbin, and AccessControl are fine permission-check libraries, but they do not give you the database model, and the model is the part that matters. Once your schema is solid, the check function is small enough that wrapping it in a library is a preference, not a need.

How do I handle permission changes without bouncing user sessions?

Cache permissions in Redis with a short TTL (30 to 60 seconds) keyed by user_id plus tenant_id. On any user_roles write, delete the cache key. The user picks up new permissions on the next request, and you avoid invalidating their session token.

When should I add ABAC on top of RBAC?

When you find yourself creating roles whose only difference is a condition (time of day, device type, IP range, resource attribute). The third or fourth such role is the signal. Add ABAC as a thin policy layer on top of the RBAC check, not as a replacement.

Does RBAC work for multi-tenant SaaS?

Yes, if you scope every user-role assignment by tenant (the user_roles.tenant_id column in the schema above) and every resource query by tenant in the WHERE clause. The two together prevent the most common class of authorization bugs in B2B SaaS.

What is the difference between RBAC and ReBAC?

RBAC asks "what role does this user have?" ReBAC asks "what is the relationship between this user and this resource?" RBAC is faster to build and reason about; ReBAC handles document-sharing, folder hierarchies, and cross-tenant collaboration that RBAC cannot model cleanly. Most production systems use RBAC for the bulk of permissions and add ReBAC only for the resource trees that need it.

All posts