Python vs Ruby (YJIT) vs JS — which is cheaper to run for web workloads?

Latest runtimes, DB-backed web app. “Cheaper” = throughput, memory, and tail latency under realistic load. 8 stacks across Ruby, Python, and JavaScript (incl. Bun, Deno, NestJS, Rage). Run on 5 dedicated Fly performance-16x machines (16 vCPU each), so numbers are median [range] across hosts. Updated 2026-06-10. Two evidence streams: a controlled multi-host benchmark and a fact-checked web-research sweep.

TL;DR
  1. Async vs threaded matters more than the language. ~10× on I/O (Bun 79k vs Django 8k). Every async stack beats every threaded one.
  2. Bun wins I/O outright (79k median, on 16 vCPU). Among the ORM-based stacks, async Ruby (Rage, 48k) leads: 2× NestJS, ~5× the Python arms and Rails.
  3. More cores help the async stacks most. Doubling 8→16 vCPU scaled the async runtimes ~2× (Node, Rage, FastAPI) but the threaded ones only ~1.5–1.8×. At 16 cores, Rails edges Django on the ORM path (9.0k vs 8.1k), since ActiveRecord’s tax is ~0 while Django’s ORM costs 30%.
  4. The host matters as much as the runtime. The same stack swung 1.3–1.9× between Fly machines; the Intel-Xeon host ran ~1.5× faster than the AMD-EPYC ones. Absolute RPS is a range. The ordering held on every host.
  5. The ORM you pick can cost more than the language. ActiveRecord is nearly free (0–2%) and Django’s ORM is moderate (30%); Prisma and SQLAlchemy roughly halve throughput (48% / 57%).

Verdict: No stack wins every axis. Pick async first, then optimize your bottleneck: Bun for I/O throughput, Ruby+YJIT for templating, Python for RAM. Weigh the ORM, which can cost more than the runtime, and the cloud host, which moves the number as much as any of these.

Where each runtime shines

Each runtime’s per-call character across five axes (CPU + memory). Bigger = better (normalized so the best in each category = 100%). These traits are the same for every framework on that runtime; I/O throughput, the axis frameworks actually move, gets its own chart below. The shape tells the story: spiky where a runtime excels, pinched where it lags.

Ruby

Bun

Node

Deno

Python

Axes: Encode (json+jwt) · Template render · Crypto · Arithmetic · Memory. Measured on x86. Ruby spikes on Template (Erubi); Node/Deno (V8) on Encode + Arithmetic; Bun on Crypto (BoringSSL) but pinched elsewhere; Python is a Memory spike with decent Crypto.

Where frameworks diverge: throughput as you’d actually write it

The radars above are runtime traits: per-call CPU and memory are the same for any framework on a given runtime. The axis a framework actually moves is DB-read concurrency. This chart is the idiomatic throughput, each stack’s row fetched through the ORM you’d really use (ActiveRecord, Django ORM, SQLAlchemy, Prisma; the raw JS arms hand-map, which is idiomatic there). Median of 5 hosts, ~10× spread from Django to Bun.

JavaScript   Ruby   Python. Bar = throughput through the ORM, label = serving RAM (PSS). The raw JS arms (Bun/Deno/Node) have no ORM, so they keep raw speed. Among stacks that do use an ORM, async Ruby (Rage + ActiveRecord, 48k) is the clear leader: 2× NestJS (Prisma 23k) and 4–6× the FastAPI / Rails / Django cluster (8–13k). Prisma and SQLAlchemy sink Nest and FastAPI from their fast raw ceilings; Django’s ORM tax drops it to the floor. Bun leads outright (79k). NestJS+Prisma is the RAM hog (2.6 GB) while Django stays leanest (605 MB).

Part 1: Multi-host benchmark of 8 stacks across 3 languages, identical work

Eight stacks across Ruby, Python, and JavaScript, each doing provably identical work (verified byte-identical before benchmarking). Endpoints: one DB-backed read (I/O), the same read via each framework’s ORM, and five named CPU tasks.

StackLanguage / engineModelServer + driver · ORM
Bun.serveBun 1.3.14 · JavaScriptCoreevent loopnative + Bun SQL · (hand-mapped)
Deno.serveDeno 2.6.6 · V8event loopnative + node-postgres 8.21 · (hand-mapped)
node:httpNode 26.0.0 · V8event loopnative + node-postgres 8.21 · (hand-mapped)
NestJS 10Node 26.0.0 · V8event loopExpress 5 + node-postgres 8.21 · Prisma 6.19.3
Rage 1.25Ruby 4.0.5 +YJITasync (fibers)Iodine + pg 1.6 · ActiveRecord 8.1.3
FastAPI 0.136Python 3.14.5asyncuvicorn 0.49 + asyncpg 0.31 · SQLAlchemy 2.0.50
Rails 8.1.3 + PumaRuby 4.0.5 +YJITthreadedPuma 8.0 + pg 1.6 · ActiveRecord 8.1.3
Django 6.0.6Python 3.14.5threadedgunicorn 26.0 + psycopg3 3.3 · Django ORM

Rig: 5 Fly performance-16x machines (16 dedicated vCPU, 32 GB, Linux x86_64): 1 Intel Xeon 2.3 GHz, 4 AMD EPYC. Postgres on the same machine (loopback), 1000-row table. Load: oha 1.14, 200 concurrent connections, 16 workers/arm, 4 threads where applicable, 3×30 s rounds per host; reported number is the median of the 5 hosts. All arms run together in one session per host, so each host is a fair within-machine comparison. YJIT on for all Ruby arms; CPython GIL-on / JIT-off (production default). Every arm: 0 errors, output verified byte-identical.
Versions (pinned): Bun 1.3.14 · Node 26.0.0 · Deno 2.6.6 · Ruby 4.0.5 (+YJIT, 2026-05-20) · CPython 3.14.5 · Postgres 17.10. Rails 8.1.3 / Puma 8.0.2 · Rage 1.25.0 · FastAPI 0.136.3 / uvicorn 0.49.0 / uvloop 0.22.1 · Django 6.0.6 / gunicorn 26.0.0 · NestJS 10 / Express 5. Drivers: node-postgres 8.21.0, asyncpg 0.31.0, psycopg3 3.3.4, pg gem 1.6.3, Bun SQL. ORMs: ActiveRecord 8.1.3, SQLAlchemy 2.0.50, Django ORM, Prisma 6.19.3. Roda was dropped: its idiomatic Sequel ORM corrupts under fibers and the pg gem segfaults under Puma’s fork, so it had no fair multi-process config (see caveats).

Throughput: GET /widgets-orm/42, through each stack’s ORM

Headline = RPS through the ORM you’d ship (the raw JS arms hand-map, idiomatic there). range = min–max across 5 hosts; tax = throughput lost vs the raw driver; tail = p99÷p50, the SLA story. Median of 5 Fly performance-16x hosts, 200 conc.

StackRPS
with ORM
range
5 hosts
rawORM
tax
RAM
PSS
tail
p99/p50
Bun.serve JSC78,97466–123k78,726~0%1,0695.6×
node:http V857,35949–76k56,898~0%8471.7×
Deno.serve V855,30248–64k56,045~1%1,0524.4×
Rage Ruby ActiveRecord47,92740–60k47,809~0%89322×
NestJS Prisma23,31219–29k45,18648%2,5812.9×
FastAPI SQLAlchemy12,66611–16k29,43757%1,1013.7×
Rails + Puma Ruby ActiveRecord8,9807–11k9,2092%1,0281.7×
Django Django ORM8,0867–10k11,54330%6053.6×
🏆 Bun 79k Rage 48k leads the ORM-based stacks: 2× NestJS, ~5× the rest Django 8k floor; Prisma/SQLAlchemy halve Nest/FastAPI

Below the raw JS arms, the ORM is the story. ActiveRecord is nearly free (Rails 2%, Rage ~0%), so async Ruby keeps 48k; Django’s ORM costs 30%; Prisma cuts Nest 48% and SQLAlchemy cuts FastAPI 57%, dropping them from fast raw ceilings (45k, 29k) to 23k and 13k. The raw JS arms’ ~0% tax confirms the test is clean: identical work, the delta is the ORM. Tails: Node and Rails steadiest (1.7×); Rage’s is the worst (22× its median). Its fiber scheduler trades a long tail for a high median.

How Rage runs ActiveRecord async. Rage serves on Iodine, a fiber-based reactor. Plain ActiveRecord blocks a thread for the whole query, which would stall a reactor, so Rage ships a fiber-aware connection pool plus a fiber scheduler: a blocking pg call yields its fiber and Iodine serves other requests while Postgres works. That overlap is why Rage holds 48k where threaded Rails sits at 9k, on the same ActiveRecord API. Falcon (the other async Ruby server) hits the wall this sidesteps: Rails 7’s pool is fiber-safe (PR #44219), but libpq still blocks the thread, so without Rage’s scheduler integration the queries don’t overlap. The cost is the tail above: fibers multiplexed on one thread queue behind each other, so p99 stretches to 22× the median.

The host you land on moves the number 1.3–1.9×

We ran the full 8-arm suite on 5 separate Fly performance-16x machines (1 Intel Xeon 2.3 GHz, 4 AMD EPYC), same image, same config. The Intel host ran ~1.5× faster than the AMD ones on every stack: Bun 66k–123k, Node 49k–76k, Rails 7k–11k. That swing is wider than several of the runtime gaps we’re measuring. The relative ordering was identical on all 5 hosts, so the rankings hold, but read any single absolute RPS as roughly ±30%. Every number here is the median of the 5. The lesson for cloud-cost benchmarks: the CPU you happen to be scheduled on can matter as much as the framework you chose.

More cores help the async stacks most

Doubling the machine from 8 to 16 vCPU (same image, workers scaled to cores) scaled the async runtimes close to linearly: Node 2.1×, Rage 2.1×, FastAPI 2.0×, Deno 1.9×, Bun 1.8×. The threaded stacks lagged: Rails 1.8×, Django 1.5×. So a bigger box widens the async lead rather than letting the threaded stacks catch up. One reversal: at 16 cores Rails passes Django on the ORM path (9.0k vs 8.1k), because ActiveRecord’s ~0% tax scales clean while Django’s 30% ORM tax does not. On raw I/O, Django still leads. Sub-linear above 2× is expected: the shared Postgres and memory bandwidth take the rest.

CPU-bound: the tasks a real SaaS app actually runs

CPU over HTTP measures queueing under load, so each runtime times the identical operation in-process, no server, using each language’s real library (template engines Erubi/EJS/Jinja2, native crypto/json). Tasks are ordered by how often a JSON-API backend runs them: serialization and auth on every request, templates for server-rendered apps, password hashing only at login, arithmetic almost never (the cleanest VM signal, rarely the real load). Green = row winner.

Speed: ms per call (lower better)

Task / frequency Ruby Bun
JSC
Node
V8
Deno
V8
Python
3.14
Winner
json serialize · every request 0.0360.0470.0220.0220.106 V8 Python last
jwt HS256 sign · every authed request 0.0150.0040.0040.0150.011 Bun/Node all sub-15µs
template HTML render · server-rendered apps 0.0850.2540.2300.2420.256 Ruby 2.7×
crypto PBKDF2 · login only 13.58.415.813.110.5 Bun BoringSSL
loop arithmetic · rare (VM reference) 11.022.22.672.6616.5 V8
🏆 Ruby wins template (Erubi); V8 wins json + arithmetic Bun wins crypto, ties jwt Python slowest json; Bun 8× slowest on the loop

Peak RAM: MB per task (lower better)

Task Ruby Bun
JSC
Node
V8
Deno
V8
Python
3.14
Winner
json 5287576621 Python Ruby 2× here
jwt 26112608421 Python
template 2492647421 Python Ruby 2nd
crypto 2444526021 Python
loop 2444576521 Python
🏆 Python 21 MB flat, every task Ruby lean (~24 MB), except json (52) Bun heaviest (44–112); Node/Deno 52–84

Two memory metrics, both measured. Per-task RAM (above): peak RSS of a process running one task in isolation, median of 5 hosts. Python is flat at its ~21 MB interpreter base; V8/JSC reserve large JIT code caches and GC heaps even for trivial work. Server RAM (throughput table): PSS (proportional set size) of the whole worker tree, which splits shared copy-on-write pages across forked workers, so it reports real physical RAM rather than a per-worker over-count. Forked Python (Django) is low partly from that sharing. One caveat: CPython’s refcounting slowly privatizes shared pages, so a long-lived process drifts above this under-load sample. Outputs verified canonical: json 9098 B, jwt 179-char token, crypto hex, loop 400001.
Template uses each language’s real engine (Ruby Erubi, JS EJS, Python Jinja2), not hand-rolled string-building. A naive Ruby concat with 4×gsub escaping measured the escape function rather than the engine; Erubi’s compiled output dropped it from last place to first. Native escaping differs (Erubi 7778 B with ", EJS/Jinja2 7578 B with "): same logical HTML.

What the matrix shows

  1. Concurrency model dominates I/O. Every async arm beats both threaded arms; Bun (79k) is ~10× Django (8k). The lever is overlapping DB waits, which is language-agnostic.
  2. Bun (JavaScriptCore) wins I/O outright. The raw JS arms skip the ORM, so they keep raw speed (idiomatic for JS). On RAM, Django is leanest serving (605 MB); Bun’s reuse-port processes don’t share, so it sits mid-pack at 16 workers.
  3. Among stacks that use an ORM, async Ruby leads. Rage (fiber-aware ActiveRecord on Iodine) holds 48k: 2× NestJS/Prisma and ~5× FastAPI, Rails, and Django. Its one weakness is a long tail (p99 22× median).
  4. The ORM is the real lever below the JS arms. ActiveRecord is nearly free (0–2%) and Django’s ORM costs 30%, while Prisma cuts Nest 48% and SQLAlchemy cuts FastAPI 57%. Pick the ORM as carefully as the framework.
  5. Cores and host are levers too. Doubling to 16 vCPU scaled async ~2× and threaded ~1.5–1.8×. Between machines the same stack swung 1.3–1.9× (Intel ~1.5× AMD). Rankings held; absolute RPS is a range.
  6. CPU depends on the operation (x86, measured in-process): Ruby+YJIT wins templating, V8 (Node/Deno) win json and arithmetic, Bun wins crypto. Python is leanest per task. Use the real library: a naive Ruby template measured the escape function rather than the engine.

Caveats

Part 2: Web research (multi-source, fact-checked)

A 100-agent deep-research sweep with adversarial verification. Meta-finding worth stating: no rigorous public head-to-head Ruby-vs-Python-vs-Node cloud-cost benchmark exists. The hard production numbers are Shopify/YJIT first-party (Ruby side). That gap is why the multi-host matrix earns its keep.

YJIT speedup on real Ruby/Rails web workloads

YJIT memory overhead

CPython 3.13/3.14: does the new JIT or free-threading help web apps?

Framework throughput context (TechEmpower R22/23)

Async micro-stacks (Bun, Starlette/FastAPI) rank far above full-stack frameworks on JSON; Django and Rails cluster together well below them. Consistent with the matrix: the async/sync split outweighs the language.

Bottom line for cost

  1. Choose the concurrency model first. Async/event-loop buys ~10× the throughput on I/O-bound JSON → far fewer, smaller instances. Dominant lever, language-agnostic.
  2. I/O through the ORM: Bun is cheapest (79k, JavaScriptCore). The raw JS arms lead because JS skips heavy ORMs. Among stacks that use an ORM, async Ruby (Rage, 48k) wins: 2× NestJS, ~5× FastAPI, Rails, and Django.
  3. The ORM can cost more than the language. Prisma cuts NestJS 48% and SQLAlchemy cuts FastAPI 57%; ActiveRecord is nearly free and Django’s ORM costs 30%. Prisma also makes Nest the RAM hog (2.6 GB).
  4. Cores and host both move the number. Doubling to 16 vCPU scaled the async stacks ~2× (threaded ~1.5–1.8×), so a bigger box widens the async lead. Between machines the same stack swung 1.3–1.9× (Intel ~1.5× AMD). Treat any single absolute as a range.
  5. CPU-bound: pick the runtime for the operation (x86). V8 (Node/Deno) win arithmetic and json; Ruby+YJIT wins HTML templating; Bun wins crypto. Or push heavy compute to native libs and it barely matters.
  6. Memory: Python is leanest, 21 MB per CPU task and 605 MB serving (PSS, leanest of the arms at 16 workers). NestJS+Prisma is the RAM hog (2.6 GB). CPython 3.13/3.14’s JIT and free-threading don’t cut I/O web cost yet.

One-sentence answer: the async-vs-threaded choice matters more than the language, Bun leads I/O outright while async Ruby (Rage) leads the ORM-based stacks, the ORM can cost more than the runtime, and the cloud host you land on swings the number as much as any of these. So “cheapest” depends on your bottleneck (I/O, a CPU operation, the ORM, or RAM) and a bit on luck.

Reproduce

Open-source harness in the standalone runtime-bench repo (MIT): SPEC.md (the shared contract), apps/ (8 server apps + ORM endpoints), scripts/run_matrix.sh + servers.sh + run_one.sh (boot + warmup + oha + PSS memory), scripts/verify.sh (proves identical output), scripts/cpu_micro.{js,py,rb} + mem_micro.sh (per-call CPU and per-task RAM), setup.sh (provisions every stack from scratch), and a fly/ Dockerfile that runs the whole suite on a dedicated machine. DB langbench, 1000-row widgets. Run: SCEN=io bash scripts/run_matrix.sh · SCEN=orm ….