Python vs Ruby (YJIT) vs JS — which is cheaper to run for web workloads?
Latest runtimes, DB-backed web app. “Cheaper” = throughput, memory, and tail latency under realistic load. 8 stacks across Ruby, Python, and JavaScript (incl. Bun, Deno, NestJS, Rage). Run on 5 dedicated Fly performance-16x machines (16 vCPU each), so numbers are median [range] across hosts. Updated 2026-06-10. Two evidence streams: a controlled multi-host benchmark and a fact-checked web-research sweep.
- Async vs threaded matters more than the language. ~10× on I/O (Bun 79k vs Django 8k). Every async stack beats every threaded one.
- Bun wins I/O outright (79k median, on 16 vCPU). Among the ORM-based stacks, async Ruby (Rage, 48k) leads: 2× NestJS, ~5× the Python arms and Rails.
- More cores help the async stacks most. Doubling 8→16 vCPU scaled the async runtimes ~2× (Node, Rage, FastAPI) but the threaded ones only ~1.5–1.8×. At 16 cores, Rails edges Django on the ORM path (9.0k vs 8.1k), since ActiveRecord’s tax is ~0 while Django’s ORM costs 30%.
- The host matters as much as the runtime. The same stack swung 1.3–1.9× between Fly machines; the Intel-Xeon host ran ~1.5× faster than the AMD-EPYC ones. Absolute RPS is a range. The ordering held on every host.
- The ORM you pick can cost more than the language. ActiveRecord is nearly free (0–2%) and Django’s ORM is moderate (30%); Prisma and SQLAlchemy roughly halve throughput (48% / 57%).
Verdict: No stack wins every axis. Pick async first, then optimize your bottleneck: Bun for I/O throughput, Ruby+YJIT for templating, Python for RAM. Weigh the ORM, which can cost more than the runtime, and the cloud host, which moves the number as much as any of these.
Where each runtime shines
Each runtime’s per-call character across five axes (CPU + memory). Bigger = better (normalized so the best in each category = 100%). These traits are the same for every framework on that runtime; I/O throughput, the axis frameworks actually move, gets its own chart below. The shape tells the story: spiky where a runtime excels, pinched where it lags.
Ruby
Bun
Node
Deno
Python
Axes: Encode (json+jwt) · Template render · Crypto · Arithmetic · Memory. Measured on x86. Ruby spikes on Template (Erubi); Node/Deno (V8) on Encode + Arithmetic; Bun on Crypto (BoringSSL) but pinched elsewhere; Python is a Memory spike with decent Crypto.
Where frameworks diverge: throughput as you’d actually write it
The radars above are runtime traits: per-call CPU and memory are the same for any framework on a given runtime. The axis a framework actually moves is DB-read concurrency. This chart is the idiomatic throughput, each stack’s row fetched through the ORM you’d really use (ActiveRecord, Django ORM, SQLAlchemy, Prisma; the raw JS arms hand-map, which is idiomatic there). Median of 5 hosts, ~10× spread from Django to Bun.
■ JavaScript ■ Ruby ■ Python. Bar = throughput through the ORM, label = serving RAM (PSS). The raw JS arms (Bun/Deno/Node) have no ORM, so they keep raw speed. Among stacks that do use an ORM, async Ruby (Rage + ActiveRecord, 48k) is the clear leader: 2× NestJS (Prisma 23k) and 4–6× the FastAPI / Rails / Django cluster (8–13k). Prisma and SQLAlchemy sink Nest and FastAPI from their fast raw ceilings; Django’s ORM tax drops it to the floor. Bun leads outright (79k). NestJS+Prisma is the RAM hog (2.6 GB) while Django stays leanest (605 MB).
Part 1: Multi-host benchmark of 8 stacks across 3 languages, identical work
Eight stacks across Ruby, Python, and JavaScript, each doing provably identical work (verified byte-identical before benchmarking). Endpoints: one DB-backed read (I/O), the same read via each framework’s ORM, and five named CPU tasks.
GET /widgets/:id→ one indexed Postgres row → JSON (raw driver / hand-mapped)GET /widgets-orm/:id→ same row via the framework’s ORM (the ORM-tax test, below)/cpu/json·/cpu/jwt·/cpu/template·/cpu/crypto·/cpu/loop, five named CPU tasks (below)
| Stack | Language / engine | Model | Server + driver · ORM |
|---|---|---|---|
| Bun.serve | Bun 1.3.14 · JavaScriptCore | event loop | native + Bun SQL · (hand-mapped) |
| Deno.serve | Deno 2.6.6 · V8 | event loop | native + node-postgres 8.21 · (hand-mapped) |
| node:http | Node 26.0.0 · V8 | event loop | native + node-postgres 8.21 · (hand-mapped) |
| NestJS 10 | Node 26.0.0 · V8 | event loop | Express 5 + node-postgres 8.21 · Prisma 6.19.3 |
| Rage 1.25 | Ruby 4.0.5 +YJIT | async (fibers) | Iodine + pg 1.6 · ActiveRecord 8.1.3 |
| FastAPI 0.136 | Python 3.14.5 | async | uvicorn 0.49 + asyncpg 0.31 · SQLAlchemy 2.0.50 |
| Rails 8.1.3 + Puma | Ruby 4.0.5 +YJIT | threaded | Puma 8.0 + pg 1.6 · ActiveRecord 8.1.3 |
| Django 6.0.6 | Python 3.14.5 | threaded | gunicorn 26.0 + psycopg3 3.3 · Django ORM |
Rig: 5 Fly performance-16x machines (16 dedicated vCPU, 32 GB, Linux x86_64): 1 Intel Xeon 2.3 GHz, 4 AMD
EPYC. Postgres on the same machine (loopback), 1000-row table. Load: oha 1.14, 200 concurrent connections, 16
workers/arm, 4 threads where applicable, 3×30 s rounds per host; reported number is the median of the 5
hosts. All arms run together in one session per host, so each host is a fair within-machine comparison. YJIT on for all Ruby
arms; CPython GIL-on / JIT-off (production default). Every arm: 0 errors, output verified byte-identical.
Versions (pinned): Bun 1.3.14 · Node 26.0.0 · Deno 2.6.6 · Ruby 4.0.5 (+YJIT, 2026-05-20) · CPython 3.14.5 · Postgres 17.10.
Rails 8.1.3 / Puma 8.0.2 · Rage 1.25.0 · FastAPI 0.136.3 / uvicorn 0.49.0 / uvloop 0.22.1 · Django 6.0.6 / gunicorn 26.0.0 · NestJS 10 / Express 5.
Drivers: node-postgres 8.21.0, asyncpg 0.31.0, psycopg3 3.3.4, pg gem 1.6.3, Bun SQL. ORMs: ActiveRecord 8.1.3, SQLAlchemy 2.0.50, Django ORM, Prisma 6.19.3.
Roda was dropped: its idiomatic Sequel ORM corrupts under fibers and the pg gem segfaults under Puma’s fork, so it had no fair
multi-process config (see caveats).
Throughput: GET /widgets-orm/42, through each stack’s ORM
Headline = RPS through the ORM you’d ship (the raw JS arms hand-map, idiomatic there). range = min–max across 5 hosts; tax = throughput lost vs the raw driver; tail = p99÷p50, the SLA story. Median of 5 Fly performance-16x hosts, 200 conc.
| Stack | RPS with ORM | range 5 hosts | raw | ORM tax | RAM PSS | tail p99/p50 |
|---|---|---|---|---|---|---|
| Bun.serve JSC | 78,974 | 66–123k | 78,726 | ~0% | 1,069 | 5.6× |
| node:http V8 | 57,359 | 49–76k | 56,898 | ~0% | 847 | 1.7× |
| Deno.serve V8 | 55,302 | 48–64k | 56,045 | ~1% | 1,052 | 4.4× |
| Rage Ruby ActiveRecord | 47,927 | 40–60k | 47,809 | ~0% | 893 | 22× |
| NestJS Prisma | 23,312 | 19–29k | 45,186 | 48% | 2,581 | 2.9× |
| FastAPI SQLAlchemy | 12,666 | 11–16k | 29,437 | 57% | 1,101 | 3.7× |
| Rails + Puma Ruby ActiveRecord | 8,980 | 7–11k | 9,209 | 2% | 1,028 | 1.7× |
| Django Django ORM | 8,086 | 7–10k | 11,543 | 30% | 605 | 3.6× |
Below the raw JS arms, the ORM is the story. ActiveRecord is nearly free (Rails 2%, Rage ~0%), so async Ruby keeps 48k; Django’s ORM costs 30%; Prisma cuts Nest 48% and SQLAlchemy cuts FastAPI 57%, dropping them from fast raw ceilings (45k, 29k) to 23k and 13k. The raw JS arms’ ~0% tax confirms the test is clean: identical work, the delta is the ORM. Tails: Node and Rails steadiest (1.7×); Rage’s is the worst (22× its median). Its fiber scheduler trades a long tail for a high median.
pg call yields its fiber and Iodine serves other requests while Postgres works. That overlap
is why Rage holds 48k where threaded Rails sits at 9k, on the same ActiveRecord API. Falcon (the other async Ruby
server) hits the wall this sidesteps: Rails 7’s pool is fiber-safe (PR
#44219), but libpq still blocks the thread, so without Rage’s scheduler integration the queries don’t overlap. The cost
is the tail above: fibers multiplexed on one thread queue behind each other, so p99 stretches to 22× the median.
The host you land on moves the number 1.3–1.9×
More cores help the async stacks most
CPU-bound: the tasks a real SaaS app actually runs
CPU over HTTP measures queueing under load, so each runtime times the identical operation in-process, no server, using each language’s real library (template engines Erubi/EJS/Jinja2, native crypto/json). Tasks are ordered by how often a JSON-API backend runs them: serialization and auth on every request, templates for server-rendered apps, password hashing only at login, arithmetic almost never (the cleanest VM signal, rarely the real load). Green = row winner.
Speed: ms per call (lower better)
| Task / frequency | Ruby | Bun JSC |
Node V8 |
Deno V8 |
Python 3.14 |
Winner |
|---|---|---|---|---|---|---|
| json serialize · every request | 0.036 | 0.047 | 0.022 | 0.022 | 0.106 | V8 Python last |
| jwt HS256 sign · every authed request | 0.015 | 0.004 | 0.004 | 0.015 | 0.011 | Bun/Node all sub-15µs |
| template HTML render · server-rendered apps | 0.085‡ | 0.254 | 0.230 | 0.242 | 0.256 | Ruby 2.7× |
| crypto PBKDF2 · login only | 13.5 | 8.4 | 15.8 | 13.1 | 10.5 | Bun BoringSSL |
| loop arithmetic · rare (VM reference) | 11.0 | 22.2 | 2.67 | 2.66 | 16.5 | V8 4× |
Peak RAM: MB per task (lower better)
| Task | Ruby | Bun JSC |
Node V8 |
Deno V8 |
Python 3.14 |
Winner |
|---|---|---|---|---|---|---|
| json | 52 | 87 | 57 | 66 | 21 | Python Ruby 2× here |
| jwt | 26 | 112 | 60 | 84 | 21 | Python |
| template | 24 | 92 | 64 | 74 | 21 | Python Ruby 2nd |
| crypto | 24 | 44 | 52 | 60 | 21 | Python |
| loop | 24 | 44 | 57 | 65 | 21 | Python |
Two memory metrics, both measured. Per-task RAM (above): peak RSS of a process running one task in isolation,
median of 5 hosts. Python is flat at its ~21 MB interpreter base; V8/JSC reserve large JIT code caches and GC heaps even for trivial
work. Server RAM (throughput table): PSS (proportional set size) of the whole worker tree, which splits
shared copy-on-write pages across forked workers, so it reports real physical RAM rather than a per-worker over-count. Forked Python
(Django) is low partly from that sharing. One caveat: CPython’s refcounting slowly privatizes shared pages, so a long-lived process
drifts above this under-load sample. Outputs verified canonical: json 9098 B, jwt 179-char token, crypto hex, loop 400001.
‡Template uses each language’s real engine (Ruby Erubi, JS EJS, Python Jinja2), not hand-rolled
string-building. A naive Ruby concat with 4×gsub escaping measured the escape function rather than the engine; Erubi’s
compiled output dropped it from last place to first. Native escaping differs (Erubi 7778 B with ",
EJS/Jinja2 7578 B with "): same logical HTML.
What the matrix shows
- Concurrency model dominates I/O. Every async arm beats both threaded arms; Bun (79k) is ~10× Django (8k). The lever is overlapping DB waits, which is language-agnostic.
- Bun (JavaScriptCore) wins I/O outright. The raw JS arms skip the ORM, so they keep raw speed (idiomatic for JS). On RAM, Django is leanest serving (605 MB); Bun’s reuse-port processes don’t share, so it sits mid-pack at 16 workers.
- Among stacks that use an ORM, async Ruby leads. Rage (fiber-aware ActiveRecord on Iodine) holds 48k: 2× NestJS/Prisma and ~5× FastAPI, Rails, and Django. Its one weakness is a long tail (p99 22× median).
- The ORM is the real lever below the JS arms. ActiveRecord is nearly free (0–2%) and Django’s ORM costs 30%, while Prisma cuts Nest 48% and SQLAlchemy cuts FastAPI 57%. Pick the ORM as carefully as the framework.
- Cores and host are levers too. Doubling to 16 vCPU scaled async ~2× and threaded ~1.5–1.8×. Between machines the same stack swung 1.3–1.9× (Intel ~1.5× AMD). Rankings held; absolute RPS is a range.
- CPU depends on the operation (x86, measured in-process): Ruby+YJIT wins templating, V8 (Node/Deno) win json and arithmetic, Bun wins crypto. Python is leanest per task. Use the real library: a naive Ruby template measured the escape function rather than the engine.
Caveats
- The raw JS arms have no ORM. Bun/Deno/Node hand-map rows because that’s idiomatic for JS; their headline isn’t penalized by an ORM the way Rails/Django/Nest/FastAPI are. NestJS+Prisma is the JS arm that pays the ORM tax. Read the throughput table with that in mind.
- Single box, loopback, tiny DB. Postgres runs on the same machine and the table is 1000 cached rows, so the DB round-trip is near-free. Real apps sit longer in the DB, which compresses the gaps between these runtimes.
- Roda dropped. Its idiomatic Sequel ORM corrupts under fibers and the
pggem segfaults under Puma’s fork, so it had no fair multi-process config. The broader point: async Ruby has no thread-or-fiber-safe ORM today. Rails 7’s pool is fiber-safe (PR #44219), but the pg driver still blocks, so an async ORM needs an async-native driver (which lacks an ORM) or runs threaded. Rage works by patching ActiveRecord for fibers. - CPU measured in-process. At high concurrency the HTTP CPU numbers measure queueing rather than the work, so CPU is the per-call microbench instead. It is architecture-sensitive (these are x86; arm flips several rankings).
Part 2: Web research (multi-source, fact-checked)
A 100-agent deep-research sweep with adversarial verification. Meta-finding worth stating: no rigorous public head-to-head Ruby-vs-Python-vs-Node cloud-cost benchmark exists. The hard production numbers are Shopify/YJIT first-party (Ruby side). That gap is why the multi-host matrix earns its keep.
YJIT speedup on real Ruby/Rails web workloads
- Shopify production (Storefront Renderer, I/O- & DB-bound): measured 14.1% end-to-end, improving to 20%+; ~10% (3.2) → ~15% (3.3) → 20%+. >80M req/min on Black Friday 2025 on prerelease YJIT 3.4.
- Benchmarks overstate real apps: YJIT 3.4 ~92% faster than the interpreter on x86-64 headline benchmarks; micros 3–7×; railsbench ~65%; but component web benchmarks (activerecord, liquid-render) only ~1.19×.
- Independent: enabling YJIT typically cuts Rails response times 15–25%; one app +22% Puma RPS.
YJIT memory overhead
- Benchmark average ~21% more memory (Ruby 3.3); in production with copy-on-write across workers, <8% PSS on Shopify SFR. Matches the elevated Ruby RSS seen locally.
CPython 3.13/3.14: does the new JIT or free-threading help web apps?
- JIT (copy-and-patch): experimental, off by default through 3.14 (
PYTHON_JIT=1); release notes publish no speedup figure. Independent: ~10–30% on CPU hot paths after warmup, ~0 for I/O-bound web servers. Consistent with our un-JIT’d CPython on the pure-arithmetic loop (16.5 ms vs V8’s 2.7 ms). A production deploy gets no JIT today. - Free-threading (no-GIL): ~5–10% single-thread penalty now; little throughput for the process-per-core web model (GIL already releases during I/O). Its main web benefit is memory sharing rather than speed.
- Net: for I/O-bound web serving, CPython 3.14 ≈ 3.13 ≈ 3.12.
Framework throughput context (TechEmpower R22/23)
Async micro-stacks (Bun, Starlette/FastAPI) rank far above full-stack frameworks on JSON; Django and Rails cluster together well below them. Consistent with the matrix: the async/sync split outweighs the language.
Bottom line for cost
- Choose the concurrency model first. Async/event-loop buys ~10× the throughput on I/O-bound JSON → far fewer, smaller instances. Dominant lever, language-agnostic.
- I/O through the ORM: Bun is cheapest (79k, JavaScriptCore). The raw JS arms lead because JS skips heavy ORMs. Among stacks that use an ORM, async Ruby (Rage, 48k) wins: 2× NestJS, ~5× FastAPI, Rails, and Django.
- The ORM can cost more than the language. Prisma cuts NestJS 48% and SQLAlchemy cuts FastAPI 57%; ActiveRecord is nearly free and Django’s ORM costs 30%. Prisma also makes Nest the RAM hog (2.6 GB).
- Cores and host both move the number. Doubling to 16 vCPU scaled the async stacks ~2× (threaded ~1.5–1.8×), so a bigger box widens the async lead. Between machines the same stack swung 1.3–1.9× (Intel ~1.5× AMD). Treat any single absolute as a range.
- CPU-bound: pick the runtime for the operation (x86). V8 (Node/Deno) win arithmetic and json; Ruby+YJIT wins HTML templating; Bun wins crypto. Or push heavy compute to native libs and it barely matters.
- Memory: Python is leanest, 21 MB per CPU task and 605 MB serving (PSS, leanest of the arms at 16 workers). NestJS+Prisma is the RAM hog (2.6 GB). CPython 3.13/3.14’s JIT and free-threading don’t cut I/O web cost yet.
One-sentence answer: the async-vs-threaded choice matters more than the language, Bun leads I/O outright while async Ruby (Rage) leads the ORM-based stacks, the ORM can cost more than the runtime, and the cloud host you land on swings the number as much as any of these. So “cheapest” depends on your bottleneck (I/O, a CPU operation, the ORM, or RAM) and a bit on luck.
Reproduce
Open-source harness in the standalone runtime-bench repo (MIT): SPEC.md (the shared contract),
apps/ (8 server apps + ORM endpoints), scripts/run_matrix.sh + servers.sh + run_one.sh
(boot + warmup + oha + PSS memory), scripts/verify.sh (proves identical output), scripts/cpu_micro.{js,py,rb}
+ mem_micro.sh (per-call CPU and per-task RAM), setup.sh (provisions every stack from scratch), and a
fly/ Dockerfile that runs the whole suite on a dedicated machine. DB langbench, 1000-row widgets.
Run: SCEN=io bash scripts/run_matrix.sh · SCEN=orm ….