Python vs Ruby (YJIT) vs JS — which is cheaper to run for web workloads?

Latest runtimes, DB-backed web app. “Cheaper” = throughput, memory, and tail latency under realistic load. 8 stacks across Ruby, Python, and JavaScript (incl. Bun, Deno, NestJS, Rage). Run on 5 dedicated Fly performance-16x machines (16 vCPU each), so numbers are median [range] across hosts. Updated 2026-06-10. Two evidence streams: a controlled multi-host benchmark and a fact-checked web-research sweep.

TL;DR

Async vs threaded matters more than the language. ~10× on I/O (Bun 79k vs Django 8k). Every async stack beats every threaded one.
Bun wins I/O outright (79k median, on 16 vCPU). Among the ORM-based stacks, async Ruby (Rage, 48k) leads: 2× NestJS, ~5× the Python arms and Rails.
More cores help the async stacks most. Doubling 8→16 vCPU scaled the async runtimes ~2× (Node, Rage, FastAPI) but the threaded ones only ~1.5–1.8×. At 16 cores, Rails edges Django on the ORM path (9.0k vs 8.1k), since ActiveRecord’s tax is ~0 while Django’s ORM costs 30%.
The host matters as much as the runtime. The same stack swung 1.3–1.9× between Fly machines; the Intel-Xeon host ran ~1.5× faster than the AMD-EPYC ones. Absolute RPS is a range. The ordering held on every host.
The ORM you pick can cost more than the language. ActiveRecord is nearly free (0–2%) and Django’s ORM is moderate (30%); Prisma and SQLAlchemy roughly halve throughput (48% / 57%).

Verdict: No stack wins every axis. Pick async first, then optimize your bottleneck: Bun for I/O throughput, Ruby+YJIT for templating, Python for RAM. Weigh the ORM, which can cost more than the runtime, and the cloud host, which moves the number as much as any of these.

Where each runtime shines

Each runtime’s per-call character across five axes (CPU + memory). Bigger = better (normalized so the best in each category = 100%). These traits are the same for every framework on that runtime; I/O throughput, the axis frameworks actually move, gets its own chart below. The shape tells the story: spiky where a runtime excels, pinched where it lags.

Ruby

Bun

Node

Deno

Python

Axes: Encode (json+jwt) · Template render · Crypto · Arithmetic · Memory. Measured on x86. Ruby spikes on Template (Erubi); Node/Deno (V8) on Encode + Arithmetic; Bun on Crypto (BoringSSL) but pinched elsewhere; Python is a Memory spike with decent Crypto.

Where frameworks diverge: throughput as you’d actually write it

The radars above are runtime traits: per-call CPU and memory are the same for any framework on a given runtime. The axis a framework actually moves is DB-read concurrency. This chart is the idiomatic throughput, each stack’s row fetched through the ORM you’d really use (ActiveRecord, Django ORM, SQLAlchemy, Prisma; the raw JS arms hand-map, which is idiomatic there). Median of 5 hosts, ~10× spread from Django to Bun.

■ JavaScript ■ Ruby ■ Python. Bar = throughput through the ORM, label = serving RAM (PSS). The raw JS arms (Bun/Deno/Node) have no ORM, so they keep raw speed. Among stacks that do use an ORM, async Ruby (Rage + ActiveRecord, 48k) is the clear leader: 2× NestJS (Prisma 23k) and 4–6× the FastAPI / Rails / Django cluster (8–13k). Prisma and SQLAlchemy sink Nest and FastAPI from their fast raw ceilings; Django’s ORM tax drops it to the floor. Bun leads outright (79k). NestJS+Prisma is the RAM hog (2.6 GB) while Django stays leanest (605 MB).

Part 1: Multi-host benchmark of 8 stacks across 3 languages, identical work

Eight stacks across Ruby, Python, and JavaScript, each doing provably identical work (verified byte-identical before benchmarking). Endpoints: one DB-backed read (I/O), the same read via each framework’s ORM, and five named CPU tasks.

GET /widgets/:id → one indexed Postgres row → JSON (raw driver / hand-mapped)
GET /widgets-orm/:id → same row via the framework’s ORM (the ORM-tax test, below)
/cpu/json · /cpu/jwt · /cpu/template · /cpu/crypto · /cpu/loop, five named CPU tasks (below)

Stack	Language / engine	Model	Server + driver · ORM
Bun.serve	Bun 1.3.14 · JavaScriptCore	event loop	native + Bun SQL · (hand-mapped)
Deno.serve	Deno 2.6.6 · V8	event loop	native + node-postgres 8.21 · (hand-mapped)
node:http	Node 26.0.0 · V8	event loop	native + node-postgres 8.21 · (hand-mapped)
NestJS 10	Node 26.0.0 · V8	event loop	Express 5 + node-postgres 8.21 · Prisma 6.19.3
Rage 1.25	Ruby 4.0.5 +YJIT	async (fibers)	Iodine + pg 1.6 · ActiveRecord 8.1.3
FastAPI 0.136	Python 3.14.5	async	uvicorn 0.49 + asyncpg 0.31 · SQLAlchemy 2.0.50
Rails 8.1.3 + Puma	Ruby 4.0.5 +YJIT	threaded	Puma 8.0 + pg 1.6 · ActiveRecord 8.1.3
Django 6.0.6	Python 3.14.5	threaded	gunicorn 26.0 + psycopg3 3.3 · Django ORM

Rig: 5 Fly performance-16x machines (16 dedicated vCPU, 32 GB, Linux x86_64): 1 Intel Xeon 2.3 GHz, 4 AMD EPYC. Postgres on the same machine (loopback), 1000-row table. Load: oha 1.14, 200 concurrent connections, 16 workers/arm, 4 threads where applicable, 3×30 s rounds per host; reported number is the median of the 5 hosts. All arms run together in one session per host, so each host is a fair within-machine comparison. YJIT on for all Ruby arms; CPython GIL-on / JIT-off (production default). Every arm: 0 errors, output verified byte-identical.
Versions (pinned): Bun 1.3.14 · Node 26.0.0 · Deno 2.6.6 · Ruby 4.0.5 (+YJIT, 2026-05-20) · CPython 3.14.5 · Postgres 17.10. Rails 8.1.3 / Puma 8.0.2 · Rage 1.25.0 · FastAPI 0.136.3 / uvicorn 0.49.0 / uvloop 0.22.1 · Django 6.0.6 / gunicorn 26.0.0 · NestJS 10 / Express 5. Drivers: node-postgres 8.21.0, asyncpg 0.31.0, psycopg3 3.3.4, pg gem 1.6.3, Bun SQL. ORMs: ActiveRecord 8.1.3, SQLAlchemy 2.0.50, Django ORM, Prisma 6.19.3. Roda was dropped: its idiomatic Sequel ORM corrupts under fibers and the pg gem segfaults under Puma’s fork, so it had no fair multi-process config (see caveats).

Throughput: `GET /widgets-orm/42`, through each stack’s ORM

Headline = RPS through the ORM you’d ship (the raw JS arms hand-map, idiomatic there). range = min–max across 5 hosts; tax = throughput lost vs the raw driver; tail = p99÷p50, the SLA story. Median of 5 Fly performance-16x hosts, 200 conc.

Stack	RPS with ORM	range 5 hosts	raw	ORM tax	RAM PSS	tail p99/p50
Bun.serve JSC	78,974	66–123k	78,726	~0%	1,069	5.6×
node:http V8	57,359	49–76k	56,898	~0%	847	1.7×
Deno.serve V8	55,302	48–64k	56,045	~1%	1,052	4.4×
Rage Ruby ActiveRecord	47,927	40–60k	47,809	~0%	893	22×
NestJS Prisma	23,312	19–29k	45,186	48%	2,581	2.9×
FastAPI SQLAlchemy	12,666	11–16k	29,437	57%	1,101	3.7×
Rails + Puma Ruby ActiveRecord	8,980	7–11k	9,209	2%	1,028	1.7×
Django Django ORM	8,086	7–10k	11,543	30%	605	3.6×

🏆 Bun 79k Rage 48k leads the ORM-based stacks: 2× NestJS, ~5× the rest Django 8k floor; Prisma/SQLAlchemy halve Nest/FastAPI

Below the raw JS arms, the ORM is the story. ActiveRecord is nearly free (Rails 2%, Rage ~0%), so async Ruby keeps 48k; Django’s ORM costs 30%; Prisma cuts Nest 48% and SQLAlchemy cuts FastAPI 57%, dropping them from fast raw ceilings (45k, 29k) to 23k and 13k. The raw JS arms’ ~0% tax confirms the test is clean: identical work, the delta is the ORM. Tails: Node and Rails steadiest (1.7×); Rage’s is the worst (22× its median). Its fiber scheduler trades a long tail for a high median.

How Rage runs ActiveRecord async. Rage serves on Iodine, a fiber-based reactor. Plain ActiveRecord blocks a thread for the whole query, which would stall a reactor, so Rage ships a fiber-aware connection pool plus a fiber scheduler: a blocking pg call yields its fiber and Iodine serves other requests while Postgres works. That overlap is why Rage holds 48k where threaded Rails sits at 9k, on the same ActiveRecord API. Falcon (the other async Ruby server) hits the wall this sidesteps: Rails 7’s pool is fiber-safe (PR #44219), but libpq still blocks the thread, so without Rage’s scheduler integration the queries don’t overlap. The cost is the tail above: fibers multiplexed on one thread queue behind each other, so p99 stretches to 22× the median.

The host you land on moves the number 1.3–1.9×

We ran the full 8-arm suite on 5 separate Fly performance-16x machines (1 Intel Xeon 2.3 GHz, 4 AMD EPYC), same image, same config. The Intel host ran ~1.5× faster than the AMD ones on every stack: Bun 66k–123k, Node 49k–76k, Rails 7k–11k. That swing is wider than several of the runtime gaps we’re measuring. The relative ordering was identical on all 5 hosts, so the rankings hold, but read any single absolute RPS as roughly ±30%. Every number here is the median of the 5. The lesson for cloud-cost benchmarks: the CPU you happen to be scheduled on can matter as much as the framework you chose.

More cores help the async stacks most

Doubling the machine from 8 to 16 vCPU (same image, workers scaled to cores) scaled the async runtimes close to linearly: Node 2.1×, Rage 2.1×, FastAPI 2.0×, Deno 1.9×, Bun 1.8×. The threaded stacks lagged: Rails 1.8×, Django 1.5×. So a bigger box widens the async lead rather than letting the threaded stacks catch up. One reversal: at 16 cores Rails passes Django on the ORM path (9.0k vs 8.1k), because ActiveRecord’s ~0% tax scales clean while Django’s 30% ORM tax does not. On raw I/O, Django still leads. Sub-linear above 2× is expected: the shared Postgres and memory bandwidth take the rest.

CPU-bound: the tasks a real SaaS app actually runs

CPU over HTTP measures queueing under load, so each runtime times the identical operation in-process, no server, using each language’s real library (template engines Erubi/EJS/Jinja2, native crypto/json). Tasks are ordered by how often a JSON-API backend runs them: serialization and auth on every request, templates for server-rendered apps, password hashing only at login, arithmetic almost never (the cleanest VM signal, rarely the real load). Green = row winner.

Speed: ms per call (lower better)

Task / frequency	Ruby	Bun JSC	Node V8	Deno V8	Python 3.14	Winner
json serialize · every request	0.036	0.047	0.022	0.022	0.106	V8 Python last
jwt HS256 sign · every authed request	0.015	0.004	0.004	0.015	0.011	Bun/Node all sub-15µs
template HTML render · server-rendered apps	0.085^‡	0.254	0.230	0.242	0.256	Ruby 2.7×
crypto PBKDF2 · login only	13.5	8.4	15.8	13.1	10.5	Bun BoringSSL
loop arithmetic · rare (VM reference)	11.0	22.2	2.67	2.66	16.5	V8 4×

🏆 Ruby wins template (Erubi); V8 wins json + arithmetic Bun wins crypto, ties jwt Python slowest json; Bun 8× slowest on the loop

Peak RAM: MB per task (lower better)

Task	Ruby	Bun JSC	Node V8	Deno V8	Python 3.14	Winner
json	52	87	57	66	21	Python Ruby 2× here
jwt	26	112	60	84	21	Python
template	24	92	64	74	21	Python Ruby 2nd
crypto	24	44	52	60	21	Python
loop	24	44	57	65	21	Python

🏆 Python 21 MB flat, every task Ruby lean (~24 MB), except json (52) Bun heaviest (44–112); Node/Deno 52–84

Two memory metrics, both measured. Per-task RAM (above): peak RSS of a process running one task in isolation, median of 5 hosts. Python is flat at its ~21 MB interpreter base; V8/JSC reserve large JIT code caches and GC heaps even for trivial work. Server RAM (throughput table): PSS (proportional set size) of the whole worker tree, which splits shared copy-on-write pages across forked workers, so it reports real physical RAM rather than a per-worker over-count. Forked Python (Django) is low partly from that sharing. One caveat: CPython’s refcounting slowly privatizes shared pages, so a long-lived process drifts above this under-load sample. Outputs verified canonical: json 9098 B, jwt 179-char token, crypto hex, loop 400001.
^‡Template uses each language’s real engine (Ruby Erubi, JS EJS, Python Jinja2), not hand-rolled string-building. A naive Ruby concat with 4×gsub escaping measured the escape function rather than the engine; Erubi’s compiled output dropped it from last place to first. Native escaping differs (Erubi 7778 B with ", EJS/Jinja2 7578 B with "): same logical HTML.

What the matrix shows

Concurrency model dominates I/O. Every async arm beats both threaded arms; Bun (79k) is ~10× Django (8k). The lever is overlapping DB waits, which is language-agnostic.
Bun (JavaScriptCore) wins I/O outright. The raw JS arms skip the ORM, so they keep raw speed (idiomatic for JS). On RAM, Django is leanest serving (605 MB); Bun’s reuse-port processes don’t share, so it sits mid-pack at 16 workers.
Among stacks that use an ORM, async Ruby leads. Rage (fiber-aware ActiveRecord on Iodine) holds 48k: 2× NestJS/Prisma and ~5× FastAPI, Rails, and Django. Its one weakness is a long tail (p99 22× median).
The ORM is the real lever below the JS arms. ActiveRecord is nearly free (0–2%) and Django’s ORM costs 30%, while Prisma cuts Nest 48% and SQLAlchemy cuts FastAPI 57%. Pick the ORM as carefully as the framework.
Cores and host are levers too. Doubling to 16 vCPU scaled async ~2× and threaded ~1.5–1.8×. Between machines the same stack swung 1.3–1.9× (Intel ~1.5× AMD). Rankings held; absolute RPS is a range.
CPU depends on the operation (x86, measured in-process): Ruby+YJIT wins templating, V8 (Node/Deno) win json and arithmetic, Bun wins crypto. Python is leanest per task. Use the real library: a naive Ruby template measured the escape function rather than the engine.

Caveats

The raw JS arms have no ORM. Bun/Deno/Node hand-map rows because that’s idiomatic for JS; their headline isn’t penalized by an ORM the way Rails/Django/Nest/FastAPI are. NestJS+Prisma is the JS arm that pays the ORM tax. Read the throughput table with that in mind.
Single box, loopback, tiny DB. Postgres runs on the same machine and the table is 1000 cached rows, so the DB round-trip is near-free. Real apps sit longer in the DB, which compresses the gaps between these runtimes.
Roda dropped. Its idiomatic Sequel ORM corrupts under fibers and the pg gem segfaults under Puma’s fork, so it had no fair multi-process config. The broader point: async Ruby has no thread-or-fiber-safe ORM today. Rails 7’s pool is fiber-safe (PR #44219), but the pg driver still blocks, so an async ORM needs an async-native driver (which lacks an ORM) or runs threaded. Rage works by patching ActiveRecord for fibers.
CPU measured in-process. At high concurrency the HTTP CPU numbers measure queueing rather than the work, so CPU is the per-call microbench instead. It is architecture-sensitive (these are x86; arm flips several rankings).

Part 2: Web research (multi-source, fact-checked)

A 100-agent deep-research sweep with adversarial verification. Meta-finding worth stating: no rigorous public head-to-head Ruby-vs-Python-vs-Node cloud-cost benchmark exists. The hard production numbers are Shopify/YJIT first-party (Ruby side). That gap is why the multi-host matrix earns its keep.

YJIT speedup on real Ruby/Rails web workloads

Shopify production (Storefront Renderer, I/O- & DB-bound): measured 14.1% end-to-end, improving to 20%+; ~10% (3.2) → ~15% (3.3) → 20%+. >80M req/min on Black Friday 2025 on prerelease YJIT 3.4.
Benchmarks overstate real apps: YJIT 3.4 ~92% faster than the interpreter on x86-64 headline benchmarks; micros 3–7×; railsbench ~65%; but component web benchmarks (activerecord, liquid-render) only ~1.19×.
Independent: enabling YJIT typically cuts Rails response times 15–25%; one app +22% Puma RPS.

YJIT memory overhead

Benchmark average ~21% more memory (Ruby 3.3); in production with copy-on-write across workers, <8% PSS on Shopify SFR. Matches the elevated Ruby RSS seen locally.

CPython 3.13/3.14: does the new JIT or free-threading help web apps?

JIT (copy-and-patch): experimental, off by default through 3.14 (PYTHON_JIT=1); release notes publish no speedup figure. Independent: ~10–30% on CPU hot paths after warmup, ~0 for I/O-bound web servers. Consistent with our un-JIT’d CPython on the pure-arithmetic loop (16.5 ms vs V8’s 2.7 ms). A production deploy gets no JIT today.
Free-threading (no-GIL): ~5–10% single-thread penalty now; little throughput for the process-per-core web model (GIL already releases during I/O). Its main web benefit is memory sharing rather than speed.
Net: for I/O-bound web serving, CPython 3.14 ≈ 3.13 ≈ 3.12.

Framework throughput context (TechEmpower R22/23)

Async micro-stacks (Bun, Starlette/FastAPI) rank far above full-stack frameworks on JSON; Django and Rails cluster together well below them. Consistent with the matrix: the async/sync split outweighs the language.

Bottom line for cost

Choose the concurrency model first. Async/event-loop buys ~10× the throughput on I/O-bound JSON → far fewer, smaller instances. Dominant lever, language-agnostic.
I/O through the ORM: Bun is cheapest (79k, JavaScriptCore). The raw JS arms lead because JS skips heavy ORMs. Among stacks that use an ORM, async Ruby (Rage, 48k) wins: 2× NestJS, ~5× FastAPI, Rails, and Django.
The ORM can cost more than the language. Prisma cuts NestJS 48% and SQLAlchemy cuts FastAPI 57%; ActiveRecord is nearly free and Django’s ORM costs 30%. Prisma also makes Nest the RAM hog (2.6 GB).
Cores and host both move the number. Doubling to 16 vCPU scaled the async stacks ~2× (threaded ~1.5–1.8×), so a bigger box widens the async lead. Between machines the same stack swung 1.3–1.9× (Intel ~1.5× AMD). Treat any single absolute as a range.
CPU-bound: pick the runtime for the operation (x86). V8 (Node/Deno) win arithmetic and json; Ruby+YJIT wins HTML templating; Bun wins crypto. Or push heavy compute to native libs and it barely matters.
Memory: Python is leanest, 21 MB per CPU task and 605 MB serving (PSS, leanest of the arms at 16 workers). NestJS+Prisma is the RAM hog (2.6 GB). CPython 3.13/3.14’s JIT and free-threading don’t cut I/O web cost yet.

One-sentence answer: the async-vs-threaded choice matters more than the language, Bun leads I/O outright while async Ruby (Rage) leads the ORM-based stacks, the ORM can cost more than the runtime, and the cloud host you land on swings the number as much as any of these. So “cheapest” depends on your bottleneck (I/O, a CPU operation, the ORM, or RAM) and a bit on luck.

Reproduce

Open-source harness in the standalone runtime-bench repo (MIT): SPEC.md (the shared contract), apps/ (8 server apps + ORM endpoints), scripts/run_matrix.sh + servers.sh + run_one.sh (boot + warmup + oha + PSS memory), scripts/verify.sh (proves identical output), scripts/cpu_micro.{js,py,rb} + mem_micro.sh (per-call CPU and per-task RAM), setup.sh (provisions every stack from scratch), and a fly/ Dockerfile that runs the whole suite on a dedicated machine. DB langbench, 1000-row widgets. Run: SCEN=io bash scripts/run_matrix.sh · SCEN=orm ….