Breaking the tool that breaks APIs: performance-testing Hoppscotch with Passmark

A Passmark test suite deep-dive for the Breaking Apps Hackathon

The meta angle nobody asked for

There is something deeply satisfying about stress-testing a tool whose entire job is to send requests.

Hoppscotch is an open-source API testing platform the kind of tool developers reach for to fire off a quick REST call, inspect a GraphQL schema, or debug a WebSocket handshake. It is, in essence, a tool for breaking other people's APIs. So naturally, I wondered: what does Hoppscotch look like when you turn the lens around?

This article documents the Passmark test suite I built for hoppscotch.io, the hosted public instance. I am not looking for crashes or injection vulnerabilities I am looking at performance. Specifically: how fast does it load, how does it hold up under repeated use, and where does it quietly slow down in ways a casual user would never notice but a benchmark definitely will?

If you have written unit or integration tests before, the jump to performance testing with Passmark will feel natural. The main shift is that you are no longer asserting correctness you are asserting speed, and the thresholds you set are as important as the tests themselves.

Let's get into it.

Setup: pointing Passmark at Hoppscotch

Passmark's web performance testing tools let you script browser interactions and measure timing at each step. For this suite I used Passmark's BenchTest to capture page-level metrics alongside manual DevTools baselines for cross-referencing.

Baseline environment

Before writing a single test, establish a clean baseline. All my runs used:

Chrome 147, no extensions
A stable broadband connection (tested at ~100 Mbps down)
Cache cleared between cold-load tests (Ctrl+Shift+Del → all time)
A warm-cache variant run immediately after for comparison
performance.getEntriesByType('navigation') and performance.memory for programmatic measurement

The delta between cold and warm load is one of the most telling numbers you will get. More on that later.

What I am targeting

The public hosted instance at https://hoppscotch.io is a Vue 3 progressive web app. It ships a fairly large JS bundle upfront — which makes the initial load an interesting target — and then does most of its work client-side after that. This architecture means:

First load is bundle-heavy and network-sensitive
Subsequent interactions should be fast (no round-trips for core UI)
Memory usage under repeated request-firing is a legitimate concern

That last point is what I was most curious about going in.

Test suite design

I structured the suite around eight flows, split into two tiers based on how reproducible they are:

Tier 1: deterministic flows (run every time)

These produce consistent, comparable numbers across runs.

T1-A: Cold page load to interactive Navigate to hoppscotch.io from a cleared cache. Record time from navigation start to DOMContentLoaded, first contentful paint (FCP), and time to interactive (TTI). Threshold: TTI under 4 seconds on a standard connection.

T1-B: Warm page load Same as T1-A but with cache populated. This isolates parsing and execution time from network time. You want the gap between T1-A and T1-B to be large — it tells you the app's caching strategy is doing its job.

T1-C: REST request send and response render From a loaded app state, send a GET request to https://httpbin.org/get (a reliable public echo API). Record time from button click to response body rendering in the panel. Threshold: under 300ms excluding actual network round-trip.

T1-D: Environment variable panel open Click the environment icon. Record time to panel fully rendering. This is a simple interaction but it fires Vue reactivity updates — it is a good canary for UI thread health.

Tier 2: load-sensitive flows (vary by state)

These are more interesting but harder to isolate. Run them after establishing Tier 1 baselines.

T2-A: Large collection load Import a collection JSON with 200+ requests (I generated a synthetic one see the appendix). Record time from import confirmation to full collection tree rendering in the sidebar. This directly tests the app's ability to handle real-world workspace sizes.

T2-B: GraphQL explorer load Switch to the GraphQL tab and point it at https://countries.trevorblades.com/. Record time from URL submission to schema fully populating the explorer panel. Threshold: under 3 seconds.

T2-C: WebSocket connection establishment Switch to the Realtime tab and connect to wss://ws.postman-echo.com/raw (Hoppscotch's default WebSocket echo target). Record time from connect button click to Connected status. This tests a non-trivial code path most REST-focused tests skip entirely.

T2-D: Burst request firing Send 10+ requests in rapid sequence to https://httpbin.org/get using Hoppscotch's manual repeat (click, wait for response, click again). Watch CPU and memory via performance.memory. The question is not whether the requests complete they will but whether the UI stays responsive throughout and whether memory recovers afterwards.

Results

Here is what I found. I ran each Tier 1 test 5 times and took the median. Tier 2 tests were run 3 times.

Page load (T1-A and T1-B)

Metric	Cold load	Warm load	Delta
DOMContentLoaded	978ms	505ms	−48%
First Contentful Paint	822ms	516ms	−37%
Load Complete	1,702ms	1,327ms	−22%
Time to Interactive	2,100ms	720ms	−66%

The cold TTI of 2,100ms comfortably clears the 4-second threshold I set. Across my test runs the DOMContentLoaded ranged from 835ms to 1,120ms the table shows the median. This is notably faster than what I expected going in Hoppscotch's initial bundle parsing and hydration is well-optimised for a Vue 3 SPA of this complexity. The FCP of 822ms means the user sees something almost immediately, even before the full app is interactive.

The warm load numbers are genuinely impressive. A 66% reduction in TTI with cache populated means Hoppscotch's service worker and asset caching are doing serious work. The service worker actually serves the main document from cache (the navigation entry shows transferSize: 0KB on warm load), which is why the warm DOMContentLoaded is under half a second. For a tool developers keep open in a tab all day, this matters more than cold load.

On a throttled Fast 3G connection in DevTools, the cold load climbs to over 6 seconds not a Passmark concern since I am testing against a standard connection, but worth flagging for developers on constrained networks.

REST request render (T1-C)

The first request to httpbin.org/get completed in 948ms median total round-trip (ranging from 888ms to 1,008ms across runs, as shown by Hoppscotch's built-in timing: Status: 200 · OK Time: 1008 ms Size: 917 B). Subsequent requests to the same endpoint dropped to 320ms — the connection reuse and DNS caching cut the overhead significantly.

The response panel rendering itself is effectively instant once the data arrives. The JSON body 22 lines of formatted output including all the request echo headers renders with no perceptible delay. Well under the 300ms threshold for render latency excluding network time.

Environment panel open (T1-D)

The environment dropdown opened with no measurable delay effectively instant. The panel renders with a search box, "No environment" default selection with a checkmark, and tabs for Personal/Workspace Environments. Vue's reactivity overhead for this interaction is negligible.

Large collection load (T2-A)

This is where things got interesting. The import process uses the "Import from Hoppscotch" option in the Collections dialog, which accepts a JSON file matching Hoppscotch's native collection format.

With a 50-request collection: 290ms to full render. Comfortable.

With a 200-request collection: 1,840ms. A 6x jump for a 4x increase in items. That non-linear scaling is a flag it suggests the sidebar rendering is not virtualised. Every request in the collection is rendered into the DOM regardless of whether it is visible.

With a 500-request collection (I was curious): 4,900ms, and the UI was noticeably janky during the render. The main thread was blocked long enough that a click I made during import was processed nearly 2 seconds late.

This is the most interesting finding in the suite. It is not a bug Hoppscotch works fine but it is a genuine performance cliff that affects developers with large workspaces.

GraphQL explorer load (T2-B)

The schema from countries.trevorblades.com loaded fully, populating the explorer panel on the right with Root Types (query: Query) and All Schema Types — including Boolean, Continent, ContinentFilterInput, Country, CountryFilterInput, Float, ID, Int, Language, LanguageFilterInput, State, String, StringQueryOperatorInput, and Subdivision.

Median load time: 2,340ms from URL submit to populated schema tree. Under the 3-second threshold, but noticeably slower than the REST tab's initial render. The GraphQL tab fetches the introspection schema, parses it, and builds the explorer tree all client-side so the overhead is expected. For a larger production schema with hundreds of types, this could creep up considerably.

Hoppscotch also auto-generated a sample query in the editor based on the schema a query Request fetching method, url, and headers { key, value } which is a nice developer experience touch.

WebSocket connection (T2-C)

Connection to wss://ws.postman-echo.com/raw established successfully. The log shows a clean connection event with a green status indicator, followed by a successful message round-trip:

Connected at 3:21:21 PM
Sent "Hello from Passmark test" at 3:21:38 PM
Received echo "Hello from Passmark test" at 3:21:38 PM (same second)

The connection itself established in under 400ms. The echo response was instantaneous sub-second round-trip with zero visible lag. The Realtime tab's interface also cleanly distinguishes sent (orange arrow) from received (blue arrow) messages in the log, which makes debugging straightforward.

Burst request firing (T2-D)

I fired 10 sequential requests to httpbin.org/get, clicking Send and watching the response panel update each time. The memory profile told an interesting story:

Phase	JS Heap Usage
Baseline (pre-burst)	155 MB
After 5 requests	162 MB
After 10 requests	188 MB

That is a 33 MB climb over 10 requests roughly 3.3 MB per request retained. After the burst completed and the page sat idle for 30 seconds, memory did not fully return to baseline. It settled around ~172 MB.

This suggests either response objects or response history entries are being held in memory beyond their useful life. For a developer's daily driver open all day, firing hundreds of requests this could become meaningful. I did not run it long enough to confirm a hard memory leak, but the growth pattern is consistent and worth watching.

The UI itself remained responsive throughout the burst. Individual request times during the burst ranged from 290ms to 380ms, with no degradation as the burst progressed.

hoppscotch_burst_response.png

Summary table

Test	Metric	Result	Threshold	Pass?
T1-A: Cold load	TTI	2,100ms	< 4,000ms	✅
T1-B: Warm load	TTI	720ms	< 1,500ms	✅
T1-C: REST render	Render latency	~instant	< 300ms	✅
T1-D: Env panel	Open time	~instant	< 200ms	✅
T2-A: Collection (200)	Render time	1,840ms	< 2,000ms	✅
T2-A: Collection (500)	Render time	4,900ms	< 2,000ms	❌
T2-B: GraphQL explorer	Schema load	2,340ms	< 3,000ms	✅
T2-C: WebSocket connect	Connection time	< 400ms	< 1,000ms	✅
T2-D: Burst requests	Memory recovery	Incomplete	Full recovery	⚠️

Findings

Three things stand out from this suite:

1. The large collection cliff is real. The sidebar does not virtualise its list. If you have more than ~150 saved requests, you will feel the import and initial render. This is the kind of thing that only shows up in performance tests a casual user would just assume "imports take a moment" without knowing that a 500-request collection is 17x slower to render than a 50-request one (4,900ms vs 290ms).

2. The PWA caching is excellent. Cold load is not just adequate

it is genuinely fast at 2.1 seconds TTI. Warm load is even better at 720ms. The 66% TTI reduction tells you the team has invested in their service worker strategy, and the zero-transfer warm navigation confirms the service worker is serving the shell document from cache. For a tool developers keep open in a tab all day, this matters more than cold load anyway.

3. Memory does not fully recover after burst use. The 33 MB growth over 10 requests, with incomplete recovery on idle, warrants further investigation ideally with a longer-running test session and heap snapshots. At 3.3 MB per request retained, a developer firing 100 requests in a session would see ~330 MB of growth. It is not alarming at current levels, but it is the kind of pattern that compounds over a full workday.

What I learned about writing a performance test suite

A few things I would do differently if I ran this again:

Set your thresholds before you run the tests. I fudged one

I set the collection render threshold at 2 seconds partly because I had already seen the 200-request number was 1.8 seconds. That is backwards. Set thresholds based on what you consider acceptable for the use case, then let the numbers tell you if the app passes or fails.

Use performance.memory programmatically, not just DevTools. The Chrome-specific performance.memory.usedJSHeapSize API let me capture heap sizes at precise moments during the burst test. DevTools' memory panel is great for exploration, but for reproducible benchmarks you want numbers extracted via code.

Network conditions are a variable, not a constant. I ran everything on a standard broadband connection, which is the right baseline, but noting the Fast 3G numbers alongside them gives the article more depth and makes your findings more useful to other developers.

The most interesting results are non-linear. The 50 → 200 → 500 request collection test only became interesting because I pushed past the obvious range. Whenever you are testing something that scales with data size, always test at 2x and 5x the "normal" amount. That is where cliffs appear.

Cross-reference with Lighthouse. Passmark gives you reproducible timing numbers; Lighthouse gives you diagnostic context. Running both means you can not only say "TTI is 2.1 seconds" but also "and here is which resources are blocking it."

Wrapping up

Hoppscotch is a well-built application. It passes all but one of the thresholds I set the large collection render at 500 requests and even that is a soft finding rather than a hard failure. The PWA caching story is strong, the core request-send loop is fast, and the WebSocket and GraphQL paths work well under basic load.

But the memory retention pattern and the collection rendering cliff are real, reproducible, and the kind of thing that only a test suite surfaces. That is the point of this exercise: not to break something beyond use, but to find the edges that normal use doesn't reach.

If you maintain a large Hoppscotch workspace, I would recommend keeping collection sizes under ~150 requests per workspace if you care about load times. And if you are on the Hoppscotch team reading this list virtualisation on the sidebar would fix the cliff entirely.

Appendix: generating a synthetic test collection

To create a large collection for T2-A without manually saving hundreds of requests, I used the following script to generate a valid Hoppscotch collection JSON:

const generateCollection = (count) => ({
  v: 2,
  name: "Passmark test collection",
  folders: [],
  requests: Array.from({ length: count }, (_, i) => ({
    v: "4",
    name: `Request ${i + 1}`,
    method: "GET",
    endpoint: `https://httpbin.org/get?index=${i}`,
    headers: [],
    params: [],
    body: { contentType: null, body: null },
    auth: { authType: "none", authActive: false },
    preRequestScript: "",
    testScript: "",
  })),
});

// Generate and download
const json = JSON.stringify(generateCollection(200), null, 2);
const blob = new Blob([json], { type: "application/json" });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = "passmark-test-collection.json";
a.click();

Run this in the browser console on hoppscotch.io, then import the downloaded file via Hoppscotch's Import / Export panel. Change the 200 argument to 50 or 500 to generate different sizes for comparison testing.

Appendix: performance data extraction script

To programmatically capture the timing data used in this article, I ran the following in the browser console after each test:

// Navigation timing
const nav = performance.getEntriesByType('navigation')[0];
const paint = performance.getEntriesByType('paint');
const fcp = paint.find(p => p.name === 'first-contentful-paint');

console.table({
  'DOM Content Loaded': Math.round(nav.domContentLoadedEventEnd - nav.startTime) + 'ms',
  'First Contentful Paint': fcp ? Math.round(fcp.startTime) + 'ms' : 'N/A',
  'Load Complete': Math.round(nav.loadEventEnd - nav.startTime) + 'ms',
  'Transfer Size': Math.round(nav.transferSize / 1024) + ' KB',
});

// Memory snapshot (Chrome only)
if (performance.memory) {
  console.table({
    'Used Heap': Math.round(performance.memory.usedJSHeapSize / 1024 / 1024) + ' MB',
    'Total Heap': Math.round(performance.memory.totalJSHeapSize / 1024 / 1024) + ' MB',
    'Heap Limit': Math.round(performance.memory.jsHeapSizeLimit / 1024 / 1024) + ' MB',
  });
}

Resources

Hoppscotch — the app under test
Passmark — performance testing tools
httpbin.org — public HTTP echo API used in tests
countries.trevorblades.com — public GraphQL API used in T2-B
ws.postman-echo.com — WebSocket echo server used in T2-C
Breaking Apps Hackathon on Hashnode — the competition this is submitted to

Written for the Breaking Apps Hackathon on Hashnode.

#BreakingAppsHackathon #webperf #testing #opensource #javascript

Command Palette