Web Performance

How to Reduce Server Response Time

Slow server response time delays everything and hurts LCP. The real causes, from unindexed queries to no caching to distant origins, and how to fix each one.

StackOptic Research Team17 May 202611 min read
Reducing server response time and TTFB with caching, query tuning and a CDN

Before a browser can render a single pixel of your page, it has to wait for your server to respond. If that wait is long, nothing else you do to the front end can fully rescue the page — a slow server sets a hard floor under your loading speed. That wait has a name, Time to First Byte (TTFB), and reducing it is one of the highest-leverage things a backend can do for performance. In short: server response time is how long your server takes to start sending a response; it is slowed by inefficient code, unoptimised database queries, missing caching, cold serverless starts and distant origins; and the fixes — caching, query tuning, a CDN and adequate hosting — remove or relocate that work. This guide maps each cause to its fix, with a reference table and a measurement workflow.

It is the server-side companion to what is Time to First Byte (TTFB) and how to improve it, going deeper on the causes and remedies.

What "server response time" actually measures

Server response time is the interval between the server receiving a request and the browser receiving the first byte of the response. Everything the server does in that window counts: accepting the connection, running your application code, querying databases or external services, rendering the HTML, and beginning to transmit it. It is reported as TTFB, and tools like PageSpeed Insights, WebPageTest and the Chrome DevTools Network panel all surface it (DevTools breaks a request into stages including "Waiting for server response", which is essentially TTFB).

Why it matters so much: TTFB is the first thing on the clock. Every later step — downloading resources, rendering, painting the largest element — can only begin after the first byte arrives. So a 1-second TTFB means the fastest possible page is already a second behind before the browser has done anything. This is also why TTFB is the first sub-part of Largest Contentful Paint: a slow server delays when the LCP element can even start to load, capping the metric no matter how lean the front end is.

A common rule of thumb is to aim for TTFB under about 200ms for cacheable content and to investigate anything consistently in the high hundreds of milliseconds — but treat these as directional, and always judge by real-user (field) data at the 75th percentile rather than one lucky test.

The causes, mapped to fixes

Almost every slow server response traces to one or more of a short list of causes, and each has a distinct remedy. Diagnosing which applies is what makes the work efficient.

CauseWhat is happeningPrimary fix
Slow application codeThe app does too much work per request (heavy computation, inefficient logic, blocking calls)Profile and optimise hot paths; cache results; do work asynchronously or offline
Unoptimised database queriesN+1 query patterns, missing indexes, fetching too much dataAdd indexes, fix N+1, select only needed columns, cache query results
No cachingThe server rebuilds the same page from scratch on every requestFull-page caching, object caching, opcode caching
Cold serverless startsA serverless function must spin up before it can respondKeep functions warm, reduce bundle/init size, use provisioned concurrency where available
Distant originThe server is geographically far from the user, inflating round-trip latencyPut a CDN in front; cache at the edge; consider closer hosting regions
Underpowered / oversubscribed hostingShared or undersized hosting can't keep up under loadMove to adequate hosting; scale resources to traffic

The rest of the guide takes these in roughly the order that delivers the most improvement for the least effort on a typical site.

Fix 1: caching (usually the biggest win)

Caching is the single most effective lever for server response time, because the fastest work is the work you do not repeat. There are several layers, and they stack:

  • Full-page (page) caching. Store the fully rendered HTML of a page and serve that stored copy to subsequent visitors instead of regenerating it from scratch every time. For content that is the same for many users — articles, product pages, marketing pages — this can take TTFB from hundreds of milliseconds down to a handful, because the server is essentially handing over a pre-made file. It is the highest-impact cache for most content sites.
  • Object caching. Cache the results of expensive operations — a slow database query, an external API call, a computed value — so the next request reuses the result rather than recomputing it. This is ideal when a page is too dynamic to cache whole but contains pieces that are costly and reusable. Common stores include in-memory caches like Redis or Memcached.
  • Opcode caching. For interpreted languages, opcode caching stores the compiled form of your code so the server does not re-parse and recompile source on every request (PHP's OPcache is the classic example). It is usually a configuration setting rather than a code change, and on the right stack it is close to free performance.

The art of caching is invalidation — making sure users do not see stale content after something changes. The general approach is to cache aggressively but expire or purge entries when the underlying data updates (on publish, on edit, on a sensible time-to-live). The closely related delivery-layer caching offered by a CDN is covered below and in what is a CDN, and do you need one.

Fix 2: optimise the database

After caching, the database is the most common source of slow responses, because a single page can fire many queries and one slow query gates the whole response.

  • Add indexes. A query that scans an entire table for matching rows is slow and gets slower as data grows; an index lets the database jump straight to the relevant rows. Missing indexes on columns used in filtering and joining are one of the most frequent and most fixable causes of slow responses. Identify them with the database's slow-query log and EXPLAIN (or equivalent) to see which queries scan instead of seeking.
  • Fix N+1 query patterns. This classic problem is where code runs one query to fetch a list, then one additional query per item in that list — a page showing 50 items fires 51 queries instead of two. The fix is to fetch related data in a single query (eager loading / joins) rather than in a loop. N+1 is easy to introduce with ORMs and is often invisible until you watch the query count.
  • Fetch only what you need. Selecting every column and row when the page uses a few is wasted work and bandwidth between the app and the database. Request just the columns and rows required, and paginate large result sets.
  • Cache query results (object caching, above) for expensive queries whose data changes infrequently.

A few targeted indexes and one fixed N+1 pattern frequently cut server response time more than any amount of front-end tuning, which is why the database deserves early attention.

Fix 3: put a CDN in front

A content delivery network helps server response time in two ways. First, for cacheable content it serves a cached copy from an edge server near the user, so the response never travels to your distant origin at all — slashing the network-latency portion of TTFB for far-away visitors. Second, by absorbing those requests at the edge, it offloads work from your origin, so the origin stays responsive under load and during spikes.

This addresses the "distant origin" cause directly: even a fast server in one region is slow for users on another continent purely because of the round trip, and an edge cache removes that distance for cacheable responses. CDNs also bring modern protocols (HTTP/2 and HTTP/3) and compression that further trim response time. The important caveat is that a CDN accelerates cacheable content; genuinely dynamic, per-user responses still reach your origin, so the CDN complements rather than replaces the caching and database work above.

Fix 4: tackle cold starts (serverless)

If you run on a serverless platform, requests can hit a cold start: when no warm instance of your function exists, the platform must initialise one — loading the runtime, your code and dependencies — before it can respond, adding noticeable latency to that request. Mitigations include:

  • Reducing initialisation cost — smaller deployment bundles, fewer heavy dependencies loaded at startup, lazy-loading what is not needed immediately.
  • Keeping functions warm — provisioned concurrency or scheduled pings on platforms that support them, so a ready instance is usually available.
  • Caching in front so many requests are served without invoking the function at all, which both improves response time and reduces how often cold starts occur.

Cold starts are an architecture-specific cause, so the right fix depends on your platform — but the combination of leaner initialisation and caching covers most cases.

Fix 5: efficient code and connections

Beyond databases, the application itself can be the bottleneck. Profile to find the hot paths — the code that runs on every request or consumes the most time — and optimise those rather than guessing. Move work that does not have to happen during the request (sending emails, generating thumbnails, syncing third parties) to background jobs, so the user's request returns quickly. Avoid blocking calls to slow external services on the critical request path, or cache their results.

Connection-level details matter too. Keep-alive lets a client reuse a connection for multiple requests instead of paying setup costs each time; connection pooling to the database avoids opening a fresh database connection per request. And ensure compression (gzip or Brotli) is enabled so responses are smaller on the wire — though note this trims transfer time more than the pure "thinking" time of TTFB.

Fix 6: adequate hosting

Sometimes the cause is simply that the server is underpowered or oversubscribed. Cheap shared hosting places many sites on one machine, so a neighbour's traffic spike can slow your responses; an undersized instance runs out of CPU or memory under modest load. If you have optimised code, queries and caching and TTFB is still high under load, the hosting itself may be the limit. Moving to a plan with adequate, dedicated resources — or scaling your instances to match real traffic — addresses the floor that no amount of code tuning can lift. This ties back to the broader advice in how to make your website load faster: adequate hosting is foundational, and underpowered hosting is a frequent, hidden cause of sluggish responses.

How to measure and diagnose

A repeatable workflow keeps this work evidence-led rather than speculative:

  1. PageSpeed Insights / WebPageTest report TTFB (PSI flags "Reduce initial server response time" with the measured value). WebPageTest's waterfall shows the wait before the first byte clearly, and lets you test from multiple locations to expose distance-related latency.
  2. Chrome DevTools → Network — click the HTML document request and read the Timing tab; "Waiting for server response" is your TTFB for that load. Compare a cached vs uncached load to see how much caching is (or is not) helping.
  3. Application profiling and the database slow-query log localise the cost inside the server — which functions and which queries dominate — so you fix the actual bottleneck.
  4. Field data (Chrome UX Report) gives real-user TTFB across the device and geographic mix that matters; judge success here, at the 75th percentile, not on a single fast test from near your server.

The key diagnostic question is "is this response cacheable or not?" For cacheable content, caching and a CDN do most of the work; for genuinely dynamic responses, the code and database optimisations are where the gains live. Knowing which bucket a slow page is in points you straight at the right fix.

A worked example

Suppose PageSpeed Insights reports a TTFB of around 900ms on a content-heavy page and flags "Reduce initial server response time". Profiling shows the page runs dozens of database queries — a classic N+1 pattern fetching related records in a loop — and there is no page caching, so every visit rebuilds the page from scratch. You make three changes. First, fix the N+1 by eager-loading the related data in a single query, cutting the query count from over fifty to a handful. Second, add an index on the column the main query filters by, turning a full table scan into a fast seek. Third, enable full-page caching for this template with sensible invalidation on publish, so repeat visitors get a pre-rendered copy. Re-measured, cached visits return in well under 100ms and even uncached visits are far quicker. Field TTFB drops, and because TTFB is the first phase of LCP, the page's LCP improves in step. This is the typical pattern — a query fix, an index and a cache — and it usually moves server response time more than anything done on the front end.

Common mistakes

  • Optimising the front end while ignoring a slow TTFB, which caps every other improvement.
  • Skipping caching and rebuilding identical pages on every request.
  • Missing indexes and N+1 queries left unexamined because the page "works" in development with little data.
  • Assuming a CDN fixes a slow origin — it accelerates cacheable content, but dynamic responses still hit the origin.
  • Testing only from near the server, hiding the distance latency that real, distant users experience.
  • Cheap, oversubscribed hosting treated as adequate when it is the actual ceiling.

Why this is worth doing

Server response time is unglamorous, but it pays off broadly. Unlike a fix to a single page's image, reducing TTFB benefits every page the server produces, and it improves the metric — LCP — that most directly reflects whether a page feels fast. Faster responses also let your infrastructure handle more traffic on the same resources, which can lower cost. And because TTFB feeds Core Web Vitals and thus Google's page-experience signals, the work pays a modest SEO dividend on top of the user-experience and capacity wins. Measure it, find whether your slow responses are cacheable or computational, and apply the matching fix — the gains tend to be large and site-wide.

Go deeper

Want your server response time measured alongside performance, SEO and security? Analyse any URL with StackOptic — free, no sign-up.

Frequently asked questions

What is server response time?

Server response time is how long your server takes to process a request and begin sending the response back — the delay before the browser receives the first byte. It is measured as Time to First Byte (TTFB) and covers everything the server does: running your application code, querying databases, assembling the page and starting to transmit it. A slow server response delays every subsequent step of loading, so it sets a floor on how fast a page can possibly be.

What is a good server response time?

As a common rule of thumb, aim for a Time to First Byte under roughly 200 milliseconds for cacheable content, and treat anything consistently above several hundred milliseconds as worth investigating. Google's guidance ties TTFB to Largest Contentful Paint: because TTFB is the first phase of LCP, a high TTFB makes a good LCP (2.5 seconds or less at the 75th percentile) much harder to achieve. Measure at the 75th percentile of real users, not just a single fast test.

What causes slow server response time?

The usual culprits are slow application code doing too much work per request, unoptimised database queries — especially N+1 query patterns and queries missing indexes — a lack of caching so the server rebuilds the same page repeatedly, cold starts on serverless platforms where the function must spin up before responding, and an origin server located far from the user so network latency inflates the response. Often several of these combine, which is why measuring the breakdown matters.

How do I reduce server response time?

Start with caching, which usually has the biggest impact: full-page caching so pages are not rebuilt every request, object caching for expensive query results, and opcode caching for compiled code. Then optimise database queries by adding indexes and fixing N+1 patterns, put a CDN in front to serve cached responses from the edge, choose adequate (not oversubscribed) hosting, and keep connections alive. Re-measure TTFB after each change to confirm it helped.

How does server response time affect SEO and Core Web Vitals?

Server response time is the first component of Largest Contentful Paint, one of the Core Web Vitals Google uses in its page-experience signals. A slow TTFB delays when the largest element can begin rendering, so it directly worsens LCP regardless of how optimised your front end is. Reducing server response time therefore improves a metric that affects both user experience and search, and it benefits every page on the site rather than a single one.

Analyse any website with StackOptic

Get the full technology stack, performance, security and SEO report in seconds — free.

Analyse a website

Related articles