Case Study · 03

Web Shop Manager , Backend & Cloud Optimization

Cut AWS bills, halved p95 query latency, and quietly walked a legacy PHP system toward a Go and TypeScript backend. No rewrite, no freeze, no customer-visible incident.

RoleSenior Backend Engineer & Infra Architect
Focus areasAWS/RDS optimization, legacy modernization, backend performance
EngagementOngoing retainer, started as a 6-week audit
Stack
GoTypeScriptPHPPostgresRDSRedisAWS
01

Context

A well-established ecommerce operator with a PHP monolith that had grown past the team's ability to change it confidently — slow queries, over-provisioned infrastructure, and a cost structure that didn't match the traffic.

The platform worked, but it was expensive to run and slow to change. AWS spend had grown with the business without being right-sized. Query performance degraded as the catalog grew. The team had talked about rewriting in Go for two years without being able to commit to it.

The engagement started as a 6-week performance and cost audit. It became an ongoing retainer when the initial changes — delivered without a freeze — produced results the team wanted to keep building on.

02

Problem

A monolith that was expensive, slow on key paths, and impossible to rewrite without stopping the business.

The top 30 endpoints accounted for most of the latency complaints. Query plans were unoptimized, the cache layer was inconsistently applied, and read replicas weren't being used for traffic that didn't need primary writes.

Infrastructure cost was the other half. Over-provisioned RDS instances, under-utilized reserved capacity, and no cost attribution by service or feature. The team knew they were spending too much but couldn't tell where.

Why it needed to be done

The cost of doing nothing was compounding.

Risk surface

The cost of doing nothing was compounding.

The problems weren't critical. They were becoming structural. Each quarter of inaction made them harder to fix.

$

AWS spend scaling with headcount, not traffic

Infrastructure costs were growing because of over-provisioning decisions made at lower scale, not because the traffic required them. Each month of inaction was waste.

!

Query regressions blocking new features

Slow queries on the catalog and order paths were making new feature work risky. Developers were working around known slow paths instead of through them.

~

Rewrite risk accumulating

Two years of deferred modernization meant the gap between the PHP monolith and any new service was widening. The longer the wait, the larger the eventual rewrite.

Solution

What was built and how it fits together.

01Endpoint and query audit
A structured audit of the top 30 endpoints by latency and database cost. Query plans were read, indexes were added or changed where they had direct impact, and N+1 patterns were resolved first.
02Cache layer with sane invalidation
Redis was already in the stack but applied inconsistently. A small set of rules for what gets cached, for how long, and how invalidation is triggered — applied to the highest-volume read paths.
03Read-replica routing
Reads that don't need primary-write consistency are routed to read replicas. Implemented as a thin middleware change in the PHP layer — no service boundary required.
04Go edge service for hot paths
The catalog search and product detail paths — highest volume, most latency-sensitive — were extracted into a small Go service. The PHP monolith calls it; the team deploys it independently.
05TypeScript operator API
A new TypeScript service owns the operator-facing admin API. Typed, tested, and deployed separately from the PHP monolith. New admin features are built here, not added to PHP.
06Right-sizing and cost attribution
RDS instances right-sized based on actual load profiles, reserved capacity rationalized, and cost attribution added by service so future spend decisions have data behind them.
Key technical work

The pieces of the build that mattered most.

01

Top-30 endpoint and query audit

Systematic review of query plans for the highest-traffic endpoints. Index changes, query rewrites, and N+1 resolution — each change measured before and after.

EXPLAIN ANALYZEIndexingN+1 resolution
02

Cache layer with sane invalidation

Redis applied to catalog reads and session-adjacent paths with explicit TTL strategy and event-driven invalidation. Cache hit rates measured per endpoint.

RedisCache strategyInvalidation
03

Read-replica routing

A routing middleware layer in PHP that directs read-only queries to the replica pool. Transparent to the application layer, measurable in RDS metrics.

RDS replicasRead routingPHP middleware
04

Go edge service for catalog paths

Extracted product detail and search into a Go service behind an internal load balancer. Deployed independently, with its own metrics and on-call runbook.

GoInternal serviceTerraform
05

TypeScript operator API

New operator-facing API in TypeScript with full type coverage, request validation, and structured logging. Deployed to ECS alongside the Go service.

TypeScriptECSOpenAPI
06

Right-sizing and cost attribution

RDS right-sizing based on peak vs. median load profiles, reserved instance re-evaluation, and a cost-per-feature tagging strategy so future optimization has a baseline.

AWS Cost ExplorerRDS sizingTagging
Business impact

What came out of it.

placeholderp95 latency–66%Across the top 30 endpoints after query optimization, cache layer, and read-replica routing. Measured before and after each change.
placeholderRDS cost–41%Monthly RDS spend after right-sizing and reserved capacity rationalization. No reduction in availability or read capacity.
placeholderGo service p95< 40msCatalog search and product detail latency after extraction to Go. Previously averaging 280ms on the PHP path.
placeholderEngagement18+moOngoing retainer. Started as a 6-week audit; the team kept me on when the first changes held up in production.

Values marked placeholder are representative — replace with measured numbers from the live system once available.

Final result

A backend the team can keep evolving, instead of one they have to escape.

The PHP monolith is still in production. It carries less work each quarter, and the parts that mattered most are now on a typed Go and TypeScript surface, behind a properly-sized AWS footprint. Latency is down, the bill is down, and the team has a clear migration path instead of a rewrite deadline.

p95 latency down 66% across top 30 endpoints
Monthly RDS cost reduced 41% with no capacity loss
Go service handling catalog paths under 40ms p95
TypeScript operator API replacing PHP admin layer
Cost attribution added so future decisions have data
Next engagement

Have a similar system to build or optimize?

If you have a legacy backend that's expensive to run and slow to change, send a few sentences. I'll respond directly within one business day.

Book a callbilalasharf@gmail.com