On this page
CPU Cache Locality in Batch Field Extraction
Multiple `.map()` calls over the same array force the CPU to reload each object
I was optimizing a bulk insert function that extracted 18 fields from an array of calendar blocks. The code looked clean — one .map() per field — but it was iterating the entire array 18 times. For 100 blocks, that meant 1,800 property accesses across 18 separate passes. The functional style was hiding a real performance problem: cache locality.
The Problem
When you use multiple .map() calls to extract fields from the same array, the CPU has to reload each object from memory on every pass:
// BAD: k separate iterations over n items = O(k*n)
const ids = blocks.map((b) => b.id); // pass 1: block accessed
const titles = blocks.map((b) => b.title); // pass 2: same block re-accessed
const starts = blocks.map((b) => b.start); // pass 3: same block re-accessed
// ... 18 more fields Each .map() iterates the entire array. For 18 fields over 100 items: 18 * 100 = 1,800 property accesses across 18 passes. The issue isn’t the raw number of accesses — it’s that the CPU has to reload the same objects from slower cache levels (or main memory) on every pass.
Why Cache Locality Matters
Modern CPUs load data in cache lines (64 bytes on x86). When you access a Block object, the entire object (or a large portion) gets loaded into the L1 cache. Subsequent property accesses on the same object are essentially free — an L1 cache hit takes ~1 nanosecond, while an L3 cache miss takes ~40 nanoseconds.
With multiple .map() calls, each pass evicts the objects from cache before the next pass needs them:
- Pass 1: Block loaded into L1, read
.id, Block evicted - Pass 2: Block reloaded into L1, read
.title, Block evicted - Repeat for each field
With a single for...of loop, you access all fields while the object is still hot:
- Block loaded into L1 once
- Read
.id,.title,.start, … (all L1 hits) - Move to next Block
The Solution: Single-Pass Extraction
Replace multiple .map() calls with a single loop that extracts all fields in one pass:
// GOOD: 1 iteration over n items = O(n)
const ids: number[] = [];
const titles: (string | null)[] = [];
const starts: (string | null)[] = [];
for (const block of blocks) {
// Block is hot in L1 cache — all reads are cache hits
ids.push(block.id);
titles.push(block.title);
starts.push(block.startDateTime);
} Why for...of Over forEach
Both for...of and forEach are single-pass with identical cache locality behavior. The CPU sees the same sequential memory access pattern. The difference is in how V8’s TurboFan JIT compiler optimizes each construct:
// Also single-pass — same cache locality as for...of
blocks.forEach((block) => {
ids.push(block.id);
titles.push(block.title);
starts.push(block.startDateTime);
}); for...of uses the iterator protocol (Symbol.iterator + .next()). This looks heavier — it allocates an iterator and a {value, done} result object per step. But V8’s escape analysis eliminates both allocations for arrays since neither object escapes the loop scope.
forEach invokes a callback per element. TurboFan inlines Array.prototype.forEach itself, but the user-supplied callback is inlined speculatively. If the callback becomes polymorphic or too large, inlining fails silently and each iteration pays full function call overhead — a 20-40% slowdown.
| Aspect | for...of | forEach |
|---|---|---|
| Cache locality | Single-pass, sequential | Single-pass, sequential (identical) |
| V8 mechanism | Iterator + escape analysis | Callback inlining (speculative) |
| Early exit | break works | Cannot break out of forEach |
| Degradation risk | Predictable on arrays | 20-40% slower if callback not inlined |
for...of is the safer default for hot paths: more predictable optimization, supports break, and makes the single-pass intent explicit.
Real Impact
| Metric | Multi-map (18x) | Single for…of | Improvement |
|---|---|---|---|
| Iterations (100 items) | 1,800 | 100 | 18x fewer |
| Iterations (100K items) | 1,800,000 | 100,000 | 18x fewer |
| Cache behavior | Cold on each pass | Hot (temporal locality) | Significant at scale |
When This Optimization Matters
| Data Size | Multiple .map() OK? | Why |
|---|---|---|
| Under 100 | Yes | Fits entirely in L1 cache anyway |
| 100-1,000 | Marginal | Depends on object size |
| 1,000+ | No — use single pass | Cache eviction between passes |
| 10,000+ | Definitely not | O(k*n) becomes measurable |
For small arrays (under 100 items), the entire dataset fits in L1 cache regardless of access pattern, so the .map() approach is fine and more readable. The optimization only matters at scale — 1K+ items with large objects, where cache eviction between passes becomes measurable.
Also skip this optimization when you only need a single field (one .map() is already optimal), in non-hot code paths where readability matters more than nanoseconds, or when the downstream consumer accepts the full object array and doesn’t need separate field arrays.
Takeaway
When extracting multiple fields from a large array, replace multiple .map() calls with a single for...of loop. The optimization isn’t about reducing iteration count (though it does that too) — the bigger win is temporal locality: accessing all fields of an object while it’s hot in L1 cache, rather than re-fetching it from slower memory on every pass. Use for...of over forEach on hot paths for more predictable V8 optimization.