Receive Side Scaling and Offloading: Observing ISR/DPC Placement in Windows
An article work in progress.
Networking is one of the most important, yet often overlooked, aspects of performance.
In this article, we will look at a few combinations of Windows networking settings and observe how they affect interrupt delivery, miniport ISR/DPC execution, RSS behavior, TCP/IP task offload state, and logical processor placement.
The results shown below are based on our own testing on one setup. They should be treated as setup-specific observations, not as universal behavior for every NIC, driver, firmware version, Windows build, RSS profile, MSI-X configuration, or workload.
Terminology
Before looking at the experiments, here is a quick explanation of the main terms used throughout the article.
Interrupt
In simple terms, the CPU gets poked because the network card has something to say.
ISR
DPC
A Deferred Procedure Call, or DPC, is a deferred execution path used to continue work that was triggered by an interrupt. In the classic NDIS miniport interrupt model, NDIS calls the miniport driver’s MiniportInterruptDPCcallback to complete deferred interrupt processing. You can think DPCs as some workers who do the real heavy lifting (if that makes sense)
Receive Side Scaling
Receive Side Scaling, also known as RSS, is a network-driver technology that distributes receive-side network processing across multiple processors. RSS uses a hash of incoming packet headers and an indirection table to associate received traffic with a processor, helping keep traffic for a given connection on a consistent processor while spreading receive work across the system.
Important: interrupt affinity and RSS processor selection are related but not identical. Interrupt affinity controls where the device interrupt may be delivered. RSS receive-processing placement can also depend on the RSS processor array, receive queues, hash result, indirection table, RSS profile, NUMA settings, MSI/MSI-X support, and NIC/driver behavior.
Offloading
TCP/IP task offload allows the Microsoft TCP/IP transport to offload supported packet-processing tasks to a capable network adapter. This can reduce CPU-side per-packet work, but the exact behavior depends on the NIC, driver, firmware, Windows version, workload, and adapter configuration.
Examples include checksum offload, Large Send Offload v1/v2 (LSOv1/LSOv2), Receive Segment Coalescing (RSC), NVGRE task offload, and UDP Segmentation Offload (USO), depending on OS and device support.
- With task offloading disabled by
DisableTaskOffload = 1, meaning TCP/IP task offloads are globally disabled. - With task offloading enabled by
DisableTaskOffload = 0, meaning TCP/IP task offloads are enabled/allowed, assuming NIC and driver support
This setting should not be interpreted as a simple universal switch where every packet calculation always moves between the CPU and NIC in the same way on every system.
Experiments
Experiment 1: Interrupt affinity pinned to logical processor 14 with RSS enabled
Configuration:
- Interrupt affinity pinned to CPU 14
- RSS enabled
- 2 RSS queues
Observation:
NDIS DPCs were observed on logical processor 12 and logical processor 14..

The miniport ISR path was observed on logical processor 14.

NDIS DPC execution times on logical processor 12 and logical processor 14.

Interpretation:
With RSS enabled, receive-side DPC placement does not necessarily have to match only the manually selected interrupt-affinity processor. RSS can distribute receive processing across multiple processors. A logical processor that is outside the manual interrupt-affinity mask may still be part of the RSS processor array, receive queue mapping, or RSS indirection table.
For this reason, seeing DPC activity on logical processor 12 while the miniport ISR path was observed on logical processor 14 should be treated as a receive-processing placement observation, not automatically as a broken state. To prove that the placement is unexpected, the RSS processor array, indirection table, receive queue mapping, MSI/MSI-X mapping, NIC model, driver version, Windows build, and ETW/WPA call stacks should also be checked.
Experiment 2: Interrupt affinity pinned to logical processor 14 with RSS disabled
Configuration:
- Interrupt affinity pinned to logical processor 14
- RSS disabled
Observation:
NDIS ISRs were executed on logical processor 14.

DPC activity was also observed on logical processor 14.

With RSS disabled in this test, both ISR and DPC activity stayed on the pinned CPU.
Experiment 3: Interrupt affinity pinned to logical processor 14 with RSS disabled and task offload disabled (DisableTaskOffload 1)
Configuration:
- Interrupt affinity pinned to CPU 14
- RSS disabled
DisableTaskOffload = 1
Observation:
NDIS ISR routines were executed on CPU 14.

ISR execution time:

DPC execution time:

251,229 DPC samples attributed to NDIS.SYS lasted between 4 – 8 microseconds. That is about 60% of the NDIS.SYS DPC samples.
DPC execution happens on CPU 14.

Interpretation:
This part of the test was used to observe behavior with RSS disabled while global task offload was disabled. Since RSS was disabled, the receive-side work did not use RSS-based spreading in this configuration.
Experiment 4: DisableTaskOffload 1 correlated with changed observed RSS/DPC placement.
Configuration:
- NIC interrupts pinned to CPU 14 and CPU 16
- RSS enabled
- 2 RSS queues
DisableTaskOffload = 1
Observation:
Interrupts were routed to the selected CPUs.

However, NDIS DPC execution was observed on CPU 12, even though CPU 12 was not part of the manually selected NIC interrupt-affinity mask.

Interpretation:
In this test, enabling DisableTaskOffload appeared to change the observed DPC placement while RSS was enabled. Interrupts were routed to the selected CPUs, but NDIS DPC execution was observed on CPU 12.
This does not automatically prove that Windows or RSS was broken. Interrupt affinity and RSS receive-processing placement are not the same mechanism. CPU 12 may still have been part of the RSS processor array, RSS indirection table, receive queue mapping, or another driver-specific receive path. To call this definitively unexpected, those RSS and MSI-X details should be verified before and after changing DisableTaskOffload.
Reminder: restart the adapter or reboot after changing task-offload state
After changing global task-offload settings, restart the network adapter or reboot the system before testing. A full reboot is the safest option when using the registry directly, because it helps avoid stale driver, NDIS, or adapter state.
Experiment 5: DisableTaskOffload set back to 0
Configuration:
- NIC interrupt affinity configured manually
- RSS enabled
DisableTaskOffload = 0
Observation:
DPCs were observed on the CPUs selected for this test, rather than being observed on CPU 12.

The same placement was also observed with ISR routines.

After setting DisableTaskOffload = 0 and restarting/reinitializing the networking stack, the observed ISR/DPC placement matched our intended CPU placement again in this test.
What does DisableTaskOffload do?
When set to 1, Windows disables task offloads from the TCP/IP transport. When set to 0, Windows enables task offloads from the TCP/IP transport, assuming the NIC, driver, firmware, and adapter settings support them.
Task offload can include work such as checksum offload, Large Send Offload v1/v2, Receive Segment Coalescing, NVGRE task offload, and UDP Segmentation Offload, depending on OS and device support. Moving work back to the CPU can increase CPU cycles and CPU load, but the real-world impact depends on the NIC, driver, firmware, workload, packet rate, traffic direction, adapter settings, and available CPU headroom.
Microsoft’s Windows Server network-adapter tuning guidance generally recommends enabling useful static offloads such as UDP checksums, TCP checksums, and LSO for microsecond-sensitive packet-processing scenarios. That guidance is useful, but it should not be treated as a universal rule for every client/gaming system, NIC, driver, firmware version, or workload.
In our 2024 CS2 testing, we observed that a specific frametime spike and small hitch before an enemy peek disappeared after changing this setting. This is anecdotal and setup-specific. It should not be treated as proof that DisableTaskOffload = 1 improves CS2 performance generally.
It is a useful clue, not a universal conclusion.
Experiment 5: Does DisableTaskOffload 1 break RSS?
Experiment 5: Does DisableTaskOffload 1 break RSS?
On this setup, after setting DisableTaskOffload to 1, Get-NetAdapterRss still reported RSS as enabled, with two receive queues and an RSS processor array containing logical processors 12 and 14. However, the IndirectionTable field was blank/not displayed in this output.
This should not be phrased as proof that DisableTaskOffload = 1 universally breaks RSS or prevents Windows from maintaining or using an RSS indirection table. Microsoft documents `DisableTaskOffload` as a TCP/IP task-offload switch, while RSS has its own configuration, processor selection, receive queues, and indirection table.
The narrower conclusion is that, in this test state, the RSS indirection table was not observable from the Get-NetAdapterRss output. Because logical processor 12 appears in the RSS processor array, DPC activity on logical processor 12 is not automatically unexpected from an RSS-placement perspective. To determine whether RSS behavior changed, collect Get-NetAdapterRss, RSS indirection-table output if available, receive queue mapping, MSI/MSI-X mapping, and ETW/WPA traces before and after changing DisableTaskOffload.

RSSv2 note: on newer Windows/NDIS versions and drivers that support RSSv2, receive queue and indirection-table behavior can be more dynamic. Microsoft documents RSSv2 for NDIS 6.80 and later, including dynamic queue spreading and indirection table entry (ITE) movement. This means that observed receive-side processor placement can depend on whether the adapter/driver is using RSSv1-style behavior, RSSv2 behavior, VMQ/vRSS-related behavior, or another driver-specific path.
Further reading
- Microsoft: Introduction to Receive Side Scaling
- Microsoft: RSS with Message Signaled Interrupts
- Microsoft: TCP/IP Task Offload Overview
- Microsoft: Using Registry Values to Enable and Disable Task Offloading
- Microsoft: Set-NetOffloadGlobalSetting
- Microsoft: Network Adapter Performance Tuning
Summary
Based on our testing:
- With RSS disabled, ISR and DPC activity stayed on the manually selected interrupt CPU.
- With RSS enabled, receive-side DPC placement was not determined only by the manual interrupt-affinity mask.
- With
DisableTaskOffload = 1, NDIS DPC execution was observed on CPU 12, even though NIC interrupts were configured for CPU 14 and CPU 16. - With
DisableTaskOffload = 0, observed ISR/DPC placement matched our intended CPU placement again in this test.
In this setup, DisableTaskOffload = 1 changed the observed RSS/DPC placement behavior. This does not prove that DisableTaskOffload = 1 universally breaks RSS or DPC placement.
The safer conclusion is that the global task-offload state appeared to affect receive-side DPC placement on this specific NIC/driver/Windows configuration. To make the result stronger, the RSS indirection table, receive queues, MSI-X mapping, NIC model, driver version, Windows build, and ETW/WPA trace data should be documented as well.
This article will be updated with further documentation later on.
Related: You may also want to read our guide on Windows performance tuning.




























