Receive Side Scaling NDIS DPC CPU placement

Receive Side Scaling and Offloading: Observing ISR/DPC Placement in Windows

Receive Side Scaling and Offloading: Observing ISR/DPC Placement in Windows

An article work in progress.

Networking is one of the most important, yet often overlooked, aspects of performance.

In this article, we will look at a few combinations of Windows networking settings and observe how they affect interrupt delivery, miniport ISR/DPC execution, RSS behavior, TCP/IP task offload state, and logical processor placement.

The results shown below are based on our own testing on one setup. They should be treated as setup-specific observations, not as universal behavior for every NIC, driver, firmware version, Windows build, RSS profile, MSI-X configuration, or workload.


Terminology

Before looking at the experiments, here is a quick explanation of the main terms used throughout the article.

Interrupt

When a device, such as a network card, receives a packet, it needs to notify the processor. This hardware signal that acts like a notification is called an interrupt.

In simple terms, the CPU gets poked because the network card has something to say.

ISR

An Interrupt Service Routine (ISR) is the first driver-provided code that runs in response to an interrupt. It handles the time-critical work quickly, then typically schedules a Deferred Procedure Call (DPC) to perform the heavier processing later.

DPC

A Deferred Procedure Call, or DPC, is a deferred execution path used to continue work that was triggered by an interrupt. In the classic NDIS miniport interrupt model, NDIS calls the miniport driver’s MiniportInterruptDPCcallback to complete deferred interrupt processing. You can think DPCs as some workers who do the real heavy lifting (if that makes sense)

Receive Side Scaling

Receive Side Scaling, also known as RSS, is a network-driver technology that distributes receive-side network processing across multiple processors. RSS uses a hash of incoming packet headers and an indirection table to associate received traffic with a processor, helping keep traffic for a given connection on a consistent processor while spreading receive work across the system.

Important: interrupt affinity and RSS processor selection are related but not identical. Interrupt affinity controls where the device interrupt may be delivered. RSS receive-processing placement can also depend on the RSS processor array, receive queues, hash result, indirection table, RSS profile, NUMA settings, MSI/MSI-X support, and NIC/driver behavior.

Offloading

TCP/IP task offload allows the Microsoft TCP/IP transport to offload supported packet-processing tasks to a capable network adapter. This can reduce CPU-side per-packet work, but the exact behavior depends on the NIC, driver, firmware, Windows version, workload, and adapter configuration.

Examples include checksum offload, Large Send Offload v1/v2 (LSOv1/LSOv2), Receive Segment Coalescing (RSC), NVGRE task offload, and UDP Segmentation Offload (USO), depending on OS and device support.

  • With task offloading disabled by DisableTaskOffload = 1, meaning TCP/IP task offloads are globally disabled.
  • With task offloading enabled by DisableTaskOffload = 0, meaning TCP/IP task offloads are enabled/allowed, assuming NIC and driver support

This setting should not be interpreted as a simple universal switch where every packet calculation always moves between the CPU and NIC in the same way on every system.


Experiments

Experiment 1: Interrupt affinity pinned to logical processor 14 with RSS enabled

Configuration:

  • Interrupt affinity pinned to CPU 14
  • RSS enabled
  • 2 RSS queues

Observation:

NDIS DPCs were observed on logical processor 12 and logical processor 14..

NDIS DPCs on CPU 12 and CPU 14

The miniport ISR path was observed on logical processor 14.

 

NDIS DPC execution times on logical processor 12 and logical processor 14.

Interpretation:

With RSS enabled, receive-side DPC placement does not necessarily have to match only the manually selected interrupt-affinity processor. RSS can distribute receive processing across multiple processors. A logical processor that is outside the manual interrupt-affinity mask may still be part of the RSS processor array, receive queue mapping, or RSS indirection table.

For this reason, seeing DPC activity on logical processor 12 while the miniport ISR path was observed on logical processor 14 should be treated as a receive-processing placement observation, not automatically as a broken state. To prove that the placement is unexpected, the RSS processor array, indirection table, receive queue mapping, MSI/MSI-X mapping, NIC model, driver version, Windows build, and ETW/WPA call stacks should also be checked.


Experiment 2: Interrupt affinity pinned to logical processor 14 with RSS disabled

Configuration:

  • Interrupt affinity pinned to logical processor 14
  • RSS disabled

Observation:

NDIS ISRs were executed on logical processor 14.

NDIS ISRs on CPU 14 with RSS disabled

DPC activity was also observed on logical processor 14.

NDIS DPCs on CPU 14 with RSS disabled

With RSS disabled in this test, both ISR and DPC activity stayed on the pinned CPU.


Experiment 3: Interrupt affinity pinned to logical processor 14 with RSS disabled and task offload disabled (DisableTaskOffload 1)

Configuration:

  • Interrupt affinity pinned to CPU 14
  • RSS disabled
  • DisableTaskOffload = 1

Observation:

NDIS ISR routines were executed on CPU 14.

NDIS ISR routines on CPU 14

ISR execution time:

ISR execution time

DPC execution time:

251,229 DPC samples attributed to NDIS.SYS lasted between 4 – 8 microseconds. That is about 60% of the NDIS.SYS DPC samples.

DPC execution happens on CPU 14.

Interpretation:

This part of the test was used to observe behavior with RSS disabled while global task offload was disabled. Since RSS was disabled, the receive-side work did not use RSS-based spreading in this configuration.


Experiment 4: DisableTaskOffload 1 correlated with changed observed RSS/DPC placement.

Configuration:

  • NIC interrupts pinned to CPU 14 and CPU 16
  • RSS enabled
  • 2 RSS queues
  • DisableTaskOffload = 1

Observation:

Interrupts were routed to the selected CPUs.

Interrupts routed to selected CPUs

However, NDIS DPC execution was observed on CPU 12, even though CPU 12 was not part of the manually selected NIC interrupt-affinity mask.

DPC executions on CPU 12

Interpretation:

In this test, enabling DisableTaskOffload appeared to change the observed DPC placement while RSS was enabled. Interrupts were routed to the selected CPUs, but NDIS DPC execution was observed on CPU 12.

This does not automatically prove that Windows or RSS was broken. Interrupt affinity and RSS receive-processing placement are not the same mechanism. CPU 12 may still have been part of the RSS processor array, RSS indirection table, receive queue mapping, or another driver-specific receive path. To call this definitively unexpected, those RSS and MSI-X details should be verified before and after changing DisableTaskOffload.


Reminder: restart the adapter or reboot after changing task-offload state

After changing global task-offload settings, restart the network adapter or reboot the system before testing. A full reboot is the safest option when using the registry directly, because it helps avoid stale driver, NDIS, or adapter state.

Experiment 5: DisableTaskOffload set back to 0

Configuration:

  • NIC interrupt affinity configured manually
  • RSS enabled
  • DisableTaskOffload = 0

Observation:

DPCs were observed on the CPUs selected for this test, rather than being observed on CPU 12.

DPCs executed on configured affinity

The same placement was also observed with ISR routines.

ISR routines on configured affinity

After setting DisableTaskOffload = 0 and restarting/reinitializing the networking stack, the observed ISR/DPC placement matched our intended CPU placement again in this test.


What does DisableTaskOffload do?

When set to 1, Windows disables task offloads from the TCP/IP transport. When set to 0, Windows enables task offloads from the TCP/IP transport, assuming the NIC, driver, firmware, and adapter settings support them.

Task offload can include work such as checksum offload, Large Send Offload v1/v2, Receive Segment Coalescing, NVGRE task offload, and UDP Segmentation Offload, depending on OS and device support. Moving work back to the CPU can increase CPU cycles and CPU load, but the real-world impact depends on the NIC, driver, firmware, workload, packet rate, traffic direction, adapter settings, and available CPU headroom.

Microsoft’s Windows Server network-adapter tuning guidance generally recommends enabling useful static offloads such as UDP checksums, TCP checksums, and LSO for microsecond-sensitive packet-processing scenarios. That guidance is useful, but it should not be treated as a universal rule for every client/gaming system, NIC, driver, firmware version, or workload.

In our 2024 CS2 testing, we observed that a specific frametime spike and small hitch before an enemy peek disappeared after changing this setting. This is anecdotal and setup-specific. It should not be treated as proof that DisableTaskOffload = 1 improves CS2 performance generally.

It is a useful clue, not a universal conclusion.

Experiment 5: Does DisableTaskOffload 1 break RSS?

Experiment 5: Does DisableTaskOffload  1 break RSS?

On this setup, after setting DisableTaskOffload to 1, Get-NetAdapterRss still reported RSS as enabled, with two receive queues and an RSS processor array containing logical processors 12 and 14. However, the IndirectionTable field was blank/not displayed in this output.

This should not be phrased as proof that DisableTaskOffload = 1 universally breaks RSS or prevents Windows from maintaining or using an RSS indirection table. Microsoft documents `DisableTaskOffload` as a TCP/IP task-offload switch, while RSS has its own configuration, processor selection, receive queues, and indirection table.

The narrower conclusion is that, in this test state, the RSS indirection table was not observable from the Get-NetAdapterRss output. Because logical processor 12 appears in the RSS processor array, DPC activity on logical processor 12 is not automatically unexpected from an RSS-placement perspective. To determine whether RSS behavior changed, collect Get-NetAdapterRss, RSS indirection-table output if available, receive queue mapping, MSI/MSI-X mapping, and ETW/WPA traces before and after changing DisableTaskOffload.

a

RSSv2 note: on newer Windows/NDIS versions and drivers that support RSSv2, receive queue and indirection-table behavior can be more dynamic. Microsoft documents RSSv2 for NDIS 6.80 and later, including dynamic queue spreading and indirection table entry (ITE) movement. This means that observed receive-side processor placement can depend on whether the adapter/driver is using RSSv1-style behavior, RSSv2 behavior, VMQ/vRSS-related behavior, or another driver-specific path.

Read more here


Further reading


Summary

Based on our testing:

  • With RSS disabled, ISR and DPC activity stayed on the manually selected interrupt CPU.
  • With RSS enabled, receive-side DPC placement was not determined only by the manual interrupt-affinity mask.
  • With DisableTaskOffload = 1, NDIS DPC execution was observed on CPU 12, even though NIC interrupts were configured for CPU 14 and CPU 16.
  • With DisableTaskOffload = 0, observed ISR/DPC placement matched our intended CPU placement again in this test.

In this setup, DisableTaskOffload = 1 changed the observed RSS/DPC placement behavior. This does not prove that DisableTaskOffload = 1 universally breaks RSS or DPC placement.

The safer conclusion is that the global task-offload state appeared to affect receive-side DPC placement on this specific NIC/driver/Windows configuration. To make the result stronger, the RSS indirection table, receive queues, MSI-X mapping, NIC model, driver version, Windows build, and ETW/WPA trace data should be documented as well.

This article will be updated with further documentation later on.

Related: You may also want to read our guide on Windows performance tuning.

CS2 FPS optimization guide Windows NVIDIA settings

CS2 FPS Optimization Guide: Windows, NVIDIA & Best Settings

This CS2 FPS optimization guide shows you how to optimize Windows, clean-install NVIDIA drivers, configure FPS caps, tune launch options, and choose the best CS2 settings for smoother gameplay. This is a very simple guide that anyone can follow.

The goal is simple: improve CS2 performance, reduce stutter, clean up unnecessary background load, and build a stable setup for competitive Counter-Strike 2.

Before You Start

Before we jump in, it is important that you create a system restore point. Do not worry, nothing we are doing today is irreversible.

Creating a Restore Point

Create a restore point

  1. Search for "restore point" in your Windows search bar.
  2. Click Create a restore point.
  3. Select your C drive, then click Configure in the bottom right.
  4. Select Turn on system protection, slide the bar to 10 GB, then click Apply and OK.
  5. Click Create, set a name for the restore point in the box, then click Create.
  6. Click OK, then restart your PC.

Windows Updates

Windows Update

  1. Search for "Check for Updates" in Windows Search.
  2. Updates may take a few minutes, so please be patient while the installation processes.
  3. Once the updates are downloaded and installed, restart your PC.

FACEIT might still complain about missing updates. This is your sign to update to Windows 11 25H2. Here is how to do it.

Open CMD as administrator, then copy and paste the following text.

REG ADD "HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate" /f && REG ADD "HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate" /v TargetReleaseVersion /t REG_DWORD /d 1 /f && REG ADD "HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate" /v TargetReleaseVersionInfo /t REG_SZ /d 25H2 /f

You will see something like this.

Windows 11 update command for CS2 optimization

Then, scan for updates again and wait for 25H2 to be installed.

Clean Up Your PC

  1. Download and install Malwarebytes, a free and very popular anti-malware solution.
  2. Click the Scan button after the installation is complete.

Malwarebytes scan before CS2 FPS optimization

  1. The scan can take up to 30 minutes, depending on the speed of your PC and the number of files that need to be scanned.

Malwarebytes scan progress before CS2 FPS optimization

  1. Once complete, you will either see that there are no threats and you can move on, or you will see Threat Scan results.
  2. Click Quarantine if you have items in your Threat Scan results.
  3. Restart your PC.

Custom Windows Power Plan

  1. Create a folder on your C:\ drive and name it FPSHEAVEN_TUTORIAL.
  2. Download the FPSHEAVEN2026 power plan and unzip the files into the newly created folder. The link will automatically start the download.
  3. When you extract the ZIP to the FPSHEAVEN_TUTORIAL folder, it should look like this:

Extracted FPSHEAVEN power plan files for CS2 FPS boost

  1. Open Command Prompt and enter the following command: powercfg -import "C:\FPSHEAVEN_TUTORIAL\FPSHEAVEN2026.pow".
  2. Enter powercfg.cpl.
  3. In the newly opened window, select the FPSHEAVEN2026 power plan.
  4. Reboot your PC.

Windows power plan for CS2 FPS boost

Windows Cleanup

Next, we are going to clean up your Windows install by removing temporary files and running Disk Cleanup.

Delete Temp Files

Windows Temp folder cleanup for better CS2 performance

  1. Press Windows key + R on your keyboard to open Run.
  2. Enter C:\Windows\Temp and click OK.
  3. Select all the files in the folder by selecting a random file and pressing CTRL+A on your keyboard.
  4. Press your Delete key.
  5. Tick the Do this for all current items box and click Skip.
  6. Repeat the process for the AppData Temp folder by entering %temp% into Run.

Some items may not delete. This is fine, and you can proceed to the next step.

Disk Cleanup

Disk cleanup

  1. Press Windows key + R on your keyboard to open Run.
  2. Paste C:\Windows\System32\cleanmgr.exe and click OK.
  3. Select the C:\ drive and click OK.
  4. Tick every box, click OK, then click Delete Files.

Cleaning System Files

  1. Press Windows key + R on your keyboard to open Run.
  2. Paste C:\Windows\System32\cleanmgr.exe and click OK.
  3. Click Clean up system files.
  4. Choose what you want to clean. Most people select everything.
  5. Click OK. This process might take a bit longer, so be patient.

Disable Startup Apps

Disabling unnecessary startup apps can reduce background load and make CS2 run smoother.

Task Manager startup apps for smoother CS2 gameplay

  1. Press CTRL+SHIFT+ESC to open Task Manager.
  2. Select the Startup tab.
  3. Right-click the programs you do not want running on startup, then click Disable.

Remove Unused Apps

To remove apps, we will use Geek Uninstaller. You can download Geek Uninstaller from the official website. This will download a ZIP file. Extract it and move geek.exe to our folder again. Run the app and start double-clicking apps you do not need.

For example, I want to remove REAL VNC.

Geek Uninstaller app removal before CS2 optimization

I will double-click it, follow the uninstallation process, and then clean the remaining entries.

Geek Uninstaller remaining entries cleanup

You can also remove Microsoft apps by clicking View and then Show Microsoft Store Apps.

Geek Uninstaller Microsoft Store apps menu

The removal logic is the same: find the app you do not want to keep and double-click it.

Geek Uninstaller Microsoft Store app list

GPU Driver Removal with DDU

A clean GPU driver setup is one of the most important parts of this CS2 FPS optimization guide. We will remove the old driver first, then install a clean driver package.

  1. Download DDU (Display Driver Uninstaller).
  2. Open the ZIP and run the EXE. Then, you will see this:

DDU extraction for clean GPU driver removal

  1. Paste the following text inside the bar and press EXTRACT.

DDU extract path for CS2 optimization

  1. Go to the FPSHEAVEN_TUTORIAL folder, find Display Driver Uninstaller, and run it.
  2. In the settings menu, make sure to check the settings highlighted in the image.

DDU settings for clean GPU driver removal

Display Driver Uninstaller for CS2 FPS optimization

Clean GPU Driver Install

Next, we are going to install new GPU drivers using a utility called NVCleanInstall. If you have an AMD or Intel GPU, simply install the latest Radeon drivers or Intel Arc drivers respectively, and skip this step along with the Profile Inspector step.

  1. Download and install NVCleanInstall, which lets you customize and install the NVIDIA driver package.
  2. Open the program and select Manually select a driver version.
  3. Select the latest driver and make sure it says 64-bit Desktop.
  4. If you are on a laptop, select 64-bit Notebook.
  5. Click Next, do not tick any options, and then click Next again.
  6. Copy the settings you see below on the Installation Tweaks screen, then click Next.

NVCleanInstall settings for CS2 FPS optimization

NVCleanInstall installation tweaks for CS2 performance

Click Next, then Install, and wait for the installation to complete. When the installation is complete, you will see an error. Ignore it and reboot your PC.

CS2 NVIDIA Settings with NVIDIA Profile Inspector

These CS2 NVIDIA settings are applied through NVIDIA Profile Inspector. The goal is to import a CS2-specific profile and avoid changing the NVIDIA 3D section afterward.

  1. Download NVIDIA Profile Inspector by orbmu2k and our CS2.nip file. Both files will auto-download.
  2. Extract both ZIP files to the FPSHEAVEN_TUTORIAL folder, open nvidiaProfileInspector, and click the button pointing at Import Profile(s).

CS2 NVIDIA settings import with NVIDIA Profile Inspector

  1. Find and select CS2.NIP, which is located in the FPSHEAVEN_TUTORIAL folder.

CS2 NVIDIA Profile Inspector profile file

  1. You will see a Profiles successfully imported pop-up.
  2. Then click Apply Changes in the top right.

CS2 NVIDIA settings applied in Profile Inspector

  1. This profile has settings specific to CS2.
  2. DO NOT change anything in the 3D section of NVIDIA. This will revert what we just did.

FPS Cap

Important: Before you proceed here, please read the following paragraph carefully.

When we force a driver-level or RTSS-level framerate cap, we need to think about our goal. If our goal is to have the lowest latency possible, then we must not cap our framerate. It comes down to two options.

  1. Use -noreflex and a framerate limiter to get flat frametimes and a flat framerate.
  2. Use fps_max 0 and Reflex on to get as many frames as possible, which will result in lower latency.

The only reason you should cap your framerate this way is if you cannot sustain a high enough framerate or your 1% average FPS is too low.

Here is how you implement the NVIDIA framerate limiter.

  1. Open the NVIDIA Control Panel and navigate to Manage 3D Settings.
  2. Click Program Settings.
  3. Find CS2 in the drop-down menu and select it. If you cannot find it, click Add and locate the CS2.exe executable from the Steam directory. It is normally located at: C:\Program Files (x86)\Steam\steamapps\common\Counter-Strike Global Offensive\game\bin\win64\cs2.exe
  4. Scroll down, locate Max Frame Rate, and then select a value.

FPS CAP

Finding this value is relatively easy. Queue up for a competitive match and take note of your average framerate. Pick a value slightly below your average framerate. For example, if you average 280 FPS, lock the framerate to 250 FPS. Take note of your new average FPS. This can take some trial and error, so if you are not happy with how things feel, increase or decrease the cap by 10 FPS at a time to dial it in.

Remember to add -noreflex to your launch options and use fps_max 0 inside CS2 when using this limiter.

NVIDIA Scaling for CS2

Simple. If you are playing on a native resolution, use no scaling. If you are playing on a stretched resolution, set scaling to fullscreen and scale on the GPU.

NVIDIA scaling settings for stretched CS2 resolution

CS2 Launch Options

These CS2 launch options are not mandatory, but they can be useful depending on your system and FPS cap setup.

-noreflex only if you use a frame limiter like NVIDIA's or RTSS.

-threads 9 if you have 8 physical CPU cores.

The thread logic is easy: take the number of physical cores you have and add 1.

  • 14900K has 8 P-Cores, so -threads 9.
  • 9800X3D has 8 cores, so -threads 9.

-mainthreadpriority 2 raises the game's main thread to a higher priority. Keep in mind that if you are on an older CPU, this might cause some stuttering.

You can also use no launch options. They are not mandatory.

CS2 Settings

Boot up CS2 and head to the video settings. This section covers the best CS2 settings for FPS while keeping the game playable and readable.

The screenshot below is tuned for performance while still looking decent. You do not have to copy it exactly, but one thing is non-negotiable: always set Dynamic Shadows to ALL. Skipping this actually hurts visibility.

For MSAA, 2X or CMAA2 both work fine. We personally run 4X because our system can handle it and it makes the game look less pixelated.

Best CS2 settings for FPS video settings screenshot

Too Much Work?

Let us tune your system, including Windows and BIOS settings, at fpsheaven.com/services. We recently introduced TinyBoost, our best value-for-money service. Let us take care of your system and help you focus on winning some ELO.

Join the Community

If you have questions about this guide, feel free to join our Discord and ask: https://discord.gg/DjCRn9WTK.