
Fortinet FortiASIC architecture utilizes proprietary, hardware-accelerated Application-Specific Integrated Circuits (ASICs) rather than relying solely on general-purpose CPUs. By offloading resource-intensive tasks like Layer 3/4 routing, stateful session tracking to the Network Processor (NP7), cryptographic decryption to the Content Processor (CP9), and deep threat inspection to the Security Processor (SP5), FortiASIC bypasses traditional operating system constraints to deliver wire-speed security throughput with ultra-low latency.
Key Takeaways
- Hardware-Driven Acceleration: Fortinet’s custom silicon strategy shifts packet processing from a sequential CPU software stack to a massively parallel hardware pipeline, eliminating the traditional performance penalty associated with deep security inspection.
- Specialized Chip Roles: The NP7 manages network layer traffic, stateful session verification, and IPsec VPN encapsulation at the physical interface level. The CP9 functions as an ultra-fast cryptographic co-processor, handling TLS/SSL decryption. The SP5 unifies network and content security into a single 5nm die optimized for edge deployments.
- Deterministic Low Latency: By offloading established sessions directly to the ASIC fast path, FortiGate appliances maintain sub-microsecond latency profiles, crucial for high-frequency trading, enterprise data centers, and 5G service provider environments.
- True SSL/TLS Deep Inspection: CP9 hardware offloading allows FortiGates to decrypt, inspect, and re-encrypt TLS 1.3 traffic at scale without crashing firewall throughput or exhausting host memory structures.
Fortinet FortiASIC Architecture Deep Dive: How NP7, CP9, and SP5
Introduction: The Scale Crisis in Modern Enterprise Networking
Modern enterprise network architectures face a massive scaling challenge. As organizations accelerate their digital transformation initiatives, migrate to hybrid multi-cloud topologies, and transition to decentralized Zero Trust architectures, the demands placed on the enterprise perimeter have fundamentally shifted.
Three converging trends have broken traditional network security models:
- The Proliferation of Encrypted Traffic: Over 95% of web traffic is encrypted via TLS 1.2 and TLS 1.3. While essential for privacy, encryption provides a massive blind spot for security operations teams. Adversaries routinely hide malware payloads, command-and-control (C2) communications, and exfiltrated data within encrypted streams.
- The Evolution of the Threat Landscape: Simple stateful packet filtering is entirely inadequate against modern threats. Securing an enterprise network requires line-rate deep packet inspection (DPI), including intrusion prevention system (IPS) analytics, application control, sandboxing, and real-time antivirus processing.
- The Demand for Zero Latency: Modern business applications—ranging from high-frequency financial transactions and real-time industrial IoT controls to ultra-high-definition video streaming and 5G telecommunications infrastructure—require sub-millisecond, or even sub-microsecond, end-to-end latency.
When traditional, software-defined firewalls running on general-purpose commodity hardware try to solve these three challenges simultaneously, they run into a hardware bottleneck. A standard x86 CPU utilizes a Von Neumann architecture, where instructions and data share the same memory bus and are processed sequentially.
When an x86-based firewall forces an encrypted packet stream through software-defined TCP/IP stacks, context-switches the data between kernel space and user space, decrypts the payload via software libraries, evaluates thousands of signature state machines, and re-encrypts the payload, the host CPU quickly spikes to 100% utilization. This results in dropped packets, increased jitter, and a massive drop in overall firewall throughput—a phenomenon network engineers call the “SSL Performance Cliff.”
To maintain high throughput, many network administrators make a dangerous compromise: they disable deep packet inspection and SSL decryption entirely. This leaves the enterprise highly vulnerable.
Fortinet’s alternative approach bypasses the limitations of general-purpose compute hardware by using FortiASIC Architecture. By offloading compute-heavy mathematical and pattern-matching operations from the main CPU to custom, purpose-built Application-Specific Integrated Circuits (ASICs), Fortinet appliances deliver high-performance security without sacrificing network throughput or structural stability.
What Is FortiASIC Architecture? Custom Silicon vs. x86 Commodity CPUs
An Application-Specific Integrated Circuit (ASIC) is an integrated circuit customized for a particular use rather than intended for general-purpose compute tasks. While an x86 or ARM CPU must be able to run everything from spreadsheet software to database engines by executing general-purpose instruction sets, an ASIC contains hardwired logic gates configured specifically to perform a narrow set of math or data manipulation operations with optimal efficiency.
Fortinet’s proprietary custom silicon strategy focuses on spatial computing. Instead of a single CPU executing instructions in a sequential, time-sliced fashion, FortiASIC engines route data through dedicated physical pipelines. Each stage of the pipeline executes a specialized operation (such as a checksum calculation, hashing function, or regex signature match) in parallel at the hardware level, often within a single clock cycle.
This architectural divergence is highly visible when comparing a software-driven firewall to an ASIC-powered FortiGate platform:
Architectural Comparison: Custom Silicon vs. General-Purpose Compute
| Architectural Vector | Traditional x86 Software Firewalls | Fortinet FortiASIC Powered Architecture |
| Compute Core | Intel Xeon / AMD EPYC General Purpose CPUs | Proprietary ASICs (NP7, CP9, SP5 Co-processors) |
| Processing Paradigm | Sequential, instruction-driven execution loops | Parallel, pipeline-driven spatial computing |
| Memory Architecture | Shared L1/L2/L3 Cache and system DRAM | Segregated TCAM, SRAM, and localized high-speed memory arrays |
| Context Switching | High overhead (Interrupt handling between Kernel & User space) | Non-existent on fast-path (Hardware-terminated packets) |
| Latency Profile | Variable (Jitter increases exponentially under high packet load) | Deterministic (Sub-microsecond latency under heavy load) |
| SSL/TLS Processing | Software-defined crypto libraries (OpenSSL) drawing host clock cycles | Dedicated cryptographic pipelines engineered into the silicon |
| Power Consumption | High Watts-per-Gigabit ratios due to inefficient compute loops | Highly optimized performance-per-watt efficiency |
By building its own proprietary silicon, Fortinet isolates the control plane from the data plane. The system CPU manages the operating system (FortiOS), handles administrative tasks (GUI, CLI, API management), process updates, and resolves routing protocol calculations (OSPF, BGP, RIP).
Meanwhile, the actual network traffic—the millions of packets flooding physical interfaces—is intercepted and processed entirely by the FortiASIC data plane engines. This prevents administrative operations from starving packet processing queues and ensures that sudden traffic spikes do not freeze management sessions.
Understanding the Three Core FortiASIC Engines
The FortiASIC portfolio is built upon three distinct custom silicon families, each engineered to accelerate a specific subset of security and network operations:
+---------------------------------------+
| Control Plane |
| (Host CPU / FortiOS) |
+---------------------------------------+
^
| Control & Slow-Path
v
+-----------------------------------------------------------------------------------+
| Data Plane |
| |
| +---------------------------+ +---------------------------+ +---------------+ |
| | Network Processor | | Content Processor | | Security | |
| | (NP7) | | (CP9) | | Processor | |
| +---------------------------+ +---------------------------+ | (SP5) | |
| | - L3/L4 Wire-Speed Forward| | - Asymmetric Crypto (RSA) | | | |
| | - Stateful Session Lookup | | - Symmetric Crypto (AES) | | - Converged | |
| | - NAT & Hardware Routing | | - TLS 1.3 Handshakes | | ASIC Matrix | |
| | - IPsec Tunnel Encaps. | | - Data Compression | | - IPS & AV | |
| +---------------------------+ +---------------------------+ +---------------+ |
+-----------------------------------------------------------------------------------+
NP7 Network Processor: The Layer 3/4 Wire-Speed Engine
The Network Processor (NP7) operates natively at the physical interface layer of the FortiGate firewall. Its primary mission is to provide line-rate forwarding and security enforcement for Layer 3 (Network) and Layer 4 (Transport) traffic.
When a packet arrives on an NP7-mapped physical interface (such as a 100GbE QSFP28 or 400GbE port), it bypasses the system CPU entirely if the session matches an offloaded hardware state.
Core Capabilities of the NP7:
- Wire-Speed IPv4/IPv6 Routing and NAT: The NP7 handles Source NAT (SNAT), Destination NAT (DNAT), and Carrier-Grade NAT (CGNAT) at scale. The hardware rewrites IP addresses and TCP/UDP ports natively within the hardware pipeline, avoiding packet buffer recirculation.
- Stateful Session Lookup & Policy Enforcement: The NP7 contains localized, high-speed Ternary Content-Addressable Memory (TCAM) lookup tables. When a packet arrives, the chip evaluates the session’s 5-tuple (Source IP, Destination IP, Source Port, Destination Port, Protocol) against the hardware-synchronized session table to apply security policies instantly.
- IPsec VPN Encapsulation/Decapsulation: The network processor features integrated cryptographic engines that handle IPsec ESP (Encapsulating Security Payload) framing, sequence number management, and anti-replay window validations at network line speeds.
- Advanced Network Protocol Acceleration: The NP7 natively handles encapsulation and decapsulation for modern overlay network tunnels, including VXLAN, GRE, NVGRE, and Geneve. This makes it an ideal fit for software-defined data centers (SDDC) and service provider edge environments.
The Internal NP7 Processing Pipeline
To understand how the NP7 achieves its massive throughput, we must trace a packet through its internal architectural pipeline:
- Ingress Interface & Parsing Engine: The packet enters the physical MAC layer. The NP7 parsing engine isolates the Layer 2 Ethernet frame header, validates the CRC checksum, and extracts Layer 3 and Layer 4 metadata (IP, TCP/UDP flags, VLAN tags).
- Session Lookup Matrix: The extracted 5-tuple is cross-referenced with the NP7’s hardware session table. If a match is found (indicating an existing, approved session), the packet advances directly to the NAT/Policy Engine. If no match exists, the packet is flagged as a “First Packet” and routed over the control plane to the FortiOS kernel for initial policy validation.
- Policy and Access Control Mapping: The hardware validates that the packet matches the defined access control matrices, verifying firewall policies, interface bindings, and zone configurations.
- NAT Engine: If the matching policy dictates Network Address Translation, the packet enters the NAT processing block. The IP header addresses and TCP/UDP checksums are rewritten in a single clock cycle.
- IPsec/Tunnel Acceleration Block: If the packet is destined for an IPsec tunnel, the NP7 fetches the corresponding Security Association (SA) keys, encrypts or decrypts the payload, appends the necessary outer headers, and increments the SPI counters.
- Traffic Shaping & Quality of Service (QoS): The packet passes through hardware-enforced token bucket filters and priority queues, ensuring bandwidth guarantees and rate limits are applied without consuming CPU memory buffers.
- Egress Interface Mapping: The packet is sent directly to the destination physical port’s MAC transmit buffer and placed onto the wire.
Real-World Execution Example: Imagine an enterprise data center processing a 40Gbps stream of database replication traffic (large MSS TCP packets) between two internal secure segments. Once the initial TCP handshake packet is checked by FortiOS and marked as “Allowed,” all subsequent packets in that stream bypass the host CPU entirely. The NP7 processes the headers, checks the session matrix, confirms no policy violations exist, and routes the frames at 0.02 microseconds of latency. The main CPU remains at 0% utilization for this high-throughput transfer.
CP9 Content Processor: The Cryptographic Workhorse
While the NP7 handles the network layer, the Content Processor (CP9) acts as a dedicated co-processor for data processing tasks. The CP9 is completely decoupled from the physical network interfaces; instead, it operates as a specialized compute asset connected to the system architecture via ultra-high-speed PCIe lanes or internal high-bandwidth system buses.
Core Capabilities of the CP9:
- Asymmetric Cryptographic Acceleration: Handles RSA, Diffie-Hellman (DH), and Elliptic Curve Cryptography (ECDSA/ECDH) key exchanges, which are highly compute-intensive stages of the TLS/SSL handshake.
- Symmetric Cryptographic Engine: Accelerates bulk stream encryption and decryption using high-performance ciphers like AES-GCM, AES-CBC, and ChaCha20-Poly1305 at extreme speeds.
- Hardware-Driven Compression: Provides high-throughput GZIP and Deflate data compression/decompression, helping optimize WAN networks and reduce resource usage during file analysis.
How CP9 Accelerates SSL/TLS Deep Inspection
To analyze an encrypted session, a firewall must act as a transparent proxy. This process requires significant computational power, which the CP9 offloads efficiently:
[External Client] <--- (TLS Handshake 1) ---> [ FortiGate / CP9 ] <--- (TLS Handshake 2) ---> [Target Server]
|
[Hardware Decryption]
|
[Cleartext Payload]
|
V -> (To SP5/IPS Engine)
- Handshake Interception: When a client establishes a TLS session through the FortiGate, the connection is intercepted. The CP9 handles the mathematical computations required to negotiate the cryptographic handshake with the client, generating a session ephemeral key using ECDHE.
- Upstream Session Negotiation: Simultaneously, the FortiGate opens a secondary, independent TLS session to the intended destination server. The CP9 computes the server-side handshake, validates the upstream digital certificate chain, and establishes a separate set of symmetric encryption keys.
- Inline Stream Decryption: As the client transmits data, the CP9 pulls the encrypted ciphertext blocks directly from the data streams and applies the symmetric session key. It returns the raw, unencrypted plaintext payload to system memory buffers in real time.
- Security Handoff: The plaintext data stream is analyzed by the security engines (such as IPS or Antivirus). If no threats are found, the data is handed back to the CP9.
- Inline Stream Re-encryption: The CP9 takes the inspected plaintext payload, encrypts it using the upstream server session key, and packages it back into a standard TLS frame to be sent out across the network.
By handling the resource-intensive mathematical calculations required for these cryptographic operations in custom silicon, the CP9 prevents the firewall’s main CPU from becoming overwhelmed during heavy TLS 1.3 deep inspection tasks.
SP5 Security Processor: The Consolidated 5nm Frontier
Introduced as Fortinet’s latest custom silicon innovation, the Security Processor (SP5) represents a shift toward advanced consolidation. Manufactured on a highly efficient 5nm semiconductor process, the SP5 combines network forwarding capabilities and deep content security processing onto a single, specialized system-on-a-chip architecture.
Core Architectural Capabilities of the SP5:
- Pattern Matching State Machines: Traditional CPUs struggle with pattern matching because they must sequentially scan byte arrays against large signature databases. The SP5 features hardwired deterministic finite automaton (DFA) and non-deterministic finite automaton (NFA) pattern matching engines. These engines scan packet payloads for thousands of known vulnerability and malware signatures simultaneously in a single pass.
- Hardware-Assisted Protocol Decoding: The SP5 contains custom logic to decode common application layer protocols (such as HTTP/2, SMTP, FTP, SMB, and DNS). This allows the chip to unpack complex, layered application structures and present isolated payloads directly to threat analysis engines without relying on slow software parsing routines.
- Integrated Sandboxing & Machine Learning Inference Hooks: The SP5 includes specialized execution blocks designed to accelerate on-box machine learning heuristics, allowing the firewall to block zero-day threats and anomalous traffic patterns at the network edge with minimal latency.
How the SP5 Detects and Processes Threat Vectors:
- Intrusion Prevention System (IPS): The SP5 matches packet payloads against live vulnerability exploit signatures. Because the pattern matching engine is baked directly into the silicon, expanding the active IPS signature database has minimal impact on firewall throughput.
- Antivirus & File Demuxing: When a file (such as a PDF, PE executable, or archive) travels across the network, the SP5 intercepts the TCP stream, reassembles the fragments in cache memory, computes its cryptographic hash (MD5/SHA-256) at hardware speeds, and checks it against a localized blocklist while analyzing its composition for malicious byte sequences.
- URL Filtering and DNS Security: The SP5 intercepts outbound port 53 DNS queries and HTTP Host/SNI strings, validating safety status via a high-speed internal lookup cache linked to FortiGuard services. This system blocks access to malicious domains and command-and-control infrastructure without introducing noticeable network delays.
Complete FortiGate Packet Flow Explained
To understand the practical value of FortiASIC architecture, let’s track the complete end-to-end lifecycle of a packet as it traverses a FortiGate Next-Generation Firewall configured with full Unified Threat Management (UTM) security profiles.
[ Packet Inbound on Interface ]
|
v
+-----------------------+
| NP7 Ingress Parsing |
+-----------------------+
|
Is Session Offloaded?
/ \
(Yes) (No)
/ \
v v
+-------------------+ +----------------------------+
| NP7 Fast Path | | Host CPU / FortiOS |
| - Validates NAT | | - Slow Path First Packet |
| - Policies OK | | - Creates Session Table |
+-------------------+ +----------------------------+
| |
+----------+------------+
|
Is UTM / Inspection Required?
/ \
(Yes) (No)
/ \
v v
+-------------------+ +----------------------------+
| CP9 Engine | | NP7 Immediately Forward |
| - Decrypts TLS | | to Egress Port |
+-------------------+ +----------------------------+
|
v
+-------------------+
| SP5 Engine |
| - Scan Signatures |
| - IPS / AV Check |
+-------------------+
|
v
+-------------------+
| CP9 Engine |
| - Re-encrypt TLS |
+-------------------+
|
v
+-------------------+
| NP7 Egress Line |
+-------------------+
Step 1: Physical Ingress & Hardware Parsing
A packet arrives via a fiber-optic interface into the FortiGate network layer. The NP7 Network Processor immediately intercepts the frame, parsing the layer 2, 3, and 4 headers to isolate the source, destination, ports, and protocol tags.
Step 2: The Fast-Path Session Decision
The NP7 queries its localized internal hardware session lookup table:
- The Fast Path (Offloaded): If the 5-tuple matches an established session that has already been approved by FortiOS, the NP7 executes necessary NAT translations and safety checks, bypassing the host CPU entirely.
- The Slow Path (Kernel Evaluation): If the packet represents a new connection (
TCP SYN), it is routed to the FortiOS kernel. The system CPU evaluates the firewall policy engine, checks routing tables, validates user authentication states, and performs security zone assessments. If the session is approved, the host CPU writes a new entry into the NP7’s hardware session table, ensuring all subsequent packets in this stream use the fast path.
Step 3: Redirection for Cryptographic Processing
If the firewall rules specify that the session requires Deep SSL Inspection, the NP7 does not forward the packet to the outbound port. Instead, it places the packet payload into a shared memory array and triggers the CP9 Content Processor. The CP9 steps in to decrypt the TLS-encapsulated payload, converting the ciphertext into cleartext within the hardware block.
Step 4: Deep Content Inspection (The Threat Engine)
The cleartext data stream is then analyzed by the SP5 Security Processor (or the host IPS engine via NTurbo architecture). The SP5 scans the unencrypted payload for malicious signatures, protocol anomalies, and malware indicators in a single processing pass.
Step 5: Session Re-encryption
Once the SP5 completes its inspection and confirms the payload is clean, it passes the cleartext data back to the CP9 Content Processor. The CP9 re-encrypts the inspected cleartext data using the appropriate outbound TLS keys, maintaining end-to-end encryption integrity for the connection.
Step 6: Hardware Egress Forwarding
The re-encrypted packet is returned to the NP7 Network Processor. The NP7 appends the correct destination MAC address, recalibrates the IP header checksums, and transmits the frame out of the destination physical interface at wire speed.
How SSL/TLS Deep Inspection Works Inside FortiASIC
To successfully balance deep threat inspection with high network performance, FortiASIC architecture uses a specialized TLS Proxy Architecture. This split-session approach allows the firewall to intercept and analyze encrypted traffic efficiently:
[Client] <=== Encrypted Session 1 ===> [ FortiGate / CP9 TLS Proxy ] <=== Encrypted Session 2 ===> [Server]
||
[ Plaintext Cache Buffer ]
||
(SP5 Inline Scan Engine)
The Split-Session State Machine
- Client-Side TLS Session Creation: The client sends a
ClientHellomessage targeting an external secure server. The FortiGate intercepts this frame. The CP9 processes the request and responds with aServerHello, substituting the original server’s certificate with a dynamically generated certificate signed by the enterprise’s internal Certificate Authority (CA) trusted root. - Server-Side TLS Session Creation: Concurrently, the FortiGate’s proxy architecture sends its own
ClientHellomessage to the actual upstream server. The upstream server responds with its legitimate public certificate, which the CP9 validates against trusted global root storage databases. - Symmetric Key Isolation: Two distinct sets of symmetric encryption keys are established: Key Set A between the client and the FortiGate, and Key Set B between the FortiGate and the target server.
Hardware-Driven Decryption and Inspection Flow
- As the client transmits application payload data, it arrives encrypted with Key Set A.
- The CP9 Content Processor processes the incoming ciphertext blocks, applying Key Set A to extract the underlying cleartext application streams into a secure, isolated cache memory space.
- The SP5 Security Processor runs its signature matching routines directly against this plaintext cache buffer, checking for exploits, malicious commands, or unauthorized data transfers.
Secure Re-encryption Pipeline
- Assuming no threats are detected, the CP9 Content Processor processes the plaintext data from the cache buffer, encrypting it using Key Set B.
- This architecture ensures that unencrypted data is never sent over physical network cables; it exists only within protected, localized memory spaces inside the FortiGate’s processing core.
How IPsec VPN Acceleration Works
Organizations often rely on IPsec Virtual Private Networks (VPNs) to connect distributed branch offices, remote data centers, and cloud environments. However, processing heavy IPsec workflows in software can quickly overwhelm standard firewall CPUs. FortiASIC architecture addresses this by dividing the workload between its specialized processing engines:
+-------------------------------------------------------------------------+
| FortiGate IPsec Architecture |
| |
| +--------------------+ |
| | Host CPU | <-- IKE Negotiation & Key Management |
| +--------------------+ (Phase 1 & Phase 2 Daemons) |
| | |
| | Pushes Cryptographic Keys & SA Policies |
| v |
| +--------------------+ |
| | NP7 ASIC | <-- Bulk Packet Processing |
| +--------------------+ - Encapsulates ESP Frames |
| - Validates Crypto Checksums |
| - Performs Line-Rate Forwarding |
+-------------------------------------------------------------------------+
Control Plane Management (Host CPU)
The main CPU handles the initial, administrative phases of the VPN connection. The IKED daemon manages the Phase 1 and Phase 2 negotiations, authenticates the remote peer using pre-shared keys or digital certificates, and agrees upon the encryption algorithms (such as AES-GCM 256).
Once negotiated, these parameters form a Security Association (SA). The host CPU writes these active SA cryptographic keys directly into the memory tables of the NP7 Network Processor.
Data Plane Acceleration (NP7 & CP9 Engines)
Once the SA keys are loaded into hardware, all subsequent VPN data traffic moves entirely to the ASIC fast path:
Outbound Packet Processing:
- Cleartext corporate network packets arrive at an NP7-managed internal interface.
- The NP7 identifies that the destination IP matches an active IPsec tunnel route.
- The NP7 fetches the corresponding encryption keys from its hardware table, encrypts the payload, appends the outer ESP header, increments the anti-replay sequence counters, and routes the encrypted packet out through the WAN interface.
Inbound Packet Processing:
- Incoming encrypted ESP packets arrive from the WAN interface.
- The NP7 matches the Security Parameter Index (SPI) value in the header against its hardware database.
- The network processor decrypts the payload, verifies the packet integrity, validates that the sequence number falls within the approved anti-replay window, strips the outer ESP header, and forwards the unencrypted inner packet to its destination local network segment.
This hardware acceleration allows the firewall to maintain high throughput and low latency even under heavy, multi-gigabit IPsec workloads.
NTurbo Acceleration Explained: Bridging the ASIC and User-Space Gap
When an enterprise enables advanced security features like proxy-based inspection, traffic cannot always stay entirely within the ASIC fast path. To prevent this traffic from falling back to slow software-defined processing loops, Fortinet uses NTurbo Acceleration.
+-------------------------------------------------------------------------+
| NTurbo Architecture |
| |
| +-----------------------------------------------------------------+ |
| | User-Space Daemons | |
| | (IPS Engine / IPSHelper) | |
| +-----------------------------------------------------------------+ |
| ^ | |
| Direct Ring | Direct Ring |
| Buffer Push | Buffer Pull |
| | v |
| +-----------------------------------------------------------------+ |
| | NP7 Network Processor | |
| +-----------------------------------------------------------------+ |
+-------------------------------------------------------------------------+
Historically, when a network processor identified that a packet needed deep content inspection, it passed that packet to the operating system’s kernel network stack. The kernel would then handle interrupt requests, perform context switches, and move the packet data up to user-space security daemons (such as the IPS engine). Once inspected, the packet would travel back down through the kernel stack to the network interface. This process introduced significant CPU overhead and network jitter.
NTurbo modifies this path by establishing a direct communication pipeline between the NP7 Network Processor and the user-space security engines:
The Mechanized Operation of NTurbo
- When a session is initialized, FortiOS flags it as requiring advanced IPS or Application Control processing.
- Instead of routing the traffic through the standard kernel network stack, the NP7 uses dedicated ring buffers to pass packet payloads directly to the user-space security engine (
ipshelper). - The IPS engine processes the payload, utilizing the CP9 or SP5 for pattern scanning and decryption tasks.
- Once approved, the data is pushed back to the NP7 via the NTurbo channel, allowing the network processor to handle final framing and egress forwarding.
By bypassing the kernel network stack, NTurbo reduces CPU context-switching overhead, allowing the firewall to maintain high throughput even during deep content inspection.
Hardware Offloading Operational Modes
| Operational Mode | Processing Architecture | Latency Impact | CPU Overhead | Use Cases |
| Full Hardware Offload (Fast Path) | Processed entirely within the NP7 silicon. The host CPU never sees the traffic. | Lowest (Sub-microsecond) | Zero CPU Consumption | Standard L3/L4 routing, basic NAT, stateful ACLs, and IPsec VPN data streams. |
| Partial Offload (NTurbo Assisted) | Managed by the NP7, but payloads are passed directly to user-space security daemons. | Very Low (Deterministic Microseconds) | Minimal CPU Engagement | Intrusion Prevention (IPS), Application Control, and Flow-based Antivirus inspection. |
| CPU Fallback (Slow Path) | Handled entirely within the FortiOS Kernel. Packets are queued sequentially via CPU cores. | Highest (Variable Milliseconds) | Maximum CPU Consumption | Complex Protocol Helpers (SIP/H.323), Explicit Proxy configurations, and detailed DLP matching. |
SOC5 Architecture: System-on-a-Chip for the Modern Edge
In mid-range and entry-level security appliances, deployment challenges often involve space, power constraints, and thermal management rather than processing multi-terabit data streams. For these edge environments, Fortinet developed the SOC5 (System-on-a-Chip 5) architecture.
Instead of placing separate NP, CP, and system CPU chips onto a large motherboard, the SOC5 consolidates these components into a single silicon die. It combines:
- A multi-core, general-purpose RISC system CPU core array.
- A built-in Network Processor pipeline (derived from NP7 architecture).
- A built-in Content Processor engine (derived from CP9 architecture).
Key Advantages of the SOC5 Single-Die Layout:
- Lower Power Consumption: Eliminating external high-speed bus lines between separate chips significantly reduces energy consumption, making these appliances ideal for branch offices and remote locations.
- Compact Hardware Footprint: The consolidated design allows for smaller, fanless desktop form factors, which are well-suited for quiet office environments or space-constrained retail locations.
- Cost-Effective Performance: By simplifying the manufacturing process, the SOC5 architecture brings enterprise-grade hardware acceleration to entry-level appliances, enabling robust security scaling at a lower price point.
Why FortiASIC Outperforms Traditional General-Purpose Firewalls
To understand why FortiASIC custom silicon provides a significant advantage over software-based firewalls, we must look at how different vendor architectures handle identical security workloads.
- Palo Alto Networks Architecture: Palo Alto uses a “Single-Pass Parallel Processing” (SP3) model. This architecture is primarily software-driven, running on multi-core general-purpose Intel CPUs supplemented by off-the-shelf field-programmable gate arrays (FPGAs) for basic networking tasks. Because it relies heavily on general-purpose compute cores, intensive tasks like TLS 1.3 decryption and deep threat scanning must compete for CPU cycles. Under heavy traffic loads, this can lead to increased processing latency and reduced overall throughput.
- Cisco Secure Firewall Architecture: Cisco’s platform architecture relies primarily on general-purpose x86 CPU complexes running the Snort engine inside a virtualized software architecture. Because packet processing depends almost entirely on software-defined compute loops, throughput can drop significantly when enabling advanced security features like IPS or deep packet inspection.
- Check Point Architecture: Check Point uses a software-focused architecture designed to run on standard x86 open-server hardware or proprietary appliances built around generic computing platforms. Without specialized hardware acceleration chips, the system must process deep packet scanning and cryptographic workflows across shared CPU threads, which can limit performance efficiency under heavy loads.
Performance Vector Comparison
| Performance Vector | Fortinet FortiASIC Systems | Palo Alto SP3 (x86/FPGA) | Cisco Secure (x86/Software) | Check Point (x86/Software) |
| Firewall Throughput per Dollar | Exceptional (Custom silicon reduces hardware costs) | Premium Pricing Profile | Moderate Value Ratio | Moderate Value Ratio |
| Latency Under Full UTM Load | Low (Microseconds) | High (Milliseconds) | High (Milliseconds) | High (Milliseconds) |
| SSL Inspection Impact | Minimal (Decoupled to CP9) | Significant Throughput Drop | Significant Throughput Drop | Significant Throughput Drop |
| Energy Efficiency (Watts/Gbps) | Very Efficient | High Power Draw | High Power Draw | High Power Draw |
Real-World Benefits of FortiASIC Across Enterprise Topologies
Enterprise Data Centers
In large data centers, firewalls must process massive volumes of east-west traffic without introducing network delays. FortiASIC architecture allows operators to enforce strict segmentation policies at wire speed, ensuring that high-throughput workloads like database replication and backup routines move efficiently without overloading the firewall’s processing cores.
Service Providers and 5G Infrastructures
Service providers handle millions of concurrent user sessions across carrier-grade configurations. The NP7 Network Processor features dedicated hardware acceleration for Carrier-Grade NAT (CGNAT) and specialized tunneling protocols like GTP-U. This enables mobile edge compute hubs to secure high-density 5G data traffic while maintaining ultra-low latency.
Decentralized Campus and Branch Topologies
Using System-on-a-Chip models like the SOC5, distributed organizations can deploy next-generation firewall capabilities across all remote locations. This allows branches to process secure, direct-to-internet SD-WAN traffic locally, bypassing the need to backhaul data to a central corporate data center.
Zero Trust Network Access (ZTNA) Architectures
A successful Zero Trust model requires continuous validation of every user and device session. This ongoing authentication, along with the associated encrypted handshakes and continuous policy evaluations, places a heavy demand on security infrastructure. FortiASIC engines handle these intensive validation processes in hardware, allowing organizations to implement granular access controls without degrading user experience or network performance.
Performance Optimization Best Practices for Network Architects
To maximize the value of FortiASIC hardware acceleration, network engineers should align their system configurations with the underlying silicon architecture.
1. Ensure Global Hardware Offloading is Enabled
Verify that the firewall is actively offloading traffic to the ASIC engines. You can confirm this via the FortiOS CLI:
Bash
config system npu
set fastpath enable
set ge-asym-allow enable
end
To verify that specific firewall policies are actively offloading sessions to the hardware fast path, use the following diagnostic command:
Bash
diagnose sys session list
Look for the proto_state and flags like act=NP7 or no_ofld in the output. If a session shows no_ofld, it indicates the traffic has fallen back to the slow path CPU, which warrants further inspection of the policy configuration.
2. Tune SSL Inspection Profiles Dynamically
Avoid using deep decryption on trusted, low-risk traffic categories that require significant processing power, such as financial applications, healthcare systems, and official operating system update repositories. Instead, configure custom bypass rules within your SSL inspection profiles:
[ Incoming Request ] ---> { Category Match: "Financial Services"? }
/ \
(Yes) (No)
/ \
v v
[ Bypass Decryption ] [ Route to CP9 for Deep Scan ]
3. Align Interface Allocations with Bus Layouts
In high-density modular firewalls, physical interfaces are wired to specific ASIC units via internal buses. For optimal performance, avoid routing high-throughput traffic in and out of ports connected to the same internal ASIC pipeline. Spreading high-capacity connections across distinct interface blocks balances the data load evenly across multiple network processors.
Common Misconceptions About FortiASIC Architecture (Myths vs. Facts)
- Myth 1: FortiASICs only accelerate basic firewall throughput; enabling UTM features causes performance to drop like any other platform.
- Fact: FortiGates handle advanced security features using dedicated chips. While the NP7 manages basic firewall routing, tasks like deep content scanning and cryptographic processing are offloaded to the CP9 and SP5 engines, minimizing the performance impact on core network operations.
- Myth 2: Software-defined firewalls are inherently more secure because ASICs use fixed, unalterable hardware code.
- Fact: FortiASICs are built with programmable microcode architectures. When new security vulnerabilities appear or encryption standards evolve, Fortinet can update the firmware instructions on the chips via standard FortiOS updates, maintaining hardware-level speed without requiring new equipment.
- Myth 3: FortiGate hardware acceleration only works with IPv4 traffic.
- Fact: The NP7 and SP5 architectures offer full hardware parity for both IPv4 and IPv6 environments, accelerating routing, NAT options, and deep threat inspection uniformly across both protocol versions.
- Myth 4: If a single packet drops to the CPU slow path, the entire firewall’s performance degrades.
- Fact: FortiOS separates the control plane from the data plane. If an unusual or complex packet requires CPU intervention, it is isolated and processed independently, leaving the offloaded ASIC channels free to handle standard network traffic without interruption.
- Myth 5: Custom silicon chips consume more power than standard off-the-shelf components.
- Fact: Because ASICs are engineered specifically for security and networking workflows, they perform these tasks far more efficiently than general-purpose CPUs, resulting in lower power consumption and reduced thermal output per gigabit of throughput.
- Myth 6: Turning on SSL inspection always disables hardware acceleration for subsequent security rules.
- Fact: The CP9 handles decryption and then passes the unencrypted data directly to the SP5 or the IPS engine via efficient NTurbo channels, keeping the traffic within an accelerated path.
- Myth 7: FortiASIC chips cannot inspect custom or non-standard application ports.
- Fact: The underlying hardware acceleration focuses on the protocol structure and packet payload rather than the physical port number. You can map custom ports to standard protocol decoders within FortiOS to maintain full hardware scaling.
- Myth 8: Virtual FortiGate instances running in cloud environments perform identically to hardware appliances because they use the same FortiOS operating system.
- Fact: Cloud environments run on standard x86 hypervisors and lack Fortinet’s custom physical silicon. While virtual instances offer robust security features, they cannot match the high throughput and low latency profiles of hardware appliances equipped with physical NP7 and CP9 chips.
- Myth 9: Hardware acceleration prevents the firewall from generating detailed traffic logs.
- Fact: The NP7 and SP5 chips feature dedicated logging processors that compile session metrics and telemetry data at the hardware level, passing structured log files to management systems without impacting packet processing performance.
- Myth 10: FortiASIC technology is only useful for large enterprise environments with high-bandwidth connections.
- Fact: Small-to-medium businesses and remote offices benefit significantly from hardware acceleration. Using consolidated designs like the SOC5, smaller appliances can run full security and threat inspection profiles locally without causing network slowdowns or connectivity issues.
The Future of Fortinet Custom Silicon: Post-Quantum Crypto and AI Scaling
As network security demands continue to evolve, Fortinet’s custom silicon development focuses on addressing next-generation computing challenges:
1. Hardware Acceleration for Post-Quantum Cryptography (PQC)
The transition to quantum-resistant encryption algorithms (such as Kyber and Dilithium) will require significantly more computational power than traditional RSA or ECC standards. Future iterations of Fortinet’s content processors are being engineered with specialized mathematical processing pipelines designed to handle these complex post-quantum algorithms at network line speeds.
2. On-Chip Artificial Intelligence and Machine Learning Engines
As threat detection models shift from reactive, signature-based matching to real-time behavioral analysis, future security processors will integrate dedicated AI/ML inference blocks directly into the silicon. This design will allow firewalls to run advanced heuristic analysis and detect zero-day threats natively within the hardware pipeline, eliminating the delay of sending files to external cloud systems for verification.
3. Adapting to Terabit Networks
With data centers and service providers moving toward 400Gbps and 800Gbps infrastructure, future network processors will utilize advanced multi-die architectures and high-bandwidth memory (HBM) modules. This will allow the chips to manage massive, high-density session tables and enforce security policies across multi-terabit networks with consistent, low-latency performance.
Conclusion and Actionable Architectural Blueprint
Fortinet’s FortiASIC architecture provides a highly efficient solution to the performance challenges facing modern enterprise networks. By offloading intensive routing, decryption, and threat scanning tasks from general-purpose CPUs to specialized, purpose-built silicon, FortiGate appliances allow organizations to implement comprehensive security controls without sacrificing network throughput or reliability.
[ Strategic Implementation Path ]
|
+------------------------------+------------------------------+
| |
v v
[ Core Data Center Hubs ] [ Branch Edge Locations ]
Deploy dedicated multi-chip platforms Utilize integrated SOC5 devices
(NP7 & CP9) to handle high-volume, to maintain local threat scanning
low-latency data streams. and direct cloud connectivity.
When designing and deploying your network security infrastructure, consider this architectural blueprint:
- Match Hardware to Workload Requirements: Deploy high-capacity appliances equipped with independent NP7 and CP9 chips at your core data hubs to handle demanding enterprise data streams. For remote offices and edge locations, look to integrated options like the SOC5 to maintain consistent threat scanning and secure connectivity across the entire organization.
- Implement Deep Inspection Intelligently: Avoid compromising on security by turning off critical visibility features. Instead, leverage your hardware’s cryptographic engines to run full SSL/TLS deep inspection while implementing smart bypass rules for trusted, low-risk corporate data sources.
- Keep Core Systems Clean: Maximize hardware offloading efficiency by keeping highly complex, un-accelerated software processes off your main traffic channels, ensuring your security environment remains responsive and stable even under heavy operational loads.
Technical FAQ Section
What is FortiASIC?
FortiASIC refers to Fortinet’s proprietary family of custom-designed Application-Specific Integrated Circuits. These specialized chips are engineered to process intensive networking and security tasks at the hardware level, significantly outperforming traditional software-based firewalls that rely entirely on general-purpose CPUs.
How does the NP7 processor work?
The NP7 (Network Processor 7) operates directly at the network interface layer. It handles Layer 3 and Layer 4 tasks like packet routing, Network Address Translation (NAT), and IPsec VPN traffic forwarding in hardware, allowing approved data streams to pass through the firewall at wire speed without involving the main CPU.
What does the CP9 processor do?
The CP9 (Content Processor 9) functions as a dedicated cryptographic co-processor. It handles resource-intensive encryption and decryption tasks, managing TLS/SSL handshakes and bulk data security processing to prevent these mathematical workflows from slowing down primary firewall operations.
What is the SP5 security processor?
The SP5 (Security Processor 5) is Fortinet’s latest 5nm custom chip. It combines network forwarding capabilities with advanced threat inspection components on a single silicon die, accelerating functions like Intrusion Prevention (IPS), application identification, and anti-malware scanning for edge network deployments.
How does SSL inspection work inside FortiASIC hardware?
When configured for deep inspection, the firewall acts as a secure intermediary proxy. The CP9 chip handles the intensive processing needed to decrypt incoming TLS data streams, presenting the cleartext data to the SP5 engine for threat scanning before re-encrypting the traffic and sending it safely to its destination.
Why are ASICs faster than standard CPUs for network security?
General-purpose CPUs execute instructions sequentially across shared system memory lines. ASICs utilize specialized, hardwired hardware pipelines to process data in parallel, executing key tasks like network routing and cryptographic calculations in a single clock cycle with minimal latency.
What is NTurbo?
NTurbo is an acceleration technology within FortiOS that creates a direct communication path between the network processor (NP) and user-space security engines. This allows the system to bypass the traditional operating system kernel network stack, reducing processing overhead and keeping latency low during deep content scanning.
What is hardware offloading?
Hardware offloading is the process of shifting resource-heavy data workflows from the firewall’s general-purpose host CPU to specialized co-processors like the NP7 and CP9, freeing up main system assets to focus on core administrative and management tasks.
How does Fortinet accelerate IPsec VPN connections?
Once the main CPU establishes an encryption tunnel’s security parameters, the active cryptographic keys are written directly to the NP7 network processor. The NP7 then handles bulk packet encryption, decryption, and integrity verification at hardware speeds as data travels through the tunnel.
What is SOC5?
The SOC5 (System-on-a-Chip 5) is an integrated design that combines a multi-core system CPU, a network processor, and a content processor onto a single silicon die. This design reduces power consumption and hardware size, making it ideal for compact branch office and edge security appliances.
How many concurrent sessions can an NP7 processor support?
Depending on the specific FortiGate appliance model and memory layout, a single NP7 network processor can track and manage tens of millions of concurrent connections within its high-speed hardware session tables.
Does running full SSL inspection always degrade firewall performance?
On traditional firewalls that rely solely on software processing, deep inspection can cause performance to drop significantly. FortiGate appliances use specialized CP9 chips to offload these cryptographic workloads, allowing organizations to maintain high network throughput even under heavy inspection demands.
How does the SP5 detect advanced network threats?
The SP5 features dedicated pattern matching components that scan data streams for thousands of known vulnerability and exploit signatures simultaneously. This design allows the chip to analyze complex file structures and protocol variants in a single processing pass without adding noticeable delays.
Why does Fortinet develop its own custom silicon?
Developing custom silicon allows Fortinet to optimize its hardware specifically for complex security and network workflows. This strategy enables their appliances to deliver higher throughput, lower processing latency, and better energy efficiency compared to firewalls built with standard, off-the-shelf processing components.
Is FortiASIC hardware more effective than a software-only firewall?
For high-capacity enterprise networks, data center boundaries, and traffic hubs with heavy encryption needs, hardware-accelerated appliances offer significantly higher throughput and more consistent low-latency performance than software-defined alternatives running on general-purpose computer platforms
