AltiCoreHDL is a hardware IP core that maps models trained within the AltiCore mathematical framework directly into FPGA and ASIC logic. It is delivered as a synchronous, fixed-depth pipelined circuit with a minimal-control interface designed for maximum throughput and minimal fabric overhead.
AltiCoreHDL is not a conventional neural network and does not execute tensor-centric floating-point workloads. Inference is realized primarily through bitwise logic operations, shifts, and a bounded amount of integer DSP, compiled into a deterministic pipeline.
A fully pipelined configuration supports:
AltiCoreHDL is delivered as a synthesizable RTL core with defined input and output pins. The product boundary is the core interface.
AltiCoreHDL makes no architectural assumptions about the external system beyond standard synchronous digital integration requirements (clock, reset, and pin-level timing). System-level concerns — interconnect arbitration, DMA behavior, host software, clock-domain crossings, buffering strategy, and end-to-end application latency — are out of scope for the core and are the responsibility of the integrator.
The core is intentionally designed to be simple to wrap (e.g., external FIFO buffering, framing, or adapters to standard streaming protocols). Reference wrappers will be developed on a demand-priority basis.
AltiCoreHDL uses a minimal-control interface:
This interface is backpressure-free (no ready signal). The pipeline advances every clock; if downstream logic cannot accept results at line rate, buffering/flow control must be implemented externally.
The en token propagates through the pipeline alongside the computation. For a pipeline depth of L cycles:
This latency is data-independent at the core boundary: there are no data-dependent early exits and no variable-iteration loops.
The core is strictly feed-forward for inference: internal state consists only of pipeline storage and static parameters. There are no feedback paths or state updates that couple one sample to another. Bubble cycles (en=0) do not affect subsequent accepted samples (en=1) beyond inserting bubbles in the token stream.
The inference pipeline does not stall internally. The only stall condition is reset asserted. When reset is asserted, internal state and token pipeline state are cleared.
When valid=0, output pins are not meaningful for inference results and may change. Downstream logic must qualify sampling using valid.
Per instantiated core: peak acceptance rate up to 1 sample/cycle (sustained en=1), peak result rate up to 1 result/cycle (steady state, after pipeline fill), and latency of exactly L cycles from acceptance (en=1) to corresponding valid=1 result.
AltiCoreHDL inference is implemented as a fixed-depth pipeline consisting of:
The pipeline advances every clock, and output validity is carried by the token pipeline (en → valid).
Each instantiated AltiCoreHDL core uses dedicated resources for that core. There are no shared compute resources across samples and no internal arbitration that could introduce stalls or variable service time.
For FPGA implementations: the core uses on-chip resources only (logic, registers, DSP blocks, and/or on-chip memory for deterministic pipeline storage). The core does not require external DRAM or off-FPGA resources for inference operation.
For ASIC implementations: equivalent storage and arithmetic primitives are realized using on-chip resources (standard-cell and/or embedded memory macros as appropriate to the implementation).
AltiCoreHDL minimizes control overhead by using a token pipeline rather than backpressure networks or dynamic scheduling. This reduces: internal control logic, stall/bubble management complexity, and fabric used for handshake infrastructure.
The en/valid interface was selected to: maximize achievable throughput at the core boundary, minimize interface and internal control fabric, and allow integrators to add only the wrappers they need (FIFO buffering, framing, protocol adapters).
AltiCoreHDL supports configurable integer word sizes to match integration and PPA goals:
AltiCoreHDL uses minimal arithmetic relative to MAC-dominant tensor approaches typical of neural inference accelerators. It does not claim "no arithmetic"; instead, arithmetic is bounded and used where needed within an otherwise logic-centric compute profile.
AltiCoreHDL has been implemented and validated on shipping Xilinx FPGA platforms. Under sustained en=1 (steady state), observed throughput matches the token-pipeline contract: one valid result per clock per instantiated core.
| Platform | Device | Cores Instantiated | Clock (MHz) | Observed Throughput |
|---|---|---|---|---|
| Digilent Arty Z7 (battery-powered) | Zynq-7000 Zynq-7020 / "-20" | 3 | 100.0 | 300 M inferences/sec |
| Xilinx ZCU104 | Zynq UltraScale+ XCZU7EV MPSoC | 17 | 187.5 | 3.1875 B inferences/sec |
Throughput was validated at the core boundary using a logic analyzer observing the valid output token under continuous operation (en=1, steady state). The observed valid behavior confirmed one asserted valid per clock per core, consistent with the fixed pipeline contract.
These implementations are observable and can be independently validated by instrumenting valid and confirming the en → valid delay and sustained token rate at the selected clock frequency.
When multiple identical cores are instantiated and clocked together, aggregate peak throughput scales approximately linearly with:
subject to device capacity, timing closure, and integrator routing/floorplanning constraints.
AltiCoreHDL integrates into a development pipeline that supports:
Algorithmic specifics and training methodology can be provided under NDA as part of technical diligence.
AltiCoreHDL is engineered for deployments prioritizing deterministic core timing and efficient hardware realization, including:
Where predictable core timing and repeatable behavior support system assurance activities.
Where dedicated, low-overhead inference cores can be integrated into cost-sensitive silicon.
Low-overhead inference used for continuous detection and wake-on-event patterns.
Where power/area constraints favor fixed pipelines and logic-centric computation.
AltiCoreHDL is a deterministic inference IP core implemented as a fixed-depth synchronous pipeline with a backpressure-free en/valid token interface. This design choice maximizes achievable throughput and minimizes fabric overhead, while enabling integrators to wrap the core as best suits their system.
Measured FPGA implementations demonstrate real, observable throughput consistent with the contract: one inference per core per clock with fixed latency (approximately L cycles, configuration dependent). The result is a production-oriented pathway for deploying AltiCore model intelligence in FPGA and ASIC environments where predictable core behavior and efficient implementation are primary requirements.
About EvoChip.ai
EvoChip.ai develops AI computing technology that transforms training data into deterministic hardware inference implementations.
Contact: ab@evochip.ai | Document Version: V3 | Publication Date: February 2026 | © 2026 EvoChip.ai. All rights reserved.
| Signal | Direction | Description |
|---|---|---|
| clk | Input | Synchronous clock |
| rst | Input | Synchronous/asynchronous reset (implementation defined); when asserted, core is held in reset and internal state is cleared |
| en | Input | Input token — en=1 indicates an accepted sample on input pins |
| valid | Output | Output token — valid=1 indicates output pins contain a meaningful result for a previously accepted input |
For fixed latency L: valid[t] = en[t−L] (after reset/pipeline fill)
If en=0, the cycle is a bubble: no accepted sample for that cycle, and a corresponding valid=0 bubble emerges after L cycles.
When valid=1, output pins correspond to the input sample presented on the cycle where the matching en=1 occurred (L cycles earlier).