EvoChip

Technical White Paper — EvoChip.ai

AltiCoreHDL

Hardware-Integrated Intelligence for Production Silicon

1Inference / clock / core

~100Cycle-constant latency (typical)

3.19 BPeak inf/sec — ZCU104

0External DRAM required

DocumentVersion 3.0

DateFebruary 2026

ClassificationPublic Distribution

Issued byEvoChip.ai

Table of Contents

1Executive Summary

6Demonstrated Implementations and Measured Performance

2Product Boundary and Integration Scope

7Ecosystem Integration

3Interface and Timing Contract (en / valid)

8Target Applications

4Core Technical Architecture

9Conclusion

5Structural Design Principles

AAppendix A — Interface Summary and Timing Diagram

Section 01

Executive Summary

AltiCoreHDL is a hardware IP core that maps models trained within the AltiCore mathematical framework directly into FPGA and ASIC logic. It is delivered as a synchronous, fixed-depth pipelined circuit with a minimal-control interface designed for maximum throughput and minimal fabric overhead.

AltiCoreHDL is not a conventional neural network and does not execute tensor-centric floating-point workloads. Inference is realized primarily through bitwise logic operations, shifts, and a bounded amount of integer DSP, compiled into a deterministic pipeline.

A fully pipelined configuration supports:

Up to 1 accepted inference per clock cycle per core (when en=1 each cycle)
Up to 1 valid result per clock cycle per core (steady state, after pipeline fill)
Fixed, cycle-constant core latency of approximately L cycles (typical ~100 cycles, configuration dependent)

Key Benefits

Cycle-constant core latency and order-preserving results at the IP boundary
Maximum throughput via a backpressure-free token pipeline (en/valid)
Minimal control fabric — no internal ready/valid networks, skid buffers, or arbitration
No floating-point dependency — logic-centric with bounded integer arithmetic
No external DRAM required for FPGA inference operation (on-chip storage only)
Wrapper-friendly — integrators can add FIFOs / framing / standard streaming adapters externally to suit their system

Section 02

Product Boundary and Integration Scope

AltiCoreHDL is delivered as a synthesizable RTL core with defined input and output pins. The product boundary is the core interface.

AltiCoreHDL makes no architectural assumptions about the external system beyond standard synchronous digital integration requirements (clock, reset, and pin-level timing). System-level concerns — interconnect arbitration, DMA behavior, host software, clock-domain crossings, buffering strategy, and end-to-end application latency — are out of scope for the core and are the responsibility of the integrator.

The core is intentionally designed to be simple to wrap (e.g., external FIFO buffering, framing, or adapters to standard streaming protocols). Reference wrappers will be developed on a demand-priority basis.

Section 03

Interface and Timing Contract (en / valid)

3.1 — Backpressure-Free Token Pipeline

AltiCoreHDL uses a minimal-control interface:

en (input enable token): qualifies the input pins on a given clock cycle. If en=1, that cycle's input sample is accepted. If en=0, the cycle is a bubble (no accepted sample).
valid (output token): asserts when the output pins contain a meaningful result corresponding to a previously accepted input sample.

This interface is backpressure-free (no ready signal). The pipeline advances every clock; if downstream logic cannot accept results at line rate, buffering/flow control must be implemented externally.

3.2 — Fixed-Latency Mapping

The en token propagates through the pipeline alongside the computation. For a pipeline depth of L cycles:

Each accepted input (en=1) produces exactly one corresponding output with valid=1 after exactly L cycles.
The token relationship is: valid[t] = en[t − L] (after reset/pipeline fill)

This latency is data-independent at the core boundary: there are no data-dependent early exits and no variable-iteration loops.

3.3 — Independence and No Cross-Sample Coupling

The core is strictly feed-forward for inference: internal state consists only of pipeline storage and static parameters. There are no feedback paths or state updates that couple one sample to another. Bubble cycles (en=0) do not affect subsequent accepted samples (en=1) beyond inserting bubbles in the token stream.

3.4 — Stall Conditions

The inference pipeline does not stall internally. The only stall condition is reset asserted. When reset is asserted, internal state and token pipeline state are cleared.

3.5 — Output Meaning When valid=0

When valid=0, output pins are not meaningful for inference results and may change. Downstream logic must qualify sampling using valid.

3.6 — Throughput Statement (Precise)

1 / cyclePeak acceptance rate

1 / cyclePeak result rate (steady state)

L cyclesLatency (en=1 → valid=1)

0Internal stalls

Per instantiated core: peak acceptance rate up to 1 sample/cycle (sustained en=1), peak result rate up to 1 result/cycle (steady state, after pipeline fill), and latency of exactly L cycles from acceptance (en=1) to corresponding valid=1 result.

Section 04

Core Technical Architecture

4.1 — Fixed-Depth Synchronous Pipeline

AltiCoreHDL inference is implemented as a fixed-depth pipeline consisting of:

combinational logic stages (logic-centric primitives),
pipeline storage (registers and/or on-chip memory configured as deterministic pipeline storage),
bounded fixed-point/integer arithmetic where required, optionally mapped to device primitives (e.g., DSP blocks).

The pipeline advances every clock, and output validity is carried by the token pipeline (en → valid).

4.2 — Resource Model

Each instantiated AltiCoreHDL core uses dedicated resources for that core. There are no shared compute resources across samples and no internal arbitration that could introduce stalls or variable service time.

4.3 — Memory and External Dependencies

For FPGA implementations: the core uses on-chip resources only (logic, registers, DSP blocks, and/or on-chip memory for deterministic pipeline storage). The core does not require external DRAM or off-FPGA resources for inference operation.

For ASIC implementations: equivalent storage and arithmetic primitives are realized using on-chip resources (standard-cell and/or embedded memory macros as appropriate to the implementation).

Section 05

Structural Design Principles

5.1 — Minimal Control Fabric by Design

AltiCoreHDL minimizes control overhead by using a token pipeline rather than backpressure networks or dynamic scheduling. This reduces: internal control logic, stall/bubble management complexity, and fabric used for handshake infrastructure.

5.2 — Backpressure-Free Interface for Maximum Throughput and Minimal Fabric

The en/valid interface was selected to: maximize achievable throughput at the core boundary, minimize interface and internal control fabric, and allow integrators to add only the wrappers they need (FIFO buffering, framing, protocol adapters).

5.3 — Flexible Word-Length Support

AltiCoreHDL supports configurable integer word sizes to match integration and PPA goals:

configurable widths (e.g., 4-bit through 64-bit and beyond, implementation dependent),
alignment to device primitives and interface requirements,
precision/resource tradeoffs.

5.4 — Arithmetic Positioning (Accurate Claim)

AltiCoreHDL uses minimal arithmetic relative to MAC-dominant tensor approaches typical of neural inference accelerators. It does not claim "no arithmetic"; instead, arithmetic is bounded and used where needed within an otherwise logic-centric compute profile.

Section 06

Demonstrated Implementations and Measured Performance

AltiCoreHDL has been implemented and validated on shipping Xilinx FPGA platforms. Under sustained en=1 (steady state), observed throughput matches the token-pipeline contract: one valid result per clock per instantiated core.

6.1 — Measured FPGA Implementations

Platform	Device	Cores Instantiated	Clock (MHz)	Observed Throughput
Digilent Arty Z7 (battery-powered)	Zynq-7000 Zynq-7020 / "-20"	3	100.0	300 M inferences/sec
Xilinx ZCU104	Zynq UltraScale+ XCZU7EV MPSoC	17	187.5	3.1875 B inferences/sec

6.2 — Measurement Method

Throughput was validated at the core boundary using a logic analyzer observing the valid output token under continuous operation (en=1, steady state). The observed valid behavior confirmed one asserted valid per clock per core, consistent with the fixed pipeline contract.

These implementations are observable and can be independently validated by instrumenting valid and confirming the en → valid delay and sustained token rate at the selected clock frequency.

6.3 — Interpreting Multi-Core Scaling

When multiple identical cores are instantiated and clocked together, aggregate peak throughput scales approximately linearly with:

number of cores, and
clock frequency,

subject to device capacity, timing closure, and integrator routing/floorplanning constraints.

Section 07

Ecosystem Integration

AltiCoreHDL integrates into a development pipeline that supports:

1Model development and training in AltiCore tooling
2Hardware mapping into AltiCoreHDL for FPGA-based characterization and deployment
3Production migration into ASIC/SoC implementations as required

Algorithmic specifics and training methodology can be provided under NDA as part of technical diligence.

Section 08

Target Applications

AltiCoreHDL is engineered for deployments prioritizing deterministic core timing and efficient hardware realization, including:

▸Safety- and compliance-sensitive systems

Where predictable core timing and repeatable behavior support system assurance activities.

▸High-volume production

Where dedicated, low-overhead inference cores can be integrated into cost-sensitive silicon.

▸Always-on monitoring (“sentinel”)

Low-overhead inference used for continuous detection and wake-on-event patterns.

▸Embedded and edge inference

Where power/area constraints favor fixed pipelines and logic-centric computation.

Section 09

Conclusion

AltiCoreHDL is a deterministic inference IP core implemented as a fixed-depth synchronous pipeline with a backpressure-free en/valid token interface. This design choice maximizes achievable throughput and minimizes fabric overhead, while enabling integrators to wrap the core as best suits their system.

Measured FPGA implementations demonstrate real, observable throughput consistent with the contract: one inference per core per clock with fixed latency (approximately L cycles, configuration dependent). The result is a production-oriented pathway for deploying AltiCore model intelligence in FPGA and ASIC environments where predictable core behavior and efficient implementation are primary requirements.

About EvoChip.ai

EvoChip.ai develops AI computing technology that transforms training data into deterministic hardware inference implementations.

Appendix A

Interface Summary and Timing Diagram

Signal Summary (Core Boundary)

Signal	Direction	Description
clk	Input	Synchronous clock
rst	Input	Synchronous/asynchronous reset (implementation defined); when asserted, core is held in reset and internal state is cleared
en	Input	Input token — en=1 indicates an accepted sample on input pins
valid	Output	Output token — valid=1 indicates output pins contain a meaningful result for a previously accepted input

Token Relationship

For fixed latency L: valid[t] = en[t−L] (after reset/pipeline fill)

Bubble Semantics

If en=0, the cycle is a bubble: no accepted sample for that cycle, and a corresponding valid=0 bubble emerges after L cycles.

Timing Sketch (Illustrative — L = 4 cycles)

Cycle: 0   1   2   3   4   5   6   7   8
en:     1   0   1   1   0   0   1   0   0
valid: 0   0   0   0   1   0   1   1   0   (= en delayed by 4)

When valid=1, output pins correspond to the input sample presented on the cycle where the matching en=1 occurred (L cycles earlier).