← Back to Site
Technical White Paper
AltiCoreSWP
Cross-Platform Software Runtime for General-Purpose Compute
DocumentVersion 1.0
DateFebruary 2026
ClassificationPublic Distribution
Issued byEvoChip.ai
Section 01
Executive Summary

AltiCoreSWP is the software deployment tier of the AltiCore product family, providing model training and inference on existing general-purpose compute systems. Built on the AltiCore mathematical framework, AltiCoreSWP is not a conventional neural-network runtime and does not rely primarily on tensor-centric floating-point execution. Instead, trained models are executed through logic-dominant operator chains implemented largely with hardware-native logical operations, with bounded arithmetic where required.

AltiCoreSWP is engineered for CPU-first deployment on standard compute infrastructure. It is intended to deliver high sustained inference throughput on supported systems without requiring an NPU, cloud inference service, or hardware modification to the host platform. Within the same framework, AltiCoreSWP supports model training and inference for supported software targets.

In a CPU-only benchmark across seven public datasets and two AVX2-capable x86 platforms, AltiCoreSWP delivered a large and consistent throughput advantage over the fastest neural-network CPU implementation tested (TensorFlow Lite with XNNPACK). Under the stated benchmark conditions, observed speedups were typically approximately 13× on the workstation-class CPU (range approximately 6.7× to 21×) and typically approximately 17× on the server-class CPU (range approximately 9.1× to 27.6×). These are measured benchmark results under defined conditions, not universal guarantees for every workload, model shape, or deployment environment.

Key Benefits
  • CPU-first deployment on existing infrastructure
  • High sustained inference throughput for structured decisioning workloads
  • Reduced arithmetic intensity relative to conventional neural-network CPU inference
  • Standard software integration through C/C++ and Python workflows
  • Training and inference within a common framework

Section 02
Product Boundary and Integration Scope

AltiCoreSWP is delivered as a software execution platform for supported general-purpose compute targets. Depending on deployment mode, the product boundary consists of:

  1. the AltiCore model training workflow and artifact generation process,
  2. the software runtime and/or compiled execution artifact used to execute trained AltiCore models, and
  3. the inference and, where applicable, training interfaces exposed to the surrounding application or service.

AltiCoreSWP makes no architectural assumptions about the surrounding software system beyond standard integration requirements. It may be integrated into standalone applications, libraries, services, or batch-processing pipelines as supported by the target platform.

System-level responsibilities remain outside the AltiCoreSWP product boundary and are the responsibility of the integrator. These include application orchestration, data acquisition, feature engineering, storage, security, service topology, batching strategy, network transport, process lifecycle management, concurrency policy, observability, and end-to-end application latency.


Section 03
Execution, Timing, and Measurement Contract

AltiCoreSWP is characterized primarily by sustained inference throughput on supported software targets. Unless otherwise stated, performance statements in this paper refer to CPU-only execution under defined benchmark conditions. Training performance, end-to-end application latency, and alternative platform realizations should be characterized separately.

3.1  Execution Model

AltiCoreSWP executes trained models on the host CPU through standard software deployment workflows. CPU inference does not require hardware modification, an external NPU, or an off-device inference service.

3.2  Measurement Boundary

In the demonstrated CPU benchmark, throughput is measured in inferences per second over the timed inference region only. Input preparation, model loading and initialization, output-buffer allocation, and warm-up are performed outside the timed region. Reported figures therefore represent sustained compute throughput under the stated test conditions, not end-to-end application throughput.

3.3  Variability and Integration Responsibility

Observed performance depends on the specific trained model being executed, compiler and runtime configuration, batch size, thread settings, memory placement, host platform, operating-system scheduling, and surrounding application architecture. Because AltiCoreSWP runs on general-purpose compute systems, wall-clock latency is not cycle-constant, and final end-to-end performance remains a system-integration responsibility.

3.4  Platform Scope

The demonstrated benchmark results summarized in this paper are based on AVX2-capable x86 CPU systems. Other supported targets, optimizations, or deployment modes should be specified separately and should not be inferred from the CPU benchmark data alone.


Section 04
Core Technical Architecture
4.1  Logic-Dominant Computation

AltiCoreSWP executes trained models through logic-dominant operator chains rather than conventional tensor-centric neural-network execution. Runtime behavior relies primarily on hardware-native logical operations, with bounded arithmetic where required. This reduces arithmetic intensity relative to conventional neural-network CPU inference and aligns execution with standard general-purpose processors.

4.2  Compact Model Footprint and Reduced Arithmetic Work

At its current demonstrated stage, AltiCoreSWP shows its strongest value on compact, low-parameter models. In the seven-dataset CPU benchmark described in this paper, the selected AltiCoreSWP models were substantially smaller than the selected multilayer perceptron (MLP) baselines, using approximately 35× to 301× fewer parameters and approximately 40× to 343× fewer arithmetic operations per inference, depending on dataset.

These structural differences reduce the amount of work performed per inference and provide the technical basis for the throughput advantage observed in the benchmark. These figures are benchmark-specific and should not be interpreted as universal ratios for all model classes or deployment scenarios.

4.3  Variable Usage and Deployment Reach

The AltiCoreSWP training workflow naturally identifies and retains the variables that contribute most to model performance as a consequence of its training and model-selection process, without requiring a separate explicit feature-selection preprocessing stage. In the benchmark suite, the selected AltiCoreSWP models used no more input variables than the compared MLP models and, in most datasets, used fewer.

Operationally, reduced input requirements can lower upstream data dependencies and simplify deployment in environments where signal availability, data-pipeline complexity, or integration cost are limiting factors.

4.4  Unified Training and Inference Framework

AltiCoreSWP supports both model training and inference within the same underlying framework. A model can be developed, trained, evaluated, and deployed within a common AltiCore workflow, while the final software realization is matched to the intended target environment and integration path.


Section 05
Deployment Workflow and Integration Model
5.1  Data Ingestion and Model Training

A labeled dataset is ingested through AltiCore tooling using supported software workflows, including file-based and API-mediated inputs where applicable. Model training is performed within the AltiCore framework on supported compute platforms.

5.2  Model Shape Selection

For AltiCoreSWP, model shape is a user-selected hyperparameter. Specific shape values are chosen according to the desired balance of model capability, complexity, resource consumption, and deployment objectives, in a manner analogous to shape-variant model selection in other algorithm families. Candidate model shapes may therefore be evaluated and compared within the AltiCore workflow before selecting the deployment model.

5.3  Command-Line Training and Inference Tools

AltiCoreSWP provides command-line tools for both training and inference. These tools allow users to train models, execute inference, and evaluate performance without requiring custom application development, making them suitable for rapid evaluation, benchmarking, and development-free deployment workflows.

On supported platforms, the supplied command-line binaries are implemented as heavily optimized, multithreaded executables designed to utilize available parallel compute resources effectively out of the box.

5.4  Software Realization and Application Integration

In addition to command-line execution, AltiCoreSWP integrates into standard software toolchains and may be used through C/C++ and other workflows, depending on the deployment path. The selected model is realized as an AltiCoreSWP software deployment artifact matched to the intended runtime environment.

Application-specific responsibilities — including feature acquisition, batching policy, service orchestration, security, observability, and end-to-end latency management — remain outside the AltiCoreSWP runtime boundary and are the responsibility of the integrator.

5.5  Deployment and Scale-Out

The resulting runtime can be deployed on supported workstation- and server-class systems using conventional software deployment methods. Horizontal scale-out across processes, services, or nodes is handled by the surrounding application and infrastructure architecture rather than by the AltiCoreSWP runtime itself.


Section 06
Demonstrated Implementations and Measured Performance

AltiCoreSWP's logic-dominant execution model translates into measured, order-of-magnitude throughput advantages on standard CPU infrastructure. In the AltiCoreSWP Benchmark Report v6.0, conducted jointly by EvoChip and SidePath, AltiCoreSWP was evaluated against multiple widely used neural-network CPU inference implementations across seven public datasets on two AVX2-capable x86 systems.

Across the demonstrated benchmark scope, AltiCoreSWP delivered higher throughput than the fastest neural-network CPU implementation tested in every dataset included in the report.

6.1  Demonstrated Benchmark Results
225–361M
Inf/sec — Workstation
472–575M
Inf/sec — Server
~13×
Typical speedup (WS)
27.6×
Peak observed speedup
PlatformCPUObserved AltiCoreSWP ThroughputRelative Advantage
Workstation-classIntel Core i7-13700H~225–361 M inf/sectypically ~13×, range ~6.7×–21×
Server-classIntel Xeon Gold 5416S~472–575 M inf/sectypically ~17×, range ~9.1×–27.6×
6.2  Out-of-the-Box Benchmark Realization

The demonstrated benchmark results were produced using the standard AltiCoreSWP-supplied command-line binaries for training and inference. No customer-specific development work, model-serving integration work, or benchmark-specific application engineering was required to obtain the reported AltiCoreSWP results.

The supplied command-line tools are heavily optimized for multithreaded execution and are designed to utilize available parallel compute resources on supported platforms out of the box.

6.3  Benchmark Reference

Detailed methodology, dataset-level results, baseline definitions, hardware configuration, and measurement procedures are documented in the AltiCoreSWP Benchmark Report v6.0. Performance claims in this white paper summarize the demonstrated results reported there.


Section 07
Ecosystem Integration

AltiCoreSWP is one deployment tier within the broader AltiCore product family. Alongside AltiCoreMCU and AltiCoreHDL, it is built on the same underlying AltiCore mathematical framework, allowing model development and training to proceed within a common technical foundation across software, constrained embedded, and hardware-integrated targets.

Within that ecosystem, each tier delivers the framework in a target-specific form:

  • AltiCoreSWP for general-purpose software training and inference,
  • AltiCoreMCU for constrained MCU-class deployment through validated runtime templates and static parameter blocks, and
  • AltiCoreHDL for FPGA and ASIC realization as deterministic hardware logic.

A model developed within AltiCore tooling can, where a corresponding supported deployment realization exists, be carried into the delivery tier that best matches the system constraint profile.


Section 08
Target Applications

AltiCoreSWP is presently best matched to CPU-first, high-throughput structured decisioning workloads where inference cost, deployment simplicity, and predictable compute behavior are primary requirements, including:

Risk screening and fraud detection

High-volume decisioning workloads where throughput directly affects capacity and cost per decision.

Industrial monitoring and operational classification

Structured inference tasks where sustained CPU throughput and deployment simplicity matter more than accelerator-centric execution.

On-premises and security-sensitive deployments

Environments where organizations prefer inference to remain inside standard CPU-based infrastructure without mandatory dependency on external AI accelerators or cloud inference services.

Capacity-bound inference services

Systems in which inference is a meaningful operating expense and higher inferences-per-second directly improve server utilization and unit economics.


Section 09
Conclusion

AltiCoreSWP is a CPU-first software runtime built on the AltiCore mathematical framework. By executing trained models through logic-dominant operator chains with reduced arithmetic intensity relative to conventional neural-network CPU inference, it provides a practical path to higher sustained throughput on existing general-purpose compute infrastructure.

In the demonstrated seven-dataset CPU benchmark summarized in this paper, AltiCoreSWP delivered a large and consistent throughput advantage over the fastest neural-network CPU implementation tested, with typical observed speedups of approximately 13× on the workstation-class platform and approximately 17× on the server-class platform, and a peak observed speedup of 27.6× under the stated benchmark conditions.

These results are configuration-specific, but they support the central conclusion: AltiCoreSWP provides a production-relevant software deployment pathway for structured workloads where CPU throughput, deployment simplicity, and unit economics are primary requirements.


Appendix A
Benchmark Scope and Measurement Summary
A.1  Purpose and Scope

This white paper summarizes performance results reported in the AltiCoreSWP Benchmark Report v6.0, a joint benchmark initiative conducted by EvoChip and SidePath. The benchmark measured maximum sustained CPU inference throughput, reported in inferences per second, across seven public datasets on two AVX2-capable x86 systems.

Throughout the benchmark, one input row corresponds to one inference. The benchmark objective was relative throughput comparison under CPU-only conditions. End-to-end application latency, GPU-accelerated neural-network performance, and full service-level deployment behavior were outside the benchmark scope.

A.2  Systems Under Test

The demonstrated benchmark was executed on two general-purpose CPU platforms:

  • Workstation-class system: Intel Core i7-13700H, 32 GB RAM
  • Server-class system: Intel Xeon Gold 5416S, 64 GB RAM

All evaluated methods used AVX2-capable CPU execution paths. Although the server platform supported AVX512, AVX512 was not used in the reported benchmark results.

A.3  Inference Methods Evaluated

For each dataset and hardware platform, AltiCoreSWP was compared against multiple CPU implementations of a neural-network baseline. The evaluated methods included:

  • Python TensorFlow/Keras baseline
  • Multithreaded Python TensorFlow/Keras
  • C++ TensorFlow Lite with XNNPACK
  • C++ TensorFlow Lite with application-managed RUY-based parallelism
  • Multithreaded AltiCoreSWP execution

Within the demonstrated benchmark scope, the fastest neural-network CPU implementation tested was C++ TensorFlow Lite with XNNPACK.

A.4  Timed Region and Throughput Definition

Throughput was measured only over the timed inference region. The following activities were performed outside the timed region: input preparation, model loading and initialization, output-buffer allocation, and warm-up execution.

Throughput was computed as: Throughput (inferences/sec) = N / t, where N = number of rows processed and t = elapsed time in seconds. Because each input row represents one inference, rows per second and inferences per second are equivalent in this benchmark.

A.5  Replicate Protocol

Each unique hardware / dataset / method configuration was executed as five independent process launches. Reported benchmark results use the median throughput across those five runs for each configuration.

This protocol was intended to characterize sustained compute throughput under repeatable CPU-only test conditions rather than single-run best-case behavior.

A.6  Out-of-the-Box AltiCoreSWP Realization

The AltiCoreSWP results summarized in this white paper were produced using the standard AltiCoreSWP-supplied command-line binaries for training and inference.

No customer-specific development work, model-serving integration work, or benchmark-specific application engineering was required to obtain the reported AltiCoreSWP results. The supplied command-line tools are heavily optimized, multithreaded executables designed to utilize available parallel compute resources on supported platforms out of the box.

A.7  Summary of Reported Benchmark Results

Under the demonstrated benchmark conditions:

  • On the workstation-class platform, AltiCoreSWP sustained approximately 225 to 361 million inferences per second, depending on dataset, and was typically approximately 13× faster than the fastest neural-network CPU implementation tested, with observed speedups ranging from approximately 6.7× to 21×.
  • On the server-class platform, AltiCoreSWP sustained approximately 472 to 575 million inferences per second, depending on dataset, and was typically approximately 17× faster than the fastest neural-network CPU implementation tested, with observed speedups ranging from approximately 9.1× to 27.6×.
  • The peak observed speedup reported in the benchmark was 27.6×.
A.8  Interpretation Boundary

The benchmark establishes demonstrated sustained CPU inference throughput under the stated conditions. It does not characterize: end-to-end service latency, network transport, storage or database behavior, feature-pipeline overhead, application orchestration effects, or full deployment-level SLA performance.

Detailed methodology, hardware configuration, baseline definitions, dataset-level results, and measurement procedures are documented in the AltiCoreSWP Benchmark Report v6.0. Performance statements in this white paper summarize the demonstrated results reported there.