AltiCoreSWP is the software deployment tier of the AltiCore product family, providing model training and inference on existing general-purpose compute systems. Built on the AltiCore mathematical framework, AltiCoreSWP is not a conventional neural-network runtime and does not rely primarily on tensor-centric floating-point execution. Instead, trained models are executed through logic-dominant operator chains implemented largely with hardware-native logical operations, with bounded arithmetic where required.
AltiCoreSWP is engineered for CPU-first deployment on standard compute infrastructure. It is intended to deliver high sustained inference throughput on supported systems without requiring an NPU, cloud inference service, or hardware modification to the host platform. Within the same framework, AltiCoreSWP supports model training and inference for supported software targets.
In a CPU-only benchmark across seven public datasets and two AVX2-capable x86 platforms, AltiCoreSWP delivered a large and consistent throughput advantage over the fastest neural-network CPU implementation tested (TensorFlow Lite with XNNPACK). Under the stated benchmark conditions, observed speedups were typically approximately 13× on the workstation-class CPU (range approximately 6.7× to 21×) and typically approximately 17× on the server-class CPU (range approximately 9.1× to 27.6×). These are measured benchmark results under defined conditions, not universal guarantees for every workload, model shape, or deployment environment.
AltiCoreSWP is delivered as a software execution platform for supported general-purpose compute targets. Depending on deployment mode, the product boundary consists of:
AltiCoreSWP makes no architectural assumptions about the surrounding software system beyond standard integration requirements. It may be integrated into standalone applications, libraries, services, or batch-processing pipelines as supported by the target platform.
System-level responsibilities remain outside the AltiCoreSWP product boundary and are the responsibility of the integrator. These include application orchestration, data acquisition, feature engineering, storage, security, service topology, batching strategy, network transport, process lifecycle management, concurrency policy, observability, and end-to-end application latency.
AltiCoreSWP is characterized primarily by sustained inference throughput on supported software targets. Unless otherwise stated, performance statements in this paper refer to CPU-only execution under defined benchmark conditions. Training performance, end-to-end application latency, and alternative platform realizations should be characterized separately.
AltiCoreSWP executes trained models on the host CPU through standard software deployment workflows. CPU inference does not require hardware modification, an external NPU, or an off-device inference service.
In the demonstrated CPU benchmark, throughput is measured in inferences per second over the timed inference region only. Input preparation, model loading and initialization, output-buffer allocation, and warm-up are performed outside the timed region. Reported figures therefore represent sustained compute throughput under the stated test conditions, not end-to-end application throughput.
Observed performance depends on the specific trained model being executed, compiler and runtime configuration, batch size, thread settings, memory placement, host platform, operating-system scheduling, and surrounding application architecture. Because AltiCoreSWP runs on general-purpose compute systems, wall-clock latency is not cycle-constant, and final end-to-end performance remains a system-integration responsibility.
The demonstrated benchmark results summarized in this paper are based on AVX2-capable x86 CPU systems. Other supported targets, optimizations, or deployment modes should be specified separately and should not be inferred from the CPU benchmark data alone.
AltiCoreSWP executes trained models through logic-dominant operator chains rather than conventional tensor-centric neural-network execution. Runtime behavior relies primarily on hardware-native logical operations, with bounded arithmetic where required. This reduces arithmetic intensity relative to conventional neural-network CPU inference and aligns execution with standard general-purpose processors.
At its current demonstrated stage, AltiCoreSWP shows its strongest value on compact, low-parameter models. In the seven-dataset CPU benchmark described in this paper, the selected AltiCoreSWP models were substantially smaller than the selected multilayer perceptron (MLP) baselines, using approximately 35× to 301× fewer parameters and approximately 40× to 343× fewer arithmetic operations per inference, depending on dataset.
These structural differences reduce the amount of work performed per inference and provide the technical basis for the throughput advantage observed in the benchmark. These figures are benchmark-specific and should not be interpreted as universal ratios for all model classes or deployment scenarios.
The AltiCoreSWP training workflow naturally identifies and retains the variables that contribute most to model performance as a consequence of its training and model-selection process, without requiring a separate explicit feature-selection preprocessing stage. In the benchmark suite, the selected AltiCoreSWP models used no more input variables than the compared MLP models and, in most datasets, used fewer.
Operationally, reduced input requirements can lower upstream data dependencies and simplify deployment in environments where signal availability, data-pipeline complexity, or integration cost are limiting factors.
AltiCoreSWP supports both model training and inference within the same underlying framework. A model can be developed, trained, evaluated, and deployed within a common AltiCore workflow, while the final software realization is matched to the intended target environment and integration path.
A labeled dataset is ingested through AltiCore tooling using supported software workflows, including file-based and API-mediated inputs where applicable. Model training is performed within the AltiCore framework on supported compute platforms.
For AltiCoreSWP, model shape is a user-selected hyperparameter. Specific shape values are chosen according to the desired balance of model capability, complexity, resource consumption, and deployment objectives, in a manner analogous to shape-variant model selection in other algorithm families. Candidate model shapes may therefore be evaluated and compared within the AltiCore workflow before selecting the deployment model.
AltiCoreSWP provides command-line tools for both training and inference. These tools allow users to train models, execute inference, and evaluate performance without requiring custom application development, making them suitable for rapid evaluation, benchmarking, and development-free deployment workflows.
On supported platforms, the supplied command-line binaries are implemented as heavily optimized, multithreaded executables designed to utilize available parallel compute resources effectively out of the box.
In addition to command-line execution, AltiCoreSWP integrates into standard software toolchains and may be used through C/C++ and other workflows, depending on the deployment path. The selected model is realized as an AltiCoreSWP software deployment artifact matched to the intended runtime environment.
Application-specific responsibilities — including feature acquisition, batching policy, service orchestration, security, observability, and end-to-end latency management — remain outside the AltiCoreSWP runtime boundary and are the responsibility of the integrator.
The resulting runtime can be deployed on supported workstation- and server-class systems using conventional software deployment methods. Horizontal scale-out across processes, services, or nodes is handled by the surrounding application and infrastructure architecture rather than by the AltiCoreSWP runtime itself.
AltiCoreSWP's logic-dominant execution model translates into measured, order-of-magnitude throughput advantages on standard CPU infrastructure. In the AltiCoreSWP Benchmark Report v6.0, conducted jointly by EvoChip and SidePath, AltiCoreSWP was evaluated against multiple widely used neural-network CPU inference implementations across seven public datasets on two AVX2-capable x86 systems.
Across the demonstrated benchmark scope, AltiCoreSWP delivered higher throughput than the fastest neural-network CPU implementation tested in every dataset included in the report.
| Platform | CPU | Observed AltiCoreSWP Throughput | Relative Advantage |
|---|---|---|---|
| Workstation-class | Intel Core i7-13700H | ~225–361 M inf/sec | typically ~13×, range ~6.7×–21× |
| Server-class | Intel Xeon Gold 5416S | ~472–575 M inf/sec | typically ~17×, range ~9.1×–27.6× |
The demonstrated benchmark results were produced using the standard AltiCoreSWP-supplied command-line binaries for training and inference. No customer-specific development work, model-serving integration work, or benchmark-specific application engineering was required to obtain the reported AltiCoreSWP results.
The supplied command-line tools are heavily optimized for multithreaded execution and are designed to utilize available parallel compute resources on supported platforms out of the box.
Detailed methodology, dataset-level results, baseline definitions, hardware configuration, and measurement procedures are documented in the AltiCoreSWP Benchmark Report v6.0. Performance claims in this white paper summarize the demonstrated results reported there.
AltiCoreSWP is one deployment tier within the broader AltiCore product family. Alongside AltiCoreMCU and AltiCoreHDL, it is built on the same underlying AltiCore mathematical framework, allowing model development and training to proceed within a common technical foundation across software, constrained embedded, and hardware-integrated targets.
Within that ecosystem, each tier delivers the framework in a target-specific form:
A model developed within AltiCore tooling can, where a corresponding supported deployment realization exists, be carried into the delivery tier that best matches the system constraint profile.
AltiCoreSWP is presently best matched to CPU-first, high-throughput structured decisioning workloads where inference cost, deployment simplicity, and predictable compute behavior are primary requirements, including:
High-volume decisioning workloads where throughput directly affects capacity and cost per decision.
Structured inference tasks where sustained CPU throughput and deployment simplicity matter more than accelerator-centric execution.
Environments where organizations prefer inference to remain inside standard CPU-based infrastructure without mandatory dependency on external AI accelerators or cloud inference services.
Systems in which inference is a meaningful operating expense and higher inferences-per-second directly improve server utilization and unit economics.
AltiCoreSWP is a CPU-first software runtime built on the AltiCore mathematical framework. By executing trained models through logic-dominant operator chains with reduced arithmetic intensity relative to conventional neural-network CPU inference, it provides a practical path to higher sustained throughput on existing general-purpose compute infrastructure.
In the demonstrated seven-dataset CPU benchmark summarized in this paper, AltiCoreSWP delivered a large and consistent throughput advantage over the fastest neural-network CPU implementation tested, with typical observed speedups of approximately 13× on the workstation-class platform and approximately 17× on the server-class platform, and a peak observed speedup of 27.6× under the stated benchmark conditions.
These results are configuration-specific, but they support the central conclusion: AltiCoreSWP provides a production-relevant software deployment pathway for structured workloads where CPU throughput, deployment simplicity, and unit economics are primary requirements.
This white paper summarizes performance results reported in the AltiCoreSWP Benchmark Report v6.0, a joint benchmark initiative conducted by EvoChip and SidePath. The benchmark measured maximum sustained CPU inference throughput, reported in inferences per second, across seven public datasets on two AVX2-capable x86 systems.
Throughout the benchmark, one input row corresponds to one inference. The benchmark objective was relative throughput comparison under CPU-only conditions. End-to-end application latency, GPU-accelerated neural-network performance, and full service-level deployment behavior were outside the benchmark scope.
The demonstrated benchmark was executed on two general-purpose CPU platforms:
All evaluated methods used AVX2-capable CPU execution paths. Although the server platform supported AVX512, AVX512 was not used in the reported benchmark results.
For each dataset and hardware platform, AltiCoreSWP was compared against multiple CPU implementations of a neural-network baseline. The evaluated methods included:
Within the demonstrated benchmark scope, the fastest neural-network CPU implementation tested was C++ TensorFlow Lite with XNNPACK.
Throughput was measured only over the timed inference region. The following activities were performed outside the timed region: input preparation, model loading and initialization, output-buffer allocation, and warm-up execution.
Throughput was computed as: Throughput (inferences/sec) = N / t, where N = number of rows processed and t = elapsed time in seconds. Because each input row represents one inference, rows per second and inferences per second are equivalent in this benchmark.
Each unique hardware / dataset / method configuration was executed as five independent process launches. Reported benchmark results use the median throughput across those five runs for each configuration.
This protocol was intended to characterize sustained compute throughput under repeatable CPU-only test conditions rather than single-run best-case behavior.
The AltiCoreSWP results summarized in this white paper were produced using the standard AltiCoreSWP-supplied command-line binaries for training and inference.
No customer-specific development work, model-serving integration work, or benchmark-specific application engineering was required to obtain the reported AltiCoreSWP results. The supplied command-line tools are heavily optimized, multithreaded executables designed to utilize available parallel compute resources on supported platforms out of the box.
Under the demonstrated benchmark conditions:
The benchmark establishes demonstrated sustained CPU inference throughput under the stated conditions. It does not characterize: end-to-end service latency, network transport, storage or database behavior, feature-pipeline overhead, application orchestration effects, or full deployment-level SLA performance.
Detailed methodology, hardware configuration, baseline definitions, dataset-level results, and measurement procedures are documented in the AltiCoreSWP Benchmark Report v6.0. Performance statements in this white paper summarize the demonstrated results reported there.