← Back to Site
Technical White Paper
AltiCoreMCU
Embedded AI Runtime for Resource-Constrained Devices
DocumentVersion 1.0
DateFebruary 2026
ClassificationPublic Distribution
Issued byEvoChip.ai
Section 01
Executive Summary

AltiCoreMCU is the MCU deployment tier of the AltiCore product family, providing hand-designed, validated inference runtime templates for supported MCU-class targets. Models are trained within the AltiCore mathematical framework, and the training process produces model parameter values that are compiled into a statically allocated parameter block matched to the selected runtime template. No new inference code is synthesized during deployment.

AltiCoreMCU is designed for resource-constrained processors where dynamic memory allocation, variable execution timing, and dependence on specialized accelerators are unacceptable. It is not a conventional neural-network runtime and does not rely on tensor-centric floating-point execution. Instead, inference is executed through logic-dominant operator chains implemented primarily with hardware-native bitwise operations, with bounded integer arithmetic where required.

For supported targets, AltiCoreMCU aligns to native word widths across 8-bit, 16-bit, 32-bit, 64-bit, and custom architectures, without requiring cloud connectivity, an external neural processing unit (NPU), or other dedicated AI accelerators for local inference. On compatible platforms, the same framework can also support on-device training or model-update flows where memory and compute headroom permit.

Current collateral cites example benchmark configurations with model-state / parameter RAM as low as 521 bytes and approximately 9,000 inferences per second at 16 MHz. These figures are configuration-specific and should be interpreted within the benchmark conditions and memory definitions stated in this paper.

Key Benefits
  • Zero dynamic memory allocation with statically bounded runtime memory
  • Deterministic execution profile suitable for real-time embedded integration
  • Validated runtime templates for standard embedded toolchains
  • Native word-width targeting across 8-bit, 16-bit, 32-bit, 64-bit, and custom architectures
  • No NPU, accelerator, or cloud dependency for local inference
  • Optional on-device training where memory and compute capacity permit

Section 02
Product Boundary and Integration Scope

AltiCoreMCU is delivered as a set of hand-designed, validated inference runtime templates for supported model shapes and target platforms. Model training does not synthesize new executable inference code. Instead, the training process produces model parameter values, which are compiled into a statically allocated parameter block matched to the selected runtime template.

The product boundary therefore consists of:

  • 1the validated inference runtime template
  • 2the static model parameter block consumed by that runtime
  • 3the inference function interface exposed to the surrounding firmware

AltiCoreMCU makes no assumptions about the surrounding firmware architecture beyond standard embedded integration. It may be used in bare-metal systems or under an RTOS. System-level responsibilities including sensor acquisition, interrupt handling, peripheral I/O, DMA behavior, buffering strategy, watchdog policy, task scheduling, and end-to-end application latency remain outside the AltiCoreMCU runtime boundary and are the responsibility of the integrator.


Section 03
Runtime Execution, Timing, and Memory Contract

Unless otherwise stated, this section describes baseline inference operation using a validated runtime template and its associated static parameter block. Optional on-device training or update flows, where supported, should be characterized separately.

3.1  Static Memory Guarantee

AltiCoreMCU inference uses zero dynamic memory allocation. The inference runtime does not require malloc(), free(), or heap allocation during inference. Trained model parameter values are compiled into static storage, and the runtime operates by referencing that statically allocated parameter block during execution.

For a given runtime template, model shape, and target build configuration, the required parameter-state memory can be established prior to deployment. This eliminates inference-time heap fragmentation risk and removes heap-exhaustion failure modes from the AltiCoreMCU inference path itself.

Because deployment varies parameter data, not executable inference logic, the deployed code path remains within a pre-validated template set. Per-model variation is therefore confined to the statically compiled parameter block rather than model-specific generated source code.

3.2  Deterministic Timing

For a fixed runtime template, model shape, parameter block, compiler configuration, and target platform, AltiCoreMCU executes according to a static inference schedule. This supports repeatable execution-time characterization at the inference-function boundary and enables bounded integration into time-constrained embedded systems.

End-to-end real-time behavior remains dependent on platform-specific factors such as clocking, memory placement, wait states, cache behavior where applicable, and interrupt or preemption policy. Final schedule remains a system-integration responsibility.

3.3  No Cloud or NPU Dependency

AltiCoreMCU inference executes entirely on the host MCU or CPU. It requires no cloud connectivity, no off-device inference service, and no external NPU or AI coprocessor.

3.4  Validation Boundary

AltiCoreMCU deployment does not introduce newly synthesized inference code into the executable path. The deployed logic is selected from hand-designed, previously validated runtime templates; only the statically compiled model parameter block changes between trained models. This distinction is central to regression control, qualification, and controlled deployment in embedded environments.


Section 04
Core Technical Architecture
4.1  Logic-Dominant Computation

AltiCoreMCU implements inference through logic-dominant operator chains rather than conventional arithmetic-heavy neural-network execution. Runtime behavior relies primarily on hardware-native bitwise and discrete logic operations, with bounded integer arithmetic where required. This reduces arithmetic intensity and aligns execution with MCU-class processors operating under tight memory, timing, and power constraints.

4.2  Template-Based Runtime Realization

AltiCoreMCU uses hand-designed and validated runtime templates for supported model shapes and target platforms. Training does not generate new inference code. Instead, it produces the parameter values required by the selected template.

At build time, the validated runtime template is combined with the trained static parameter block. During inference, the runtime reads that block to execute the model. This preserves a fixed, validated executable implementation while allowing deployed models to vary through parameterization alone.

4.3  Native Word-Width and Platform Matching

The underlying AltiCore mathematical framework is not bound to a single tensor datatype or fixed accelerator format. AltiCoreMCU can therefore be realized across supported 8-bit, 16-bit, 32-bit, 64-bit, and custom word-width targets using templates aligned to the native register and integration constraints of the host platform.

4.4  Optional Local Training

Where supported, and where memory and compute headroom permit, AltiCoreMCU can support on-device training or local model updates. These capabilities are configuration-dependent and should be specified separately from the baseline inference runtime in terms of RAM consumption, persistent state, execution time, and deployment constraints.


Section 05
Deployment Workflow and Integration Model
5.1  Data Ingestion and Model Training

A labeled training dataset is provided to the AltiCore training framework through CSV or API input. Training produces the parameter values associated with the target model shape within the AltiCore mathematical framework.

5.2  Runtime Template Selection

For the intended deployment target, a hand-designed and validated inference runtime template is selected based on the supported model shape and platform characteristics.

5.3  Static Parameter Block Integration

The trained model parameter values are exported and compiled as a statically allocated parameter block matched to the selected runtime template. No new inference source code is synthesized during this step.

5.4  Firmware Integration

The selected runtime template and its static parameter block are integrated into the target firmware using the standard embedded build flow and IDE toolchains. The runtime is invoked through a stable inference function interface, while application-specific concerns such as input acquisition, buffering, scheduling, and output handling remain outside the AltiCoreMCU runtime boundary.


Section 06
Demonstrated Implementations and Measured Performance
521 B
Min. parameter RAM
~9,000
Inferences / sec @ 16 MHz
Zero
Dynamic allocation
No NPU
CPU-only operation

In benchmark example configurations, AltiCoreMCU has demonstrated model parameter / model-state RAM as low as 521 bytes and observed local inference throughput of approximately 9,000 inferences per second at 16 MHz. These measured results are consistent with the AltiCoreMCU execution profile of zero dynamic allocation, deterministic execution, and CPU-only operation without cloud connectivity or external NPUs.

These figures are benchmark results for specific model shapes, parameter blocks, runtime templates, and target-platform configurations. They are not universal guarantees for every AltiCoreMCU deployment. The 521-byte figure refers specifically to static model parameter / model-state memory, not total application memory consumption, and observed throughput will vary with the selected template, target MCU architecture, compiler settings, memory placement, clock frequency, and surrounding firmware integration conditions.

Taken together, these benchmark examples establish that AltiCoreMCU can deliver useful local inference within an extremely small static memory footprint, making it practical for MCU-class deployments where memory limits, execution predictability, and integration simplicity are primary design constraints.


Section 07
Ecosystem Integration

AltiCoreMCU is one deployment tier within the broader AltiCoreAI product family. Alongside AltiCoreSWP and AltiCoreHDL, it is built on the same underlying AltiCore mathematical framework, allowing model development and training to proceed within a common technical foundation across software, MCU, and hardware-integrated targets.

Within that ecosystem, each tier delivers the framework in a target-specific form: AltiCoreSWP for general-purpose software execution, AltiCoreMCU for constrained MCU-class deployment, and AltiCoreHDL for FPGA and ASIC implementation. For AltiCoreMCU specifically, deployment is realized through validated runtime templates bound to statically compiled model parameter blocks, preserving a controlled and reviewable executable path for embedded integration.

This shared foundation provides a coherent multi-target deployment pathway. A model developed within AltiCore tooling can, where a corresponding supported deployment realization exists, be carried into the delivery tier that best matches the system constraint profile: general-purpose software for development and evaluation, MCU deployment for resource-constrained embedded operation, and HDL implementation for deterministic hardware-integrated execution at larger scale.


Section 08
Target Applications

AltiCoreMCU is engineered for deployments prioritizing deterministic execution, static memory behavior, and low-overhead local inference, including:

Legacy platform extension and BOM preservation

Where existing MCU platforms need added local intelligence without processor replacement or Bill of Materials (BOM) change, including legacy 8-bit devices and modern embedded processors alike.

Always-on monitoring (“sentinel”)

Low-overhead local inference used for continuous detection and wake-on-event behavior, allowing a higher-power subsystem to remain idle until intervention is required.

Safety- and compliance-sensitive embedded systems

Where predictable execution timing, zero dynamic allocation, and repeatable runtime behavior support integration into real-time and assurance-sensitive products.

Constrained edge inference

Where latency, connectivity, cost, or operational simplicity favor local inference over cloud-dependent execution, especially for compact-model workloads on highly resource-constrained devices.


Section 09
Conclusion

AltiCoreMCU is a deterministic embedded inference runtime for MCU-class devices built on the AltiCore mathematical framework. In the deployment model described for this paper, trained models vary through statically compiled parameter blocks while inference executes through validated runtime templates, preserving a controlled executable boundary.

The core technical differentiators are local CPU execution, zero dynamic memory allocation, no cloud or NPU dependency, and a logic-dominant compute profile aligned to constrained processors across a wide range of MCU word sizes.

Current MCU collateral cites benchmark example configurations with parameter RAM as low as 521 bytes and observed throughput of approximately 9,000 inferences per second at 16 MHz. These figures are configuration-specific, but they support the core conclusion: AltiCoreMCU provides a practical path for deploying compact-model intelligence in embedded systems where memory bounds, predictable execution, and BOM stability are primary requirements.