The heptane computer processor is a processor (CPU) with out-of-order
execution which pack it's program instructions (risc like) into 256-bit
bundles. I'm currently developing it in Verilog and have completed the
back-end to apoint where memory reads and alu operations execute. L1 cache
insertions works. Now i'm working on the control unit and load store queue
to allow for jumps and stores. There will also be compare and jump instructions.
The core will have 2-way smt. Trace cache will "staighten" branches so that,
if taken branches occur more often than once per 9 instructions, the
processor will still operate at full speed. The instructions in trace cache
will be in execution order rather than IP order. However, if loops with less
than 9 instructions are executed most of the time, then full performance
will only be achievable with 2 threads using SMT.
The processor can fetch 12 instructions per cycle, and can execute up to
9 general purpose instructions per cycle. It has most addressing modes
of x86_64, so that both assembly code from this architecture and binary code
translation) can be executed with near-native performance. The assembler
translates the complex (CISC) instructions into sequences of shorter ones,
overwritting some of the extra registers if needed.
Basically what that means, is that most programs written in compiled
languages, even with inline assembly, will compile without change.Also there
will be a dynamic translator, which will both emulate a whole x86/x64 system
and run user mode application code.
It is expected to achieve clock frequency above 2GHz if implemented on a
modern fab process with static cmos logic. If implemented using dynamic
logic & custom design, then above 3 GHz.
It is designed to have significantly better performance per clock than (big
name company) processors on general purpose instructions. On floating point,
the peak throughput is equal as compared to the consumer version of said
company CPU, but the sustainable throughput is more as it has 4 128 bit FPU
instead of 2 256 bit. They are 2 MUL and 2 ADD.
The processor supports 3 simultaneous loads and 2 stores to/from L1 cache.As
long as the loads and stores are aligned on 4 bytes and are 4 or more bytes
wide (except extended precision), no penalty for misalignment and split
cache-line access occurs. (but here is for split-page access).
A simd gather load can load 2 data items in 1 load slot, (native mode only)
as long as they wholly fit in 64 byte range, are single or double format,
and are aligned to 4 bytes.
The assembler will be open-source, syntax compatible with gnu as.It will
support a subset of x86_64/x86 as well as extra instructions native to the
There will be support for hardware accelerated AES encryption.
The assembler will translate CISC instructions in sequences of simpler
RISC-like instructions, by using the extra registers as scratch registers.
With regards to the x86/x64 translator, one might think that it'll be slow,
because previous attempts have been made to use a many instruction processor
to emulate it and it was slow. However, the transmeta 8 instruction
processor was in-order, had only 2 arithmetic-logic units and 2 load-store
units. The heptane processor will have 3 load units, 2 store units, 6
arithmetic logic units and will feature out-of -order execution. The two
processors aren't comparable.