The format of ChampSim
Warning
If you don't want to waste time like I did simply read the instruction.h
. It's all there.
The adventure
ChampSim, is a simulator oft's trace based and despite the simple format of his binary trace it's suprisingly dependant of x86, one of the main mainteiners (or the main one, I don't really know) has been pushing for a new trace format but not much advancment on the issue has been made.
This is an attempt to describe the current format of the trace and some of the considerations you have to take, specially if you are trying to trace a ISA diffrent from x86.
constexpr std::size_t NUM_INSTR_DESTINATIONS = 2;
constexpr std::size_t NUM_INSTR_SOURCES = 4;
struct input_instr {
// instruction pointer or PC (Program Counter)
unsigned long long ip;
// branch info
unsigned char is_branch;
unsigned char branch_taken;
unsigned char destination_registers[NUM_INSTR_DESTINATIONS]; // output registers
unsigned char source_registers[NUM_INSTR_SOURCES]; // input registers
unsigned long long destination_memory[NUM_INSTR_DESTINATIONS]; // output memory
unsigned long long source_memory[NUM_INSTR_SOURCES]; // input memory
};
I was filling all the fields and everything seem to work, until I noticed my branches were not really detected as taken despite marking them as that.
My branches instructions looked somthing like this:
d0 06 01 00 00 00 00 00 ┊ 01 01 00 00 1c 0f 00 00
The first 8 bytes to the program counter, the two next ones with the flags for branch and taken.
After a lot of struggle and looking the binary of some example traces I saw that the instruciton that are really detected as taken looked like this:
─────────────────────────────────┐
↓
d0 06 01 00 00 00 00 00 ┊ 01 01 1a 00 1c 0f 00 00
They differed from mine on the marked byte, but why! All the instructions had an extra destinatin register, now with hindsight it's pretty obvius... Why they are using the 26 register so much on branch instructions, it looks like it's the status register on x86 🤦.
Looking more closelly to instruction.h
this is more apparent as in the conversion from input_instr
to ooo_model_instr
it has some logic to detect the kind of branch that it's been executed but what pains me the most it's ignoring the branch taken parameter given by the trace in most cases. (I understend this is possibly a limitation on PIN or related tools but it's still a pain).