homeblog

The format of ChampSim

Warning

If you don't want to waste time like I did simply read the instruction.h. It's all there.

The adventure

ChampSim, is a trace based simulator and despite the simple format of his binary trace it's surprisingly dependant of x86, one of the main maintainers (or the main one, I don't really know) has been pushing for a new trace format but not much advancement on the issue has been made.

This is an attempt to describe the current format of the trace and some of the considerations you have to take, especially if you are trying to trace a ISA different from x86.

constexpr std::size_t NUM_INSTR_DESTINATIONS = 2;
constexpr std::size_t NUM_INSTR_SOURCES = 4;

struct input_instr {
  // instruction pointer or PC (Program Counter)
  unsigned long long ip;

  // branch info
  unsigned char is_branch;
  unsigned char branch_taken;

  unsigned char destination_registers[NUM_INSTR_DESTINATIONS]; // output registers
  unsigned char source_registers[NUM_INSTR_SOURCES];           // input registers

  unsigned long long destination_memory[NUM_INSTR_DESTINATIONS]; // output memory
  unsigned long long source_memory[NUM_INSTR_SOURCES];           // input memory
};

I was filling all the fields and everything seemed to work, until I noticed my branches were not really detected as taken despite marking them as that.

My branches instructions looked something like this:

d0 06 01 00 00 00 00 00 ┊ 01 01 00 00 1c 0f 00 00

The first 8 bytes to the program counter, the two next ones with the flags for branch and taken.

After a lot of struggle and looking the binary of some example traces I saw that the instructions that are really detected as taken looked like this:

─────────────────────────────────┐
                                 ↓
d0 06 01 00 00 00 00 00 ┊ 01 01 1a 00 1c 0f 00 00

They differed from mine on the marked byte, but why! All the instructions had an extra destination register, now with hindsight it's pretty obvious... Why they are using the 26 register so much on branch instructions, it looks like it's the status register on x86 🤦.

Looking more closely to instruction.h this is more apparent as in the conversion from input_instr to ooo_model_instr it has some logic to detect the kind of branch that it's been executed but what pains me the most it's ignoring the branch taken parameter given by the trace in most cases. (I understand this is possibly a limitation on PIN or related tools but it's still a pain).