Feature or enhancement
Proposal:
Not a user visible feature per se but I still think it warrants discussion.
The problem:
I've been working on fixing up the RISC-V support for perf and also checking out adding s390x and ppc64le but the process is quite tedious, I'd say and the various connections around this infra are fragile, so I've been thinking of how to make the process a little bit easier and more maintainable/readable.
The steps to add perf trampoline support for a new CPU architecture currently:
The DWARF CFI instructions must mirror the assembly. This synchronization needs to be done manually and involves compiling a C equivalent of the trampoline, running readelf, and hand-translating the output into DWRF macros. Getting it wrong can produces silent failures, broken stack unwinding.
With x86-64, aarch64, and RISC-V there (RISC-V broken atm), when adding more things there, the assembly file is becoming bigger and less readable, elf_init_ehframe_perf() is becoming a wall of #ifdef blocks without much shared logic and so on.
Proposal1:
Split assembly files.
Replace the single Python/asm_trampoline.S with per-arch files:
- Python/asm_trampoline_x86_64.S
- Python/asm_trampoline_aarch64.S
- Python/asm_trampoline_riscv64.S
- (maybe in the future: asm_trampoline_ppc64le.S, asm_trampoline_s390x.S)
configure.ac is modified accordingly and selects which file to compile depending on arch. With each file self-contained, each arch can be reviewed, modified, tested independently. This will also require a few more lines in configure and careful handling of the MacOS case.
Proposal2 (followup):
Auto-generate DWARF data from compiled assembly.
A build-time script extracts the .eh_frame section from the compiled trampoline object and generates a header with the raw DWARF unwind data. At runtime, elf_init_ehframe_perf() copies this data and patches in the actual code address and size, replacing the current architecture-specific DWARF generation code entirely.
The JIT's Tools/jit/build.py already extracts DWARF CFI data from compiled objects to generate jit_unwind_info-.h for the GDB unwind path so something similar could be deployed here.
Another possibility would be a dependency on readelf, but I don't think that this is something desirable.
Pros and cons
Pros:
- Adding a new architecture requires exactly one new file (the assembly) and a few lines in configure.ac. No DWARF knowledge needed (assuming the DWARF extraction script is robust :) ).
- Eliminates possible bugs with syncronization of DWARF data (or move them to the extraction script, either way it should be simpler to deal with)
- jit_unwind.c's elf_init_ehframe_perf shrinks to only the arch-independent code.
Cons
- PYTHON_FOR_REGEN must be available when the trampoline assembly changes (same constraint as JIT stencil generation). That practically means that perf support will now depend on PYTHON_FOR_REGEN, making it unavailable for "clean" builds.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Feature or enhancement
Proposal:
Not a user visible feature per se but I still think it warrants discussion.
The problem:
I've been working on fixing up the RISC-V support for perf and also checking out adding s390x and ppc64le but the process is quite tedious, I'd say and the various connections around this infra are fragile, so I've been thinking of how to make the process a little bit easier and more maintainable/readable.
The steps to add perf trampoline support for a new CPU architecture currently:
The DWARF CFI instructions must mirror the assembly. This synchronization needs to be done manually and involves compiling a C equivalent of the trampoline, running readelf, and hand-translating the output into DWRF macros. Getting it wrong can produces silent failures, broken stack unwinding.
With x86-64, aarch64, and RISC-V there (RISC-V broken atm), when adding more things there, the assembly file is becoming bigger and less readable, elf_init_ehframe_perf() is becoming a wall of #ifdef blocks without much shared logic and so on.
Proposal1:
Split assembly files.
Replace the single Python/asm_trampoline.S with per-arch files:
configure.ac is modified accordingly and selects which file to compile depending on arch. With each file self-contained, each arch can be reviewed, modified, tested independently. This will also require a few more lines in configure and careful handling of the MacOS case.
Proposal2 (followup):
Auto-generate DWARF data from compiled assembly.
A build-time script extracts the .eh_frame section from the compiled trampoline object and generates a header with the raw DWARF unwind data. At runtime, elf_init_ehframe_perf() copies this data and patches in the actual code address and size, replacing the current architecture-specific DWARF generation code entirely.
The JIT's Tools/jit/build.py already extracts DWARF CFI data from compiled objects to generate jit_unwind_info-.h for the GDB unwind path so something similar could be deployed here.
Another possibility would be a dependency on readelf, but I don't think that this is something desirable.
Pros and cons
Pros:
Cons
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response