-
Code Optimization - mostly C++
- Intel 64 and IA-32 Architectures Optimization Reference Manual
- Знакомьтесь, loop fracking
- Приемы использования масочных регистров в AVX512 коде
- LLVM для исследователей
- Оптимизация циклов: нужны блоки
- «Ра-а-авняйсь, смирно!». Выравниваем данные
- Городские легенды о медленных вызовах виртуальных функций
- Основные проблемы влияющие на производительность вычислительного ядра и приложения и методы их решения компилятором
- Как правильно скопировать массив и при чем тут SFINAE
- Linux Performance
-
g++ autovectorization tips and issues - SSE,AVX
-
Is Parallel Programming Hard, And, If So, What Can You Do About It?
-
Performance Monitor
- What are perf cache events meaning?
- perfmon2
- perf wiki
- Linux Performance Monitoring, any way to monitor per-thread?
- oprofile
- tl;dr
operf [ options ] [ --system-wide | --pid=<PID> | [ command [ args ] ] ] A typical usage might look like this: operf ./my_test_program my_arg
-
MMU - TLB - huge pages
- Virtual Memory
- Which CPUs support 1GB pages?
- How to use Intel Westmere 1GB pages on Linux?
- HugeTLB - Large Page Support in the Linux Kernel
- Hugepages
- Huge pages part 1 (Introduction)
- How Bad Can 1GB Pages Be?
- Multiple Page Size Support in the Linux Kernel
- mmap/munmap tips
- Writing an OS in Rust - Page Tables
-
Cache issues
-
Other Memory issues
-
new technologies from Intel : Cache Monitoring Technology (CMT), Memory Bandwidth Monitoring (MBM), Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) Technology
- Cache Monitoring Technology (CMT), Memory Bandwidth Monitoring (MBM), Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) Technology
- User space software for Intel(R) Resource Director Technology http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
- processors supporting CAT Cache Allocation Technology
- A few experiments with the Cache Allocation Technology
- CAT white paper
-
Timers
-
Videos from The 3rd annual JuliaCon 2016 (MIT)
-
Useful pieces of information
-
Assembler
-
Scientific Methodology and Performance Evaluation
- Scientific Methodology and Performance Evaluation for Computer Scientists
- analytical GPU performance model based on Little’s law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy
- Understanding Latency Hiding on GPUs PhD Thesis by Vasily Volkov , EECS BerkeleyUnderstanding Latency Hiding on GPUs PhD Thesis by Vasily Volkov , EECS Berkeley
-
Binary Instrumentation
-
review of binary instrumentation methods: Fuzzing binary-only programs with AFL++
-
Intel GTPin
-
DynamoRIO
-