-
-
Notifications
You must be signed in to change notification settings - Fork 15
[STATUS] Study
Min Chen edited this page Sep 16, 2020
·
14 revisions
On September 2020, I am almost unemployment (unpaid leave status until end of 2020), so I have more time to study and think about future of x265.
As a result, I guess the x266 will be come back soon. Moreover, I also retarget my x266 goal to mobile platforms during first stages.
This blog is used to record the prepare, idea, concepts, and study experience during the development process, welcome to discuss with us. In order to maintain the accuracy, I write as Chinese and English language.
A good performance codec that the compression rate may be lower than others (such as VTM), but it is goal best cost performance balance in the engineering fields.
09/16/2020
- Suspend Encoder and explorer Decoder path architecture with DAG, it is most important component in my codec, I want to implement it as tasks schedule based.
09/16/2020
- Hexagon does not support unalignment memory access, so I design bitstream low-level function work with 64 bits Cache.
09/15/2020
- Spends lot of time on DAG (Directed Acyclic Graph), it will be my base architecture for CU link.
09/14/2020
- Implement template SPS, PPS
09/13/2020
- The DDR4 burst length is 8 words, it means transaction is 64 or 128 bytes, so internal data struct design to 512 bytes unit may be better.
- Finished design data struct for internal frame buffer, it is 512-bytes block, storage 16x16 YUV420 data into there and organization as YUV420SP. I think it is more DDR4 friendly and more suitable large width SIMD, such as AVX512.
- Pack and prepare move to cheaper hotel in tomorrow.
09/12/2020
- Initialize Import x266 and try to start continue develop x266-mobile encoder
09/10/2020
- Read documents about Qualcomm v67 DSP
- The DSP is 4-parallelism VLIW architecture
- It is 32-bits address space
- 32 of registers, the R29-R31 are reserved
- R29 - SP
- R30 - FP
- R31 - LR
- Register can be pair as 64-bits, likely TI DSP
- Two of hardware zero-overhead loop
- Two of circular buffer
- Throughput is 4 of ALU32, 2 of LD/ST, 2 of branch, 1 of others