Skip to content

Compression

Emily3403 edited this page May 9, 2022 · 1 revision

Compression of videos into H265 Codec

Videos from isis tend to take a lot of space. This has to do with the nature of the videos: Usually it is a screen recoding optimized for encoding speed rather than space. This is fine since the hardware most professors use is pretty lacking.

Of course, we can do better. The Codec H265 provides a number of benefits in comparison to the old and outdated H264 Codec. One of these is way better space efficiency.

Unfortunately, re-encoding these videos is pretty expensive in terms of computational power. One nice thing about encoding is that this task can be split very efficiently. Thus, you can take full advantage of 8 / 16 / 32 cores.

Expected efficiency

Okay, assuming you are compressing your videos, how much space can you save?

It depends. But you can expect around 70%-90% space saved on your hard drive.

Is it worth it?

Yes, if you plan on archiving.

Since compressing videos is a very CPU intensive task, the benefits of the saved space might not outweigh the unusability of the computer during that time. So let's do a simple tradeoff calculation:

To compress 1 GiB of video it takes roughly 25min on my machine.

For the efficiency let's take 80% and for the library size 50 GiB.

Now the following formula calculates how much

...

TODO: Document this further

What about already decent videos?

Some videos already have a very decent encoding. In fact while ffmpeg is being run, you can estimate the new file size. This process is pretty stable and has very good accuracy. This may vary depending on how much information fluctuation you have in the video. The videos provided in the ISIS dataset have little information fluctuation: Usually static screen recordings.

To estimate how "good" a video will compress a few things have to factor in on how a score can be calculated:

  • Estimated file size / efficiency
  • How good the estimate is
  • How much of the file is already done

All these things are considered by the current score mechanic.

The formula for the score is given by

previous_estimates: List[float]
current_file_stdev = stdev(previous_estimates)

score = sqrt(1 + estimated_size) / log(1.65 + current_file_stdev) - (0.5 * percent_done ** 0.5)

The score consists of three parts indicated by the braces. These turn the three numbers into a single one on a consistent scale. Thus, we can make decisions only based on a single cut-off point.

If the current_file_stdev is too high, it is not recorded. "Too high" means everything > 0.5 is discarded.

Also, to prevent too high jumps there is a moving average across the last few seconds.

TODO: Document this further