"Introduction"
With its excellent compression performance, AV1 is undoubtedly a new generation video coding standard that has attracted much attention since 2017. The industry has also conducted some evaluation work on AV1. For example, Facebook and Netflix have reduced its coding complexity from nearly a thousand times that of VP9 in the early stage to a hundred times. In order to verify the performance of AV1 on short video, the Meitu audio and video team has conducted a comprehensive AV1 performance evaluation based on the top 500 meipai short video since November 2018. The benchmarking encoder adopts the mainstream video encoders x264, x265 and VP9 used in the actual generation environment.
This paper will introduce the whole evaluation process in detail, and comprehensively evaluate the performance of AV1 on short video combined with experimental data.
Performance of AV1 in experiment
Before describing this article, let's take a look at the comprehensive performance of AV1. The evaluation results show that AV1 has obvious advantages in compression efficiency, but the coding time is still relatively long:
Compared with x264 high profile, x265 main profile and VP9, AV1 can obtain 36.0%, 26.9% and 31.8% bit rate gain respectively under the same quality. Moreover, with the increase of video resolution, the bit rate gain advantage of AV1 is more obvious;
In terms of encoding time, AV1 is 395 times that of x264 high profile, 36 times that of x265 main profile and 156 times that of VP9.
Research background
AV1 (full name, aomedia Video 1) is undoubtedly a new generation video coding standard that has attracted much attention since 2017 because of its high compression performance. The industry test data show that relevant experiments have been carried out by Moscow State University (MSU), Facebook and Netflix, which have confirmed that AV1 has surpassed h.265 and VP9 and has become the video coding standard with the highest compression rate at present. However, its high complexity has also been shocking. The data show that in early 2018, compared with the on-demand speed file of VP9 encoder, the encoding time of AOM / libaom [1] is nearly 1000 times longer than that of VP9 encoder. In the past two years, both AOM, the standard setting team of AV1, and other manufacturers, such as Intel, have been committed to optimizing the speed performance of AV1. Recently, relevant data show that AV1 is close to the use level [2].
In terms of software and hardware support, more and more software or platforms begin to support AV1 video playback, such as Mozilla's Firefox browser, chromium browser kernel, Microsoft Windows 10 platform and Android Q system; Google, Intel, arm, Qualcomm, Samsung, Sony and other head hardware manufacturers have also joined the R & D queue of hardware decoder. It can be predicted that the AV1 hard solution support of mobile terminal will be popularized rapidly in 2020 [3].
A comprehensive coding performance evaluation experiment is carried out for AV1
When the wave of AV1 is coming, we conducted a comprehensive coding performance evaluation experiment on the compression efficiency and coding complexity of AV1 by using libaom v1.0.0 in 2018 for the videos of top 500 and top talent in the United States. The coders for the experiment are the mainstream video coders x264, x265 and VP9 used in the actual generation environment, and the quality evaluation indexes are PSNR, SSIM and vmaf phone models.
Selection of video test sequence
The test sequence is taken from the top 500 of meipai and the popular and high-quality videos from the head expert. There are 523 videos actually participating in the evaluation experiment. These videos have the following characteristics:
Most of them are videos taken by mobile phones, including photos and videos, videos recorded on mobile phones and videos officially produced and released;
Compressed video;
Most of them are SD / HD (480p / 540p / 720p), rather than UHD / 4K and 8K commonly used by official testing institutions;
Most video frame rates are 30 FPS;
The width ratio of most videos is 16:9;
Most videos are between 10s and 60s.
Complexity analysis of video test sequence
The compression efficiency of an encoder is closely related to the video content. Therefore, before the encoder evaluation, we first analyze the complexity of each video. According to the method described in ITU-T subjective quality evaluation standard ITU-T p.910 [4], we describe the complexity of the test sequence by calculating the maximum spatial complexity information (SI) and maximum time complexity information (TI) for each video. Because there are a lot of scene switching in some videos, we also calculate the average Si and Ti to comprehensively measure the sequence complexity.
Figure 1 and Figure 2 respectively show the Si-Ti distribution of top 500 American videos (the first 6S). The results show that most of our videos are 40 ≤ Si ≤ 90 and Ti ≤ 40, which corresponds to a (one person, mainly head and shoulders, limited detail and motion), B (one person with graphics and / or more detail) and C (more than one person) in ITU-T subjective quality evaluation standard ITU-T p.910. The data results are also very consistent with the business scenario of meipai.
Figure 1 scatter distribution of Si-Ti (max) of top 500 American Video
Figure 2 scatter distribution of Si-Ti (average) of top 500 American Video
Encoder selection
For AV1, we use the reference software (AOM / libaom) provided by AOM AV1; For h.264/avc, h.265/hevc and VP9, we use the coding library corresponding to ffmpeg 4.0.2. Table 1 lists the encoder versions used in the experiment.
Candidate encoder
Implementation version
x264
Ffmpeg 4.0.2-libx264 (latest commit 303c484ec828ed0d8bfe743500e70314d026c3bd)
x265
Ffmpeg 4.0.2-libx265 (latest release 2.8)
VP9
ffmpeg 4.0.2-libvpx(tag 1.7.0)
AV1
AOM source code (commit d14c5bb4f336ef1842046089849dee4a301fbbf0 v1.0.0)
Table 1 encoder version used in the experiment
Configuration of coding parameters
We evaluate the encoder performance based on CRF and ABR rate control configurations respectively. The CRF configuration of different encoders is shown in Table 2. We take 6 CRF values for multi group coding, and take the code rate obtained after CRF coding as the target code rate of 2-pass ABR coding.
encoder
CRF configuration
X264/X265
19,23,27,31,35,39
VP9/AV1
27,33,39,45,51,57
Table 2 CRF configuration
The specific configuration scheme of each encoder is shown in Table 3.
encoder
CRF/QP
ABR(pass-1)
ABR(pass-2)
x264-high
ffmpeg -i -c:v libx264 -pix_ fmt yuv420p -profile:v high -preset veryslow -crf -threads 1 -refs 4 -g 60 -keyint_ min 60 -sc_ threshold 0 -an -f mp4
ffmpeg -i -c:v libx264 -pix_ fmt yuv420p -profile:v high -preset veryslow -b:v -threads 1 -refs 4 -g 60 -keyint_ min 60 -sc_ threshold 0 -pass 1 -an -f null -
ffmpeg -i -c:v libx264 -pix_ fmt yuv420p -profile:v high -preset veryslow -b:v -threads 1 -refs 4 -g 60 -keyint_ min 60 -sc_ threshold 0 -pass 2 -an -f mp4
x265-main
ffmpeg -i -c:v libx265 -pix_ fmt yuv420p -profile:v main -preset veryslow -crf -x265-params pools=none:frame-threads=1:scenecut=0:keyint=60:min-keyint=60:ref=4 -an -f mp4
ffmpeg -i -c:v libx265 -pix_ fmt yuv420p -profile:v main -preset veryslow -b:v -x265-params pools=none:frame-threads=1:scenecut=0:keyint=60:min-keyint=60:ref=4 -pass 1 -an -f null -
ffmpeg -i -c:v libx265 -pix_ fmt yuv420p -profile:v main -preset veryslow -b:v -x265-params pools=none:frame-threads=1:scenecut=0:keyint=60:min-keyint=60:ref=4 -pass 2 -an -f mp4
VP9
ffmpeg -i -c:v libvpx-vp9 -pix_ fmt yuv420p -crf -b:v 0 -speed 1 -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -lag-in-frames 25 -g 60 -keyint_ min 60 -an -f webm
ffmpeg -i -c:v libvpx-vp9 -pix_ fmt yuv420p -speed 1 -b:v -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -lag-in-frames 25 -g 60 -keyint_ min 60 -pass 1 -an -f null -
ffmpeg -i -c:v libvpx-vp9 -pix_ fmt yuv420p -speed 1 -b:v -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -lag-in-frames 25 -g 60 -keyint_ min 60 -pass 2 -an -f webm
AV1
aomenc --i420 --codec=av1 --cpu-used=1 --width= --height= --threads=0 --profile=0 --lag-in-frames=19 --auto-alt-ref=1 --min-q=0 --max-q=63 --kf-max-dist=60 --kf-min-dist=60 --drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=2000 --arnr-maxframes=7 --arnr-strength=5 --sharpness=0 --undershoot-pct=100 --overshoot-pct= 100 --tile-columns=0 --frame-parallel=0 --test-decode
Our other product: