리아트리스의 마법궁전: Rubiks:practical 360-degree video streaming for smartphones

이번 Rubiks:practical 360-degree video streaming for smartphones 논문은 360 video를 제한된 네트워크상에서 효율적으로 전송하기 위한 방법에 관한 논문인데요.
mobiSys 18' proceeding에 제출된 논문입니다.

저자는 Jian He, Mubashir Adnan Qureshi, Lili Qiu, Jin Li, Feng Li, Lei Han입니다.

일반적인 360 streaming은 위와 같은 구조로 이루어지는데요.

서버에서 wifi나 기타 무선 연결을 통해 VR headset으로 전송하고, VR headset에서는 video data를 수용하여 재생을 하는 구조를 가지고 있습니다.

다만 360 video의 경우에는 일반적인 2D video와는 다르게 360도를 전부 다 재생해야 하는데요. 최소 4k - 8k정도의 해상도로 재생해야 일반적인 low quality video와 비슷한 수준으로 재생이 됩니다.

이러한 고해상도 데이터를 전송/재생하기에 일반적인 wifi나 smartphone의 성능으로는 많이 어려운 점이 있기에, 데이터를 압축함과 동시에 품질을 향상시키기 위해 이 논문에서 제안된 방법 중 하나가 Rubiks입니다.

기존의 3d 360 video의 재생 방법에는 youtube등 현재의 360 video streaming service들이 사용하는 전체 streaming 방법, FoV기반 streaming, FoV+ 기반의 streaming 방식이 있어요.

FoV+ streaming방식은 FoV뿐만 아니라 주변 타일도 streaming합니다.

다만 기존의 streaming방식들은 대체로 quality와 타협을 하거나 혹은 decoding이나 bandwidth에서 손해를 보게 되죠...

이 문제를 해결하려고 본 논문에서 제안한 게 Rubiks이구요.

Rubiks은 혼합형 알고리즘으로서, 360 video streaming에 있어 quality와 bandwidth 사이에서 적당한 수준의 밸런스가 잡힐 수 있게 조정한 것이라고 볼 수 있겠지요.

About Rubiks

Rubiks는 기본적으로는 FoV+에 기반을 둔 360 streaming 방식인데요.

기본적으로는 360 video들을 tile로 나누어 재생하는 tile based algorithm에 기반을 두고 있구요. 이외에도 FoV예측이나 video 압축 알고리즘등으로 구성되어 있습니다.

tile based algorithm 부분은, temporal splitting과 spacial splitting로 구성되어 있는데요.

말 그대로 spacial splitting는 360 video를 여러 개의 타일로 나눈 것을 말하며, temporal splitting은 시간 순서대로 전송되는 frame를 일정 단위로 묶은 뒤 시간 순서대로가 아닌 섞어 전송하는 것을 의미해요.

Video rate adaptation 부분에서는

bandwidth, FoV, video chunk의 갯수에 기반하여 최적화하는 모델링 방식을 사용하고 있더군요.

본 논문의 연구 결과에 따르면, 압축에 관여하는 quantization parameter와 SSIM은 밀접한 관련이 있고, SSIM은 quality와 밀접한 관련이 있기에 해당 metric를 이용해서 품질을 계산한다고 해요.

system architecture는 전통적인 server-client모델이며

server쪽에서는 tile video encoding과 stream data를 수행하고, client 쪽에서는 FoV예측, network throughput 예측, optimization등을 수행하네요.

Client side structure

Client side estimates the predicted FoV and network throughput, runs the optimization, and generates requests

Tracking and predicting head position
Network throughput predictor - The client continually monitors the network throughput when downloading video data.
Video request optimizer - Given the predicted head position and throughput, do optimization
Video Downloader
Video decoder
Video player

Server side structure

Server side handles the video encoding and stream data according to the client requests.

video layer & tile extractor – divide video with temporal and special
video encoder – encode video with HEVC(kvazaar)
video database
tile manager & video request handler - manages video tile & send video tile to client

Video evaluation

Video quality 측정의 경우에는 SSIM기반의 측정 방식을 활용해 이뤄졌으며, 네트워크나 decoder에 대한 부하 정도는 4k영상과 8k영상을 가지고 실험하였더군요.

위의 tile conf는 한번에 36개의 video tile중 conf.에 해당하는 만큼 1초에 3번 전송하였을 때 decode하는 시간이 얼마나 걸리는가에 관한 것인데요. 실험결과 25-20-20개가 사실상 한계라고 하며, 이 이상 되면 decoding time이 재생 시간보다 더 길어지게 된다고 하네요.