All about... Apple's 'cinematic mode' video capture

Published by at

Shooting video on our phones is something we all do from time to time, usually when out and about with family, and especially with kids, who you want to capture as photogenically as possible at every age, so that you can look back when they're older.  And, off to one side in Apple's tweaked Camera application for the iPhone 13 range, is a new video capture mode that aims to please. With this in mind, and with a superb example clip from an iPhone 13 owner, here are my thoughts on the new system, including how it actually works.

So yes, ‘cinematic video’. I’ve been curious as to exactly what Apple has done here - I’m not convinced that most iPhone 13 reviewers really understand it. ‘Cinematic video’ is an option in the Camera UI as another mode (left of 'Video' in the mode carousel) and it’s limited to 1080p at 30fps, which gives a clue that this is a baby step. This somewhat archaic limit, in 2021, is because of the amount of processing involved. No doubt next year’s A16 Bionic chip will herald being able to use ‘cinematic’ mode in 4K, but we’re limited for now.

Who is it for?

It’s all very well for Apple to wheel out famous directors to show this mode off, but speaking as a video professional of sorts - I’ve been shooting video on phones for 15 years - cinematic mode isn’t really aimed at anyone who already knows what they’re doing. 

Although I mainly do head shots, usually with precise focus, for The Phones Show, I’ve done some arty shoots too, and in each case I plan ahead with a storyboard - see my brief tutorial. I know what the focus of each video clip should be and I tap to focus as needed, knowing that focus is then not going to waver even as the framing changes.

Then it’s on to the next clip, with its own specific focus. In each case relying on natural bokeh to do some separation of subject from background, but also relying on my experience to choose backgrounds that don’t distract from the foreground.

And the more professional a videographer is - more pro than me, anyway - the more they’ll use larger lenses, proper cameras, dollies, light sources and reflectors, and so on. The framing, focus, and bokeh in each case is imagined in the director’s or professional’s mind, and implemented by traditional means. Even sometimes on phones.

So real pros aren’t going to give an AI video director the time of day. But Apple has put in cinematic mode for the man in the street, for someone who fell in love with taking Portrait stills to now be able to do something… similar, but with video.

How does it work?

The flagship 'effect', used over and over again in Apple's promo videos (shown off to exagerated effect in the clip above), is the recognition of when a subject's 'gaze' switches and the imitation of what’s called ‘rack focus’ (moving focus from one subject to another in a way that looks deliberate and planned) and this is a good place to start. Now, note that you can already tap to change focus when shooting video on most phones, I did a two minute demo in the garden (horribly downsampled by Twitter, but you'll get the idea):

OK, a little snarky there, but do bear in mind that I was only referring to just one aspect of 'cinematic video'. The change in focus with modern phone cameras is almost instant, helped on the iPhone 12 Pro Max used here by its LiDAR system. Impressively fast, you'd think. But that’s actually the problem, in terms of 'art'.  Auto-focus and manual focus changes are so fast in modern phones that the visual effect can be jarring for the viewer. In professional video and, well, cinema, 'rack focus' is used to guide the viewer’s eyes from one thing to another, slowly, smoothly, and obviously.

From my investigation, Apple’s Cinematic Mode is actually a bundle of five AI functions that work together to mimic professional video traits:

  • Subject recognition and tracking, typically human heads and shoulders

  • Focus locking on said subjects

  • Slow ‘Rack’ focus changes from one identified subject to another

  • Artificial shallow depth of field, i.e. bokeh (think portrait mode, but for video)

  • Examining beyond the current video frame crop on the sensor, to identify subjects about to come into frame

Plus a real time depth map is compiled, frame by frame, using LiDAR data (on the 'Pro' iPhone 13 devices) for close by subjects, and this is all stored within the video file.

Ignoring the 1080p limitation, there’s still an awful lot of kludging going on, mind you. Although actual lens focus can change according to subject, for most purposes cinematic mode uses 'hyperfocal' techniques, which means relying on most things in frame being - optically - in focus most of the time. How much this is true will depend on subjects and lighting conditions, but it's a fair starting point for most real world video.

Then, according to the effects being implemented and the depth map, the relevant parts of the frame can be progressively blurred in software. Even at just 30fps that’s an awful lot of calculations for the phone’s chipset to crunch through if the blurring is to look natural enough to be convincing. Effectively it’s computing 30 Portrait mode photos per second (ouch)!

The gaze focus transferral trick is cool, but you’re not going to find much use for that in daily home video. Much more interesting and useful will be the artificial shallow depth of field, helping your subjects really stand out, whatever the background. Danny Winget, a prominent YouTuber and videographer, shot his family in Disneyland with lots of cinematic video - I think you'll be as impressed as I was:

The effect is rarely perfect now, but it’s still striking and you just know that it’s going to get better with updates and with next year’s processors to power it all.

Given that this is all software and data, rather than real time optics, you can also tweak the centre of artificial focus for each bit of footage after the fact, thanks to the stored depth map per frame, and you can even adjust the virtual aperture to increase or decrease the level of bokeh.

The purist in me shudders at the thought of all this computation taking the place of hard-crafted optical planning, but the pragmatist in me agrees that many people will try this and love the portrait look of their video subjects.

In fact, when Apple has perfected this and has the oomph in their phones to do it at 4K then I reckon this won’t be a separate shooting mode in Camera. It will be a toggle in the main Video mode. So you’d shoot as-is or with ‘Cinematic AI’ on, depending on subject.

Just a prediction.