Wednesday, April 30, 2025
spot_imgspot_imgspot_imgspot_img
HomePhotographyAs a Videographer, Should You Shoot in 4:2:0, 4:2:2, or Raw? A...

As a Videographer, Should You Shoot in 4:2:0, 4:2:2, or Raw? A Primer on Chroma Subsampling


Dedicated video cameras and even hybrid cameras now offer a plethora of formats for shooting video, but what is the difference between 4:2:0, 4:2:2, and raw, and how will it affect your footage? A grasp of the technical details can be helpful in making an informed decision about which format to use.

The profusion of different recording formats offered in your camera’s video menu can sometimes feel overwhelming, but if you take nothing else away from this article, here’s the essence of it: depending upon factors like the resolution, color depth, and frame rate of your video footage, operations such as encoding and decoding your video, editing your footage, or storing and recovering it from files can be extremely data-intensive, time-consuming, and computationally demanding. The rather cryptically named video formats mentioned in the title of this article are the response to this problem, offering videographers a variety of options for making trade-offs between the quality of their footage and the amount of information (or data) that is required to store it.

The Big Trade-Off: Picture Quality Versus Data Storage

In general, the highest quality video formats will require the most data for storage, while the lowest quality formats will require the least. The amount of data required to store your video footage can have important practical consequences. The larger datasets generated by the high-quality video formats create bigger files that come with some potential downsides. Larger volumes of digital storage media are required to store the footage, and the longer write times for these high-quality video files can also impose limits on your camera’s ability to capture footage. Choosing a higher quality video format could, for example, force you to shoot at a lower frame rate and/or resolution in order to allow the camera’s data pipeline to keep up.

And the problems don’t end at the camera.

Once you have these larger files uploaded to your computer for editing, the time and computational resources required to read and process these files is correspondingly greater, and your computer might even struggle to complete these tasks at all if it lacks the necessary memory or computational power to handle the processing of such large datasets.

On the plus side, higher quality video formats will give you—well… higher quality footage (obviously)—but they can also offer an easier workflow and superior results within the editing suite, something we will discuss later.

A simple example of this kind of trade-off between picture quality and file size that everybody understands is bit depth. The pixels on a digital camera sensor will have a specified bit depth for encoding colors when recording video or stills. A pixel with an 8-bit color depth can record 2⁸ or 256 colors for each of the red, green, and blue (RGB) channels, for a total of about 16.7 million colors. A camera sensor that offers 12-bit color depth, by contrast, can record about 68.3 billion colors. The color rendition of the 12-bit sensor will obviously be far superior in most circumstances to that of the 8-bit sensor, but a 20-megapixel image captured with the 12-bit sensor will require 90 megabytes to store the color data, whereas the 8-bit sensor requires 60 megabytes (assuming no image compression in either case).

The 8-bit color image below (courtesy of Wikipedia) nicely illustrates the kind of trade-off between quality and file size that we have been discussing. In the blue background of the sky, you can clearly see an example of the banding phenomenon that can be caused by the use of a shallower bit depth for color.

It’s pretty easy to understand how the choice of bit depth for your video image can affect the trade-off between image quality and file size, but what about these other, more mysteriously named video formats like 4:2:2 and 4:2:0?

The 90 Year-Old Color System That We Still Use Today

In order to understand these video formats, we need to take a step beyond the world of simple RGB color and look at a different system for encoding color that arose in the late 1930s when television engineers were starting to think about the introduction of color broadcasts. In a manner analogous to the way that Microsoft in 1985 needed their new Windows Operating System to be backward-compatible with all of the PCs already running their previous Disk Operating System (DOS), television engineers recognized that during the transition period when color television was still being introduced, the new color broadcasts would also need to be compatible with the black and white television sets that most people were still using. In 1938 a French engineer, Georges Valensi came up with an ingenious system for separating the black and white component of the picture from the color components. For the new color broadcasts, the existing black and white televisions would simply use the black and white component of the signal, while the new color televisions would reconstruct a full color image from this black and white component in combination with two additional color channels.

Despite its age, Valensi’s system, referred to as YCbCr, is still in use today, and it is, in fact, the foundation of our modern video encoding methods (including the 4:2:2 and 4:2:0 formats that we will discuss here). Instead of separating a picture into red, green, and blue channels, the YCbCr system separates the picture into two broad components referred to as luma and chroma. The luma component (the Y in YCbCr) is essentially the black and white portion of the picture, while the chroma component consists of two color difference channels—a blue difference channel (Cb) and a red difference channel (Cr).

The truly ingenious aspect of this system is that it directly exploits the manner in which the human eye responds differently to luminance (light and dark tones) and color—and it does this in a very clever way that allows us to encode accurate video images using less information.

Because the human eye is more sensitive to luminance than to color, it is possible to encode color information at lower resolution and still be able to reconstruct an accurate picture. In the case of a broadcast television signal, this corresponds to the use of less bandwidth for the chroma (color) components than for the luma (luminance) component. For a digital video image, we can exploit this same space-saving concept by using a smaller fraction of our data to encode the lower-resolution chroma components.

But as the infomercial goes—wait, there’s more…

Our eyes are also more sensitive to the central green region of the visible spectrum than they are to the colors closer to the red or blue ends of the spectrum, which means we can further reduce the amount of data we need to store for our video frame by storing less red and blue color information than green color information—and this is exactly what YCbCr can do.

But at this point, you might be asking, “Where exactly is the green channel information? We have the luma (black and white) component and chroma channels for the red and blue color differences. Are we throwing the green channel away?”

The answer is no.

Because our eyes are more sensitive to green, the green color information is preserved within the higher-resolution luma component. When the green color information is needed for the reconstruction of the original image in RGB, it can be readily extracted from the luma data.

Just as an aside—this heightened sensitivity of the human eye to green is also reflected in the layout of the colored filters in the Bayer matrix (or X-Trans if you’re using Fuji) that likely sits in front of your digital camera sensor and is used to reconstruct color from the pure luminance image that the sensor sees. If you look at the diagram below, you will see that there are two green filters on the Bayer matrix for each red or blue filter—weighting the green component of the image more heavily in accordance with the natural color response of our eyes.

Chroma Subsampling: A Clever Hack To Save On Data Storage

Because our eyes are more sensitive to the luminance of an image than to its colors, we can sacrifice some resolution in the color information—particularly in the blue and red channels—without compromising the accuracy of our image too much. This allows us to further reduce the amount of data that we need to store the image, with corresponding benefits when it comes to sidestepping some of the problems with large datasets that we have already discussed. One way to achieve this reduction in data is to selectively use some of the chroma information from certain pixels via a method known as chroma subsampling.

Consider this array of 8 color pixels in the original image.

We can separate out the luma and chroma components of this pixel array like this.

Before we go forward, it is important to note that the CbCr pixels are shown as a single pixel combination of the Cb and Cr channels, but in the YCbCr system, each of them would actually be encoded as two separate Cb and Cr pixels.

You will notice that there are two rows of pixels with four pixels in each row, and this is where the names of the chroma subsampling formats 4:2:0 and 4:2:2 come from. The first number is the width of the pixel block across which we are sampling colors—in this case, 4. The second number is the number of pixels whose colors we will sample in the first row. The third number is the number of pixels whose colors we will sample in the second row.

This next image shows these subsampling protocols more clearly and will help us to understand the details of each protocol.

In the 4:2:0 format, we sample two pixels in the CbCr first row—pixels 1 and 3—and no pixels at all from the second row. Then we set pixels 1 and 2 in the first row to the value of pixel 1, and pixels 3 and 4 to the value of pixel 3. Since we did not sample any pixels in the second row, we simply set the value of each pixel in the second row to the value of the pixel above it in the first row. Adding back the luma channel gives us the result that we see at the bottom of the diagram.

From the diagram, you can see that with 4:2:0 subsampling, we are sacrificing half of our chroma resolution vertically and half horizontally.

In the 4:2:2 format, we sample two pixels in the first row—pixels 1 and 3—and the same two pixels from the second row. Then we set pixels 1 and 2 in the first row to the value of pixel 1, and pixels 3 and 4 to the value of pixel 3—but this time, since we also sampled two pixels in the second row, we can perform the equivalent operation for the pixels in the second row.

From the diagram, you can see that with the 4:2:2 subsampling we are sacrificing half of our chroma resolution horizontally but retaining all of our original vertical resolution.

But what about that 4:4:4 protocol in the third column?

You will notice that with 4:4:4 subsampling, we are using all of the CbCr values in each row and are therefore sacrificing no color resolution at all. The 4:4:4 protocol is what we call a lossless video encoding format, and if you had not already guessed it, 4:4:4 subsampling is more commonly referred to as raw.

So Back to the Picture Quality Versus Data Storage Question

Let’s first look at how much data each of these subsampling protocols saves us when we’re encoding our video footage. The good news here is that you don’t even need to memorize these numbers because there’s a very easy rule of thumb for figuring this out just from the name of the protocol. I’ll give you this quick rule in a moment, but let’s see how it works.

If we encode all 8 pixels using YCbCr (4:4:4) with a bit depth of 8, we need 8 bits for each luma pixel, 8 bits for each Cb pixel, and 8 bits for each Cr pixel, for a total of 192 bits to encode the full 8-pixel array. This is the storage requirement for the lossless raw format, which we can take as a baseline since we’re not saving any space using this protocol.

For 4:2:2, we only have four Cb and four Cr pixels instead of eight of each, so we can encode the full 8-pixel array using only 128 bits—a saving of one-third.

For 4:2:0, we only have two Cb and two Cr pixels instead of eight of each, so we can encode the full 8-pixel array using only 96 bits—a saving of one-half.

The quick and easy rule of thumb for figuring out how much each video format saves you is to add up the numbers in the protocol’s name and divide by 12. So 4:4:4 = 12/12 = 1, 4:2:2 = 8/12 = 0.67, and 4:2:0 = 6/12 = 0.5. Easy!

So what about picture quality?

With all of the talk about discarding color resolution, you might be tempted to think that 4:2:0 is some kind of quick and dirty protocol for capturing low quality video footage using a minimum of storage, but it might surprise you to learn that 4:2:0 is actually the standard for high quality digital video media like Blu-ray. If you consider an analogy from the world of still photography, we effectively discard a huge amount of information when we convert an image from its original raw format to a JPEG, but we can still make wall-sized prints from a JPEG image if the resolution is sufficient.

In truth, you would be hard-pressed to see much, if any, difference under most circumstances between video shot using the raw format and video shot using 4:2:0. The differences are definitely there if you’re determined to pixel peep, but they’re usually subtle—showing up mainly in scenes where the frame is divided by sharp edges at the boundaries of different colors. The image below shows a comparison between the three subsampling protocols discussed here, and in the magnified view, you can see traces of the subsampling artifacts for 4:2:0 and 4:2:2.

Aside from the higher quality of footage that it delivers, raw (4:4:4) video really shines when it comes to editing. To return to our still photography analogy, a lot of professional photographers shoot in raw even if they will ultimately deliver their images in a compressed format such as JPEG, because it gives them a great deal more flexibility and control during the editing process. The same kind of approach is often followed by professional videographers.

Raw video is uncompressed (or uses lossless compression), retaining full-resolution color for every pixel and avoiding any problems with compression artifacts at the editing stage. Since the video is unprocessed, the video editor has a great deal more flexibility to manipulate and adjust the footage—for example, setting the white balance, recovering blown highlights or dark shadows, or applying color grading. For chroma key work—shooting against a green screen, for example—or any kind of compositing in post-production, the use of raw video is essential in order to avoid artifacts such as color fringing and jaggy edges that can occur as a result of chroma subsampling.

So, having read this article, it is my hope that the next time you dive into the video menu on your camera, you will find the array of video format options a little less daunting—as well as having a better idea of what the consequences will be for choosing one format over another when it comes to storing and handling your footage.





Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments

Enable Notifications OK No thanks