Sunday, November 17, 2024
Google search engine
HomeGuest BlogsPortrait Depth API: Turning a Single Image into a 3D Photo with...

Portrait Depth API: Turning a Single Image into a 3D Photo with TensorFlow.js

Editor’s note: This article was reposted from the TensorFlow Blog with permission. To learn more about Web ML, be sure to check out Jason Mayes’ session at ODSC Europe 2022 titled “Next Generation Web Apps: Create a Machine Learning Powered Smart Cam in the Browser with TensorFlow.js.”

depth map is essentially an image (or image channel) that contains information relating to the distance of the surfaces of objects in the scene from a given viewpoint (in this case, the camera itself) for every pixel in that image. Depth maps are a fundamental building block for a variety of computer graphics and computer vision applications, such as augmented realityportrait mode, and 3D reconstruction. Despite the recent advances in depth-sensing capabilities with ARCore Depth API, the majority of photographs on the web are still missing associated depth maps. This, combined with users from the web community expressing a growing interest in having depth capabilities within JavaScript to enhance existing web apps such as bringing images to live, applying real-time AR effects to a human face and body, or even reconstructing items for use in VR environments, helped shape the path for what you see today.

Today we are introducing the Depth API, the first depth estimation API from TensorFlow.js. With this new API, we are also introducing the first depth model for portraits, ARPortraitDepth, which estimates a depth map for a single portrait image. To demonstrate one of many possible usages of depth information, we also present a computational photography application, 3D photo, which utilizes the predicted depth and enables a 3D parallax effect on the given portrait image. Try the live demo below, everyone can easily make their social media profile photo 3D as shown below.

Try out the 3D portrait demo for yourself! 

ARPortraitDepth: Single Image Depth Estimation

At the core of the Portrait Depth API is a deep learning model, named ARPortraitDepth, that takes a single color portrait image as the input and produces a depth map. For the sake of computational efficiency, we adopt a lightweight U-Net architecture. As shown below, the encoder gradually downscales the image or feature map resolution by half, and the decoder increases the feature resolution to the same as the input. Deep learning features from the encoder are concatenated to the corresponding layers with the same spatial resolution in the decoders to bring high-resolution signals for depth estimation. During training, we force the decoder to produce depth predictions with increasing resolutions at each layer, and add a loss for each of them with the ground truth. This empirically helps the decoder to predict accurate depth by gradually adding details.

Abundant and diverse training data is critical for the machine learning model to achieve overall decent performance, e.g. accuracy and robustness. We synthetically render pairs of color and depth images with various camera configurations, e.g. focal length, camera pose, from 3D digital humans captured by a high-quality performance capture system, and run relighting augmentation with High Dynamic Range environment illumination maps to increase the realism and diversity of the color images, e.g. shadows on the face. We also collect real data using mobile phones equipped with a front-facing depth sensor, e.g. Google Pixel 4, where the depth quality, as the training ground truth, is not as accurate and complete as that in our synthetic data, but the color images are effective in improving the performance of our model when running on images in the wild.

Single image depth estimation pipeline.

The portrait depth model could enable a whole host of creative applications orientated around the human body that could drive next-generation web apps. We refer readers to ARCore Depth Lab for more inspiration. To enhance the robustness against background variation, in practice, we run an off-the-shelf body segmentation model with MediaPipe and TensorFlow.js before sending the image into the neural network of depth estimation.

For the 3D photo application, we created a high-performance rendering pipeline. It first generates a segmented mask using the TensorFlow.js existing body segmentation API. Next, we pass the masked portrait into the Portrait Depth API and obtain a depth map on the GPU. Eventually, we generate a depth mesh in three.js, with vertices arranged in a regular grid and displaced by re-projecting corresponding depth values (see the figure below for generating the depth mesh). Finally, we apply texture projection to the depth mesh and rotate the camera around the z-axis in a circle. Users can download the animations in GIF or WebM format.

Generating the depth mesh from the depth map for the 3D photo application.

The portrait depth API is currently offered as one variant of the new depth API.Portrait Depth API Installation

To install the API and runtime library, you can either use the <script> tag in your HTML file or use NPM.

Through script tag:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/body-segmentation"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/depth-estimation"></script>

Through NPM:

yarn add @tensorflow/tfjs-core @tensorflow/tfjs-backend-webgl
yarn add @tensorflow/tfjs-converter
yarn add @tensorflow-models/body-segmentation
yarn add @tensorflow-models/depth-estimation

To reference the API in your JS code, it depends on how you installed the library.

If installed through script tag, you can reference the library through the global namespace depthEstimation.

If installed through NPM, you need to import the libraries first:

import '@tensorflow/tfjs-backend-core';
import '@tensorflow/tfjs-backend-webgl';
import '@tensorflow/tfjs-converter';
import '@tensorflow-models/body-segmentation;
import * as depthEstimation from '@tensorflow-models/depth-estimation;

Try it yourself!

First, you need to create an estimator:

const model = depthEstimation.SupportedModels.ARPortraitDepth;
    estimator = await depthEstimation.createEstimator(model);


    const video = document.getElementById('video');
    const depthMap = await estimator.estimateDepth(video);

Once you have an estimator, you can pass in a video stream, static image, or TensorFlow.js tensors to estimate depth:

const video = document.getElementById('video');

    const estimationConfig = {
      minDepth: 0, // The minimum depth value outputted by the estimator.
      maxDepth: 1, // The maximum depth value outputted by the estimator.
    };

   const depthMap = await estimator.estimateDepth(video, estimationConfig);

How to use the output?

The depthMap result above contains depth values for each pixel in the image.

The depthMap is an object which stores the underlying depth values. You can then utilize the provided asynchronous conversion functions such as toCanvasImageSourcetoArray, and toTensor depending on the desired output type that you want for efficiency.

It should be noted that different models have different internal representations of data. Therefore converting from one form to another may be expensive. In the name of efficiency, you can call getUnderlyingType to determine what form the depth map is in already so you may choose to keep it in the same form for faster results.

The semantics of the depthMap are as follows: the depth map is the same size as the input image. For array and tensor representations, there is one depth value per pixel. For CanvasImageSource, the green and blue channels are always set to 0, whereas the red channel stores the depth value.

See below output snippet for example:

  {
    toCanvasImageSource(): ...
    toArray(): ...
    toTensor(): ...
    getUnderlyingType(): ...
  }

Browser Performance

Portrait Depth model

MacBook M1 Pro 2021.
(FPS)

iPhone 13 Pro
(FPS)

Desktop PC
Intel i9-10900K. Nvidia GTX 1070 GPU.
(FPS)

TFJS Runtime

With WebGL backend.

51

22

47

Acknowledgments

We would like to acknowledge our colleagues who participated in or sponsored the creation of the Portrait Depth API in TensorFlow.js: Na LiXiuxiu YuanRohit PandeyAbhishek KarSergio Orts EscolanoChristoph RhemannIdris AleemSean FanelloAdarsh KowdlePing YuAlex Olwal‎, Sarah HeimlichCecilia Abadie. We would also like to acknowledge the body segmentation model provided by MediaPipe, and The Relightables for high-quality synthetic data.

About the ODSC Europe 2022 Speaker:

Jason Mayes is the public face of TensorFlow.js, helping web engineers globally take their first steps with machine learning in JavaScript. He also combines his knowledge of the technical and creative worlds to develop innovative prototypes for Google’s largest customers and internal teams with over 15 years of experience working in web engineering and investigating emerging technologies.

He holds an MEng in Computer Science, is a member of the British Computing Society, and is a certified information privacy technologist. Jason loves sharing knowledge online which has attracted a global following. In his spare time he can be found walking the wings of flying aircraft being one of the few people in the world who has been trained in the art of wing walking.

RELATED ARTICLES

Most Popular

Recent Comments