DEVELOPER BLOG

Face It! – 6M: Generative 3D Human Characters – Synthetic Images Dataset for Vision AI

By SKY ENGINE AI   18 July 2024

Synthetic Data

|

3D Generative AI

|

Human Characters

|

AI Dataset

|

Computer Vision


SUBSCRIBE
       

 

SKY ENGINE AI omnihuman cover2 

Face It! – 6M: Generative 3D Human Characters – Synthetic Images Dataset for Vision AI

The number of human-related applications of computer vision and AI is growing by the day. From medicine to retail to manufacturing and security, AI-powered solutions are becoming more prevalent and will soon be present in nearly all aspects of our lives. In all this flux one constant remains – the need for high-quality data for training AI models. Face analysis-related use cases intensify and complicate this need even more, for example, face recognition, gaze estimation, face segmentation, facial expressions analysis, etc. We challenge this reality with our approach to synthetic data for training computer vision AI models.

Why train computer vision models on synthetic data?

The seemingly easiest solution (and currently the most popular one) of using manually labeled real-world images to train deep learning models in computer vision has multiple drawbacks that are hard to eliminate. They include, among others:

  • Legal (and ethical) concerns, as privacy regulations, such as GDPR and similar, prevent the use of most of existing data with real human faces (labeled and unlabeled). Synthetic data is free of such constraints.
  • Labeling issues caused by manual or semi-manual labeling which is often imprecise and incomplete.
  • Low diversity of datasets. Face recognition AI models do not work properly in real life when trained and tested on datasets lacking in various accessories, hair styles, age, gender, skin color, etc., or on unwell-balanced data collections.
  • Context bias. Training outcomes obtained with real-world datasets don’t accurately reflect reality -- a seemingly illogical effect, but easy to explain. Some cases are so rare that obtaining a large number of images representing that case would last 100 years. Other cases are related to contamination with outside artifacts due to an insufficient number of labeled images.
  • Lack of 3D information in ground truth. Labeling farms are not able to add information such as 3D annotations, 3D keypoints, 3D bounding boxes, and depth maps to the real-world images.
     

 Figure 1. Example characters from the FaceIt!-6M dataset.

Let us introduce you to FaceIt!-6M Synthetic Humans Dataset

At SKY ENGINE AI we endeavor to solve all those problems and make data scientists' lives easier at the same time. The Face It!--6M dataset is a collection of synthetic human characters (Figure 1-7), created especially for Vision AI tasks, that enable perfect outcomes of training computer vision AI models.

supervised vs unsupervised learning 
Figure 2. Example characters from the FaceIt!-6M dataset with labels (normal vector map and depth map).

Omnihuman - 3D Generative Synthetic Human Characters for Vision AI

 Figure 3. Example characters from the FaceIt!-6M dataset in varying context.

It comprises 6,000,000 images of 15,000 unique characters and features a wide spectrum of ages, skin colors, and ethnicities. Additionally, each character can be adjusted to ensure variation by applying facial expressions, head poses, as well as different head and face accessories (Figure 2) and styles of hair (both on the head and face). The third characteristic aspect of this dataset is that all the labels are available for the visible light (VIS) and near-infrared light (NIR) modalities (Figure 3) To see all the details of the Face It!—6M Synthetic Human dataset, check the Table 1 below.

SKY ENGINE AI omnihuman samples 3 accessories

 Figure 4. Examples of head pose, hairstyle, and accessories from the FaceIt!-6M Synthetic Human Dataset.

 

SKY ENGINE AI omnihuman samples 3 nir

 Figure 5. Example characters from the FaceIt!-6M dataset in near-infrared (NIR).

 

SKY ENGINE AI omnihuman samples 3 groundtruth

 Figure 6. Each VIS image has a twin in NIR and complex ground truth available.

Data balancing is the key

Figures 8-11 illustrate the statistical balance of data within the FaceIt! – 6M Synthetic Human dataset. It is possible to filter out images according to the needs of a specific case, for example, train or test models on images where all characters wear glasses or whose head is in a certain position.

SKY ENGINE AI omnihuman samples 2

Figure 7. Example faces in VIS and NIR modalities with varying facial expressions.

FaceIt!-6M Generative 3D Synthetic Human Dataset accessories
Figure 8. Distribution of head accessories.
FaceIt!-6M Generative 3D Synthetic Human Dataset age distribution
Figure 9. Distribution of age and gender.
 FaceIt!-6M Generative 3D Synthetic Human Dataset pitch without outliers
Figure 10. Statistical distribution of head pitch (movement up or down).
FaceIt!-6M Generative 3D Synthetic Human Dataset plot gaze vector 3D
Figure 11. Distribution of the head pitch, yaw and roll movements.

See Table 1 with Face It!-6M Synthetic Humans dataset features:

Number of images 6 000 000 of high quality images of human characters (faces)
Resolution 512 x 512 pixels
Modality VIS, NIR (50/50 split)
Total dataset volume 8.52 TB
Available annotations Face bounding box
Gaze vectors (for each eye)
Head orientation
Gender
Expression – 30 labels
Skin tone – 11 categories
Ethnicity – African, Arabic, Asian, Caucasian, Hispanic, Indian
Age – 5 ranges: 18-25, 25-35, 35-45, 45-60, 60+
Segmentation mask with specified regions – neck, body, eyes (pupil, sclera, iris; left and right), ears (left and right), nose, hair, facial hair, lips (upper, lower), mouth interior, eyebrows (left, right), accessories (eyewear, headwear, earrings, necklace, face wear, eyebrow piercing, nose piercing, lip piercing, earbuds, bead)
Landmarks: 68 landmarks consistent with iBUG-68 and additionally pupil landmarks (70 landmarks in total); each landmark has specified location (2D image space, 3D camera space, 3D world space) and visibility
Depth map
Scene variety 15 000 identities, 200 unique frames per identity (3 million unique frames in total)
Complex skin texture randomization (pores, imperfections, scars, makeup, moles, acne, wrinkles, lip color, and more)
Randomized textures of hair, eyes and accessories
Randomized lighting
Randomized background
Randomized hairstyles
Randomized BMI
Randomization constraints preserving ID consistency
Controlled scene complexity for each ID (from non-obstructed face looking directly at the camera to various head orientations with obstructing accessories)

 

Moving beyond the obvious

Potential uses of a dataset like Face It!-6M Synthetic Humans go beyond e.g. driver monitoring systems. Multiple technologies are based on the recognition and analysis of human faces and the most popular ones include security and biometrics. Financial institutions use face recognition to improve user authentication, while airports deploy AI-powered systems that enhance boarding. Facial landmark detection can be applied to detect deepfakes, thereby limiting the spread of misinformation locally and globally.

Perhaps lesser-known examples of using face recognition and analysis technologies are in healthcare, psychology, and marketing. Emotion recognition software can help individuals with Autistic Spectrum Disorders improve their social skills and therefore boost their quality of life. Gaze vector estimation and analysis, on the other hand, prove immensely helpful in modern sales and marketing analysis to boost understanding of consumer behaviors.

SKY ENGINE AI is at the forefront of synthetic data technologies

Face It!-6M Synthetic Human Dataset was generated by our Synthetic Data Cloud’s feature -- Omnihuman. This newest addition to our Platform is not yet available to our customers, however, we can reveal a bit of the mystery. Omnihuman will allow full control of all aspects of synthetic data generation, including randomization of features and scenes. Data scientists will have a tool to select desired emotional expressions (based on the Facial Action Coding System, FACSs), adjust head pose for roll, yaw, and pitch, as well as choose multiple hairstyles and accessory variants.  Moreover, data generation will be fast because it will support simultaneous rendering on multiple machines. Stay tuned!

If you would like to know more about how SKY ENGINE AI can help you with your face dataset needs, just drop us a line and we’ll get back to you. If you’d like to meet us in person, follow our social media for announcements about conferences we will be attending.

 

Learn more about SKY ENGINE AI offering

To get more information on synthetic data, tools, methods, technology check out the following resources: