SKY ENGINE AI PLATFORM

Insights from your computer vision datasets at the unprecedented accuracy

Book a Demo

SKY ENGINE AI – DEEP LEARNING IN VIRTUAL REALITY PLATFORM

Accelerate AI Transformation with High Accuracy
Deep Learning in Real World Applications

Train highly accurate AI models for computer vision
in the SKY ENGINE AI developers platform
without a hassle of data acquisition and labeling.

Book a demo

SKY ENGINE AI – PLATFORM OVERVIEW

Evolutionary AI Platform for Deep Neural Networks training in Virtual Reality for Computer Vision applications

SKY ENGINE AI – ECOSYSTEM DETAILS

GPU simulator with sets of Physics-based rendering shaders tailored to sensor fusion

  • Multispectral, physics-based rendering and simulation:
    • Visual light
    • NIR
    • Thermal
    • X-ray
    • Lidar
    • Radar
    • Sonar
    • Satellite
  • Render passes dedicated to deep learning
  • Animation and motion capture systems support
  • Determinism and advanced machinery for randomization strategies of scene parameters for active learning approach
  • GAN-based materials and images postprocessing
  • Support for Nvidia MDL and Adobe Substance textures
  • Data scientist friendly
  • Compatibility with popular CGI software like Blender, Maya or Houdini

Garden of neural networks, metrics and tools in SKY ENGINE AI Platform

Garden of specific neural network models, ready for parallel training, tested and optimised for the following tasks:

  • Object detection
  • Classification
  • Semantic Segmentation
  • Image Translation
  • Geometry Reasoning (3D pose and position estimation)
  • Key Points 2D/3D
  • Pose Estimation
  • Domain adaptation
  • Depth Estimation
  • Spectrogram classification

SKY ENGINE AI – FULL STACK WORKFLOW

Allows building optimal, customised AI models from scratch and training them in a virtual reality

 

 

SKY ENGINE AI – Working example: Rugby gameplay 3D Pose Estimation

[1]: from skyrenderer.utils.common import get_presentation_folder_from_assets
[2]: get_presentation_folder_from_assets('/dli/mount/assets', 'gtc03_assets')

1 Motivation

In rugby, as in any similar type of sport, building correct playing strategy before the championship season is a key to success for any professional coach and club owner. While coaches strive at providing best tips and point out mistakes during the game, they still are uncapable of noticing every detail and behavioral patterns of both teams while rewatching the matches. For being able to collect such data, analyze it and make inference about team behavior, sofisticated AI algorithms can be used.

In particular, the types of the tasks we would like to solve for fostering the analysis of rugby team are the location of each player during the match and the 3D pose of each player on the field.

Having such information in real-time will provide necessary evidence for building better playing strategy.

2 Problem

Machine learning algorithms are getting more and more powerful in all kinds of classification problems, including image recognition. Increasingly sophisticated models, given enough correctly labeled data, are able to achieve superb performance and accuracy. In many cases a class of a problem has an efficient solution already discovered, but it cannot be applied - the only bottleneck is missing data.

The process of gathering and - especially - labeling data can be extremely expensive and time-consuming. The images must be manually analyzed by humans, whose labor in such repetitive tasks is not only slow and expensive, but also less accurate, compared to computers.

In addition, there are cases that require modern equipment for the production of labeled data and highly qualified specialists to maintain the production process. This case significantly increases the project cost or in many cases, makes the project realization unaccessible for stakeholders.

3 Solution

What if we could generate automatically the images suited perfectly for the task at hand with the complete and always correct ground truth built-in?

We would like to show our attempt to achieve exactly this on the example of football players pose recognition. The goal is to train the model to accurately recognize the football players and their poses as human keypoints in 3D space on the real-life match footage, like below, having been trained exclusively on artificial, synthetic data. The images are rendered scenes, that are fully controlled by our renderer, so all kinds of ground truths can be provided, depending on the model's requirements.

[3]: from skyrenderer.example_assistant.markdown_helpers import show_jupyter_picture, show_jupyter_movie
[4]: show_jupyter_picture('gtc03_assets/illustrations/football_frame_1.png')
[4]:
[5]: show_jupyter_picture('gtc03_assets/illustrations/football_frame_2.png')
[5]:
[6]: show_jupyter_picture('gtc03_assets/illustrations/football_frame_3.png')
[6]:

3.1 Agenda

  • Dependences
  • Context Configuration
  • SKYENGINE RENDERER CONFIGURATION
    • The graphic assets
    • Assets configuration
    • Scene Tree Structure
    • Scene
    • Renderer Scenario
    • Renderer Datasource
  • TRAINING
  • EVALUATION
    • On Synthetic Data
    • On Real Data
[7]: from skyrenderer.core.logger_config import configure_logger
[8]: logger = configure_logger()

First let's visualize the GPUs available on the machine. Based on this we can select which GPUs will be used by rendering and learning. By default we use all available devices.

[9]: !gpustat

/bin/sh: 1: gpustat: not found

[10]: import torch
[11]: AVAILABLE_GPUS = list(range(torch.cuda.device_count()))

4 Sky Engine renderer configuration

4.0.1 Context configuration

It is required to set the path where the assets (images, meshes, animations etc.) are stored. For convenience, the example assistant is configured. It will help with visualizations.

[12]: from skyrenderer.scene.renderer_context import RendererContext
from skyrenderer.scene.scene import SceneOutput
from skyrenderer.example_assistant.visualization_settings import VisualizationDestination
from skyrenderer.example_assistant.display_config import DisplayConfig
from skyrenderer.example_assistant.example_assistant import ExampleAssistant
[13]: root_paths_config = {
'assets_root': '/dli/mount/assets',
'cache_root': '/dli/mount/cache'
}
renderer_ctx = RendererContext(root_paths_config)
     2021-03-16 15:05:38,218 | skyrenderer.scene.renderer_context | INFO: Root
paths:
- root path: /home/skyengine/.local/lib/python3.6/site-packages/skyrenderer
- assets path: /dli/mount/assets
- config path: /home/skyengine/.local/lib/python3.6/sitepackages/
skyrenderer/config
- optix sources path: /home/skyengine/.local/lib/python3.6/sitepackages/
skyrenderer/optix_sources/sources
- cache path: /dli/mount/cache
2021-03-16 15:05:38,480 | skyrenderer.service.service | INFO: Open GUI here:
http://andariel.skyengine.ai:10001/index.html?service_addr=104.45.29.245:20000
[14] display_config = DisplayConfig(visualization_destination=VisualizationDestination.DISPLAY,
 visualized_outputs=[SceneOutput.BEAUTY],
 cv_waitkey=0)
example_assistant = ExampleAssistant(context=renderer_ctx, display_config=display_config)

4.1 The graphic assets

In the Sky Engine pipeline the graphic assets, the building blocks for the scene, are prepared by a CG Artist using third-party software tools. Assets prepared for this scene:

  1. Geometries

    The main format used in Sky Engine for carrying information about scene definition: models and their relative positions (or position ranges for randomization) is Alembic (.abc). Alembic exchange format developed by Sony Pictures Imageworks and Lucasfilm is widely used in the industry and is supported by most of the modern CG tools.

    For this scene an artist prepared:

    • Model of a rugby stadium,
    • Animation of a rugby player with keypoints,
    • Scene definition Alembic file - locators specifying positions of all the geometries, lights and camera. The player does not have a fixed position, it has a position range instead.
  2. Materials

    Sky Engine by default uses a metallic-roughness PBR shader. The input maps for the shader can come from files or from the Substance archive (.sbsar). Sky Engine built-in support for Substance allows for parameter randomization and texture rendering on the fly in background.

    For this scene an artist prepared:

    • Substance archive for rugby players,
    • Substance archive for parts of the stadium: base, grass, logos, crowd,
    • Files with maps for banners, bumpers and screen.
  3. Environmental mapping

    The background for this scene is a simple cloudy sky HDR.

4.1.1 Context configuration

For the Alembic assets prepared according to Sky Engine guidelines, the whole scene can be loaded and visualized without further configuration.

[15]: renderer_ctx.load_abc_scene('stadium')
     2021-03-16 15:05:38,525 | skyrenderer.core.asset_manager.asset_manager | INFO: Syncing git annex…
2021-03-16 15:05:40,209 | skyrenderer.core.asset_manager.asset_manager | INFO: Syncing git annex done.
[16]: renderer_ctx.setup() logger.info(f'Scene\n{str(renderer_ctx)}')
     2021-03-16 15:05:54,532 | main | INFO: Scene
scene_tree:
top_node (count: 1)
|-- bumper_GEO_NUL_000 (count: 1)
| +-- bumper_GEO (count: 1)
|-- bumper_GEO_NUL_001 (count: 1)
| +-- bumper_GEO_0 (count: 1)
|-- bumper_GEO_NUL_002 (count: 1)
| +-- bumper_GEO_1 (count: 1)
|-- bumper_GEO_NUL_003 (count: 1)
| +-- bumper_GEO_2 (count: 1
) |-- light_L01_LIGHT_NUL (count: 1)
|-- light_L02_LIGHT_NUL (count: 1)
|-- light_L03_LIGHT_NUL (count: 1)
|-- light_L04_LIGHT_NUL (count: 1)
|-- player_GEO_NUL (count: 1)
| +-- player_GEO (count: 1)
|-- rugby_pitch_GEO_NUL_000 (count: 1)
| +-- rugby_pitch_GEO (count: 1)
|-- rugby_pitch_GEO_NUL_001 (count: 1)
| +-- rugby_pitch_GEO_0 (count: 1)
|-- screen_GEO_NUL (count: 1)
| +-- screen_GEO (count: 1)
|-- banners_GEO_NUL (count: 1)
| +-- banners_GEO (count: 1)
|-- crowd_GEO_NUL (count: 1)
| +-- crowd_GEO (count: 1)
|-- grass_baners_GEO_NUL (count: 1)
| +-- grass_baners_GEO (count: 1)
|-- grass_GEO_NUL (count: 1)
| +-- grass_GEO (count: 1)
|-- logo_adidas_GEO_NUL (count: 1)
| +-- logo_adidas_GEO (count: 1)
|-- stadium_base_GEO_NUL (count: 1)
| +-- stadion_base_GEO (count: 1)
|-- stadium_details_GEO_NUL (count: 1)
| +-- stadion_details_GEO (count: 1)
|-- stripes_GEO_NUL (count: 1)
| +-- stripes_GEO (count: 1)
|-- camera_CAM_NUL (count: 1)
| +-- camera_CAM (count: 1)
+-- camera_target_NUL (count: 1)
[17]: with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:06:09,938 | skyrenderer.utils.time_measurement | INFO: Render time: 15.40 seconds
[17]:

4.1.2 Materials

Each loaded object needs to have a material assigned.

[18]: from skyrenderer.scene.scene_layout.layout_elements_definitions import MaterialDefinition
from skyrenderer.basic_types.provider import SubstanceTextureProvider, FileTextureProvider
from skyrenderer.basic_types.procedure import PBRShader
[19]: player_textures = SubstanceTextureProvider(renderer_ctx, 'rugby_player')
renderer_ctx.set_material_definition('player_GEO', MaterialDefinition(player_textures))
[20]: stadium_base_textures = SubstanceTextureProvider(renderer_ctx, 'concrete')
stadium_base_params = PBRShader.create_parameter_provider(renderer_ctx, tex_scale=50)
renderer_ctx.set_material_definition('stadion_base_GEO',
  MaterialDefinition(stadium_base_textures, parameter_set=stadium_base_params))
[21]: crowd_textures = SubstanceTextureProvider(renderer_ctx, 'crowd')
crowd_params = PBRShader.create_parameter_provider(renderer_ctx, tex_scale=5)
renderer_ctx.set_material_definition('crowd_GEO', MaterialDefinition(crowd_textures, parameter_set=crowd_params))
[22]: grass_textures = SubstanceTextureProvider(renderer_ctx, 'grass')
renderer_ctx.set_material_definition('grass_GEO', MaterialDefinition(grass_textures))
[23]: grass_logos_textures = SubstanceTextureProvider(renderer_ctx, 'logos_grass')
renderer_ctx.set_material_definition('grass_baners_GEO', MaterialDefinition(grass_logos_textures))
[24]: banners_textures = FileTextureProvider(renderer_ctx, 'banners', 'stadium/banners')
renderer_ctx.set_material_definition('banners_GEO', MaterialDefinition(banners_textures))
[25]: screen_texture = FileTextureProvider(renderer_ctx, 'screen', 'stadium/screen')
renderer_ctx.set_material_definition('screen_GEO', MaterialDefinition(screen_texture))
[26]: bumpers_texture = FileTextureProvider(renderer_ctx, 'bumpers', 'stadium/bumpers')
renderer_ctx.set_material_definition('bumper_GEO.?.?$', MaterialDefinition(bumpers_texture), use_regex=True)
[27]: white_params = PBRShader.create_parameter_provider(renderer_ctx, 'white_params', material_color=(0.8, 0.8, 0.8))
renderer_ctx.set_material_definition('stripes_GEO', MaterialDefinition(parameter_set=white_params))
renderer_ctx.set_material_definition('rugby_pitch_GEO.?.?$', MaterialDefinition(parameter_set=white_params),
  use_regex=True)
[28]: renderer_ctx.setup() with example_assistant.get_visualizer() as visualizer:   visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:06:32,080 | skyrenderer.utils.time_measurement | INFO: Render time: 7.86 seconds
[28]:

Let's replace the gray background with a sky.

[29]: from skyrenderer.basic_types.item_component import Background
from skyrenderer.basic_types.procedure import EnvMapMiss
from skyrenderer.basic_types.provider import HdrTextureProvider
[30]: renderer_ctx.define_env(Background(renderer_ctx,
  EnvMapMiss(renderer_ctx),
  HdrTextureProvider(renderer_ctx, 'light_sky')))
     2021-03-16 15:06:35,475 | skyrenderer.scene.renderer_context | WARNING: Setting background definition after setup.
[31]: renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
  visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:06:45,492 | skyrenderer.utils.time_measurement | INFO: Render time: 9.75 seconds
[31]:

4.1.3 Scene configuration

The Sky Engine renderer provides virtually endless possibilities to shuffle, multiply, randomize and organize the assets.

From one Alembic animation we are creating two teams of 20 players each.

[32]: renderer_ctx.layout().duplicate_subtree(renderer_ctx, 'player_GEO_NUL', suffix='team2')
renderer_ctx.layout().get_node('player_GEO_NUL').n_instances = 20
renderer_ctx.layout().get_node('player_GEO_NUL_team2').n_instances = 20
[33]: renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
  visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:06:53,184 | skyrenderer.utils.time_measurement | INFO: Render time: 6.84 seconds
[33]:

By default, all materials are drawn randomly. To create two proper teams we need to ensure that each team has the same shirt color which is different than the other team's color, while keeping all the other inputs (hair, skin color, socks color, shirt number etc.) random.

To achieve it, we need to put the the players into separate randomization groups and define their drawing strategy. The Substance archive input that controls shirt color is called "Colors_select". It needs to be the same (synchronized) inside the randomization group and different between groups. All the other inputs are kept randomized by default.

[34]: from skyrenderer.randomization.strategy.input_drawing_strategy import SynchronizedInput
from skyrenderer.randomization.strategy.synchronization import Synchronization, SynchronizationDescription
from skyrenderer.randomization.strategy.drawing_strategy import DrawingStrategy
[35]: shirt_sync = SynchronizedInput(SynchronizationDescription(
in_strategy=Synchronization.DISTINCT_EQUAL_GROUPS))
player_material_strategy = DrawingStrategy(renderer_ctx, inputs_strategies={'Colors_select': shirt_sync})
renderer_ctx.instancers['player_GEO'].modify_material_definition(strategy=player_material_strategy)
renderer_ctx.instancers['player_GEO_team2'].modify_material_definition(randomization_group='team2',
  strategy=player_material_strategy)
[36]: renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
  visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:08:15,715 | skyrenderer.utils.time_measurement | INFO: Render time: 4.22 seconds
[36]:

If you looked closer on the picture above, you might notice that each player is in the exact same pose. By default, Sky Engine plays animations from Alembic files frame by frame, so we need to randomize this parameter.

[37]: from skyrenderer.randomization.strategy.input_drawing_strategy import UniformRandomInput
[38]: player_geometry_strategy = DrawingStrategy(renderer_ctx, frame_numbers_strategy=UniformRandomInput())
renderer_ctx.instancers['player_GEO'].modify_geometry_definition(strategy=player_geometry_strategy)
[39]: renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
  visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:08:23,451 | skyrenderer.utils.time_measurement | INFO: Render time: 6.82 seconds
[39]:

During the rugby match, players are not distributed roughly uniformly - they tend to gather in a group closer together. To make the scene look more natural, we can change the way the players' positions are drawn. Instead of drawing them uniformly, we can use random Gaussian random distribution. It is double-random, because first 𝜇 and 𝜎 are drawn, and then the positions for players are drawn also randomly with these parameters.

[40]: from skyrenderer.randomization.strategy.input_drawing_strategy import RandomGaussianRandomInput
[41]: gauss_strategy = DrawingStrategy(renderer_ctx,
  default_input_strategy=RandomGaussianRandomInput(sigma_relative_limits=(0.1, 0.2)))
renderer_ctx.layout().get_node('player_GEO_NUL').modify_locus_definition(strategy=gauss_strategy)
[42]: renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
  for _ in range(5):
    visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:08:31,028 | skyrenderer.utils.time_measurement | INFO: Render time: 6.71 seconds
[42]:
     2021-03-16 15:08:37,221 | skyrenderer.utils.time_measurement | INFO: Render time: 5.84 seconds
[42]:
     2021-03-16 15:08:44,746 | skyrenderer.utils.time_measurement | INFO: Render time: 7.18 seconds
[42]:
     2021-03-16 15:08:48,757 | skyrenderer.utils.time_measurement | INFO: Render time: 3.66 seconds
[42]:
     2021-03-16 15:08:52,932 | skyrenderer.utils.time_measurement | INFO: Render time: 3.82 seconds
[42]:

This concludes configuration of materials, geometries and their positions.

4.1.4 Lights

The artist defined light positions in the Alembic scene definition. By default they have a constant intensity. We will randomize them.

[43]: from skyrenderer.basic_types.provider.provider_inputs import HSVColorInput
from skyrenderer.basic_types.lights import BasicLight
[44]: white_light_provider = BasicLight.create_parameter_provider(renderer_ctx,
  color=HSVColorInput(hue_range=(0, 0),
    saturation_range=(0, 0),
        value_range=(0.4, 1)))
renderer_ctx.set_light('light_L01_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider))
renderer_ctx.set_light('light_L02_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider))
renderer_ctx.set_light('light_L03_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider))
renderer_ctx.set_light('light_L04_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider)

4.1.5 Camera

Now we must improve the camera and its filters. The rendering process in Sky Engine is defined by a chain of render steps. The output of a step is an input of the next one. We define four render steps: * PinholeRenderStep - simple pinhole camera with randomized horizontal field of view (hfov, in degrees), which simulates random zoom, * Denoiser - AI-Accelerated Optix Denoiser, * Tonemapper - Optix tonemapper with randomized parameters - gamma and exposure, * GaussianBlurPostprocess - the train images should not be perfect. Images with different degree of blurring help the model to generalize better. Here we're using random Gaussian blur.

Additionally, we're reducing the output size to match the deep learning model required input size.

[45]: from skyrenderer.render_chain import RenderChain, PinholeRenderStep, Denoiser, Tonemapper, GaussianBlurPostprocess
from skyrenderer.basic_types.provider.provider_inputs import IntInput, FloatInput
[46]: HEIGHT = 768
WIDTH = 1024
[47]: pinhole_params = PinholeRenderStep.create_hfov_parameter_provider(renderer_ctx,
  hfov=IntInput(min_value=15, max_value=35))
camera_step = PinholeRenderStep(renderer_ctx, origin_name='camera_CAM_NUL', target_name='camera_target_NUL',
  parameter_provider=pinhole_params)
[48]: denoiser = Denoiser(renderer_ctx)
[49]: tonemapper_params = Tonemapper.create_parameter_provider(renderer_ctx, gamma=FloatInput(min_value=2, max_value=5),
    exposure=FloatInput(min_value=0.7, max_value=1))
tonemapper = Tonemapper(renderer_ctx, parameter_provider=tonemapper_params)
[50]: gauss_blur_params = GaussianBlurPostprocess.create_random_parameter_provider(renderer_ctx, max_kernel_radius=7,
     max_sigma_x=2, max_sigma_y=0.7)
gauss_blur = GaussianBlurPostprocess(renderer_ctx, parameter_provider=gauss_blur_params)
[51]: renderer_ctx.define_render_chain(RenderChain([camera_step, denoiser, tonemapper, gauss_blur],
    pwidth=WIDTH,
    height=HEIGHT))
[52]: renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
  for _ in range(5):
    visualizer(renderer_ctx.render_to_numpy())
     2021-03-16 15:09:02,229 | skyrenderer.utils.time_measurement | INFO: Render time: 8.31 seconds
[52]:
     2021-03-16 15:09:07,964 | skyrenderer.utils.time_measurement | INFO: Render time: 5.50 seconds
[52]:
     2021-03-16 15:09:11,144 | skyrenderer.utils.time_measurement | INFO: Render time: 2.97 seconds
[52]:
     2021-03-16 15:09:14,030 | skyrenderer.utils.time_measurement | INFO: Render time: 2.61 seconds
[52]:
     2021-03-16 15:09:17,424 | skyrenderer.utils.time_measurement | INFO: Render time: 3.17 seconds
[52]:

4.1.6 Scene semantics

Last but not least, we must provide ground truth - information about scene semantics. This setup is designed for player detection with keypoints, so we must assign the semantic class only to players.

As mentioned before, the keypoints are already present in the player animation. Sky Engine by default calculates all the information about keypoints, if it receives them in the input assets, we just need to visualize them to be sure everything is configure correctly. Green keypoints are visible, red - hidden.

[53]: renderer_ctx.set_semantic_class('player_GEO', 1)
renderer_ctx.set_semantic_class('player_GEO_team2', 1)
[54]: from skyrenderer.scene.scene import SceneOutput}
example_assistant.visualized_outputs = {SceneOutput.BEAUTY, SceneOutput.SEMANTIC, SceneOutput.KEYPOINTS}
[55]: renderer_ctx.setup() }
with example_assistant.get_visualizer() as visualizer:}
   for _ in range(5):}
    visualizer(renderer_ctx.render_to_numpy())}
     2021-03-16 15:09:23,943 | skyrenderer.utils.time_measurement | INFO: Render time: 5.87 seconds
[55]:
[55]:
     2021-03-16 15:09:29,358 | skyrenderer.utils.time_measurement | INFO: Render time: 5.18 seconds
[55]:
[55]:
     2021-03-16 15:09:33,988 | skyrenderer.utils.time_measurement | INFO: Render time: 4.38 seconds
[55]:
[55]:
     2021-03-16 15:09:37,252 | skyrenderer.utils.time_measurement | INFO: Render time: 3.01 seconds
[55]:
[55]:
     2021-03-16 15:09:40,526 | skyrenderer.utils.time_measurement | INFO: Render time: 3.01 seconds
[55]:
[55]:

Everything is OK, so we can create a renderer datasource for training.

[56]: from skyengine.datasources.multi_purpose_renderer_data_source import MultiPurposeRendererDataSource
[57]: datasource = MultiPurposeRendererDataSource(renderer_context=renderer_ctx, images_number=20,
    cache_folder_name='rugby_presentation_new')
[57]: datasource = MultiPurposeRendererDataSource(renderer_context=renderer_ctx, images_number=20,
    cache_folder_name='rugby_presentation_new')

5 Training

Training configuration.

[58]: from deepsky.evaluators.sample_savers import ImageBboxKeyPointSaver, EvalHook
from deepsky.models.pose3d import get_pose_3d_model
from deepsky.trainers.trainer import DefaultTrainer
from deepsky.serializers.simple import SimpleSerializer
from skyengine.datasources.wrappers.mpose3d_wrapper import SEWrapperForDistancePose3D
from torch.utils.data import DataLoader
import torchvision.transforms as standard_transforms
[59]: import numpy as np
[60]: class Constants:
  TRAIN_BATCH_SIZE = 1:
  VALID_BATCH_SIZE = 1:
  NUM_WORKERS = 0 :
  TRAIN_SHUFFLE = True:
  VALID_SHUFFLE = False:
  DROP_LAST = True:
  EPOCHS = 1
[61]: transform = standard_transforms.Compose([standard_transforms.ToPILImage(),
    standard_transforms.ToTensor()])
[62]: main_datasource = SEWrapperForDistancePose3D(datasource, imgs_transform=transform)
# split the dataset in train and test set
torch.manual_seed(79)
indices = torch.randperm(len(main_datasource)).tolist()
dataset = torch.utils.data.Subset(main_datasource, indices[:int(len(indices) * 0.9)])
dataset_test = torch.utils.data.Subset(main_datasource, indices[(len(indices) * 0.9):])
[63]: def collate_fn(batch):
  return tuple(zip(*batch))
[64]: train_data_loader = DataLoader(dataset,
                batch_size=Constants.TRAIN_BATCH_SIZE,
                num_workers=Constants.NUM_WORKERS,
                drop_last=Constants.DROP_LAST,
                shuffle=Constants.VALID_SHUFFLE,
                collate_fn=collate_fn)
[65]: valid_data_loader = DataLoader(dataset_test,
                batch_size=Constants.VALID_BATCH_SIZE,
                num_workers=Constants.NUM_WORKERS,
                drop_last=Constants.DROP_LAST,
                shuffle=Constants.TRAIN_SHUFFLE,
                collate_fn=collate_fn)
[66]: model = get_pose_3d_model(main_datasource.joint_num, backbone_pretrained=True)
model = model.cuda(0)
logger.info('Train length in batches {}'.format(len(train_data_loader)))
logger.info('Test length in batches {}'.format(len(valid_data_loader)))
     2021-03-16 15:21:34,651 | main | INFO: Train length in batches 18
2021-03-16 15:21:34,652 | main | INFO: Test length in batches 2
[67]: def keypoint_saver_transform(x):
  return x
[68]: key_point_saver = ImageBboxKeyPointSaver(keypoint_saver_transform, labels=['person'], colors_per_class=None,
                  use_labels=False, connections=main_datasource.CONNECTIONS)
[69]: evalbatch = {'keypoints_3D_image_saver': (key_point_saver, 1)}
[70]: def hook_func(evalhook, images_batch, predictions_batch, metas_batch):
  """ hook_func is the function which EvalHook instance will execute after method "update" call. You should
  define this function according to evalbatch input of evaluator inside Trainer """

  for img, preds, metas in zip(images_batch, predictions_batch, metas_batch):
    img = standard_transforms.ToPILImage()(img.cpu())
    evalhook.count()
    poses_coords = metas['poses_coords'].numpy()
    poses_coords[:, :, 2] = metas['poses_viz'].squeeze(2).numpy()
    keypoint_targets = {'boxes': metas['boxes'], 'keypoints': poses_coords, 'stamp': 'GTS'}
    pred_poses_coords = preds['pred_poses_coords'].cpu().numpy()
    pred_poses_coords[:, :, 2] = 1
    keypoint_preds = {'boxes': preds['boxes'].cpu(), 'keypoints': pred_poses_coords, 'stamp': 'PREDS'}
    keypoint_name = 'img{}.png'.format(evalhook.counter)
    keypoints_image_saver_object, freq = evalhook.keypoints_3D_image_saver
    keypoints_image_saver_object.update(
      img_name=keypoint_name, images=[img], preds=[keypoint_preds], gts=[keypoint_targets])
[71]: evaluator = EvalHook(evalbatch, hook_func)
[72]: logger.info(model)
[73]: optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[16, 64], gamma=0.5)
[74]: serializer = SimpleSerializer(
  train_dir='pose3D_tests', ckpt_dir='checkpoints')
[75]: trainer = DefaultTrainer(
  data_loader=train_data_loader, model=model, epochs=Constants.EPOCHS, save_freq=1,
  valid_data_loader=valid_data_loader, optimizer=optimizer, evaluator=evaluator, scheduler=scheduler,
  erializer=serializer)
     2021-03-16 15:21:39,627 | deepsky.serializers.simple | INFO: Checkpoint was not provided, start from epoch 1
2021-03-16 15:21:39,628 | deepsky.trainers.generic | INFO: Scheduler step will be applied per epoch

You've probably noticed that we set a tiny dataset: just 20 images. Normally it would naturally be at least a few thousands of images, but we can't wait now a few hours for the training to finish and we want to present the full Sky Engine workflow.

[76]: trainer.train()
     2021-03-16 15:21:39,627 | deepsky.serializers.simple | INFO: Checkpoint was not provided, start from epoch 1
2021-03-16 15:21:39,628 | deepsky.trainers.generic | INFO: Scheduler step will be applied per epoch

[epoch 1]: 100%|  | 18/18 [00:14<00:00, 1.26it/s, str=pose_head:7.064 loss_sum:7.064]
[epoch 1]: 100%|  | 2/2 [00:05<00:00, 2.71s/it, str=pose_head:6.501 loss_sum:6.501]
[inf]: 100%|  | 2/2 [00:00<00:00, 6.32it/s]

     2021-03-16 15:22:02,089 | deepsky.trainers.generic | INFO: {'epoch': 1,'train_loss': 7.064170413547092, 'val_loss': 6.500706195831299}
2021-03-16 15:22:02,709 | deepsky.serializers.simple | INFO: Saving checkpoint checkpoints/pose3D_tests/ckpt_epoch_1_train_loss-7.064_val_loss-6.501.pth.tar

After each epoch we save a checkpoint and produce some inference example on inference data to be able to see the training progress. Images generated during longer training process as above on bigger datasets are presented as follows:

[77]: show_jupyter_picture('gtc03_assets/trained/img1.png')
[77]:
[78]: show_jupyter_picture('gtc03_assets/trained/img2.png')
[78]:

5.1 On real data

Lets load model for 3D pose estimation pretrained on large synthetic data and run inference on real rugby match.

[79]: device = torch.device('cuda')
[80]: resume_path = 'gtc03_assets/trained/pose3d.pth.tar'
checkpoint = torch.load(resume_path)
model_weights = checkpoint['state_dict']
model.load_state_dict(model_weights)
model = model.to(device)

We will need also player detection model which also was trained on the same artificial data with bouding boxes provided by the same renderer datasource.

[81]: from deepsky.models.maskrcnn import get_model_from_coco_pretrained
from deepsky.datasources.image_inference import ImageInferenceDatasource
from dem_rugby_helpers import bboxes_viz, plot_pose3D, make_patch, _image_to_3dbox_world, _bboxes_to_low_corner
from PIL import Image
[82]: detection_model = get_model_from_coco_pretrained(num_classes=3,
        anchor_sizes=((16,), (32,), (48,), (64,), (72,)),
        ratios=((0.5, 0.75, 1.0),),
        pretrained=False)
[83]: checkpoint = torch.load('gtc03_assets/trained/rugby_detection.pth.tar')
for k, v in sorted(checkpoint.items()):
  checkpoint[''.join(['_model.', k])] = checkpoint.pop(k)
detection_model.load_state_dict(checkpoint)
detection_model = detection_model.to(device)
real_dataset = ImageInferenceDatasource(dir='gtc03_assets/real_data', extension='png')

Lets detect players and vizualize results

[84]: img, file_path = real_dataset[75]
orig_img = Image.open(file_path)
detection_model.eval()
with torch.no_grad():
  img = img.to(device)
  outputs = detection_model(img.unsqueeze(0))
out = outputs.pop()
bboxes = out['boxes'].cpu().detach().numpy()
labels = out['labels'].cpu().detach().numpy()
bboxes = bboxes[np.where(labels == 1)[0]]
[85]: bbox_image = bboxes_viz(orig_img, bboxes)
bbox_image
[85]:

After bounding boxes were generated, we can crop target objects and estimate the pose in 3D space. for data preprocessing we will use the same datasource we have used during training

[86]: model.eval()
with torch.no_grad():
  results = model((img,), ({'boxes': torch.from_numpy(bboxes).int()},))
results = results.pop()
output_coords, output_bboxes = results['pred_poses_coords'].cpu(), \
        results['boxes'].cpu()
[87]: output_coords[:2]
[88]: n = 6
[89]: boxes = _bboxes_to_low_corner(output_bboxes)
crops = make_path(img, boxes.int())
pil_img = standard_transforms.ToPILImage()(crops[n].squeeze(0).cpu())
[90]: coord = _image_to_3dbox_world(output_coords, boxes, 2000)
[91]: Image.open('inference_3d.png')
[91]:

 

SKY ENGINE – ADVANCING ARTIFICIAL INTELLIGENCE

Artificial Intelligence Evolved

The SKY ENGINE AI Platform lets you generate your data and train ML models and expand your use cases beyond the limitations of traditional AI.

The Sky Engine deep learning platform is designed to overcome the complex object recognition challenges of modern machine vision.

  • Instantly generate and visually inspect all of your data in SKY ENGINE Integral, regardless of scale
  • Leverage the blazing accurate physics-driven light propagation simulations, data generation and Python data science pipeline of SKY ENGINE Render
  • Reap the benefits of a full-stack AI platform and accelerate third-party BI and data science workflows with standard PyTorch, and TensorFlow connectivity

Bridge Data Generation & Deep Learning

Sky Engine combines a physics simulations-driven image renderer directly integrated with the AI models training framework and is designed to generate images for training machine vision AI systems in virtual environments.

Sky Engine generates training data using virtual scenes. By changing parameters in the CGI scene, Sky Engine is able to generate a massive number of labelled images for AI vision training directly into Deep Learning pipeline with multi-GPU scaling.

SKY ENGINE AI Disrupts Industries

Our solutions are used in areas diverse as healthcare for disease recognition from medical images or organ segmentation for radiation oncology planning to processing video footage for sports analytics like football.

Furthermore, Sky Engine provides ultra efficient methods for defects discrimination in manufacturing or agriculture to support food safety increase.


Stay informed

Subscribe to SKY ENGINE newsletter now.
Sign up for our news with press releases, inspiration, market reports and the latest news.

You can find out here which data is stored and who can access it.

You can revoke my consent at any time for the future.

OUR OFFICES AROUND THE GLOBE