SKY ENGINE AI is a simulation and deep learning cloud that generates fully annotated, synthetic data, provides advanced domain adaptation algorithms and trains AI computer vision algorithms at scale.
SKY ENGINE AI is a tool for developers: Data Scientists, ML/Software Engineers creating computer vision projects in any industry.
Synthetic Data Cloud for Deep Learning in a Virtual Reality for Computer Vision Developers
Deep Learning in the Metaverse full technology stack – Developer environment core modules
GPU simulator
with sets Physics-based rendering shaders tailored to sensor fusion
AI-based image
and video processor for domain adaptation
Garden of deep neutral
network architectures for 3D/4D training
Multi-GPU and network
level adaptive deep learning and tasks scheduler
GPU memory level
integration with PyTorch and TensorFlow
Large chunks of real-world training images are no longer required reducing data acquisition costs as synthetic data is diverse and covers edge-cases.
Synthetic data, digital twins come with labels, annotated instances and ground truths, reducing humans work.
AI business transformation can be reality when generating massive synthetic training datasets at a fraction of real-world data collection and labelling cost.
Training AI models with pure synthetic data and advanced domain adaptation and testing digital twins in virtual reality greatly improves model performance.
Accelerate deployment of computer vision models by shortening training iteration cycles with adaptive data-centric evolutionary deep learning workflow.
Build trust with your customers and community when creating anonymized synthetic datasets and work safely with data while preserving privacy.
+ Specific neural network models, ready for parallel training, tested and optimised for the following tasks:
Shape your virtual environment in every aspect including camera characteristics, lighting, weather conditions, background, occlusions, colours, object placement, motion, and environment evolution to build diverse and balanced datasets for the robust AI models training resulting in accurate real-world performance.
Allows data generation and building optimal, customised AI models from scratch and training them in a virtual reality
Unleash the potential of computer vision at scale using full stack synthetic data cloud with evolutionary deep learning in virtual reality that enables massive, customised, well—balanced synthetic data simulation and generation for adaptive AI models training.
Create virtual environments, simulate and generate image and video datasets, and train AI models of high performance in reality.
What’s included:
Our data science team and computer vision experts will help you creating high quality datasets.
What’s included:
The number of human-related applications of computer vision and AI is growing by the day. From medicine to retail to manufacturing and security, AI-powered solutions are becoming more prevalent and will soon be...
SKY ENGINE AI and DRKVRS Partnership – 3D Generative AI for Games Development.
SKY ENGINE AI has raised its Series A round to help tech companies improve computer vision with Synthetic Data Cloud for AI developers.
from skyrenderer.utils.common import get_presentation_folder_from_assets
get_presentation_folder_from_assets('/dli/mount/assets', 'gtc03_assets')
In rugby, as in any similar type of sport, building correct playing strategy before the championship season is a key to success for any professional coach and club owner. While coaches strive at providing best tips and point out mistakes during the game, they still are incapable of noticing every detail and behavioral patterns of both teams while rewatching the matches. For being able to collect such data, analyze it and make inference about team behavior, sophisticated AI algorithms can be used.
In particular, the types of the tasks we would like to solve for fostering the analysis of the rugby team are the location of each player during the match and the 3D pose of each player on the field.
Having such information in real-time will provide necessary evidence for building better playing strategy.
Machine learning algorithms are getting more and more powerful in all kinds of classification problems, including image recognition. Increasingly sophisticated models, given enough correctly labeled data, are able to achieve superb performance and accuracy. In many cases a class of a problem has an efficient solution already discovered, but it cannot be applied - the only bottleneck is missing data.
The process of gathering and - especially - labeling data can be extremely expensive and time-consuming. The images must be manually analyzed by humans, whose labor in such repetitive tasks is not only slow and expensive, but also less accurate, compared to computers.
In addition, there are cases that require modern equipment for the production of labeled data and highly qualified specialists to maintain the production process. This case significantly increases the project cost or in many cases, makes the project realization unaccessible for stakeholders.
What if we could generate automatically the images suited perfectly for the task at hand with the complete and always correct ground truth built-in?
We would like to show our attempt to achieve exactly this on the example of football players pose recognition. The goal is to train the model to accurately recognize the football players and their poses as human keypoints in 3D space on the real-life match footage, like below, having been trained exclusively on artificial, synthetic data. The images are rendered scenes, that are fully controlled by our renderer, so all kinds of ground truths can be provided, depending on the model's requirements.
from skyrenderer.example_assistant.markdown_helpers import show_jupyter_picture, show_jupyter_movie
show_jupyter_picture('gtc03_assets/illustrations/football_frame_1.png')
show_jupyter_picture('gtc03_assets/illustrations/football_frame_2.png')
show_jupyter_picture('gtc03_assets/illustrations/football_frame_3.png')
from skyrenderer.core.logger_config import configure_logger
logger = configure_logger()
First let's visualize the GPUs available on the machine. Based on this we can select which GPUs will be used by rendering and learning. By default we use all available devices.
!gpustat
/bin/sh: 1: gpustat: not found
import torch
AVAILABLE_GPUS = list(range(torch.cuda.device_count()))
It is required to set the path where the assets (images, meshes, animations etc.) are stored. For convenience, the example assistant is configured. It will help with visualizations.
from skyrenderer.scene.renderer_context import RendererContext
from skyrenderer.scene.scene import SceneOutput
from skyrenderer.example_assistant.visualization_settings import VisualizationDestination
from skyrenderer.example_assistant.display_config import DisplayConfig
from skyrenderer.example_assistant.example_assistant import ExampleAssistant
root_paths_config = {
'assets_root': '/dli/mount/assets',
'cache_root': '/dli/mount/cache'
}
renderer_ctx = RendererContext(root_paths_config)
2021-03-16 15:05:38,218 | skyrenderer.scene.renderer_context | INFO: Root
paths:
- root path: /home/skyengine/.local/lib/python3.6/site-packages/skyrenderer
- assets path: /dli/mount/assets
- config path: /home/skyengine/.local/lib/python3.6/sitepackages/
skyrenderer/config
- optix sources path: /home/skyengine/.local/lib/python3.6/sitepackages/
skyrenderer/optix_sources/sources
- cache path: /dli/mount/cache
2021-03-16 15:05:38,480 | skyrenderer.service.service | INFO: Open GUI here:
http://andariel.skyengine.ai:10001/index.html?service_addr=104.45.29.245:20000
display_config = DisplayConfig(visualization_destination=VisualizationDestination.DISPLAY,
visualized_outputs=[SceneOutput.BEAUTY],
cv_waitkey=0)
example_assistant = ExampleAssistant(context=renderer_ctx, display_config=display_config)
In the Sky Engine pipeline the graphic assets, the building blocks for the scene, are prepared by a CG Artist using third-party software tools. Assets prepared for this scene:
Geometries
The main format used in Sky Engine for carrying information about scene definition: models and their relative positions (or position ranges for randomization) is Alembic (.abc). Alembic exchange format developed by Sony Pictures Imageworks and Lucasfilm is widely used in the industry and is supported by most of the modern CG tools.
For this scene an artist prepared:
Materials
Sky Engine by default uses a metallic-roughness PBR shader. The input maps for the shader can come from files or from the Substance archive (.sbsar). Sky Engine built-in support for Substance allows for parameter randomization and texture rendering on the fly in background.
For this scene an artist prepared:
Environmental mapping
The background for this scene is a simple cloudy sky HDR.
For the Alembic assets prepared according to Sky Engine guidelines, the whole scene can be loaded and visualized without further configuration.
renderer_ctx.load_abc_scene('stadium')
2021-03-16 15:05:38,525 | skyrenderer.core.asset_manager.asset_manager | INFO: Syncing git annex…
2021-03-16 15:05:40,209 | skyrenderer.core.asset_manager.asset_manager | INFO: Syncing git annex done.
renderer_ctx.setup()
logger.info(f'Scene\n{str(renderer_ctx)}')
2021-03-16 15:05:54,532 | main | INFO: Scene
scene_tree:
top_node (count: 1)
|-- bumper_GEO_NUL_000 (count: 1)
| +-- bumper_GEO (count: 1)
|-- bumper_GEO_NUL_001 (count: 1)
| +-- bumper_GEO_0 (count: 1)
|-- bumper_GEO_NUL_002 (count: 1)
| +-- bumper_GEO_1 (count: 1)
|-- bumper_GEO_NUL_003 (count: 1)
| +-- bumper_GEO_2 (count: 1
)
|-- light_L01_LIGHT_NUL (count: 1)
|-- light_L02_LIGHT_NUL (count: 1)
|-- light_L03_LIGHT_NUL (count: 1)
|-- light_L04_LIGHT_NUL (count: 1)
|-- player_GEO_NUL (count: 1)
| +-- player_GEO (count: 1)
|-- rugby_pitch_GEO_NUL_000 (count: 1)
| +-- rugby_pitch_GEO (count: 1)
|-- rugby_pitch_GEO_NUL_001 (count: 1)
| +-- rugby_pitch_GEO_0 (count: 1)
|-- screen_GEO_NUL (count: 1)
| +-- screen_GEO (count: 1)
|-- banners_GEO_NUL (count: 1)
| +-- banners_GEO (count: 1)
|-- crowd_GEO_NUL (count: 1)
| +-- crowd_GEO (count: 1)
|-- grass_baners_GEO_NUL (count: 1)
| +-- grass_baners_GEO (count: 1)
|-- grass_GEO_NUL (count: 1)
| +-- grass_GEO (count: 1)
|-- logo_adidas_GEO_NUL (count: 1)
| +-- logo_adidas_GEO (count: 1)
|-- stadium_base_GEO_NUL (count: 1)
| +-- stadion_base_GEO (count: 1)
|-- stadium_details_GEO_NUL (count: 1)
| +-- stadion_details_GEO (count: 1)
|-- stripes_GEO_NUL (count: 1)
| +-- stripes_GEO (count: 1)
|-- camera_CAM_NUL (count: 1)
| +-- camera_CAM (count: 1)
+-- camera_target_NUL (count: 1)
with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:06:09,938 | skyrenderer.utils.time_measurement | INFO: Render time: 15.40 seconds
Each loaded object needs to have a material assigned.
from skyrenderer.scene.scene_layout.layout_elements_definitions import MaterialDefinition
from skyrenderer.basic_types.provider import SubstanceTextureProvider, FileTextureProvider
from skyrenderer.basic_types.procedure import PBRShader
player_textures = SubstanceTextureProvider(renderer_ctx, 'rugby_player')
renderer_ctx.set_material_definition('player_GEO', MaterialDefinition(player_textures))
stadium_base_textures = SubstanceTextureProvider(renderer_ctx, 'concrete')
stadium_base_params = PBRShader.create_parameter_provider(renderer_ctx, tex_scale=50)
renderer_ctx.set_material_definition('stadion_base_GEO',
MaterialDefinition(stadium_base_textures, parameter_set=stadium_base_params))
crowd_textures = SubstanceTextureProvider(renderer_ctx, 'crowd')
crowd_params = PBRShader.create_parameter_provider(renderer_ctx, tex_scale=5)
renderer_ctx.set_material_definition('crowd_GEO', MaterialDefinition(crowd_textures, parameter_set=crowd_params))
grass_textures = SubstanceTextureProvider(renderer_ctx, 'grass')
renderer_ctx.set_material_definition('grass_GEO', MaterialDefinition(grass_textures))
grass_logos_textures = SubstanceTextureProvider(renderer_ctx, 'logos_grass')
renderer_ctx.set_material_definition('grass_baners_GEO', MaterialDefinition(grass_logos_textures))
banners_textures = FileTextureProvider(renderer_ctx, 'banners', 'stadium/banners')
renderer_ctx.set_material_definition('banners_GEO', MaterialDefinition(banners_textures))
screen_texture = FileTextureProvider(renderer_ctx, 'screen', 'stadium/screen')
renderer_ctx.set_material_definition('screen_GEO', MaterialDefinition(screen_texture))
bumpers_texture = FileTextureProvider(renderer_ctx, 'bumpers', 'stadium/bumpers')
renderer_ctx.set_material_definition('bumper_GEO.?.?$', MaterialDefinition(bumpers_texture), use_regex=True)
white_params = PBRShader.create_parameter_provider(renderer_ctx, 'white_params', material_color=(0.8, 0.8, 0.8))
renderer_ctx.set_material_definition('stripes_GEO', MaterialDefinition(parameter_set=white_params))
renderer_ctx.set_material_definition('rugby_pitch_GEO.?.?$', MaterialDefinition(parameter_set=white_params),
use_regex=True)
renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:06:32,080 | skyrenderer.utils.time_measurement | INFO: Render time: 7.86 seconds
Let's replace the gray background with a sky.
from skyrenderer.basic_types.item_component import Background
from skyrenderer.basic_types.procedure import EnvMapMiss
from skyrenderer.basic_types.provider import HdrTextureProvider
renderer_ctx.define_env(Background(renderer_ctx,
EnvMapMiss(renderer_ctx),
HdrTextureProvider(renderer_ctx, 'light_sky')))
2021-03-16 15:06:35,475 | skyrenderer.scene.renderer_context | WARNING: Setting background definition after setup.
renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:06:45,492 | skyrenderer.utils.time_measurement | INFO: Render time: 9.75 seconds
The Sky Engine renderer provides virtually endless possibilities to shuffle, multiply, randomize and organize the assets.
From one Alembic animation we are creating two teams of 20 players each.
renderer_ctx.layout().duplicate_subtree(renderer_ctx, 'player_GEO_NUL', suffix='team2')
renderer_ctx.layout().get_node('player_GEO_NUL').n_instances = 20
renderer_ctx.layout().get_node('player_GEO_NUL_team2').n_instances = 20
renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:06:53,184 | skyrenderer.utils.time_measurement | INFO: Render time: 6.84 seconds
By default, all materials are drawn randomly. To create two proper teams we need to ensure that each team has the same shirt color which is different than the other team's color, while keeping all the other inputs (hair, skin color, socks color, shirt number etc.) random.
To achieve it, we need to put the the players into separate randomization groups and define their drawing strategy. The Substance archive input that controls shirt color is called "Colors_select". It needs to be the same (synchronized) inside the randomization group and different between groups. All the other inputs are kept randomized by default.
from skyrenderer.randomization.strategy.input_drawing_strategy import SynchronizedInput
from skyrenderer.randomization.strategy.synchronization import Synchronization, SynchronizationDescription
from skyrenderer.randomization.strategy.drawing_strategy import DrawingStrategy
shirt_sync = SynchronizedInput(SynchronizationDescription(
in_strategy=Synchronization.DISTINCT_EQUAL_GROUPS))
player_material_strategy = DrawingStrategy(renderer_ctx, inputs_strategies={'Colors_select': shirt_sync})
renderer_ctx.instancers['player_GEO'].modify_material_definition(strategy=player_material_strategy)
renderer_ctx.instancers['player_GEO_team2'].modify_material_definition(randomization_group='team2',
strategy=player_material_strategy)
renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:08:15,715 | skyrenderer.utils.time_measurement | INFO: Render time: 4.22 seconds
If you looked closer on the picture above, you might notice that each player is in the exact same pose. By default, Sky Engine plays animations from Alembic files frame by frame, so we need to randomize this parameter.
from skyrenderer.randomization.strategy.input_drawing_strategy import UniformRandomInput
player_geometry_strategy = DrawingStrategy(renderer_ctx, frame_numbers_strategy=UniformRandomInput())
renderer_ctx.instancers['player_GEO'].modify_geometry_definition(strategy=player_geometry_strategy)
renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:08:23,451 | skyrenderer.utils.time_measurement | INFO: Render time: 6.82 seconds
During the rugby match, players are not distributed roughly uniformly - they tend to gather in a group closer together. To make the scene look more natural, we can change the way the players' positions are drawn. Instead of drawing them uniformly, we can use random Gaussian random distribution. It is double-random, because first 𝜇 and 𝜎 are drawn, and then the positions for players are drawn also randomly with these parameters.
from skyrenderer.randomization.strategy.input_drawing_strategy import RandomGaussianRandomInput
gauss_strategy = DrawingStrategy(renderer_ctx,
default_input_strategy=RandomGaussianRandomInput(sigma_relative_limits=(0.1, 0.2)))
renderer_ctx.layout().get_node('player_GEO_NUL').modify_locus_definition(strategy=gauss_strategy)
renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
for _ in range(5):
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:08:31,028 | skyrenderer.utils.time_measurement | INFO: Render time: 6.71 seconds
2021-03-16 15:08:37,221 | skyrenderer.utils.time_measurement | INFO: Render time: 5.84 seconds
2021-03-16 15:08:44,746 | skyrenderer.utils.time_measurement | INFO: Render time: 7.18 seconds
2021-03-16 15:08:48,757 | skyrenderer.utils.time_measurement | INFO: Render time: 3.66 seconds
2021-03-16 15:08:52,932 | skyrenderer.utils.time_measurement | INFO: Render time: 3.82 seconds
This concludes configuration of materials, geometries and their positions.
The artist defined light positions in the Alembic scene definition. By default they have a constant intensity. We will randomize them.
from skyrenderer.basic_types.provider.provider_inputs import HSVColorInput
from skyrenderer.basic_types.lights import BasicLight
white_light_provider = BasicLight.create_parameter_provider(renderer_ctx,
color=HSVColorInput(hue_range=(0, 0),
saturation_range=(0, 0),
value_range=(0.4, 1)))
renderer_ctx.set_light('light_L01_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider))
renderer_ctx.set_light('light_L02_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider))
renderer_ctx.set_light('light_L03_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider))
renderer_ctx.set_light('light_L04_LIGHT_NUL', BasicLight(renderer_ctx, white_light_provider)
Now we must improve the camera and its filters. The rendering process in Sky Engine is defined by a chain of render steps. The output of a step is an input of the next one. We define four render steps: * PinholeRenderStep - simple pinhole camera with randomized horizontal field of view (hfov, in degrees), which simulates random zoom, * Denoiser - AI-Accelerated Optix Denoiser, * Tonemapper - Optix tonemapper with randomized parameters - gamma and exposure, * GaussianBlurPostprocess - the train images should not be perfect. Images with different degree of blurring help the model to generalize better. Here we're using random Gaussian blur.
Additionally, we're reducing the output size to match the deep learning model required input size.
from skyrenderer.render_chain import RenderChain, PinholeRenderStep, Denoiser, Tonemapper, GaussianBlurPostprocess
from skyrenderer.basic_types.provider.provider_inputs import IntInput, FloatInput
HEIGHT = 768
WIDTH = 1024
pinhole_params = PinholeRenderStep.create_hfov_parameter_provider(renderer_ctx,
hfov=IntInput(min_value=15, max_value=35))
camera_step = PinholeRenderStep(renderer_ctx, origin_name='camera_CAM_NUL', target_name='camera_target_NUL',
parameter_provider=pinhole_params)
denoiser = Denoiser(renderer_ctx)
tonemapper_params = Tonemapper.create_parameter_provider(renderer_ctx, gamma=FloatInput(min_value=2, max_value=5),
exposure=FloatInput(min_value=0.7, max_value=1))
tonemapper = Tonemapper(renderer_ctx, parameter_provider=tonemapper_params)
gauss_blur_params = GaussianBlurPostprocess.create_random_parameter_provider(renderer_ctx, max_kernel_radius=7,
max_sigma_x=2, max_sigma_y=0.7)
gauss_blur = GaussianBlurPostprocess(renderer_ctx, parameter_provider=gauss_blur_params)
renderer_ctx.define_render_chain(RenderChain([camera_step, denoiser, tonemapper, gauss_blur],
pwidth=WIDTH,
height=HEIGHT))
renderer_ctx.setup()
with example_assistant.get_visualizer() as visualizer:
for _ in range(5):
visualizer(renderer_ctx.render_to_numpy())
2021-03-16 15:09:02,229 | skyrenderer.utils.time_measurement | INFO: Render time: 8.31 seconds
2021-03-16 15:09:07,964 | skyrenderer.utils.time_measurement | INFO: Render time: 5.50 seconds
2021-03-16 15:09:11,144 | skyrenderer.utils.time_measurement | INFO: Render time: 2.97 seconds
2021-03-16 15:09:14,030 | skyrenderer.utils.time_measurement | INFO: Render time: 2.61 seconds
2021-03-16 15:09:17,424 | skyrenderer.utils.time_measurement | INFO: Render time: 3.17 seconds
Last but not least, we must provide ground truth - information about scene semantics. This setup is designed for player detection with keypoints, so we must assign the semantic class only to players.
As mentioned before, the keypoints are already present in the player animation. Sky Engine by default calculates all the information about keypoints, if it receives them in the input assets, we just need to visualize them to be sure everything is configure correctly. Green keypoints are visible, red - hidden.
renderer_ctx.set_semantic_class('player_GEO', 1)
renderer_ctx.set_semantic_class('player_GEO_team2', 1)
from skyrenderer.scene.scene import SceneOutput}
example_assistant.visualized_outputs = {SceneOutput.BEAUTY, SceneOutput.SEMANTIC, SceneOutput.KEYPOINTS}
renderer_ctx.setup() }
with example_assistant.get_visualizer() as visualizer:}
for _ in range(5):}
visualizer(renderer_ctx.render_to_numpy())}
2021-03-16 15:09:23,943 | skyrenderer.utils.time_measurement | INFO: Render time: 5.87 seconds
2021-03-16 15:09:29,358 | skyrenderer.utils.time_measurement | INFO: Render time: 5.18 seconds
2021-03-16 15:09:33,988 | skyrenderer.utils.time_measurement | INFO: Render time: 4.38 seconds
2021-03-16 15:09:37,252 | skyrenderer.utils.time_measurement | INFO: Render time: 3.01 seconds
2021-03-16 15:09:40,526 | skyrenderer.utils.time_measurement | INFO: Render time: 3.01 seconds
Everything is OK, so we can create a renderer datasource for training.
from skyengine.datasources.multi_purpose_renderer_data_source import MultiPurposeRendererDataSource
datasource = MultiPurposeRendererDataSource(renderer_context=renderer_ctx, images_number=20,
cache_folder_name='rugby_presentation_new')
datasource = MultiPurposeRendererDataSource(renderer_context=renderer_ctx, images_number=20,
cache_folder_name='rugby_presentation_new')
Training configuration.
from deepsky.evaluators.sample_savers import ImageBboxKeyPointSaver, EvalHook
from deepsky.models.pose3d import get_pose_3d_model
from deepsky.trainers.trainer import DefaultTrainer
from deepsky.serializers.simple import SimpleSerializer
from skyengine.datasources.wrappers.mpose3d_wrapper import SEWrapperForDistancePose3D
from torch.utils.data import DataLoader
import torchvision.transforms as standard_transforms
import numpy as np
class Constants:
TRAIN_BATCH_SIZE = 1:
VALID_BATCH_SIZE = 1:
NUM_WORKERS = 0 :
TRAIN_SHUFFLE = True:
VALID_SHUFFLE = False:
DROP_LAST = True:
EPOCHS = 1
transform = standard_transforms.Compose([standard_transforms.ToPILImage(),
standard_transforms.ToTensor()])
main_datasource = SEWrapperForDistancePose3D(datasource, imgs_transform=transform)
# split the dataset in train and test set
torch.manual_seed(79)
indices = torch.randperm(len(main_datasource)).tolist()
dataset = torch.utils.data.Subset(main_datasource, indices[:int(len(indices) * 0.9)])
dataset_test = torch.utils.data.Subset(main_datasource, indices[(len(indices) * 0.9):])
def collate_fn(batch):
return tuple(zip(*batch))
train_data_loader = DataLoader(dataset,
batch_size=Constants.TRAIN_BATCH_SIZE,
num_workers=Constants.NUM_WORKERS,
drop_last=Constants.DROP_LAST,
shuffle=Constants.VALID_SHUFFLE,
collate_fn=collate_fn)
valid_data_loader = DataLoader(dataset_test,
batch_size=Constants.VALID_BATCH_SIZE,
num_workers=Constants.NUM_WORKERS,
drop_last=Constants.DROP_LAST,
shuffle=Constants.TRAIN_SHUFFLE,
collate_fn=collate_fn)
model = get_pose_3d_model(main_datasource.joint_num, backbone_pretrained=True)
model = model.cuda(0)
logger.info('Train length in batches {}'.format(len(train_data_loader)))
logger.info('Test length in batches {}'.format(len(valid_data_loader)))
2021-03-16 15:21:34,651 | main | INFO: Train length in batches 18
2021-03-16 15:21:34,652 | main | INFO: Test length in batches 2
def keypoint_saver_transform(x):
return x
key_point_saver = ImageBboxKeyPointSaver(keypoint_saver_transform, labels=['person'], colors_per_class=None,
use_labels=False, connections=main_datasource.CONNECTIONS)
evalbatch = {'keypoints_3D_image_saver': (key_point_saver, 1)}
def hook_func(evalhook, images_batch, predictions_batch, metas_batch):
""" hook_func is the function which EvalHook instance will execute after method "update" call. You should
define this function according to evalbatch input of evaluator inside Trainer """
for img, preds, metas in zip(images_batch, predictions_batch, metas_batch):
img = standard_transforms.ToPILImage()(img.cpu())
evalhook.count()
poses_coords = metas['poses_coords'].numpy()
poses_coords[:, :, 2] = metas['poses_viz'].squeeze(2).numpy()
keypoint_targets = {'boxes': metas['boxes'], 'keypoints': poses_coords, 'stamp': 'GTS'}
pred_poses_coords = preds['pred_poses_coords'].cpu().numpy()
pred_poses_coords[:, :, 2] = 1
keypoint_preds = {'boxes': preds['boxes'].cpu(), 'keypoints': pred_poses_coords, 'stamp': 'PREDS'}
keypoint_name = 'img{}.png'.format(evalhook.counter)
keypoints_image_saver_object, freq = evalhook.keypoints_3D_image_saver
keypoints_image_saver_object.update(
img_name=keypoint_name, images=[img], preds=[keypoint_preds], gts=[keypoint_targets])
evaluator = EvalHook(evalbatch, hook_func)
logger.info(model)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[16, 64], gamma=0.5)
serializer = SimpleSerializer(
train_dir='pose3D_tests', ckpt_dir='checkpoints')
trainer = DefaultTrainer(
data_loader=train_data_loader, model=model, epochs=Constants.EPOCHS, save_freq=1,
valid_data_loader=valid_data_loader, optimizer=optimizer, evaluator=evaluator, scheduler=scheduler,
erializer=serializer)
2021-03-16 15:21:39,627 | deepsky.serializers.simple | INFO: Checkpoint was not provided, start from epoch 1
2021-03-16 15:21:39,628 | deepsky.trainers.generic | INFO: Scheduler step will be applied per epoch
You've probably noticed that we set a tiny dataset: just 20 images. Normally it would naturally be at least a few thousands of images, but we can't wait now a few hours for the training to finish and we want to present the full Sky Engine workflow.
trainer.train()
2021-03-16 15:21:39,627 | deepsky.serializers.simple | INFO: Checkpoint was not provided, start from epoch 1
2021-03-16 15:21:39,628 | deepsky.trainers.generic | INFO: Scheduler step will be applied per epoch
[epoch 1]: 100%| | 18/18 [00:14<00:00, 1.26it/s, str=pose_head:7.064 loss_sum:7.064]
[epoch 1]: 100%| | 2/2 [00:05<00:00, 2.71s/it, str=pose_head:6.501 loss_sum:6.501]
[inf]: 100%| | 2/2 [00:00<00:00, 6.32it/s]
2021-03-16 15:22:02,089 | deepsky.trainers.generic | INFO: {'epoch': 1,'train_loss': 7.064170413547092, 'val_loss': 6.500706195831299}
2021-03-16 15:22:02,709 | deepsky.serializers.simple | INFO: Saving checkpoint checkpoints/pose3D_tests/ckpt_epoch_1_train_loss-7.064_val_loss-6.501.pth.tar
After each epoch we save a checkpoint and produce some inference example on inference data to be able to see the training progress. Images generated during longer training process as above on bigger datasets are presented as follows:
show_jupyter_picture('gtc03_assets/trained/img1.png')
show_jupyter_picture('gtc03_assets/trained/img2.png')
Lets load model for 3D pose estimation pretrained on large synthetic data and run inference on real rugby match.
device = torch.device('cuda')
resume_path = 'gtc03_assets/trained/pose3d.pth.tar'
checkpoint = torch.load(resume_path)
model_weights = checkpoint['state_dict']
model.load_state_dict(model_weights)
model = model.to(device)
We will need also player detection model which also was trained on the same artificial data with bouding boxes provided by the same renderer datasource.
from deepsky.models.maskrcnn import get_model_from_coco_pretrained
from deepsky.datasources.image_inference import ImageInferenceDatasource
from dem_rugby_helpers import bboxes_viz, plot_pose3D, make_patch, _image_to_3dbox_world, _bboxes_to_low_corner
from PIL import Image
detection_model = get_model_from_coco_pretrained(num_classes=3,
anchor_sizes=((16,), (32,), (48,), (64,), (72,)),
ratios=((0.5, 0.75, 1.0),),
pretrained=False)
checkpoint = torch.load('gtc03_assets/trained/rugby_detection.pth.tar')
for k, v in sorted(checkpoint.items()):
checkpoint[''.join(['_model.', k])] = checkpoint.pop(k)
detection_model.load_state_dict(checkpoint)
detection_model = detection_model.to(device)
real_dataset = ImageInferenceDatasource(dir='gtc03_assets/real_data', extension='png')
Lets detect players and vizualize results
img, file_path = real_dataset[75]
orig_img = Image.open(file_path)
detection_model.eval()
with torch.no_grad():
img = img.to(device)
outputs = detection_model(img.unsqueeze(0))
out = outputs.pop()
bboxes = out['boxes'].cpu().detach().numpy()
labels = out['labels'].cpu().detach().numpy()
bboxes = bboxes[np.where(labels == 1)[0]]
bbox_image = bboxes_viz(orig_img, bboxes)
bbox_image
After bounding boxes were generated, we can crop target objects and estimate the pose in 3D space. for data preprocessing we will use the same datasource we have used during training
model.eval()
with torch.no_grad():
results = model((img,), ({'boxes': torch.from_numpy(bboxes).int()},))
results = results.pop()
output_coords, output_bboxes = results['pred_poses_coords'].cpu(), \
results['boxes'].cpu()
output_coords[:2]
n = 6
boxes = _bboxes_to_low_corner(output_bboxes)
crops = make_path(img, boxes.int())
pil_img = standard_transforms.ToPILImage()(crops[n].squeeze(0).cpu())
coord = _image_to_3dbox_world(output_coords, boxes, 2000)
Image.open('inference_3d.png')
The SKY ENGINE AI Platform lets you generate your data and train ML models and expand your use cases beyond the limitations of traditional AI.
The Sky Engine deep learning platform is designed to overcome the complex object recognition challenges of modern machine vision.
Sky Engine combines a physics simulations-driven image renderer directly integrated with the AI models training framework and is designed to generate images for training machine vision AI systems in virtual environments.
Sky Engine generates training data using virtual scenes. By changing parameters in the CGI scene, Sky Engine is able to generate a massive number of labelled images for AI vision training directly into Deep Learning pipeline with multi-GPU scaling.
Our solutions are used in areas diverse as healthcare for disease recognition from medical images or organ segmentation for radiation oncology planning to processing video footage for sports analytics like football.
Furthermore, Sky Engine provides ultra efficient methods for defects discrimination in manufacturing or agriculture to support food safety increase.
Subscribe to the SKY ENGINE AI newsletter.
Sign up for our news with press releases, inspiration, market reports and the latest updates or talk directly to sales and get data or AI platform.
You can find out here which data is stored and who can access it.
You can revoke my consent at any time for the future.