Unlocking accurate warehousing and inventorying solutions with AI training in virtual environments and synthetically generated data

By SKY ENGINE AI   09 November 2021





Video Analytics


Synthetic Data


Deep Learning


Sky Engine feature image


Developing AI-driven computer vision (CV) solutions for warehousing and inventorying applications is complex, as acquiring variety of balanced and accurate data can be costly, time-consuming, and complicated by the privacy concerns. In this article, you will discover how SKY ENGINE AI’s platform with synthetic data generation tools and AI models training in virtual environment enable designing of computer vision systems for warehousing whereas mitigating bottlenecks and issues with inventorying accuracy.


One of the most challenging steps in developing reliable AI models for computer vision include training data acquisition, balancing datasets with adequate diversity, and precise labelling. Accelerated AI models training in virtual environments with synthetic datasets provide feasible solutions to these challenges by reducing the requirement for expensive and time-consuming gathering and labelling of actual data.

SKY ENGINE AI has been pioneering this approach towards AI models training in virtual reality. SKY ENGINE AI is able to recreate the machine learning problem in the virtual world and generate synthetic data to train the AI models. Warehouse and storage interiors are one of the areas where our customers have sought our expertise in creating synthetic training datasets and training computer vision models.

Due to privacy issues and ever-changing topology, acquiring labelled datasets based on real warehouses is especially difficult for warehouse interior applications. This is exacerbated by training datasets that need a high degree of variation in elements such as objects, humans, materials, colours, lighting, and fittings.

In the SKY ENGINE AI, we have developed a complete toolchain of components for generating synthetic environments such as warehouses or storage facilities, and any genuine interior may be replicated to help our customers creating AI models faster. We employ procedural object geometries and materials arrangement with various randomized factors, such as camera angles, lighting, daytime and external surroundings, and introduction of customised items into the scene, to obtain highly balanced datasets and to ensure a high degree of diversity in the data itself.

Outdoor weather & indoor lighting
Object materials (walls, floor)
Object placement (packages, vehicles, shelves)
SKY ENGINE AI Synthetic Data generation with lighting and weather randomization 0003 thumb
SKY ENGINE AI Synthetic data materials randomization 0001 thumb
SKY ENGINE AI Synthetic data object placement randomization 0002 thumb
LightingTypes 0003 thumb
SKY ENGINE AI Synthetic Data generation materialSwitch 2 thumb
SKY ENGINE AI synthetic data generation WarehouseDisplacement 0003 thumb
SKY ENGINE AI Synthetic data LightingTypes 0001 thumb
SKY ENGINE AI Synthetic Data generation materialSwitch 1 thumb
SKY ENGINE AI synthetic data generation WarehouseDisplacement 0001 thumb
SKY ENGINE AI Synthetic data LightingTypes 0002 thumb
SKY ENGINE AI Synthetic Data generation materialSwitch 3 thumb
SKY ENGINE AI synthetic data generation WarehouseDisplacement 0002 thumb

Figure 2 – Scene randomisation: varying outdoor weather and indoor lighting; varying object materials (walls, floor); alternating object placement (packages, vehicles, shelves, staircase). Top row shows animation of the randomisations applied to the scene.

Shown above exemplary synthetic dataset includes images simulated in visible and infrared light, instance and semantic segmentation images with labels on the individual objects, normals, and bounding box labels (2D, 3D). Creating such a set of annotations requires no additional effort because all labels are also generated synthetically. The availability of multiple label types opens up a wide range of computer vision applications for AI models trained on this dataset.

SKY ENGINE AI synthetic data for AI models training for warehouse inventorying with AI drones bounding box3D object detection thumb
SKY ENGINE AI synthetic data for AI models training in warehousing and inventorying using AI drones depth maps depth estimation 3D
SKY ENGINE AI synthetic data for AI models training in warehousing and inventorying using AI drones bounding box 2D thumb
SKY ENGINE AI synthetic data for AI models training in warehousing and inventorying using AI drones normals thumb
SKY ENGINE AI synthetic data for AI models training in warehousing and inventorying using AI drones semantic segmentation thumb
SKY ENGINE AI synthetic data for AI models training in warehousing and inventorying using AI drones instance segmentation 3D thumb

Figure 3 – Ground truths, 6 images: (Left) 3D bounding boxes, 2D bounding boxes, semantic segmentation, (Right) depth map, normals, instance segmentation

Following that, we'll look at a few warehousing and inventorying use cases where creating a digital twin of the sensor and robot and putting it through AI training in a virtual warehouse solved the problem of insufficient and costly real data.

Warehouse automation

Robots, drones and computer vision are gaining momentum in a variety of interior settings, including warehouses, to automate multiple inventory-related tasks. Robots and drones can be equipped with different sensors, such as smart cameras that use computer vision to increase efficiency of these duties.

However, these unmanned autonomous vehicles (UAVs) continue to struggle with accurate navigation in the GPS-denied environments and have a limited ability to distinguish between different objects or moving people. The latter issue makes it difficult for human- and robot-workers to co-work in a seamless manner. All these challenges can be efficiently solved using SKY ENGINE AI platform. The simulation engine can generate large range of warehouse environments including variants of products and packages, regals, and obstacles with the universe of randomised settings. Such synthetic data is then utilized to teach the robot AI-driven active space reasoning without the use of GPS, allowing precise navigation in enclosed spaces and with greatly improved obstacle avoidance. SKY ENGINE AI enables training of the AI algorithms dedicated to 3D reasoning, depth estimation or simultaneous localisation and mapping (SLAM).

Figure 4 – Autonomous AI-driven drone training in virtual reality for inventorying tasks scheduled in GPS-denied environment.

People tracking

As computer vision is gaining immense traction in the industrial settings, in SKY ENGINE AI we are often helping our customers to improve a performance of their AI models deployed on smart cameras and other sensors. Such cameras are capable of extracting application-specific information from the captured images, along with generating event descriptions or making decisions that are used in an intelligent and automated system. They can detect people in the area and track any activities carried out by workers in order to optimize task performance and monitor workers safety. In this domain the SKY ENGINE’s platform has proven its value training AI models to identify workers pose and position. Such purpose requires large dataset of people/worker images with varied poses, heights, skin colors, different clothing, etc. in warehouse environments. The SKY ENGINE AI platform, that is developed from its first line of code to aid data scientists creating AI computer vision models quickly, offers a broad range of built-in tools for skeletal geometry reasoning (3D poses and position estimation) to support design of a customer-specific solutions for warehousing.

Workers boundingBox thumb
Workers keypoints thumb

Figure 5 – Simulated synthetic 3D workers in a warehouse environment. (Left) 3D bounding boxes, (Right) 3D pose estimation with keypoints.

Camera characteristics

In addition, SKY ENGINE AI is sensors agnostic providing tools that make it possible to re-create custom real camera/sensor’s characteristics including focal length, field of view (FOV) and relationship between them, quantity of random noise, modulation transfer function (MTF) of the lens setup, perspective, aspect ratio, available contrast and more. The infrared sensors or thermal vision can be easily simulated and synthetic data can be rapidly generated for such setups as in our platform it is a matter of milliseconds instead of minutes.

IR view2 thumb
Warehouse nVidiaBlog infraRed semantic thumb

Figure 6 – Simulated warehouse interior in an infrared light (IR) with a fisheye lens. (Left) IR image with 3D bounding boxes on the objects; (Right) Semantic segmentation on IR image.

Safety and security

Another field where computer vision has become important is in solutions aimed at improving workers safety or quality of work through automation and robotic technologies. Workers safety in warehouses is related to operational excellence.

Warehouses are dynamically changing areas with evolving layouts and objects placement posing a real challenge for efficient indoor navigation. Obstructed pathways, pallets not stacked properly, dangerous driving, phone use, not respecting walking zones are only some of the daily things that can lead to safety and quality events in a warehouse.

To ensure accurate understanding of these areas including the objects and moving humans we have to train AI models for safe paths identification by detecting fittings, walls, stairs, pillars, free-standing packages, pallets, forklifts, regals and more. In addition, such AI-driven detector needs to be aware of the distances between these objects and obstacles to alert the workers if they get too close to e.g. hazardous appliances. To solve this challenge, one can create a variety of warehouse layouts (single or multi-floor) and randomise objects’ type and position. At such level of complexity, it is required that well-balanced synthetic dataset would include several ground-truth images for AI training i.e., segmentation masks, 2D & 3D bounding boxes, depth masks, and normals. Acquiring such a diverse and accurately annotated real dataset can be extremely difficult, as well as expensive and time-consuming.

The synthetic data generated in virtual warehouse environments can also be used as a realistic interior for developing AI models that recognize humans in storage facility settings. To accomplish this, we can place 3D models of the workers on the scene and simulate their motion, pose, height, skin, hair, body weight, and clothing including protection elements i.e., helmets and jackets. The AI models trained on that simulated synthetic data can serve to monitor workers’ physical activities and detect inconsistencies in their motion and agility. The computer vision systems equipped in these AI models could automatically alert the emergency at workers injury or elongated period of inactivity. Likewise, such CV system can detect whether workers respect the personal protective equipment rules and can help keeping the warehouse floor tidy and with clear pathways from trash, packing materials, damaged goods, and abandoned equipment that is critical for maintaining a safe warehouse working environment.

Injury AI detection in warehouse
Injury AI detection in warehouse 3D pose estimation

Figure 7 – Workers injury tracking: (Left) Falling - bounding box; (Right) Keypoints on the skeleton, 3D pose estimation

Inventorying optimization

Keeping accurate and updated maps of the storage facility is one of critical tasks in a warehouse. Inaccurate inventory causes problems such as maintaining improper stock levels and buildups of obsolete inventory. Picking problems also arise when pickers rely on inaccurate information, leading to inefficient processes. The AI-driven automation is a key factor in solving these problems. As AI-based monitoring with drones can be performed almost live, an automated system developed in the SKY ENGINE AI platform can provide real-time, accurate information about stock levels and composition. This is achieved with CV AI models that can identify and measure the extent and area of various furniture fittings and virtually displaying (referencing) products and packages on them can recognize the available space, concurrently keeping track on the object quantities. The physically based rendering system in the SKY ENGINE AI software allows creating environments with generative materials like wood, metal, carbon fiber, glass, cardboard, and shape or size making possible detecting existing inventory. Digital twin modeling can serve to predict customer demand in different channels, anticipate inventory requirements across a supply chain, and improve supplier and stock visibility to secure the supply.

Synthetic warehouse with randomisations of lighting, objects placement, materials, colors, weather, daytime, etc. generated in SKY ENGINE AI platform

And there are much more applications

The AI models training in virtual environments described here are only a fraction of what is possible with SKY ENGINE AI technology. The synthetic data is able to significantly improve AI model’s performance in multiple computer vision applications. Additional solutions include reconstruction and mapping of current warehouse layout in order to reconfigure both the floor space and vertical space available for use to level out inbound and outbound goods flows and lower the risk of capacity bottlenecks.

Contact us to discuss your cases and get access to the SKY ENGINE AI platform or get a tailored synthetic dataset for your warehouse computer vision applications. As we support several industries a broad range of data customization is available even for specific sensors. Check out above examples of synthetic data to see the quality of a training datasets.