Publications

Working documents

Understanding global pedestrian behaviour in 565 cities with dashcam videos on YouTube
Alam, M. S., Martens, M., Bazilinska, O., Bazilinskyy, P.
In preparation.
ABSTRACT BIB
The interactions between future cars and pedestrians should be designed to be understandable and safe worldwide. Although previous research has studied vehicle-pedestrian interactions within specific cities or countries, this study offers a more scalable and robust approach by examining pedestrian behaviour worldwide. We present a dataset, Pedestrians in YouTube (PYT), which includes 1562.80 hours of YouTube day and night dashcam footage from 565 cities in 104 countries. The included videos feature continuous urban driving, are at least 10 min long, feature no atypical events and represent everyday conditions, and are from cities with a minimum of 20,000 population. We detected pedestrian movements, focussing on the speed and the pedestrian crossing decision time during road crossings based on the bounding boxes given by YOLOv11x. The results revealed statistically significant variations in pedestrian behaviour influenced by socioeconomic and environmental factors such as Gross Metropolitan Product, traffic-related mortality, Gini coefficient, traffic index, and literacy. The dataset is publicly available to encourage further research on global pedestrian behaviour.

Deep Reinforcement Learning based Eye Tracking for Unity Environment
Alam, M. S., Bazilinskyy, P.
Working project.
ABSTRACT

Wait! Don't cross the road. A virtual reality study on the influence of the distance between pedestrians on their crossing decision
Alam, M. S., Martens, M., Bazilinskyy, P.
Working document.
ABSTRACT
As vehicles with SAE level 2 automation become increasingly prevalent in certain countries and are set to expand globally, understanding the behaviour of pedestrians and other road users in these evolving scenarios is essential. This study investigates how the presence of an automated vehicle equipped with an external Human-Machine Interface (eHMI) or a manually driven car influences pedestrians' crossing decision-making processes as the distance between them varies. We examine the conditions under which vehicles yield, offering a comprehensive perspective on pedestrian-vehicle interactions in dynamic environments. Using a virtual reality (VR) experiment, we manipulate several key variables: the distance between two pedestrians, whether the incoming vehicle yields, the presence of an eHMI, and the pedestrian's standing position relative to the second person. Both pedestrians aim to cross the road, and we observe how these factors impact one pedestrian's decision-making process in the presence of another.

2025

Robot-like In-vehicle Agent for a Level 3 Automated Vehicle with Emotions
Zeng, X., Alam, M. S., Bazilinskyy, P.
Submitted (2025)
ABSTRACT BIB

In-vehicle agents (IVAs) have emerged as a transformative innovation for intelligent transportation systems. This paper presents the development and evaluation of a robot-like IVA prototype with emotional feedback capabilities for SAE Level 3 automated vehicles. A user study assessed emotional interactions between drivers and IVA. The results showed that emotional feedback and driver working status did not have a significant effect on average workload or acceptance (usefulness and satisfaction). However, emotional feedback influenced physical and temporal demands, and its interaction with working status significantly affected the overall workload. Voice communication remained the main interaction mode, especially when drivers were engaged in other tasks. The study highlighted the challenges of accurately detecting emotions through facial recognition in automated driving scenarios, emphasised the need to consider physical conditions such as fatigue and stress, and insight into the participants' perspectives towards the IVA robot.

Robot like in-vehicle agent for a level 3 automated car
Zeng, X., Alam, M. S., Bazilinskyy, P.
Submitted (2025)
ABSTRACT BIB

With the rapid development of automotive technology and artificial intelligence, in-vehicle agents have great potential to solve the challenges of explaining the status of the system and the intentions of an automated vehicle. A robot-like in-vehicle agent was designed and developed to explore the in-vehicle agent communicating through gestures and facial expressions with a driver in a SAE Level 3 automated vehicle. An experiment with 12 participants was conducted to evaluate the prototype. The results showed that facial expression and gesture interactions can reduce workload (NASA TLX mean scores: baseline = 33%, facial expressions = 23%, gestures = 18%) and increase usefulness and satisfaction. In general, gestures were preferred by 7 of 12 participants due to their practicality and earlier signal timing, while facial expressions were preferred by the remaining 5 participants for their emotional and aesthetic appeal. These findings highlight the distinct advantages of gesture-based interactions for functional communication and facial expressions for emotional connection in automated driving scenarios.

Generating realistic traffic scenarios: A deep learning approach using generative adversarial networks (GANs)
Alam, M. S., Martens, M., Bazilinskyy, P.
Human Interaction and Emerging Technologies (IHIET-AI 2025): Artificial Intelligence and Future Applications, Málaga, Spain (2025)
ABSTRACT BIB

Traffic simulations are crucial for testing systems and human behaviour in transportation research. This study investigates the potential efficacy of Unsupervised Recycle Generative Adversarial Networks (Recycle–GANs) in generating realistic traffic videos by transforming daytime scenes into nighttime environments and vice-versa. By leveraging Unsupervised Recycle-GANs, we bridge the gap between data availability during day and night traffic scenarios, enhancing the robustness and applicability of deep learning algorithms for real - world applications. GPT-4V was provided with two sets of six different frames from each day and night time from the generated videos and queried whether the scenes were artificially created based on lightning, shadow behaviour, perspective, scale, texture, detail and presence of edge artefacts. The analysis of GPT-4V output did not reveal evidence of artificial manipulation, which supports the credibility and authenticity of the generated scenes. Furthermore, the generated transition videos were evaluated by 15 participants who rated their realism on a scale of 1 to 10, achieving a mean score of 7.21. Two persons identified the videos as deep-fake generated without pointing out what was fake in the video; they did mention that the traffic was generated.

Cross or Nah? LLMs Get in the Mindset of a Pedestrian in front of Automated Car with an eHMI
Alam, M. S., Bazilinskyy, P.
Submitted (2025)
ABSTRACT BIB

This study examines the effectiveness of using large language model-based personas to evaluate external Human-Machine Interfaces (eHMIs) in automated vehicles. 13 different models namely BakLLaVA, ChatGPT-4o, DeepSeek-VL2, Gemma 3: 12B, Gemma 3: 27B, Granite Vision 3.2, LLaMA 3.2 Vision, LLaVA-13B, LLAVA-34B, LLaVA-LLaMA-3, LLaVA-Phi3, MiniCPM-V, and Moondream were used to simulate pedestrian perspectives. Models assessed vehicle images with eHMI, assigning scores from 0 (completely unwilling) to 100 (fully confident) regarding crossing decisions. Each model was run 15 times across the full set of images, both with and without prior conversational context. The resulting confidence scores were then compared with crowdsourced human ratings. The findings indicate Gemma3: 27B performed better without chat history (r = 0.85), while ChatGPT-4o was superior when the historical context was included (r = 0.81). In contrast, DeepSeek-VL2 and BakLLaVA gave similar scores regardless of context, while LLaVA-LLaMA-3, LLaVA-Phi3, LLaVA-13B, and Moondream produced only limited-range outputs in both cases.

Pedestrian planet: What YouTube driving from 233 countries and territories teaches us about the world
Alam, M. S., Martens, M. H., Bazilinskyy, P.
Adjunct Proceedings of the 16th International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutoUI). Brisbane, QLD, Australia (2025)
ABSTRACT BIB

Pedestrian crossing behaviour varies globally. This study analyses dashcam footage from the CROWD dataset, covering 233 countries and territories, to examine crossing initiation time, crossing speed, and contextual variables, including detected vehicles, traffic mortality, GDP, and Gini coefficient. Qatar had the longest mean crossing initiation time (6.44 s), while China exhibited the fastest crossing speed (1.69 m/s). On average, worldwide, pedestrians exhibited a crossing initiation time of 3.18 s and crossing speed 1.20 m/s. Crossing speed and crossing initiation time are negatively correlated (r = -0.18), indicating slower crossings after longer hesitation. Crossing speed is negatively correlated with Gini coefficient (r = -0.19) and positively correlated with traffic mortality (r = 0.18). Similar crossing times in countries with different infrastructures, such as Bangladesh (3.42 s) and the Netherlands (3.40 s), underscore the complex interaction between infrastructure and behavioural adaptation. These findings emphasise the importance of culturally aware road design and the development of adaptive interfaces for vehicles.

Exploring Veo 3's Capabilities for Generating Urban Traffic Scenes in 76 Cities Worldwide
Alam, M. S., Wang, Z., Zhang, L., Bazilinskyy, P.
Submitted (2025)
ABSTRACT BIB

This study explores the potential of Google Veo 3, a generative video model, to synthesise 8-second dashcam-style urban traffic scenes solely based on text prompts in 76 cities across six continents. YOLOv11x was used to count facts like the number of road users, traffic lights, and stop signs, revealing variations across cities: Karachi had the most objects detected (79), while Muscat had only four cars. Audio analysis using dBFS showed that Montevideo was the loudest, while Copenhagen was the loudest. Through a qualitative visual analysis, the authors assessed and confirmed the perceived authenticity of most traffic scenes and highlighted AI errors, including the inability to handle non-English languages in these videos. Moreover, we compared 10 synthetic videos of New York City and Kampala, each, and verified that Veo 3 is consistent. To summarise, Veo 3 is capable of synthesising authentic, logical traffic scenes worldwide; nevertheless, it still poses non-negligible errors.

Pedestrian crossing behaviour in front of electric vehicles emitting synthetic sounds: A virtual reality experiment
Bazilinskyy, P., Alam, M. S., Merino-Martınez, R.
54th International Institute of Noise Control Engineering (Internoise). Sao Paulo, Brazil (2025)
ABSTRACT BIB

The increasing adoption of electric vehicles (EVs), which operate more quietly than internal combustion engine vehicles, raises concerns about their detectability, particularly for visually impaired road users. Regulations mandate exterior sound signals for EVs, ensuring minimum sound pressure levels at low speeds. However, these signals are often used in already noisy urban environments, creating a challenge: enhancing detectability without adding excessive noise pollution. This study explores the use of synthetic exterior sounds that balance high noticeability with low annoyance. An audiovisual experiment was conducted with 20 participants in 15 virtual reality scenarios featuring an EV passing in front of them. Different sound signals, including pure, intermittent, and complex tones at varying frequencies, were tested alongside two baseline cases (a diesel engine and tyre noise alone, i.e., no synthetic sound added). Participants rated sounds for annoyance, noticeability, and informativeness using 11-point ICBEN scales. Trigger measurements provided additional insights into their willingness to cross in front of the EV. The results highlight optimal sound characteristics for EVs, offering guidance on improving pedestrian safety while minimising noise pollution. By refining exterior sound design, this research contributes to the development of effective and user-friendly EV sound standards, ensuring safer and more inclusive urban environments.

Psychoacoustic assessment of synthetic sounds for electric vehicles in a virtual reality experiment
Bazilinskyy, P., Alam, M. S., Merino-Martınez, R.
11th Convention of the European Acoustics Association (Euronoise). Malaga, Spain (2025)
ABSTRACT BIB

The growing adoption of electric vehicles, known for their quieter operation compared to internal combustion engine vehicles, raises concerns about their detectability, particularly for vulnerable road users. To address this, regulations mandate the inclusion of exterior sound signals for electric vehicles, specifying minimum sound pressure levels at low speeds. These synthetic exterior sounds are often used in noisy urban environments, creating the challenge of enhancing detectability without introducing excessive noise annoyance. This study investigates the design of synthetic exterior sound signals that balance high noticeability with low annoyance. An audiovisual experiment with 14 participants was conducted using 15 virtual reality scenarios featuring a passing car. The scenarios included various sound signals, such as pure, intermittent, and complex tones at different frequencies. Two baseline cases, a diesel engine and only tyre noise, were also tested. Participants rated sounds for annoyance, noticeability, and informativeness using 11-point ICBEN scales. The findings highlight how psychoacoustic sound quality metrics predict annoyance ratings better than conventional sound metrics, providing insight into optimising sound design for electric vehicles. By improving pedestrian safety while minimising noise pollution, this research supports the development of effective and user-friendly exterior sound standards for electric vehicles.

Vibe Coding in Practice: Building a Driving Simulator Without Expert Programming Skills
Fortes-Ferreira, M., & Alam, M. S., Bazilinskyy, P.
Submitted (2025)
ABSTRACT BIB

The emergence of Large Language Models has introduced new opportunities in software development, particularly through a revolutionary paradigm known as vibe coding or 'coding by vibes,' in which developers express their software ideas in natural language and AI generates the code. This exploratory case study investigated the potential of vibe coding to support non-expert programmers. A participant without coding experience attempted to create a 3D driving simulator using the Cursor platform and Three.js. The iterative prompting process improved the simulation's functionality and visual quality. The results indicated that LLM can reduce barriers to creative development and expand access to computational tools. However, challenges remain: prompts often required refinements, output code can be logically flawed, and debugging demanded a foundational understanding of programming concepts. These findings highlight that while vibe coding increases accessibility, it does not completely eliminate the need for technical reasoning and understanding prompt engineering.

Deep learning approach for realistic traffic video changes across lighting and weather conditions
Alam, M. S., Parmar, S. H., Martens, M. H., Bazilinskyy, P.
8th International Conference on Information and Computer Technologies (ICICT). Hilo, HI, USA (2025)
ABSTRACT BIB

Recent advances in GAN-based architectures have led to innovative methods for image transformation. The lack of diversity of environmental factors, such as lighting conditions and seasons in public data, prevents researchers from effectively studying the differences in the behaviour of road users under varying conditions. This study introduces a deep learning pipeline that combines CycleGAN-turbo and Real-ESRGAN to improve video transformations of traffic scenes. Evaluated using dashcam videos from Los Angeles, London, and Hong Kong, our pipeline demonstrates a notable improvement in T-SIMM for temporal consistency during night-to-day transformations, achieving a 7.97% increase for Hong Kong, 7.35% for Los Angeles, and 3.41% for London compared to CycleGAN-turbo. PSNR and VPQ scores are comparable, but the pipeline performs better in DINO structure similarity and KL divergence, with up to 153.49% better structural fidelity in Hong Kong compared to Pix2Pix and 107.32% better compared to ToDayGAN. This approach demonstrates better realism and temporal coherence in day-to-night, night-to-day, and clear-to-rainy transitions.

2024

Harnessing traditional controllers for fast-track training of deep reinforcement learning control strategies
Alam, M. S., Carlucho, I.
Journal of Marine Engineering & Technology (2024)
ABSTRACT BIB

In recent years, Autonomous Ships have become a focal point for research, with a specific emphasis on improving ship autonomy. Machine Learning Controllers, especially those based on Reinforcement Learning, have seen significant progress. However, addressing the substantial computational demands and intricate reward structures required for their training remains critical. This paper introduces a novel approach, “Leveraging Traditional Controllers for Accelerated Deep Reinforcement Learning (DRL) Training,” aimed at bridging conventional maritime control methods with cutting-edge DRL techniques for vessels. This innovative approach explores the synergies between stable traditional controllers and adaptive DRL methodologies, known for their complexity handling capabilities. To tackle the time-intensive nature of DRL training, we propose a solution: utilizing existing traditional controllers to expedite DRL training by transferring knowledge from these controllers to guide DRL exploration. We rigorously assess the effectiveness of this approach through various ship maneuvering scenarios, including different trajectories and external disturbances like winds. The results unequivocally demonstrate accelerated DRL training while maintaining stringent safety standards. This groundbreaking approach has the potential to bridge the gap between traditional maritime practices and contemporary DRL advancements, facilitating the seamless integration of autonomous systems into maritime operations, with promising implications for enhanced vessel efficiency, cost-effectiveness, and overall safety.

From A to B with ease: User-centric interfaces for shuttle buses
Alam, M. S., Subramanian, T., Martens, M., Remlinger, W., & Bazilinskyy, P.
16th International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutoUI) Stanford, CA, USA (2024)
ABSTRACT BIB

User interfaces are crucial for easy travel. To understand user preferences for travel information during automated shuttle rides, we conducted an online survey with 51 participants from 8 countries. The survey focused on the information passengers wish to access and their preferences for using mobile, private, and public screens during boarding and travelling on the bus. It also gathered opinions on the usage of Near-Field Communication (NFC) for shuttle bus confirmation and viewing assistance to help passengers stand precisely where the shuttle will arrive, overcoming navigation and language barriers. Results showed that 72.6% of participants indicated a need for NFC and 82.4% for viewing assistance. There was a strong correlation between preferences for shuttle bus schedules, route details (r=0.55), and next-stop information (r=0.57) on mobile screens, suggesting that passengers who value one type of information are likely to value related kinds too.

2023

AI on the water: Applying drl to autonomous vessel navigation
Alam, M. S.,Sanjeev Kumar, R.S., Somayajula, A.
Proceedings of the Sixth International Conference in Ocean Engineering (ICOE2023) (2023)
ABSTRACT BIB

Human decision-making errors cause a majority of globally reported marine accidents. As a result, automation in the marine industry has been gaining more attention in recent years. Obstacle avoidance becomes very challenging for an autonomous surface vehicle in an unknown environment. We explore the feasibility of using Deep Q-Learning (DQN), a deep reinforcement learning approach, for controlling an underactuated autonomous surface vehicle to follow a known path while avoiding collisions with static and dynamic obstacles. The ship's motion is described using a three-degree-of-freedom (3-DOF) dynamic model. The KRISO container ship (KCS) is chosen for this study because it is a benchmark hull used in several studies, and its hydrodynamic coefficients are readily available for numerical modelling. This study shows that Deep Reinforcement Learning (DRL) can achieve path following and collision avoidance successfully and can be a potential candidate that may be investigated further to achieve human-level or even better decision-making for autonomous marine vehicles.

Navigating the Ocean with DRL: Path following for marine vessels
Jose, J., & Alam, M. S., Somayajula, A.S.
Proceedings of the Sixth International Conference in Ocean Engineering (ICOE2023) (2023)
ABSTRACT BIB

Human error is a substantial factor in marine accidents, accounting for 85% of all reported incidents. By reducing the need for human intervention in vessel navigation, AI-based methods can potentially reduce the risk of accidents. AI techniques, such as Deep Reinforcement Learning (DRL), have the potential to improve vessel navigation in challenging conditions, such as in restricted waterways and in the presence of obstacles. This is because DRL algorithms can optimize multiple objectives, such as path following and collision avoidance, while being more efficient to implement compared to traditional methods. In this study, a DRL agent is trained using the Deep Deterministic Policy Gradient (DDPG) algorithm for path following and waypoint tracking. Furthermore, the trained agent is evaluated against a traditional PD controller with an Integral Line of Sight (ILOS) guidance system for the same. This study uses the Kriso Container Ship (KCS) as a test case for evaluating the performance of different controllers. The ship's dynamics are modeled using the maneuvering Modelling Group (MMG) model. This mathematical simulation is used to train a DRL-based controller and to tune the gains of a traditional PD controller. The simulation environment is also used to assess the controller's effectiveness in the presence of wind.

Data Driven Control for marine vehicle maneuvering
Alam, M. S.
(2023)
ABSTRACT BIB

The majority of global marine accidents are caused by human decision-making errors, which has resulted in increased interest in automation within the marine industry. However, obstacle avoidance for autonomous surface vehicles in unknown environments is particularly difficult. This study investigates the possibility of utilizing a deep reinforcement learning (DRL) approach to control an underactuated autonomous surface vehicle following a predetermined path while avoiding collisions with static and dynamic obstacles. The ship’s movement is modelled using a three-degree-of-freedom (3-DOF) dynamic model, with the KRISO container ship (KCS) being selected for the study due to its extensive use in previous research and readily available hydrodynamic coefficients for numerical modelling. The study evaluates the performance of various DRL algorithms, such as Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) algorithms, for path following and their effectiveness in the presence of wind, as well as comparing them to the traditional PD controller. The study also explores DQN and DDPG algorithms for both static and dynamic obstacle avoidance and proposes a hybrid network that uses two networks for improved path following and obstacle avoidance capabilities.

Deep reinforcement learning based controller for ship navigation
Deraj, R., Sanjeev Kumar, R. S., Alam, M. S., Somayajula, A.
Ocean Engineering (2023)
ABSTRACT BIB

A majority of marine accidents that occur can be attributed to errors in human decisions. Through automation, the occurrence of such incidents can be minimized. Therefore, automation in the marine industry has been receiving increased attention in the recent years. This paper investigates the automation of the path following action of a ship. A deep Q-learning approach is proposed to solve the path-following problem of a ship. This method comes under the broader area of deep reinforcement learning (DRL) and is well suited for such tasks, as it can learn to take optimal decisions through sufficient experience. This algorithm also balances the exploration and the exploitation schemes of an agent operating in an environment. A three-degree-of-freedom (3-DOF) dynamic model is adopted to describe the ship’s motion. The Krisco container ship (KCS) is chosen for this study as it is a benchmark hull that is used in several studies and its hydrodynamic coefficients are readily available for numerical modeling. Numerical simulations for the turning circle and zig-zag maneuver tests are performed to verify the accuracy of the proposed dynamic model. A reinforcement learning (RL) agent is trained to interact with this numerical model to achieve waypoint tracking. Finally, the proposed approach is investigated not only by numerical simulations but also by model experiments using 1:75.5 scaled model.

Comparison of path following in ships using modern and traditional controllers
Sanjeev Kumar, R. S., Alam, M. S., Reddy, B., Somayajula, A.S.
Proceedings of the Sixth International Conference in Ocean Engineering (ICOE2023) (2023)
ABSTRACT BIB

Vessel navigation is difficult in restricted waterways and in the presence of static and dynamic obstacles. This difficulty can be attributed to the high-level decisions taken by humans during these maneuvers, which is evident from the fact that 85% of the reported marine accidents are traced back to human errors. Artificial intelligence-based methods offer us a way to eliminate human intervention in vessel navigation. Newer methods like Deep Reinforcement Learning (DRL) can optimize multiple objectives like path following and collision avoidance at the same time while being computationally cheaper to implement in comparison to traditional approaches. Before addressing the challenge of collision avoidance along with path following, the performance of DRL-based controllers on the path following task alone must be established. Therefore, this study trains a DRL agent using Proximal Policy Optimization (PPO) algorithm and tests it against a traditional PD controller guided by an Integral Line of Sight (ILOS) guidance system. The Krisco Container Ship (KCS) is chosen to test the different controllers. The ship dynamics are mathematically simulated using the Maneuvering Modelling Group (MMG) model developed by the Japanese. The simulation environment is used to train the deep reinforcement learning-based controller and is also used to tune the gains of the traditional PD controller. The effectiveness of the controllers in the presence of wind is also investigated.

Download all papers in bib file here.

* Joint first author.