Mahmoud M. Abdelaty

I'm a Biomedical Engineer AI Engineer Researcher Data Analyst Software Engineer

A multidisciplinary AI engineer with hands-on experience in machine learning, computer vision, multimodal deep learning, and intelligent systems for healthcare and engineering applications. Skilled in building end-to-end AI solutions—from data acquisition and model development to deployment. Experienced in both research and industry projects, with a strong track record in developing AI-driven diagnostic tools, automated manufacturing systems, and decision-support platforms. Passionate about bridging engineering, data science, and real-world problem-solving to create impactful, reliable, and scalable technologies.

View Resume/CV

Bachelor of Science (BSc) in Systems and Biomedical Engineering

Cairo University

Sep 2020 - July 2025 · Cairo, Egypt

Completed a BSc in Systems and Biomedical Engineering, developing a broad and interdisciplinary foundation that integrates engineering principles with medical and biological sciences. The program strengthened competencies across AI, Medical Technology, Rehabilitation Engineering, Sports and Movement Analysis, Neuroscience, Digital Signal Processing, Computer Vision, Computer Graphics, Embedded Systems, and Software Development.

Completed a graduation project titled “Multimodal AI-based Detection and Tracking of Autistic Behavioral and Communication Patterns.” The project was recognized with 1st Place at the Dell Envision the Future Competition and was awarded the top graduation project in the class. For more details, refer to the Graduation Project section.

Artificial Intelligence Software Development Medical Devices Medical Imaging Digital Signal Processing Embedded Systems Robotics Machine Learning Deep Learning Natural Language Processing Computer Vision Neuroscience Rehabilitation Engineering Sports and Movement Analysis Computer Graphics Biomedical Engineering Multimodal AI Internet of Things (IoT) Data Analysis Data Science Bioinformatics Human-Computer Interaction Assistive Technologies Behavioral Analysis

cGPA: 3.6 / 4.0

View Transcript

Multimodal AI-based Detection and Tracking of Autistic Behavioral and Communication Patterns

Mahmoud Abdelaty¹, Amgad Shaban¹, Ziyad El-Fayoumy¹, Abdulrahman Ghitani¹, Omar Shaban¹

Under the Supervision of:

Prof. Muhammad Rushdi¹, Dr. Eman Marzban¹,

¹ Systems and Biomedical Engineering - Cairo University

Note: Source code and implementation details will be released after completion of papers submission process.

SBME Final Seminar Paper Dell ETF 1st Place Annoncement 1st Place Dell ETF Report 1st Place Dell ETF Video Project Presentation Abstract Video

Abstract

Figure 1. System block diagram of the proposed AI-based autism assessment platform, highlighting parent, expert, and trainee modules.

Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by deficits in communication, social interaction, and the presence of restricted or repetitive behaviors, typically emerging in early childhood. Traditional diagnostic methods are time-consuming, costly, and reliant on expert observation, and this limits accessibility to early intervention. Existing AI-based tools often target isolated symptoms, lacking a comprehensive assessment approach. In this work, we present a multimodal AI-powered system that detects and tracks key behavioral and developmental ASD indicators (including stereotypical behaviors, graphomotor performance, gaze pattern, attention, emotions, blinking rate, and early developmental milestones typically observed via Q-CHAT-10 screening). The proposed system leverages vision transformers (ViT), large-scale learned encoders (LLE), and multimodal language-vision models. Moreover, the system is accessible via a user-friendly, bilingual web interface that allows parents to upload videos and images of their children for automated, remote, contactless, and objective assessments, which can be further validated by specialists. Extensive experimental results on multiple multimodal datasets demonstrate the feasibility and applicability of the proposed system. This approach enhances diagnostic objectivity, expands access to early screening, and supports timely, data-driven intervention for ASD, both at home and in clinical settings.

Autism Spectrum Disorder (ASD) Neurodevelopmental Disorder Multimodal Large Language Models (MLLMs) Vision-Language Models (VLMs) PaliGemma-2 LLaVA LLaVA-ASD CogVLM2-Caption CoNeTTE LoRA DeepSpeed Whisper Multi-scale Vision Transformer (MViT) SWIN Transformer Vision Transformer (ViT) DINov2 ConvNeXt MobileNet CLIP Region Proposal Network (RPN) Mask R-CNN YOLOv8 Detectron2 Intersection Over Union (IoU) Identity Matching Q-CHAT-10 Gaze Large-scale Learned Encoder (Gaze-LLE) Mutual Gaze Joint Attention Head Pose Estimation Eye Aspect Ratio (EAR) Mouth Aspect Ratio (MAR) DeepFace MediaPipe RetinaFace DBSCAN Transfer Learning Ensemble Learning SHAP Analysis Next.js Django Rhubarb-based Lipsync ElevenLabs

Introduction

Autism spectrum disorder (ASD) affects individuals across a wide range of cognitive and behavioral profiles, from those requiring substantial support to those with High-Functioning Autism (HFA). Although ASD is not classified as a disease, it is associated with distinct neurological differences that impact development and behavior. Individuals with ASD often experience challenges in verbal and non-verbal communication, social engagement, emotion regulation, and adapting to environmental changes, all of which can significantly affect daily functioning.

Early intervention has been shown to substantially improve developmental outcomes, particularly when initiated during the critical early childhood period. However, achieving timely diagnosis remains a global challenge. In Egypt, a 2025 nationwide survey reported an ASD prevalence of 1.1% among children aged 1–12 years, while earlier screening initiatives identified approximately 3.3% of children as high-risk for ASD traits.

The current diagnostic process relies heavily on expert observation and clinical assessment, which are time-consuming, costly, and often inaccessible due to a shortage of specialists. This contributes to delayed diagnoses, depriving many children of critical early interventions.

Recent advancements in Artificial Intelligence (AI) and Machine Learning (ML) have introduced promising solutions to support ASD diagnosis. AI models have demonstrated efficacy in detecting atypical gaze patterns, reduced eye contact, motor irregularities, and stereotypical behaviors with high accuracy.

Additionally, tools like the Quantitative Checklist for Autism in Toddlers (Q-CHAT) have emerged as accessible screening mechanisms for identifying ASD risk at an early age.

Nonetheless, most existing AI-based systems focus on isolated symptoms, limiting their clinical utility in providing comprehensive assessments. To address these limitations, we propose a multimodal, AI-powered web platform designed to assess a wide spectrum of ASD-related behaviors and traits. The system integrates advanced models for gaze estimation, graphomotor analysis, attention and emotion recognition, and audiovisual behavior assessment within an interactive, bilingual interface (see Fig. 1). This platform enables objective, scalable, and accessible ASD screening to support both clinicians and caregivers.

Methodology

Stereotypical Behavior Detection Contribution: 100%

Stereotypical behaviors are key indicators of ASD. Monitoring these behaviors are essential for early diagnosis and personalized care planning. However, these behaviors differ from general physical activities, making them difficult to detect using standard computer vision methods. Existing tools have neglected the automated recognition of such behaviors in real-world settings, highlighting a critical gap that we aim to address.

1.1. Datasets

To train and evaluate our model, we utilized two autism-specific datasets: the Self-Stimulatory Behaviors in the Wild for Autism Diagnosis (SSBD) dataset and the Expanded Stereotype Behavior Dataset (ESBD). These datasets present numerous challenges due to their real-world, uncontrolled recording environments. These include spatial variance caused by children's mobility, presence of objects used in stimming, sudden camera shifts, and short bursts of STB amid long video durations. Additionally, some data loss occurred due to unavailable URLs, which reduced the total number of videos. SSBD and ESBD videos were shared by parents and caregivers on public platforms and categorized into three stimming behaviors: arm flapping, head banging, and spinning. Arm flapping often occurs when children struggle with communication or experience sensory overload. Head banging involves self-injurious actions such as hitting the head with hands or against solid objects, while spinning involves repetitive full body turning movements. All videos included were reviewed and validated by professional clinicians to ensure authenticity and diagnostic relevance.

Sample frames from SSBD and ESBD datasets

Figure 1.1 Examples of Stereotypical Behaviors from SSBD and ESBD Datasets.

1.2. Preprocessing

These two datasets are noisy and contain a large portion of the background or other subjects. To enhance recognition accuracy, we first preprocess the videos to obtain cleaner data that include target children performing ASD behaviors only. To this end, we leverage one of the most popular object detection models Detectron2. After that we apply a pose estimation model leveraging MediaPipe framework.

Preprocessing pipeline for STB recognition

Figure 1.2. Preprocessing pipeline for stereotypical behaviour recognition using Detectron2 and MediaPipe pose estimation.

1.2.1. Child Detection

To detect and crop the target child from each video frame, we utilized Detectron2 which leverages a two-stage architecture:

The Region Proposal Network (RPN) operates as the first stage, scanning the entire image to generate bounding boxes around regions likely to contain objects (proposals). It uses convolutional layers to predict objectness scores and bounding box coordinates.

Figure 1.3. Region Proposal Network (RPN) Architecture for Object Detection.

The second stage involves Mask R-CNN, which refines these proposals by classifying the objects and generating pixel-wise masks. This allows precise localization of the child in each frame. Mask R-CNN adds a segmentation branch to the Faster R-CNN framework, enabling the network to produce a high-resolution binary mask for each region of interest (RoI).

Figure 1.4. Mask R-CNN Architecture for Instance Segmentation.

We used pre-trained weights from the COCO dataset for efficient transfer learning. Once the target child was detected, we cropped the region inside the bounding box, resizing all frames to a fixed resolution, and discarded non-relevant background content to improve behavior-focused feature learning.

Figure 1.5. Example of child detection using Detectron2.

1.2.2. Pose Estimation

To further enhance the understanding of body dynamics, we applied MediaPipe’s holistic pose estimation, extracting 3D landmarks (joints and connections) from each frame. The extracted skeletons were saved as JSON files to preserve temporal consistency and inter-joint relationships across frames.

Figure 1.6. MediaPipe Pose Estimation Framework for Skeletal Keypoint Detection.

1.3. Action Recognition

For the core task of action recognition, we employed MViT-v2, which demonstrated superior spatiotemporal learning capabilities compared to CNNs or other transformer architectures. After benchmarking multiple models, MViT-v2 was chosen for its hierarchical attention mechanisms and robust performance on complex video tasks.

MViT architecture learns visual representations by progressively increasing the channel resolution while reducing the spatiotemporal resolution across stages. Early layers process high-resolution, low-channel features for simple visual patterns, while deeper layers operate on coarser, high-channel representations to capture semantic information. This design is especially effective for dense, space-time video signals where subtle temporal cues, such as brief stimming behaviors, must be detected.

Figure 1.7. Multi-scale Vision Transformer (MViT-v2) Hierarchical Attention Mechanism and Spatiotemporal Processing

A significant advantage of MViT lies in its temporal sensitivity. Unlike Video-SWIN or ViViT models, which rely heavily on static appearance and maintain performance even on shuffled frames, MViT-v2 demonstrates a significant performance drop when frame order is disrupted. This confirms its strong utilization of temporal information, crucial for ASD behavior recognition.

1.4. Experimental Results

This study is the first to explore the use of the MViT-v2 for recognizing STB associated with ASD. On the ESBD and SSBD test splits, our approach achieved an accuracy of 96.55% and an F1-score of 96.52%, significantly surpassing prior benchmarks. These results affirm the viability of using multiscale spatiotemporal transformers and robust preprocessing techniques for ASD-related behavioral recognition.

Method	Acc (%)	Params (M)
3DCNN	42.00	~33
Conv‑LSTM	74.00	~38
VideoMAE (ViT‑B/16)	87.40	303.9
Video‑SWIN + Lnguage	90.04	~92
VST+L(d)(pretrained)	94.43	~94
MViT + Data Augmentation	91.72	~27
MViT + Pose Estimation	93.59	~39
MViT + Detectron2	96.55	~37

Figure 1.8. Confusion Matrix for STB Recognition on ESBD and SSBD Test Splits.

Gaze Estimation and Eye Contact Detection icon

Gaze Estimation and Eye Contact Detection Contribution: 100%

Gaze behaviour analysis provides a non‑invasive approach to assess a child’s attention and social engagement. Our model targets critical visual attention markers such as mutual gaze and joint attention by combining face detection, tracking and gaze estimation without specialised eye‑tracking hardware.

Examples of gaze patterns in ASD assessment

Figure 2.1. Automated Gaze-Based Autism Screening Model Pipeline.

2.1. Preprocessing

2.1.1. Person Detection and Recognition

To ensure accurate behavioral analysis and isolate relevant individuals in diverse video scenes, our system integrates YOLOv8, a state-of-the-art real-time object detection model to detect all persons in each video frame. Detected individuals are then processed through a dedicated Child Recognition module powered by a ViT, enabling the system to identify and focus exclusively on the child subject for subsequent analysis.

2.1.2. Person Tracking

2.1.2.1. Intersection over Union (IoU)

At the core of our tracking mechanism is Intersection over Union (IoU), a geometric metric used to associate object detections across consecutive frames. IoU calculates the overlap between bounding boxes from the current and previous frames, helping determine if they represent the same individual.

Intersection over Union (IoU) illustration

Figure 2.2. Intersection over Union (IoU) Calculation between Two Bounding Boxes.

If the IoU score exceeds a defined threshold, the system considers the detection a continuation of the previous track. This approach is computationally efficient and well-suited for real-time applications. However, it may face limitations under fast motion or occlusion, where appearance-based cues become essential.

Figure 2.3. Computing Intersection over Unions for various bounding boxes.

2.1.2.2. Identity Matching

To enhance robustness in challenging scenarios, the system integrates identity matching techniques in addition to IoU. This involves extracting visual embeddings such as color histograms or deep appearance features from each bounding box and comparing them to those of previously tracked individuals. A similarity score is computed to assess whether the current detection belongs to an existing identity or a new one.

Identity matching using facial embeddings

Figure 2.4. Identity Matching Equation.

Combining spatial (IoU-based) and appearance-based matching increases tracking accuracy, reduces identity switches, and ensures reliable association of behavioral data to the correct child throughout the video sequence.

Figure 2.5. Identity Matching Person Tracking Diagram of the in-context ID prediction process.

2.2. Gaze Estimation (Gaze-LLE)

Gaze-LLE is a transformer-based gaze estimation framework designed for minimal computational overhead and maximum flexibility. Its architecture consists of two primary components:

2.2.1. Frozen Scene Encoder

The backbone of Gaze-LLE is DINOv2, a self-supervised ViT model that produces high-quality, dense scene representations. DINOv2 is trained on large-scale, unlabeled data via contrastive learning (Distillation with No Labels), and excels in both image- and pixel-level vision tasks. In our implementation, the DINOv2 encoder is frozen, meaning its weights are not updated during training. This design choice is motivated by:

Reducing training complexity and memory requirements

Preserving DINOv2’s generalized visual representations

Prevents overfitting on small datasets by leveraging robust, pre-learned representations.

Figure 2.6. Gaze-LLE Framework Architecture - Frozen DINOv2 Backbone with Head Prompting and Transformer-Based Gaze Decoder for Heatmap Generation.

2.2.2. Gaze Decoder Module with Head Prompting

The output of the scene encoder is passed to a transformer-based gaze decoder. This module incorporates head position embeddings (the bounding box of the child’s face) post-encoding, which is an innovation over the traditional approach of integrating them pre-encoding. This architecture enables Gaze-LLE to operate efficiently with only 2.8 million learnable parameters, making it highly suitable for real-time deployment.

Figure 2.7. Gaze Decoder Module with Head Prompting for Enhanced Gaze Estimation.

2.3. Social Gaze Metrics

The output of Gaze-LLE is not limited to raw gaze heatmaps. We extract several quantitative behavioral metrics from its predictions.

2.3.1. Mutual Gaze

Defined as the proportion of frames in which the child's gaze overlaps with the region of the observer’s (e.g., parent's or doctor's) face. Mutual gaze is a well-established early indicator of social-communicative development and is often diminished in children with ASD.

Figure 2.8. Mutual Gaze Analysis with Positional Prompting.

2.3.2. Joint Attention

Assessed using DBSCAN (Density-Based Spatial Clustering of Applications with Noise), a clustering algorithm that groups gaze targets across frames. Consistent gaze clusters over time are interpreted as signs of shared focus such as looking at the same toy or character on the screen.

Joint attention is crucial in early language and social development and is frequently impaired in ASD.

Automated gaze‑based autism screening pipeline

Figure 2.9. Joint Attention Detection Using Gaze Clustering.

Sensorimotor and Socioemotional Markers Analysis icon

Sensorimotor and Socioemotional Markers Analysis Contribution: 100%

This model is designed to capture real-time behavioral, physiological, and emotional cues essential for assessing ASD in toddlers. It offers a scalable, accessible solution that uses webcam to analyze a child’s behavior during interactions with visual content on a screen.

3.1. Visual Content Selection

To facilitate the engagement of autistic children with meaningful and socially relevant visual content, we integrated selected episodes from the “Pablo” series into our platform. Pablo features a five-year-old autistic boy who navigates real-life situations with the help of imaginary animal friends. The series is specifically designed for autistic children, addressing social anxiety, emotional understanding, and communication through relatable storytelling and positive behavioral modeling.

We selected this series because it represents real-life social challenges and coping mechanisms in a way that is emotionally resonant and therapeutically appropriate. By embedding this content in our system's web interface and activating the camera in the background, children interact with the content naturally, which enables the collection of spontaneous valid behavioral responses.

Figure 3.1. Description of the Pablo channel used to present visual stimuli during Sensorimotor and Socioemotional Analysis.

3.2. Facial Landmarks Detection

Once the webcam is activated, frames are continuously captured and processed in real-time. MediaPipe Face Mesh framework is applied to detect and extract 468, 3D facial landmarks from each detected face. These facial regions are then classified using a ViT model, which determines whether each detected face belongs to a child or an adult. Only child faces are retained for downstream analysis, ensuring that all subsequent markers relate specifically to the target subject.

Facial landmarks provide the spatial foundation for several core features. In particular, the centers of the left and right irises are computed by averaging the coordinates of four dedicated landmarks each. The left iris center is derived from landmarks 474-477, while the right iris center is calculated using landmarks 469-472.

Facial landmarks detection using MediaPipe Face Mesh

Figure 3.2. Preprocessing steps prior to eye contact detection. Left to right: 1) Input image; 2) All faces are detected; 3) Child’s face is selected; 4) Facial landmarks are localized; and 5) Head pose is estimated.

3.3. Head Pose Estimation

Head orientation is estimated using reference points derived from the detected 3D facial landmarks, particularly the nose tip, chin, and eye corners. Then we compute the Euler angles (Pitch, Yaw, and Roll). These angles represent the spatial orientation of the head and are tracked over the entire video session.

Head pose estimation using facial landmarks

Figure 3.3. Demonstrates the real-time estimation of head orientation to assess visual attention and engagement.

Head pose variability is an important biomarker in ASD, as abnormal head motion patterns may indicate issues with attention regulation or sensory processing. This variability is captured by measuring changes in the Euler angles across time, thereby producing a quantitative marker of head movement stability.

3.4. Eye Blinking Detection

Blinking analysis is performed through the Eye Aspect Ratio (EAR), which quantifies changes in eyelid separation relative to eye width. Three distances are extracted from the eye contour landmarks: inner vertical distance, outer vertical distance, and horizontal eye width.

Figure 3.4. Eye Aspect Ratio (EAR) Calculation Using Eye Contour Landmarks.

The EAR is computed as the average of the two vertical distances divided by the horizontal distance. A lower EAR indicates eye closure, while a higher EAR corresponds to open eyes. By tracking EAR over time, we can identify blink events and calculate blink rate and duration, which are relevant markers of sensory processing in ASD.

The EAR is calculated using the following formula:

Figure 3.5. Eye Aspect Ratio (EAR) Calculation Equation.

3.5. Emotion Recognition

To analyze emotional responses, each detected child’s face is passed to the DeepFace model. It performs face detection, alignment, and embedding extraction using a deep CNN. The extracted features are compared to pre-trained templates to classify emotions into standard categories.

Figure 3.6. Visual representation of the DeepFace CNN pipeline used to extract emotional features from facial images.

This analysis tracks changes in the child's emotional expression while watching the visual content and provides insight into their affective engagement. The emotion pattern over time is especially useful in evaluating the child’s reaction to specific scenes and identifying moments of empathy, anxiety, or enjoyment.

3.6. Open Mouth Abnormality Detection

Open mouth posturing is another potential marker for ASD and is assessed using the Mouth Aspect Ratio (MAR). MAR is computed from lip landmarks extracted using the Dlib facial landmark detector. The ratio compares the vertical mouth opening to the horizontal lip distance. Consistently high MAR values are interpreted as abnormal open-mouth appearances, which may suggest neuromotor control issues.

Figure 3.7. Mouth Aspect Ratio (MAR) Calculation Using Lip Landmarks.

The MAR is calculated using the following formula:

Figure 3.8. Mouth Aspect Ratio (MAR) Calculation Equation.

3.7. Output Metrics and System Integration

All extracted features are processed and saved in real-time, producing a comprehensive set of behavioral markers for each child session. These include blinking rate, head pose variability, gaze stability, eye contact duration, emotion fluctuation patterns, open mouth appearances, and an overall social engagement score.

Sensorimotor and socioemotional analysis pipeline

Figure 3.9. Eye State, Eye Contact, Mouth State, and Blink State Distributions for a given session.

The model is deployed within a web-based platform where the child is presented with educational video stimuli chosen by the parent. The webcam runs unobtrusively in the background, allowing for a naturalistic interaction setting. In addition to providing immediate insights, this system doubles as a data collection tool for future studies, contributing to a scalable, multimodal framework for early ASD detection and assessment.

Audio‑Visual Behavior Analysis via VLMs icon

Audio‑Visual Behavior Analysis via VLMs Contribution: literature, model selection, and preprocessing

4.1. Dataset Description

The dataset used in this model is the Audio-Visual Autism Spectrum Dataset (AV-ASD), which was created to support research on autism-related behaviors. It includes 928 video clips collected from 569 publicly available YouTube and Facebook videos, showing various behaviors in different settings.

AV-ASD has several advantages over previous autism-related datasets. It contains more clips, more behavior categories, and is the first dataset to include social behaviors. It also supports multi-label annotations, which is important because multiple autism-related behaviors can happen at the same time.

The annotation process was done in two steps. First, six student annotators labeled the behavior in each video. Then, a Speech-Language Pathologist (SLP) with 15 years of experience checked each clip again and labeled all behaviors present using a multi-label format, without marking time intervals.

Figure 4.1. Audio-Visual Autism Spectrum Dataset (AV-ASD) Overview: Distribution of Clips per Behavior Category.

4.2. Preprocessing

To prepare the data for the MLLM, three types of preprocessing steps are performed: one for image representation and two for audio representation.

4.2.1. Image Representation

The visual representation is obtained by uniformly sampling 9 frames from each video. These frames are then arranged in a 3×3 grid to form a single image representing the video clip.

4.2.2. Audio Representation

The first step in audio preprocessing involves generating an audio caption that describes the semantic content of the audio signal. This is achieved using CoNeTTE, an audio captioning model that follows an encoder–decoder architecture. The encoder is a modified ConvNeXt (CNext) model adapted from the vision domain to process audio signals, while the decoder is a Transformer network that generates the caption text by predicting the next word based on previous tokens and the audio representation.

Figure 4.2. CoNeTTE Audio Captioning Model Architecture: Modified ConvNeXt Encoder and Transformer Decoder.

4.2.3. Audio Transcription

The speech transcription stage transforms spoken language from video clips into structured text. This is accomplished using Whisper, a large-scale speech recognition model developed by OpenAI. In this work, the Whisper encoder processes each audio segment and extracts a fixed-length speech representation. To obtain the final speech feature vector, average pooling is applied across the encoder’s time dimension, producing a robust embedding that captures the semantic and acoustic content of the spoken utterances. These transcriptions serve as a textual input for downstream models, enabling the system to recognize and reason about autism-related behaviors involving non-typical language and other speech-linked social indicators.

Figure 4.3. Whisper Speech Transcription Pipeline: Multitask Encoder-Decoder Architecture with Positional Encoding.

4.3. PaliGemma-2 Instruction Tuning

To improve autism behavior recognition, we propose instruction tuning of PaliGemma-2, a powerful VLM that combines visual and textual understanding. The model input includes a 3×3 grid of uniformly sampled video frames, along with a textual prompt enhanced by audio captions and speech transcriptions.

For training, we construct a paired instruction dataset Inst,y, where Inst denotes the multimodal input (image grid + enriched prompt), and y represents the multi-label behavioral annotation corresponding to autism-related categories. The prompt is designed to guide PaliGemma toward recognizing nuanced behaviors such as non-typical language, non-responsiveness, and repetitive motor actions.

Figure 4.4. PaliGemma Instruction Tuning for Multimodal Autism Behavior Recognition from Audio-Visual Inputs.

To enable efficient fine-tuning, we leverage LoRA (Low-Rank Adaptation) or parameter-efficient tuning methods. The training process employs Cross-Entropy Loss with class weighting to address label imbalance, and model performance is monitored using macro-averaged F1 scores. By integrating temporal, linguistic, and semantic cues, this tuning process empowers PaliGemma to capture the complex interplay between speech and behavior, leading to more accurate and explainable predictions for autism screening.

Figure 4.5. PaliGemma-2 Architecture for ASD Behavior Analysis: Multimodal Fusion of Image and Text Tokens with SigLIP Projection.

4.4. Results

The proposed pipeline leverages PaliGemma-v2, achieving higher F1-scores in several behavior categories compared to LLaVA-ASD, despite utilizing 10 billion fewer parameters. However, due to the complexity of the dataset and the inherent challenges of multi-label classification in large-scale multimodal models, the overall performance still requires substantial improvement.

Table 4.1. F1-score comparison between the proposed PaliGemma-2 and LLaVA-ASD across AV-ASD dataset.

Table 4.2. Classification Performance on the Test Dataset (182 Videos Total).

Graphomotor Patterns Analysis Contribution: literature, preprocessing

Graphomotor tasks not only reflect motor development but also reveal patterns of cognitive and expressive functioning. A growing body of research indicates that children with ASD often experience challenges in performing handwriting tasks, which can be detected and quantified using computational models.

This model was developed as part of a broader initiative to explore new diagnostic pathways in ASD assessment. Specifically, it investigates the cognitive distinctions embedded in children’s creative outputs, offering a promising avenue for non-invasive behavioral screening.

5.1. Dataset

The dataset used for this study was sourced from Kaggle, released in 2024, and includes a total of 1,115 labeled images. The images are divided across three types of graphomotor tasks: Drawings, Coloring, and Writing. Each sample is annotated as either ASD or non-ASD. For ASD-labeled data, an additional classification is provided to indicate severity levels: mild, moderate, or severe.

The dataset reflects a variety of expressive patterns and maturity levels that help characterize the motor and cognitive profiles of children. The insights derived from this work extend beyond detection, contributing to the understanding of the relationship between artistic expression and neurodevelopmental conditions.

Graphomotor patterns dataset distribution

Figure 5.1. Sample Coloring and Drawing Data from Children with ASD and Typically Developing (TD) Peers.

5.2. Preprocessing

To ensure optimal performance of the classification models, a standard preprocessing pipeline was implemented. Images were resized to a consistent resolution and normalized to match the input requirements of the different neural network architectures. Noise removal techniques and augmentation methods (rotation, flipping, and contrast adjustments) were selectively applied to enhance generalization and robustness.

5.3. Classification Model

We implemented a two-stage classification framework: the first stage determines whether a child is classified as ASD, and if so, the second stage assesses severity by evaluating performance difficulty or atypicality. This approach achieves high accuracy and demonstrates strong diagnostic reliability.

Graphomotor classification model architecture

Figure 5.2. Two-Stage Classification Framework for ASD Detection and Severity Assessment.

To implement this pipeline, we evaluated a diverse set of models. These models were trained and validated on the preprocessed dataset, with each model evaluated on both tasks (ASD detection and severity assessment). The multi-level inference process introduced a novel granularity in ASD assessment, allowing not only detection but also personalized characterization of motor-related impairments, which is currently underrepresented in existing diagnostic approaches.

5.4. Results

Model training was conducted using the standard supervised learning paradigm. For MobileNet model, we used a Cross-Entropy loss function and Adam optimizer with learning rate tuning based on validation performance. For SVM and RF classifiers, Grid Search was applied to determine optimal hyperparameters.

Data were split into 80% training and 20% testing partitions. Each model underwent multiple iterations with 5-fold cross-validation to ensure statistical validity and reduce overfitting.

Across both stages of classification, our models demonstrated strong performance:

ASD Detection: Multiple models achieved over 95% accuracy, confirming the predictive value of visual and spatial features extracted from drawings, coloring, and handwriting.
Severity Classification: Similarly, classifiers achieved high accuracy in distinguishing between mild, moderate, and severe ASD cases, with the top-performing model maintaining over 95% accuracy.

Table 5.1. Performance of ASD Detection and Severity Estimation Models Using Graphomotor Tasks.

These results highlight the diagnostic potential of graphomotor analysis for early ASD detection. Unlike prior studies that focused only on handwriting, our framework offers enhanced detail and broader applicability in both research and clinical contexts. Additionally, expert reviews by child psychologists confirmed that variations in stroke consistency, pressure patterns, symmetry, and detail correlated with established ASD behavioral markers, reinforcing the validity of our computational approach.

Early Risk Screening with Q‑CHAT‑10 icon

Early Risk Screening via Quantitative Checklist for Autism in Toddlers Contribution: 100%

In our system, we integrate Q-CHAT-10 analysis through a robust ensemble learning model capable of not only binary classification (ASD or non-ASD) but also estimating severity levels (mild, moderate, or severe). This enhances its applicability as a clinical decision support tool.

6.1. Dataset Description

We use a publicly available dataset which includes Q-CHAT-10 responses collected via online questionnaires.

Age Range: Toddlers aged 12–36 months.
Metadata: Includes age, gender, region, familial ASD history, and identity of the test respondent.
Labels: Each entry is annotated as ASD or non-ASD and includes an assigned severity level.

This dataset serves as a valuable resource for validating our AI-powered pipeline in culturally specific contexts.

6.2. Model Architecture

To ensure accuracy and generalizability, we adopted an ensemble machine learning approach. The ensemble output is determined through soft voting, aggregating probabilities from each model to improve robustness against noise and class imbalance. This architecture enables the system to:

Predict the likelihood of ASD based on Q-CHAT-10 responses.
Assign severity levels for ASD-positive cases using a separate calibrated classifier.

The model is trained using stratified k-fold cross-validation (k=5) to ensure reliable performance across different age groups and demographics.

Figure 6.1. Ensemble learning architecture for analysing Q‑CHAT‑10 responses and estimating ASD risk and severity.

6.3. Model Interpretation

We also applied SHAP (SHapley Additive exPlanations) to interpret the model’s predictions and uncover key behavioral indicators most associated with ASD. The most impactful questions include:

A6 (Following gaze), A7 (Empathy and comfort-seeking), A9 (Gestures like waving), A1 (Response to name), A5 (Pretend play): High feature values (in red) push the prediction strongly toward ASD.
A2 (Eye contact), A3 (Pointing to request), A4 (Pointing to share interest), A8 (First words), A10 (Unusual staring): Still contribute, but less impact compared to the top ones.

Figure 6.2. Q‑CHAT‑10 Feature Importance Using SHAP Values: Key behavioral indicators influencing ASD risk predictions.

Demo

Conclusion

This project presents Nabta, a unified, multi-symptom AI-powered web-based platform designed for early screening, monitoring, and assessment of ASD symptoms in toddlers. The platform integrates a series of specialized deep learning models and multimodal assessment tools that collectively address behavioral, social, emotional, and developmental aspects of ASD. Each module is carefully developed based on domain-specific research and validated datasets, achieving state-of-the-art performance in key diagnostic indicators.

The system encompasses six core AI models: (1) a stereotypical behavior recognition model; (2) a gaze estimation model based on GazeLLE for gaze tracking and social interaction analysis; (3) a real-time sensorimotor and socioemotional markers analyzer for detecting gaze direction, head movements, emotions, mouth postures, and blinking patterns; (4) a social behavior analysis model leveraging VLMs to understand the child's verbal responses and engagement in structured video conversations; (5) a graphomotor pattern analysis model; and (6) a Q-CHAT model for questionnaire-based ASD screening. All these models are integrated into a culturally adaptive, user-friendly bilingual web interface that enables accessible interaction.

Through rigorous experimentation, the system has demonstrated high performance across all tasks, surpassing existing tools in the analysis of stereotypical behaviors and fine-motor tasks, while delivering robust gaze estimation. Designed to be fully contactless and non-invasive, the application performs reliably in real-world settings, equipping clinicians and parents with early, data-driven insights. Ultimately, Nabta offers a scalable, accessible, and impactful ASD screening solution, grounded in state-of-the-art AI innovation.

AI Engineer

Ezz Medical Industries

Dec 2024 - Sep 2025

Enhanced SRS automation using a RAG pipeline to reduce manual review time and improve requirement consistency.
Developed a fully functional Risk Management System powered by advanced AI techniques to ensure compliance and efficiency.
Designed a BOM tree system to streamline ventilator manufacturing workflows and improve FDA traceability compliance.
Automated inventory database structuring through an Agentic AI system, improving data accuracy and operational efficiency.
Developed a customized LLM solution based on the Qwen-3 model to accelerate standard-clause matching and improve ISO coverage optimization.

Risk Management Agentic AI LLMs Qwen-3 RAGs Data Analysis Automation

Risk Management System

Internet of Things (IoT) Intern

Information Technology Institute (ITI)

Sep 2024 - Sep 2024

Gained hands-on experience in the IoT value chain, including device layers, connectivity protocols, data acquisition workflows, and cloud integration fundamentals.
Developed IoT applications using the MasterOfThings IoT AEP Platform, including designing dashboards, building automation rules, and deploying real-time monitoring pipelines.
Performed practical IoT laboratory work, including configuring sensors and microcontrollers, connecting devices to IoT platforms, and validating data transmission and device interoperability.
Built and tested end-to-end IoT scenarios combining hardware, networking, and cloud-based analytics for real-world use cases.

Internet of Things (IoT) IoT Value Chain Wireless Communication MQTT Cloud Integration MasterOfThings IoT AEP Real-Time Monitoring IoT Automation Connectivity Protocols Smart Systems

LinkedIn Post View Certificate

Software Architect Intern

Ezz Medical Industries

Jul 2024 - Aug 2024

Designed a comprehensive SRS for the Risk Management System in alignment with ISO 14971 and FDA requirements.
Performed detailed analysis of ISO-14971 standards governing risk management processes in medical devices.
Developed a desktop proof-of-concept application for the internal Risk Management System to validate workflows and system architecture.

Software Requirements Specifications (SRS) PyQt ISO 14971

LinkedIn Post View Certificate

Service Maintenance Engineer Intern

Siemens Healthineers

Aug 2024 - Aug 2024

Participated in hands-on troubleshooting and calibration of medical laboratory and imaging devices.
Attended on-site training sessions focused on medical device operation and maintenan.

Medical Devices Maintenance Field Service Engineering Troubleshooting Calibration

LinkedIn Post View Certificate

Clinical Engineer

Misr International Hospital

Sep 2023 – Sep 2023

Acquired in-depth understanding of hospital operational workflows, clinical protocols, and safety standards for medical environments.
Supported the setup, calibration, and verification of critical medical devices to ensure accurate and reliable performance.

Clinical Engineering Medical Device Calibration Biomedical Safety Hospital Operations Healthcare Protocols Device Verification

LinkedIn Post View Certificate

Calibration Engineer Trainee

Medical Equipment Calibration Lab (MECL), Cairo University

Oct 2022 - Oct 2022

Completed a 2-week intensive training program on the principles and practices of medical equipment calibration.
Gained hands-on experience working with calibration analyzers and measurement standards for biomedical devices.
Performed practical sessions inside MECL focusing on accuracy verification, calibration workflows, and device performance evaluation.

Medical Equipment Calibration Accuracy Verification Measurement Standards

View Certificate

Medical Devices Maintenance Engineer Intern

Misr International Hospital

Jul 2022 – Sep 2022

Performed preventive and corrective maintenance for a wide range of medical devices to ensure optimal functionality and minimize downtime.
Installed and configured Nurse Call Systems, enhancing emergency response efficiency and improving patient safety.
Conducted research to identify equipment-related risk factors and supported compliance with hospital safety standards.
Documented maintenance activities and ensured adherence to device manufacturer specifications.

Medical Equipment Maintenance Preventive Maintenance Corrective Maintenance Nurse Call Systems Patient Safety Biomedical Troubleshooting Risk Assessment

View Certificate

IT & Electrical Engineer Intern

Geniprocess International

Jul 2021 – Aug 2021

Contributed to the design, refinement, and optimization of PCB layouts for networking and embedded electrical systems.
Supported troubleshooting and validation of PCB prototypes to ensure electrical reliability and performance.

PCB Design Circuit Optimization Electrical Engineering Networking Hardware Prototype Testing

Exploring-Cognitive-Control-through-fMRI-Analysis: A Comprehensive Data Analysis Project

This project presents a comprehensive analysis of the Flanker Dataset using functional magnetic resonance imaging (fMRI) to investigate the neural basis of cognitive control. The study examines differences in BOLD responses between Congruent and Incongruent Flanker trials. A full analysis pipeline—including quality control, preprocessing, statistical modeling, and post-hoc evaluation—was conducted using the FSL software suite. The findings provide meaningful insights into the neural mechanisms underlying cognitive control and response inhibition.

Functional MRI (fMRI) Flanker Dataset BOLD (Blood Oxygen Level Dependent) FSL Software Quality Control Multi-level Statistical Analysis ROI Analysis Bash Scripting

GitHub Link Report

Comprehensive Analysis of Gene Expression and Pathway Enrichment in Lung Squamous Cell Carcinoma (LUSC)

This project focuses on analyzing gene expression data to identify differentially expressed genes (DEGs) in Lung Squamous Cell Carcinoma (LUSC) and exploring their biological significance using Gene Set Enrichment Analysis (GSEA). The findings contribute to understanding LUSC mechanisms and identifying potential biomarkers and therapeutic targets.

Gene Expression Lung Squamous Cell Carcinoma Differentially Expressed Genes Gene Set Enrichment Analysis Fold Change Hypothesis Testing Volcano Plot

GitHub Link Report

Climate Change Modeling Using PDEs

This project presents a comprehensive analysis of climate change models using partial differential equations (PDEs). The study focuses on developing and solving mathematical models to simulate climate dynamics and predict future climate scenarios. Advanced numerical methods and computational techniques are employed to analyze the impact of various factors on climate change.

Partial Differential Equations Climate Modeling Numerical Methods Computational Techniques Simulation Mathematical Modeling Navier-Stokes Equations MatLab

LinkedIn Project Video View Certificate

Hermes AI-Powered Mobile App for Automated Gait Pose Estimation and Analysis Using LLMs

Developed a mobile gait analysis app using MediaPipe Pose for 3D landmark detection and extracted temporal gait parameters from video input.
Integrated patient-specific data and utilized Gemini 2.0 Flash LLM to generate clinical recommendations, risk assessments, and personalized rehabilitation insights.

Gait Analysis Prosthetics Mobile Application Pose Estimation MediaPipe Pose 3D Landmark Detection Temporal Gait Parameters Gemini 2.0 Flash LLM Personalized Rehabilitation Retrieval-Augmented Generation (RAG) Expo React Native FastAPI TypeScript SQLAlchemy

GitHub Link Report

VisualMinds – Computer Vision Toolbox

VisualMinds is a comprehensive repository showcasing advanced computer vision applications, all developed from scratch using C++ and the Qt framework. Each application focuses on various aspects of image processing, segmentation, and analysis, delivering robust desktop solutions.

C++ Qt Framework Computer Vision Toolkit Snake Contour Hough Transform Edge Detection Harris Corner Detection SIFT Feature Extractor Normalized Cross-Correlation Sum of Squared Differences Image Matching Key Feature Detector Image Segmentation Otsu Thresholding K-Means Mean Shift Agglomerative Clustering Region Growing

Preprocessing Toolkit Image Descriptor Segmentor Snake Contour Vision

DICOM Visualizer

DICOM Visualizer is a Python desktop application built with PyQt6 and VTK that facilitates the visualization of DICOM data in 3D. This tool empowers users to upload DICOM datasets and visualize them using two distinct rendering methods: Raycast Rendering and Surface Rendering. Additionally, users have the flexibility to fine-tune rendering parameters such as ambient, diffuse, specular, and specular power for Raycast Rendering, and ISO value for Surface Rendering. Furthermore, the application allows users to customize the color of the visualization.

Computer Graphics DICOM PyQt6 VTK 3D Visualization Raycast Rendering Surface Rendering Medical Imaging Volume Rendering Rendering Parameters Medical Data Visualization

GitHub Link

Ultrasound Simulation

This project involved simulating ultrasound pressure fields generated by a phased-array transducer using the MATLAB UltraSound Toolbox (MUST). The task required modeling focused, diverging, and multi-focus acoustic fields by calculating precise transmit delays for each piezoelectric element based on beamforming and steering principles. Pressure fields were computed across a 2D polar grid and analyzed using RMS pressure intensity metrics to examine beam shape, focal quality, and energy distribution. Through this simulation workflow, the project explored how phased-array beamforming controls acoustic field geometry, resolution, and field-of-view—key components in diagnostic imaging and therapeutic ultrasound applications.

MATLAB UltraSound Toolbox (MUST) Ultrasound transducer simulation Image Reconstruction P4-2v phased array Polar grid generation Beamforming delay algorithms RMS pressure analysis Acoustic field simulation Chroma Features Pressure field simulation Interference-based focusing

GitHub Link Report

Fat/Water Separation Using Dixon Technique in Multi-Spin Echo MRI for Mid-Thigh Imaging

This project focuses on implementing the Dixon MRI technique to separate fat and water signals in mid-thigh imaging using multi–spin-echo acquisitions. The workflow involves generating in-phase and opposed-phase images, computing water-only and fat-only maps through mathematical decomposition, and analyzing the resulting tissue contrasts to evaluate muscle integrity and fat infiltration. Data from MRI scanners were processed using deep-learning–accelerated reconstruction and visualized through Python-based tools.

MRI scanners MESE Sequences Multi-spin-echo imaging Dixon Fat-Water Separation Dual-echo signal decomposition Water–fat separation modeling Enhanced musculoskeletal contrast T2 Mapping Chemical Shift Imaging In-phase and Opposed-phase Imaging Streamlit

GitHub Link Report

Simulate metal implants in CT data, introduce corresponding artifacts, and implement a metal artifact correction technique

The project simulates metal artifacts in CT imaging and applies a sinogram inpainting method to reduce them. Synthetic metal objects were added to CT images, projected into sinogram space, and reconstructed to visualize artifact formation. The correction approach identified metal-corrupted sinogram regions and replaced them using interpolation, followed by post-processing to enhance edges and contrast. The results show noticeably improved image quality, demonstrating that projection-domain inpainting is an effective strategy for metal artifact reduction.

Metal Artifact Reduction (MAR) Computed Tomography (CT) Sinogram Inpainting Radon Transform Beam Hardening Photon Starvation Image Reconstruction Filtered Back-Projection Medical Imaging Simulation CLAHE Enhancement

GitHub Link Report

PandoraBCI - BCI-EEG Movement Prediction

Pandora BCI is a project built around EEG data, we built a deep learning model to predict patietns' movements based on their EEG signals. Our dataset is WAY-EEG-GAL, the model is then deployed to a Flask web server. A desktop application was built to visualize the EEG signals of patients (Specific channels), and a mobile application was built to visualize the model itself and use it to predict a series of movements based on the input EEG signals file. We have built an artificial arm with ESP8266 to be controlled by the predictions of the patients, and it can also be controlled by the mobile application.

EEG-BCI Deep Learning Neural Networks PyTorch EEG Signal Processing WAY-EEG-GAL Dataset Mobile Application Desktop Application Flask Electron.js React Native ESP8266 Servo Motors Internet of Things (IoT)

GitHub Link

Real-Time Vision Processing and Face Recognition

cv_do_you_know_me is a robust computer vision project that integrates multiple advanced features for facial and body part detection, real-time video processing, and facial recognition. This project is designed to explore the capabilities of various computer vision libraries and models, making it a comprehensive tool for developers and researchers interested in face detection, landmark tracking, and more.

Python PyQt5 Computer Vision MTCNN Viola-Jones Haar Cascades Dlib OpenCV MediaPipe DeepFace Principal Component Analysis (PCA) YaleFaces Dataset Face Recognition Eye Detection Smile Detection Pose Estimation Facial Landmark Tracking Ear Detection Nose Detection Real-Time Video Processing Skin Segmentation Virtual Makeup Real-time Snapchat Filters

GitHub Link

Autonomous Drone Control

Developed an autonomous drone system using an ESP32 microcontroller for real-time flight control and communication. Implemented the A* pathfinding algorithm to enable dynamic route planning and efficient obstacle avoidance in constrained environments. Designed and executed end-to-end mission simulations in Unity, integrating physics-based models to evaluate navigation accuracy, environmental interaction, and system reliability prior to field deployment.

Python Unity A* Algorithm ESP32 MPUs PID Control Autonomous Systems Mission Simulation QGroundControl PX4 Autopilot UAV Control Obstacle Avoidance Pathfinding Drone Navigation Sensor Fusion Over-the-Air (OTA) Programming Internet of Things (IoT) Real-Time Systems

GitHub Link

SonicCipher - Audio Fingerprint Recognition

SonicCipher is an innovative project that leverages digital signal processing (DSP) and machine learning to create a unique and secure identification system. By combining voice fingerprinting and word fingerprinting, SonicCipher excels in speaker identification and word recognition.

Digital Signal Processing Machine Learning Speaker Identification Word Recognition Audio Fingerprinting MFCC Chroma Features Spectral Contrast Support Vector Classifier (SVC)

GitHub Link

Finetuning Vision–Language Models (VLMs)

This project focuses on finetuning state-of-the-art Vision–Language Models, including PaliGemma and Florence-2, to perform domain-specific multimodal tasks such as OCR-based electronic meter reading, visual understanding, and image-to-text generation. By aligning image and text embeddings through advanced encoders and projection layers, the models learn to interpret visual scenes and produce accurate textual outputs for real-world applications.

Vision–Language Models (VLMs) PaliGemma Florence-2 Multimodal Learning CLIP SigLIP PaliGemma DINO OCR Image-to-Text

GitHub Link

Enhanced Skin Cancer Classification using Pre-trained CNN Models and Transfer Learning

This project presents a clinical decision support system for automated skin cancer classification using transfer learning with pre-trained CNNs. Dermoscopic images from the HAM10000 dataset were preprocessed through resizing, augmentation, and class-imbalance correction before being trained on four deep learning architectures: VGG16, ResNet50, DenseNet201, and MobileNetV2. Extensive experiments evaluated multiple training configurations, including weighted loss functions and augmented batches, with DenseNet201 achieving the highest performance, reaching over 94% testing accuracy.

Skin Cancer Classification HAM10000 Dataset Transfer Learning Convolutional Neural Networks VGG16 ResNet50 DenseNet201 MobileNetV2 Data Augmentation Weighted Loss Function Dermoscopic Image Analysis Clinical Decision Support System (CDSS)

GitHub Link Report Poster

EEG Alcohol Analysis

This project analyzes the EEG Alcohol dataset to investigate neural patterns associated with alcohol consumption using multichannel electrophysiological recordings. The dataset contains 1-second EEG segments collected from 64 scalp electrodes at 256 Hz for both alcoholic and control subjects under varying stimulus conditions. The analysis pipeline includes data cleaning, artifact handling, temporal–spatial feature extraction, and classification of alcohol consumption status using machine-learning models. Extracted features capture spectral power, statistical descriptors, and channel-specific activity patterns that differentiate alcoholic from non-alcoholic EEG profiles.

EEG Signal Analysis Alcohol Consumption Classification Time–Series Feature Extraction Multichannel EEG Stimulus Response Signal Preprocessing Spectral Features Event-Related Potentials (ERP)

GitHub Link

Round Robin Arbiter

This project implements a Verilog-based Round Robin Arbiter to manage fair and deterministic access to a shared hardware resource among multiple requesters. Using clocked sequential logic and a rotating priority scheme, the arbiter guarantees starvation-free scheduling and balanced resource distribution. Simulation results confirm predictable behavior under varying load conditions, demonstrating the effectiveness of Round Robin arbitration in concurrent digital systems.

Round Robin Arbitration Verilog HDL Bus Access Scheduling Clocked Sequential Logic Rotating Priority Scheme Starvation-Free Scheduling Digital Circuits Design Hardware Resource Management Hardware Control Logic

GitHub Link

LED Blinking with Watchdog Supervision Project

This project implements a watchdog-supervised LED blinking system designed to ensure precise timing and robust fault recovery in embedded applications. The LED toggles with a fixed 500 ms period, driven by periodic calls to the LED management component, while a watchdog timer with a 50 ms timeout supervises system activity. Watchdog management monitors the execution frequency of the LED manager and refreshes the watchdog only when timing and aliveness conditions are satisfied.

Embedded C Microcontroller Programming ATmega328P Watchdog Timer LED Blinking Control Embedded Systems GPIO Management Sequential Logic Periodic Timers Anomaly Detection Timer/Interrupts

GitHub Link

ADVIA 1800 Prototype

This project involves the development of a prototype for the ADVIA 1800, a crucial piece of medical equipment used for high-throughput clinical chemistry analysis. The prototype is designed and built as part of the Medical Equipment 2 course, under the supervision of Dr. Eman Ayman, in collaboration with Siemens Healthineers.

Arduino ESP8266 Chemical Analyzer Siemens Healthineers Medical Equipment Embedded Systems

GitHub Link LinkedIn Post

TheraPlan: Integrated Radiotherapy Department Design and Planning System

This project presents a comprehensive, standards-compliant architectural and technical design for a modern radiotherapy department. It encompasses spatial layout, room-specific standards, medical equipment integration, safety infrastructure, IT systems, and cost planning, aligned with IAEA, NCRP, and IEC regulations.

Radiotherapy Department Design Architectural Planning Medical Equipment Integration Cost Estimation AutoCAD Layout Healthcare Facility Design Radiation Safety Workflow Optimization Biomedical Engineering

GitHub Link AutoCAD Layout

Dell Technologies – Envision the Future Competition

Dell Technologies

Awarded 1st Place out of 259 senior projects from 77 universities across 20 countries in MENA and Turkey.
Recognized for the project “Multimodal AI-based Detection and Tracking of Autistic Behavioral and Communication Patterns.”

View Certificate Dell Official Announcement View Presentation

Applications of Artificial Intelligence in Science and Education Conference

German University in Cairo (GUC), in collaboration with DAAD & DAAD Alumni Association

Presented Ezz Medical’s AI-Empowered Journey in medical-device production.
Showcased end-to-end AI solutions used in the development and automation of medical technologies.

View Presentation

Africa ExCon Conference

Egypt International Exhibition Center (EIEC)

Presented the first launch of the Egyptian ICU ventilator certified with the EDA mark.
Highlighted the development pipeline, certification process, and national manufacturing impact.

LinkedIn Post

12th Undergraduate Research Conference

Abu Dhabi University – Virtual Presentation

Presented the research project “Multimodal AI-based Detection and Tracking of Autistic Behavioral and Communication Patterns.”
Demonstrated multimodal processing pipelines across visual, sensorimotor, and audio data streams.

View Certificate

ITIDA Graduation Projects Support Grant (ITAC)

Information Technology Industry Development Agency (ITIDA)

Received competitive funding under ITIDA’s Graduation Projects Support initiative for the project “Multimodal AI-based Autism Diagnosis.”
Award highlighted by ITIDA among innovative projects shaping Egypt’s tech future across domains such as healthcare, smart technologies, and AI.

View Grant Announcement LinkedIn Post

AI Empower Egypt – Hack for Impact

AI Empower Egypt Program – GM Egypt COE

Awarded a Certificate of Recognition for participation in the AI Empower Egypt “Hack for Impact” program, focusing on practical AI applications and impact-driven solutions.

View Certificate

Click the arrow inside each research box to access a drive folder containing the most influential and inspiring papers in that topic for me.
** Special thanks to all authors whose exceptional research made this knowledge possible.**

Ego-centric Vision

I analyze first-person video to study actions, attention, interaction patterns, and social cues relevant to real-world environments.

Anomaly Detection

I develop models that detect abnormal activities in surveillance, transportation, and clinical settings using temporal-spatial features.

AI in Mental Health

I apply machine learning to identify behavioral, emotional, and social markers that support early assessment and intervention.

AI in Medical Imaging

I work on AI systems for segmentation, classification, and disease detection in radiology, pathology, and biomedical imaging.

AI Architecture

I study the design and structure of modern AI systems, focusing on neural network architectures, model optimization, multimodal pipelines, and the computational principles that enable scalable, efficient, and intelligent behavior.

Activity Recognition

I design models that classify physical, social, and daily human activities from video, motion sensors, or multimodal streams.

Video & Image Analysis

I analyze spatial-temporal patterns to extract behaviors, detect objects, and build visual understanding across diverse domains.

Human Behavior in Transportation

I study decision-making, travel patterns, and multimodal interactions using behavioral modeling and predictive analytics.

Machine Learning Instructor and Mentor

Google Developer Student Clubs (GDSC)

Jan 2025 – Oct 2025

Conducted weekly workshops on machine learning fundamentals, model development, and deployment.
Mentored students on hands-on projects including image classification, sentiment analysis, and recommendation systems.

Head of Biomedical Engineering Committee

IEEE EMBS

Nov 2023 – Aug 2025

Led a team of 12 biomedical engineering students to develop and deliver a comprehensive workshop and mentorship program for first-year and sophomore students.
Organized technical training sessions on medical devices, biomedical instrumentation, signal processing, and clinical technology fundamentals.
Oversaw curriculum planning, event scheduling, and committee operations to ensure smooth program execution.

Biomedical Engineering Committee Instructor

IEEE CUBS

Nov 2022 – Jul 2023

Delivered instructional sessions on biomedical instrumentation, biosignal analysis, and medical imaging fundamentals.
Awarded the “Best Instructor” Certificate in recognition of outstanding teaching quality, mentorship, and contribution to student development.

R&D Engineer

BEAT Student Club

Sep 2021 – Dec 2022

Contributed to research and development activities involving embedded systems, biosignal acquisition, and biomedical prototyping.
Assisted in preparing technical documentation, schematics, and prototype reports.

Embedded Systems Instructor

BEAT Student Club

Jun 2021 – Jul 2022

Taught embedded systems fundamentals, including microcontroller programming, sensor interfacing, and real-time system design.
Prepared structured mini-projects to help beginners move from theory to functional prototypes.

Let’s Work Together

Whether you want to collaborate, discuss ideas, or just chat over a good cup of coffee, feel free to reach out anytime.

Phone

(+1) 780-893-9977

Mmabdela@ualberta.ca

Address

10453 87 Ave NW, Edmonton, AB, Canada

Mahmoud M. Abdelaty

I'm a Biomedical Engineer AI Engineer Researcher Data Analyst Software Engineer

Bachelor of Science (BSc) in Systems and Biomedical Engineering

Multimodal AI-based Detection and Tracking of Autistic Behavioral and Communication Patterns

Abstract

Introduction

Methodology

1.1. Datasets

1.2. Preprocessing

1.2.1. Child Detection

1.2.2. Pose Estimation

1.3. Action Recognition

1.4. Experimental Results

2.1. Preprocessing

2.1.1. Person Detection and Recognition

2.1.2. Person Tracking

2.1.2.1. Intersection over Union (IoU)

2.1.2.2. Identity Matching

2.2. Gaze Estimation (Gaze-LLE)

2.2.1. Frozen Scene Encoder

2.2.2. Gaze Decoder Module with Head Prompting

2.3. Social Gaze Metrics

2.3.1. Mutual Gaze

2.3.2. Joint Attention

3.1. Visual Content Selection

3.2. Facial Landmarks Detection

3.3. Head Pose Estimation

3.4. Eye Blinking Detection

3.5. Emotion Recognition

3.6. Open Mouth Abnormality Detection

3.7. Output Metrics and System Integration

4.1. Dataset Description

4.2. Preprocessing

4.2.1. Image Representation

4.2.2. Audio Representation

4.2.3. Audio Transcription

4.3. PaliGemma-2 Instruction Tuning

4.4. Results

5.1. Dataset

5.2. Preprocessing

5.3. Classification Model

5.4. Results

6.1. Dataset Description

6.2. Model Architecture

6.3. Model Interpretation

Demo

Conclusion

AI Engineer

Internet of Things (IoT) Intern

Software Architect Intern

Service Maintenance Engineer Intern

Clinical Engineer

Calibration Engineer Trainee

Medical Devices Maintenance Engineer Intern

IT & Electrical Engineer Intern

Programming & Scripting Languages

Machine Learning & AI Frameworks

Models & Architectures

Biomedical Engineering Tools & Methods

Multimodal Large Language Models (MLLMs)

Visualisation & Analytics Tools

Geospatial Technologies

IoT, Embedded & Autonomous Systems

Application Development

Development Tools

Exploring-Cognitive-Control-through-fMRI-Analysis: A Comprehensive Data Analysis Project

Comprehensive Analysis of Gene Expression and Pathway Enrichment in Lung Squamous Cell Carcinoma (LUSC)

Climate Change Modeling Using PDEs

Hermes AI-Powered Mobile App for Automated Gait Pose Estimation and Analysis Using LLMs

VisualMinds – Computer Vision Toolbox

DICOM Visualizer

Ultrasound Simulation

Fat/Water Separation Using Dixon Technique in Multi-Spin Echo MRI for Mid-Thigh Imaging

Simulate metal implants in CT data, introduce corresponding artifacts, and implement a metal artifact correction technique

PandoraBCI - BCI-EEG Movement Prediction

Real-Time Vision Processing and Face Recognition

Autonomous Drone Control

SonicCipher - Audio Fingerprint Recognition

Finetuning Vision–Language Models (VLMs)

Enhanced Skin Cancer Classification using Pre-trained CNN Models and Transfer Learning