In this study, we develop an AI platform for the screening and management of strabismus, formatted as an applet runnable on any mobile device (cell phone, iPad, etc.). This platform uses corneal light reflection images for initial screening and includes digital archives for storing patient test results and doctor prescriptions. This facilitates a comprehensive understanding of the patient's condition. Additionally, a patient-physician interaction module enhances communication between doctors and patients. Collectively, these features enable effective management of strabismus patients from initial screening through to long-term follow-up.
This retrospective and prospective cross-sectional study received Ethical Approval from the Ethics Committee of West China Hospital, Sichuan University. The photos are collected from two datasets covering different ages (median:6, range: ([1, 74]), including an inpatient-based dataset and an outpatient-based dataset. The inpatient-based dataset includes photos from patients before and after surgery, taken at the Department of Ophthalmology, West China Hospital, China, between January 2018 and May 2023. These images were captured using a 24-million-pixel single-lens reflex autofocus digital camera (EOS M50 mark II; Canon Inc., Tokyo, Japan) with a lens-mounted flash, from a distance of 33 cm. The photography process ensures a clear corneal reflection point in each eye, crucial for diagnosing strabismus. To enhance device diversity, we constructed an outpatient-based dataset including photos from departmental colleagues, their families, and outpatients, captured between October 27, 2022, and March 11, 2023. These images were taken using various devices, including an EOS M50 mark II (Canon Inc., Tokyo, Japan), the Huawei (Huawei Inc., China) and OPPO (OPPO Inc., China) series, as well as other major mobile phone manufacturers' products. Devices are located at a distance ranging from 33 cm to 1 m from the patient, a range that was specified to the photo providers to ensure consistency. Additionally, we ensure the clarity of the corneal reflection point in each image.
Given that different types of strabismus can co-exist in one patient (for example, exotropia and vertical deviation), we conducted separate statistical analyses based on each specific strabismus types. Notably, the ocular photographs representing the same strabismus eye position were potentially subject to multiple entries within the tabulated data due to the presence of multiple co-existing strabismus types. The delineated process of image preparation is illustrated in Supplementary Material Fig. 1. The predefined exclusion criteria employed for the screening system encompassed the subsequent factors: (1) photographs featuring extraneous noise points, or reflex points obscured by occurrences of blepharoptosis, or the absence of reflex points; (2) photographs depicting indistinct or defocused reflex points. After preliminary curation of ocular position photographs, a consortium of proficient ophthalmologists meticulously evaluated the photographic quality. Prior to capturing photos, each patient underwent a rigorous "Cover Test" to elicit their strabismic condition. All photographs were accompanied by measurements from the "Hirschberg Test" for strabismus angle (°) and the "alternate prism cover test" for prism dioptres (PD, Δ). A comprehensive statistical analysis was conducted on the PD(Δ) for each photograph, and a detailed statistical description of the entire dataset was completed to precisely portray the extent of strabismus captured.
Currently, intermittent exotropia is the most prevalent form of strabismus, with increasing incidence [21]. To optimize the screening system's applicability in real-world scenarios, this study encompassed 1348 ocular position photographs featuring intermittent exotropia. The Hirschberg test is used to view binocular alignment by observing the position of the corneal reflection point relative to the pupil. To ensure uniformity and objectivity, each photo was reviewed by three veteran pediatric ophthalmologists with over ten years of experience. If any of the ophthalmologists considers the photo unclassifiable (unable to visually detect strabismus), it is removed from the dataset as shown in Supplementary Fig. 1.
To mitigate potential privacy breaches for photo proprietors during the training of the strabismus screening network, a dedicated ophthalmologist used algorithms to systematically crop the photographs within the ophthalmology department, producing images of the ocular region exclusively for training, as shown in Fig. 1.
In this study, we developed an AI platform centered on a strabismus screening module, complemented by patient management modules. The screening module utilizes two corneal light reflection photo datasets, involving photo cropping and strabismus diagnosis. To avoid the leakage of user's facial privacy, an eye region cropping network built based on a Dlib-toolkit facial feature point recognition model [22, 23] is first used to crop the photos so as to obtain the eye region images. Subsequently, we employed an algorithm to uniformly remove irrelevant areas between the eyes in the eye position photos, then seamlessly stitched and filled the selected eye regions. Finally, these processed eye region images were input into the strabismus screening network to determine whether the individual in the image exhibits symptoms of strabismus. The schematic diagram depicting the comprehensive architecture of the screening system network is illustrated in Fig. 1.
In this study, we build an eye region cropping network based on the Dlib-toolkit facial feature point recognition model [22, 23]. Firstly, we use a facial feature point detector to locate the eye region based on the location information of the detected face. This facial feature point detector is implemented based on Ensemble of Regression Trees (ERT) and Gradient Boosting Decision Tree (GBDT) [22] to extract 68 facial markers, as shown in Supplementary Fig. 2 [22]. It takes an image which contains a facial region as input and exports a group of facial feature points. Next, we take the x-axis coordinates of points 37 and 46 as the leftmost and rightmost boundaries of the final image crop, and then compare the y-coordinates of points 38, 39, 44, and 45, and take the minimum of them as the upper bound of the crop image, and compare the y-coordinates of points 41, 42, 47, and 48, and take the maximum of them as the lower bound of the crop image, and finally get the coordinates of the eye region. Eventually, the original photo is cropped to an image containing only the eye area at the coordinates obtained. Then, the algorithm removes unrelated areas between the eyes in the ocular position photographs, retaining only the space corresponding to the two-eye region. Subsequently, these cropped images are uniformly processed by filling the surrounding areas with black, ensuring consistent image resolution for input into the neural network. As a result, all photographs are processed to a resolution of 3686 * 850.
As eye position photos show varying degrees of facial tilt and differences in ambient lighting and contrast, it was imperative to enhance the model's ability to learn specific eye region features of strabismus patients and improve its generalization. To achieve this, we applied data augmentation techniques to the photos, including random rotation and adjustments to brightness and saturation. Initially, we horizontally flip the left and right eyes with a 50% probability. We then apply random rotations of up to 5% to both eyes to reflect slight angular deviations common in clinical settings, ensuring the model's training aligns with real-world scenarios. Lastly, we adjust image brightness and saturation, and perform standardization and normalization on each image. Given the substantial number of strabismus images in the training dataset, which exceeded three times the count of normal gaze images, we opted not to increase the number of strabismus images during data augmentation. Simultaneously, we augmented the number of normal gaze images in the training dataset to triple their original quantity.
The dataset, comprising 5894 images, underwent a stratified random split into training and validation sets, following an 80% to 20% ratio. Subsequently, a five-fold cross-validation was conducted. To ensure the robustness and reliability of our model, we selected the fold with the best performance as our final model. This model was then applied to an independent test set composed of 200 normal images and 100 strabismus images to evaluate model performance. Performance metrics included Accuracy, Precision, Specificity, Sensitivity, F1-Score, and the Area Under the Curve (AUC). Additionally, in this study, the Grad-CAM algorithm [24] was employed to visualize the features learned by the DL model in the form of heatmaps. The Visual Transformer (VIT_16_224) [25] served as the chosen network architecture. This model had been pretrained on the ImageNet dataset. VIT segments the input image into numerous 16 × 16 pixel patches, which are transformed into fixed-length vectors and processed by the Transformer. The model includes 12 transformer blocks, each utilizing a Multi-Head Attention module with 12 heads. Following this, an MLP module is incorporated for classification. The employed loss function, Label-Distribution-Aware Margin, addresses classification boundary issues and enhances minority class accuracy. Optimization uses the Adaptive Moment Estimation optimizer, starting with a learning rate of 0.0001, beta 1 at 0.9, beta 2 at 0.999, a fuzz factor of 1e-7, and no learning rate decay. Images are resized to 224 × 224 pixels and normalized between 0 and 1 before entering the VIT model, with a batch size of 16. Training spans 120 epochs with early stopping if validation loss doesn't decrease over 60 epochs. The state at the lowest validation loss is retained as the final model state. Post-training, the model integrates into the AI platform for strabismus screening, available as a mobile applet on devices like cell phones and iPads.