Validation of the NoSAS score for the screening of sleep-disordered breathing: a retrospective study in Egypt

Background This study was carried out to validate the NoSAS score and assess its performance in predicting clinically significant sleep-disordered breathing (SDB) in patients referred for a sleep study and to compare its performance with the recent (No-Apnea score) and the STOP-BANG questionnaire. Patients and methods This is a retrospective study of an existing database of consecutive outpatients who were referred for suspected SDB at the sleep lab of Chest Department in Alexandria Main University Hospital from October 2012 to December 2018. We enrolled patients of at least 18 years who completed a full-night polysomnography. We defined clinically significant SDB as an apnea–hypopnea index (AHI) of at least 20 events/h. We assessed the validity of the NoSAS score and compared its performance with the No-Apnea score and the STOP-BANG questionnaire. Results After the exclusion of patients who did not fulfill our inclusion criteria, 362 out of 720 patients were enrolled. Only 5% were not diagnosed with SDB (AHI<5). Moderate-severe SDB was present in 82.4% of patients. Using a threshold of at least 8 at different AHI cut-offs (5, 10, 15, 20, 25, 30), the NoSAS score showed area under the curve (AUC) similar to the STOP-BANG Questionnaire only at AHI of at least 20 (AUC 0.77), whereas at the other AHI cut-offs (5, 10, 25, 30), the STOP-BANG Questionnaire showed higher AUC. At all AHI cut-offs, the NoSAS score was superior to the No-Apnea score. Conclusion Despite its simplicity, the NoSAS score is a valuable screening tool, especially when resources are limited.


Introduction
Sleep-disordered breathing (SDB), defined by an apnea-hypopnea index (AHI) greater than five events/h, was reported to be prevalent in 9% of women and 17-31% of men in three cohort studies in the USA [1][2][3]. This prevalence was later estimated to be around 34% in men and 17% in women aged 30-70 years [4]. If untreated, it can significantly impact the quality of life, induce excessive sleepiness, and increase cardiovascular morbidity and mortality, thus representing an important public health problem [5]. Moreover, it has been estimated that in nearly 80% of individuals with SDB, the condition is undiagnosed [6]. Primary care physicians usually decide whether or not to refer patients for SDB evaluations. In-laboratory polysomnography (PSG) is the gold standard for diagnosing OSA. However, as it is time-consuming and expensive, PSG is not a suitable routine screening method. Moreover, the growing awareness of sleep apnea led to an increase in the long waiting list in sleep laboratories. To solve this problem, a number of screening questionnaires have been developed to identify patients with SDB who require further investigations [7][8][9]. SDB screening tools such as Berlin questionnaire, the Epworth sleepiness scale (ESS), and the STOP-BANG questionnaire were examined in extensive validation studies [7][8][9]. Marti-Soler et al. [10] have proposed and validated a new screening tool, the 'NoSAS' score, on the basis of five items [neck circumference (NC), obesity, snoring, age, and sex] that enable the detection of individuals at risk of SDB. The score was developed in a population-based cohort in Switzerland and was subsequently validated in a Brazilian [10] and Chinese cohorts [11,12].
The aim of this study was to validate the NoSAS score and test its performance in predicting clinically significant SDB in patients referred for a sleep study in Egypt and to compare the performance of the NoSAS score with the recent No-Apnea score [13] and the well-known STOP-BANG questionnaire.
from October 2012 to December 2018. We included patients of at least 18 years who underwent a full-night sleep study, complete clinical data and anthropometric measurements, ESS, and STOP-BANG questionnaires in the sleep laboratory. We excluded patients younger than 18 years, those who did not complete 4 h of sleep time, and those who did not complete the questionnaires. The ethics committee in our institute approved the study.
On the evening of the PSG, clinical data were collected from all patients: sex, age, weight, height, BMI, NC, and self-reported comorbidities (hypertension, diabetes mellitus, and other diseases). The information that we collected included all parts of the NoSAS score, the No-Apnea score, the ESS, and the STOP-BANG questionnaire, as well as data derived from the full PSG.
The STOP-BANG questionnaire included the following eight variables: snoring, tiredness/sleepiness, observed apnea, blood pressure, BMI, age, NC, and sex. Patients were classified as low risk when they scored less than 3 and high risk when at least 3. Sleepiness was evaluated using the ESS questionnaire, which was dichotomized into less than 11 and at least 11. The NoSAS and No-Apnea scores were subsequently calculated on the basis of available data (Appendix Tables 1 [10] and 2 [13]). The points for each variable are added for the NoSAS score, yielding a final score of 0-17 points (NC≥40 cm adds four points; BMI from 25 to 30 kg/m 2 adds three points, BMI≥30 kg/m 2 adds five points; snoring adds two points; age≥55 years adds four points, and male sex adds two points). The NoSAS score was considered positive if it was at least 8 according to Marti-Soler et al. [10]. The No-Apnea score [13] is a two-item simplified model. The sum of points for each item results in a range of 0-9 points. A cut-off of at least three is used to identify patients at high risk. The studied scores were evaluated versus the AHI derived from the PSG. We tested the performance of the studied scores at AHI≥5, ≥10, ≥15, ≥20, ≥25, and ≥30. This was followed by comparison of the performance of the NoSAS score with that of the STOP-BANG questionnaire and the No-Apnea score. The PSG records were all scored manually according to the American Academy of Sleep Medicine scoring manuals [14]. Scorers were blinded to the questionnaires. The AHI was calculated. We defined clinically significant SDB as an AHI of 20 events/h of sleep or more, according to the initial analysis of the HypnoLaus Sleep Cohort [15]. The severity of SDB was categorized as follows: mild (5≤AHI<15 events/h), moderate (15≤AHI<30 events/h), and severe (AHI≥30 events/h).

Compliance with ethical standards Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Statistics
The analysis was carried out using the IBM SPSS statistics program (version 20) (IBM Corp. Released 2011, IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY: IBM Corp). Qualitative data were presented using number and percentage. Quantitative data were tested for normality using the Kolmogorov-Smirnov test. Normally distributed data were presented using mean and SD; however, nonnormally distributed variables were presented using median and interquartile range. Comparison between groups was performed using the χ 2 -test, analysis of variance, and Kruskal-Wallis tests accordingly. Spearman correlation was determined at the 5% level of significance. To assess the discriminative performance of the screening tools, the area under the curve (AUC) of receiving operator curves (ROC) was used, and the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for a PSG-based AHI of at least 20/h; the results were statistically significant at 0.05.

Results
We retrospectively reviewed the existing database of 720 consecutive outpatients referred for suspected SDB and had completed a full-night PSG at the sleep lab of Chest Diseases Department of Alexandria Main University Hospital from October 2012 to September 2018. After the exclusion of patients who did not fulfill our inclusion and exclusion criteria, 362 patients were enrolled (Fig. 1). Using an AHI greater  than five as the cut-off for diagnosis of SDB, 343 (95%) patients were found to have SDB. The patients were classified into three severity groups on the basis of the severity of SDB determined by the AHI as shown in Table 1. Patients with moderate (25%) and severe (62%) degrees of OSA together constituted 87% of patients referred to our sleep lab for PSG. Demographic, anthropometric data, the mean values of the different scores studied for all patients, and comparison of different severity groups are shown in Table 1.
Using a threshold of 8 or more, the performance (AUC) of the NoSAS score was assessed and compared with the No-Apnea score and the STOP-BANG questionnaire at different AHI cut-offs (5,10,15,20,25,30), shown in Fig. 2. Using AHI at least 5 or 10 events/h as the standard diagnostic cut-off for SDB, the NoSAS score showed a lower AUC than the STOP-BANG questionnaire (0.73 and 0.75 for the NoSAS score versus 0.93 and 0.85 for STOP-BANG), but higher than the No-Apnea score (0.68 and 0.73, respectively). At AHI of at least 15, the NoSAS score showed better performance (AUC 0.77), but still slightly lower than the STOP-BANG Questionnaire (AUC 0.79 each). At AHI of at least 20 as the diagnostic cut-off for clinically significant SDB, the NoSAS score showed similar performance to the STOP-BANG questionnaire (AUC 0.77). At all cut-off values of AHI, the NoSAS score was superior to the No-Apnea score.  (5) 13 (2) 13 (2) <0.001* Qualitative data were presented as n (%), quantitative data were presented as mean±SD if normally distributed, and comparison was performed using analysis of variance, data were presented using median (IQR) if not normally distributed and compared using the Kruskal-Wallis test. AHI, apnea-hypopnea index; ESS, Epworth sleepiness scale; IQR, interquartile range; NC, neck circumference; SDB, sleep-disordered breathing. * P≤0.05, significant.
As clinically significant SDB was defined as AHI of at least 20, we assessed the performance of the NoSAS score at this level in detail (performance parameters included AUC, the sensitivity, specificity, PPV, and NPV) and compared it with the other two scores ( Table 2). The STOP-BANG questionnaire showed the highest sensitivity, but least specificity. All methods had a high sensitivity and NPV, but a relatively low specificity and PPV. No statistically significant difference was found between AUC of all studied scores (P>0.05). The NoSAS score showed the highest diagnostic odds ratio (8.23). It is worth noting that on changing the diagnostic threshold of the NoSAS score from at least 8 to at least 9, the sensitivity was 90%, the specificity increased to 51%, AUC was 0.77, PPV was 32%, and NPV was 99.2%. Figure 3 summarizes the ROC curves of the three scores studied in diagnosing clinically significant SBB at AHI of at least 20. The AUC of the STOP-BANG questionnaire and the NoSAS score were nearly similar (0.77). AUC for the No-Apnea score was 0.74. We found a significant correlation between all studied scores and severity of OSA as indexed by AHI (Table 3).

Discussion
In this study, after reviewing the existing clinical data of 720 consecutive patients, we enrolled 362 participants Performance parameters include the area under curve (AUC) of receiving operator curves (ROC), the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) calculated for apnea-hypopnea index≥20/h. *P>0.05, no statistically significant difference between areas under the curve of all studied scores.

Figure 3
Receiver operating characteristic (ROC) curves for the performance of the studied scores in diagnosing clinically significant sleep-disordered breathing at apnea-hypopnea index of at least 20. who fulfilled our inclusion criteria. We attribute the finding that the majority of our patients suffered from severe OSA to being a tertiary referral hospital. The increased prevalence of SDB among the patients studied is expected as these patients were referred to the sleep lab. Duarte et al. [13] found similar prevalence rates of OSA in their studied cohorts (the prevalence of patients with AHI≥5 events/h was 77.9%).
Using different AHI cut-offs, the NoSAS score showed AUC similar to the STOP-BANG Questionnaire only at AHI of at least 20 (AUC 0.77), whereas at other AHI cut-offs, the STOP-BANG Questionnaire showed higher AUC. At all AHI cut-offs, the NoSAS score was superior to the No-Apnea score. We assessed all performance parameters at the AHI of at least 20 cut-off. The STOP-BANG questionnaire showed the highest sensitivity, but least specificity. There was no statistically significant difference between AUC of all the scores studied (P>0.05). The NoSAS score showed the highest diagnostic odds ratio (8.23). The high diagnostic accuracy could be attributed to the fact that the score relied mainly on objective items.
Our results on the performance of the NoSAS score in identifying clinically significant SDB [AUC 0.77; 95% confidence interval (CI): 0.709-0.831] are in agreement with those of the Marti-Soler et al. study [10], which showed that the NoSAS score identified patients with clinically significant SDB, with an AUC of 0.74 (95% CI: 0.72-0.76). In contrast to our findings, they [10] reported significantly better performance of the NoSAS score than the STOP-BANG questionnaire (AUC 0.67; 95% CI: 0.65-0.69; P<0.0001), whereas our study showed that the performance of the NoSAS score was similar to STOP-BANG, but not superior; the fact that the STOP-BANG questionnaire relies on subjective items may yield different results in different studies. Moreover, we interpreted the score to the patients and reported their answers to the score questions. In agreement with our findings, a multiethnic Asian study reported that the NoSAS score performed similar to the STOP-BANG questionnaire, but not superior [11].
Two recent Chinese studies [12,16] compared the predictive value of the NoSAS score with other screening tools (ESS, the STOP-BANG questionnaire, and the Berlin questionnaire). The results of the first one [12] were in partial agreement with ours. They stated that using AHI of at least 5 as the cut-off for diagnosing SDB, the NoSAS score showed AUC=0.734, which is similar to our findings at this AHI cut-off, but the study reported that at this cut-off, the NoSAS was superior to the STOP-BANG questionnaire, which was not the case in this study; this variation may be because of the inclusion of subjective items in the STOP-BANG questionnaire. The results of the second Chinese study were almost in agreement with ours as the STOP-BANG questionnaire was superior to the NoSAS score for AUC at AHI of at least 5 or 10 events/h, but the NoSAS score was superior (AUC=0.707 at AHI cut-off ≥15 or 20 events/h) [16].
Guichard et al. [17] assessed the NoSAS score efficacy in identifying clinically significant SDB among participants with major depressive disorders (MDE). This study showed that the NoSAS score detected SDB in MDE participants with a sensitivity of 0.79, a specificity of 0.66, an NPV of 0.91, and a PPV of 0.41. The AUC ROC curve was 0.72 for NoSAS and 0.66 for STOP-BANG; the difference in patient characteristics may cause bias. In this study, using the same cut-off (AHI≥20), the NoSAS score showed AUC 0.77, with higher sensitivity (94%), but lower specificity (34%), NPV (0.94), and PPV (0.32). The authors attributed the excellent performance of the NoSAS score among this specific population to the fact that the score relied on few items including only one 'subjective' item, making it suitable for patients suffering from major depression, with whom verbal contact is difficult [17].
This study showed low specificity of all the screening tools studied (34, 17, and 40% for the NoSAS, STOP-BANG, and the No-Apnea score, respectively). In our opinion, screening a disease such as SDB that have serious consequences [18,19] if undiagnosed requires using a screening tool with high sensitivity and high NPV so as not to miss true positive cases. Nevertheless, we found that using higher threshold (≥9 points) improves the specificity of the NoSAS score (51%) without affecting the sensitivity significantly (90%). Hence, we recommend using a threshold of 9 points instead of 8 to increase the specificity of this score without markedly affecting its sensitivity. STOP-BANG had the highest sensitivity, but the least specificity; similar findings were reported by a study comparing STOP-BANG with four other screening scores [20].
The No-Apnea score, despite its simplicity, showed good performance, AUC: 0.74 for clinically significant SDB (95% CI: 0.674-0.810), which is insignificantly different compared with that of the NoSAS score and the STOP-BANG questionnaire (AUC 0.77). The No-Apnea score was derived and validated in a recent study in which it did not show a significant difference compared with the STOP-BANG and the NoSAS score using AHI cut-offs ≥5, ≥15, and ≥30, as the No-Apnea showed AUCs: 0.781 (95% CI: 0.757-0.805), 0.752 (95% CI: 0.731-0.773), and 0.752 (95% CI: 0.730-0.773), respectively [13]. This study showed that using these cut-off levels, AUC of the No-Apnea score was 0.68, 0.75, and 0.67, respectively. We recommend further studies to confirm the good performance of this simple score.
It is evident from these results that to detect mild degrees of SDB, it is better to use the traditional STOP-BANG questionnaire, whereas to detect clinically significant SDB, the three screening tools studied have nearly similar performance (the No-Apnea score showed a slightly lower AUC ROC curve). Being in a country with limited resources, patients with clinically significant SDB in whom associated comorbidities represent a big problem are our target population [21] as they will consume great expenses due to the consequences of untreated sleep disordered breathing. If 2 scores show same performance meaning area under the ROC curve and sensitivity and specificity, we choose the more simple score with fewer items. Furthermore, recent studies [22,23] have reported that a high NoSAS score was associated with increased arterial stiffness and reduced renal function in a large cohort of healthy individuals.
It is worth noting that studies of screening tools for SDB used on specific populations cannot be generalized. In a high-risk pregnancy, Berlin and STOP-BANG questionnaires proved to be of limited usefulness in the first trimester in a previous study [24], but their predictive values were acceptable as the pregnancy progressed. Another study on cardiovascular diseases patients showed that the STOP-BANG and Berlin questionnaires failed to differentiate between patients with cardiovascular diseases and coexistence or absence of SDB [25]. Similar results were obtained by a study carried out in patients after acute or subacute stroke [26]. However, the NoSAS score proved to be superior to the STOP-BANG questionnaire in detecting SDB in patients with MDE [17]. Another example that indicates the risk of generalization of a screening tool without validation in the target patient population is the wide variation in the sensitivity of the Berlin questionnaire, whose sensitivity was 86% in primary care patients [27] and 57-68% in sleep laboratory patients [7].
This study has several points of strengths; to our knowledge, no one has studied the NoSAS score or the No-Apnea score among the Egyptian patients, unlike the STOP-BANG questionnaire, which was previously studied in Egypt [28]. In this study, in addition to investigating the ability of the NoSAS score to detect clinically significant SDB, we assessed its performance (AUC) at different AHI cut-offs 5, 15, and 30. In-laboratory PSG (the gold diagnostic tool) was used to evaluate the accuracy of different scores. The staff manually scoring the PSG was blinded to the scores used in the study. Nevertheless, the present study showed limitations: patient selection was based on patients referred for a sleep study; therefore, the results cannot be generalized. Finally, it is important to note that to calculate PPV and NPV, we needed to know the actual prevalence of SDB in our population, and as we lack these epidemiological data, we used prevalence rates reported in the USA by large cohort studies [2][3][4]6]. The need for different ethnic national epidemiological studies to find the prevalence of SDB in different populations is of great importance.

Conclusion
The NoSAS score is a valuable screening tool despite its simplicity, especially when resources are limited. We recommend its use instead of the STOP-BANG questionnaire to detect clinically significant SDB. We also recommend further testing of threshold of at least 9 rather than of at least 8. Despite the fact that we cannot generalize our results, and considering that PSG remains the only gold diagnostic tool for SDB, the NoSAS score can help primary care physicians in their decision to refer patients to sleep labs. The NoSAS score may also help to prioritize patients in the long waiting lists. Future studies in different countries and different clinical populations are crucial before wider use of this screening tool. The points for each variable are added, totaling a final score of 0-9 points. The points for each variable are added, totaling a final score of 0-17 points.