The process of assessing and allocating nursing staff, as well as evaluating performance, relies heavily on nursing workload, which is strongly associated with patient safety outcomes. Nevertheless, most previous studies have utilized cross-sectional data collection methods, which limit the precision of workload prediction. Static workload models do not incorporate longitudinal changes in influential factors, potentially resulting in delayed or erroneous nursing management decisions and ultimately causing imbalances in nurses’ workload.
Aim
To employ machine learning algorithms to facilitate the dynamic prediction of nursing workload on the basis of patient characteristics.
Methods
This prospective cohort quantitative study was conducted between March 2019 and August 2021 in two general hospitals located in China. Data on the characteristics of 133 patients over the course of 1339 hospital days, as well as direct nursing time, were collected. A longitudinal investigation of nursing workload was carried out, applying multiple linear regression to identify measurable factors that significantly impact nursing workload. Additionally, machine learning methods were applied to dynamically predict the nursing time needed for each patient.
Results
The mean direct nursing workload varied greatly across hospitalizations. Significant factors contributing to increased care needs included complications, comorbidities, body mass index (BMI), income, history of past illness, simple clinical score (SCS), and activities of daily living (ADL). The predictive performance improved through machine learning, with the random forest model demonstrated the best performance (root mean square error (RMSE): 1148.38; coefficient of determination (R2): 0.74; mean square error (MSE): 1318744.64).
Conclusions
The variability in nursing workload during hospitalization is influenced primarily by patient self-care capacity, complications, and comorbidities. The random forest algorithm, a machine learning algorithm, effectively handles a wide range of features, such as patient characteristics, complications, comorbidities, and other factors. This algorithm has demonstrated good performance in predicting workload.
Implications for nursing management
This study introduces a quantitative model designed to evaluate nursing workload throughout the duration of hospitalization. By employing the model, nursing managers can consider multiple factors that impact workload comprehensively, resulting in enhanced comprehension and interpretation of workload variations. Through the application of a random forest algorithm for workload prediction, nursing managers can anticipate and estimate workload in a proactive and precise manner, thereby facilitating more efficient human resource planning.
Hinweise
Yulei Song and Xueqing Zhang these authors contributed equally to this work and should be considered co-first authors.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Introduction
The demand for medical and nursing care has significantly increased as a result of an ageing population, rising costs, and labour shortages [1]. Given that nurses constitute a significant portion of healthcare expenditures [2], maintaining optimal nurse staffing that aligns with quality care standards and enhances patient outcomes is crucial. The assignment of nurses is influenced by the recognized significance of managing nursing workload, which has been shown to impact patient outcomes [3] and is closely associated with staff engagement and performance [4]. Previous studies have examined nursing workload through various approaches, such as the nurse-to-patient ratio [5], the daily hours dedicated to patient care [6], and the classification of nursing tasks on the basis of patient complexity [7]. However, few studies have comprehensively measured longitudinal patient care workload from admission to discharge. This research gap could result in inaccurate measurements of care workload, an unbalanced distribution of workload, and delays in decision-making by managers.
A substantial body of research spanning nearly ten years has clarified a diverse array of characteristic factors that impact nursing workload, with patient-related factors being widely regarded as the most influential. Romano et al. [8] posited that the type of admission can contribute to the nursing workload for critically ill patients. Furthermore, Oostveen’s findings [9] revealed that medications, complications, medical specialties, comorbidities, surgical procedures, and hospital stay lengths significantly affect the provision of nursing care. Seago et al. [10] reported that an increase in patient days, a higher case mix index, and higher technology scores may significantly impact workload. Additionally, increases in patient admissions, discharges, and transfers have been shown to intensify nursing workload and contribute to unstable work environments [11]. Previous studies have extensively investigated factors that influence workload to develop static workload models. However, importantly, the nursing time required for patients during their entire hospitalization period can vary on the basis of factors such as condition, diagnosis, and assigned tasks. This time requirement is complex, dynamic, and diverse, making it difficult to measure and capture through cross-sectional surveys alone.
Anzeige
To forecast workload, one approach involves quantifying the influence of patient characteristics or other measurable factors on workload. Currently, most studies use regression modelling to predict workload. A team of American researchers employed linear regression modelling analysis to explore correlations between nurses’ perceived workload and real-time Emergency Severity Index (ESI) data obtained from the Patient Tracking Computer System (PTCS) in the emergency department. The resulting fitted model yielded coefficient of determination (R2) values ranging from 60 to 64%, and patient acuity derived from the Emergency Severity Index (ESI) was the strongest predictor of the workload score (WLS) (r = 0.7991) [12]. Oetelaar et al. used linear mixed-effects modelling to examine relationships between patient characteristics and length of care. Nine patient characteristics, including “partial assistance bathing, mobilization,” “full assistance bathing, mobilization, care for incontinent patients,” “full assistance with meals, drip feed, TPN,” “two or more IV/drip/drain,” “psychosocial support,” “extensive wound care, fistula, VAC bandages,” “new stoma,” “isolation measures” or “one-on-one care” and nursing proficiency, were significantly related to the nursing workload, with these variables explaining 36% of the variance [13]. However, the indicators of workload in these two studies were not comprehensive. In other methods, workload can be predicted through patient classification systems. The RAFAELA system in Finland is a combination of the Oulu patient classification instrument, a system for recording daily nursing time, and the Professional Assessment of Optimal Nursing Care Intensity Level questionnaire to measure nursing intensity [14]. Hoi et al. [15] developed a workload intensity measurement system (WIMS) by defining 28 nursing diagnoses to conduct a work sampling study on nurse activities, determining important nursing diagnoses for each ward and the daily nursing time required for each diagnosis. However, these fixed quantitative methods do not provide accurate workload predictions, as they do not consider dynamic predictors such as patient characteristics and changes, nor do they account for variations in nursing workflow throughout the stay.
Nursing workload forecasting involves a numerous influencing factors with complex correlations, which are often not simple linear relationships [16]. In contrast to linear regression models, machine learning techniques such as decision trees (DTs) and random forest (RF) models can capture nonlinear relationships and complex patterns. DTs use a tree-like structure through feature selection and partitioning. This approach is intuitive, easy to understand, capable of handling multiple data types, has strong interpretability, and can capture nonlinear relationships [17]. An RF model is composed of multiple decision trees, which have higher accuracy and strong resistance to overfitting [18]. Extreme gradient boosting (XGBoost) is an advanced gradient boosting tree algorithm with efficient optimization algorithms and powerful regularization techniques that is capable of automatically learning complex relationships [19]. Through meticulous hyperparameter tuning and cross-validation, machine learning model configurations can be optimized to improve accuracy and reduce overfitting, leading to better workload prediction results. The integration of machine learning models into care management has demonstrated significant promise and has already been implemented in various applications, such as the prediction of surgery durations and the estimation of disease occurrence probabilities. Jiao et al. extracted variables such as patient age, physical status, scheduled duration, procedure name, and surgical diagnosis and predicted the continuous probability distribution of the duration of surgical cases via mixed density networks (MDNs), DTs, RFs, gradient boosted decision trees (GBTs), and Bayesian statistical methods. The results revealed that MDNs achieved the best performance, and the scheduled duration and procedure name were the most important factors [20]. Chou et al. used a neural network model to predict diabetes on the basis of eight different characteristics, namely, pregnancy history, blood glucose level, diastolic blood pressure, sebaceous thickness, insulin level, body mass index, diabetes spectrum functioning, and age. The results revealed that the two types of augmented decision trees performed the best [21]. However, the application of machine learning in predicting nursing workload remains unexplored. Therefore, investigating the potential of machine learning in dynamically predicting nursing workload is warranted.
Most static workload models cannot incorporate longitudinal variations in influencing factors, which may potentially result in delayed or erroneous nursing management decisions and consequent imbalances in nurses’ workload. In this study, we initially assessed workload requirements for various hospitalization durations in the gastroenterology ward, considering fluctuations in patient-related influencing factors. We subsequently applied machine learning algorithms to dynamically forecast the necessary nursing workload for patients. The objective of this study was to assess the correlations between patient characteristics and the amount of direct nursing time, which was used as a proxy for nursing workload, in gastroenterology wards. By gathering data from longitudinal surveys, we aimed to identify measurable factors that significantly influence nursing workload. Ultimately, our goal was to select a suitable machine learning model and use three common machine learning techniques, DT, RF, and XGBoost, to build a prediction model that can dynamically predict the nursing workload for individual patients.
Methods
Design
This prospective cohort study was conducted from March 2019 to August 2021 in two general hospitals in China. First, the factors affecting workload and the workload survey tool were identified. Data on daily nursing workload and factors influencing patients during hospitalization were collected longitudinally. Finally, a workload prediction model was constructed on the basis of influencing factors through various machine learning algorithms. The aim was to comprehensively understand the factors influencing the nursing workload of patients in the digestive ward and assess the feasibility of workload prediction.
Anzeige
Setting and participants
This study was conducted in two general hospitals in China: one of the hospitals has two gastroenterology wards, and the study included a total of three gastroenterology wards (ranging from 20 to 30 beds). The bed utilization rate was approximately 80%, and the length of hospital stay ranged from 3 to 25 days.
A cluster sampling method was employed, and the sample included all patients admitted to the digestive ward from July to August 2020 during the study period. The inclusion criteria were as follows: [1] patients who provided informed consent and [2] patients who were newly admitted from the gastroenterology ward. The exclusion criteria were as follows: [1] patients who voluntarily withdraw; [2] patients who were transferred to another department or hospital; and [3] patients who were readmitted.
Data collection tools
Nursing time measurement questionnaire
The purpose of this study was to observe the correlation between patient characteristics and nursing time to predict workload. According to the Nursing Intervention Classification (NIC), direct nursing intervention is a treatment conducted through interaction with patients, direct social action, and consultation, whereas indirect nursing intervention refers to treatments carried out externally to the patient [22]. Considering that direct nursing time is more closely related to patient characteristics and provides more predictive accuracy, this study focuses only measuring direct nursing time. Direct nursing care in this study is defined as tasks related to patient interaction, including medication management, patient hygiene, nutrition, and excretion. Activities related to unit management, such as ordering supplies and attending meetings, are not within the scope of this survey. In this study, direct nursing time includes the preparation, implementation, documentation, and final disposal steps associated with direct nursing interventions.
Initially, the Nursing Time Measurement Questionnaire referenced the International Classification for Nursing Practice (ICNP), specifically, the Nursing Interventions section, to develop a comprehensive list of nursing activities [23]. This list was then refined to include all the nursing programs pertinent to gastroenterology. A preliminary one-week survey was subsequently conducted in gastroenterology wards. On the basis of the actual clinical circumstances observed during this presurvey, any overlooked programs were incorporated, while superfluous programs were excluded. The final nursing time measurement questionnaire included 6 categories, basic care, therapeutic care, specialized care, condition observation, health education and Chinese medicine care, with a total of 116 nursing activity names, frequencies and durations, which were used to observe the time of direct care to gastroenterology ward patients on a daily basis.
Initially, 15 factors influencing nursing workload, including age, comorbidities, and severity of illness, were incorporated, drawing on studies by Oostveen [9] and Bai et al. [24]. A panel of 10 gastroenterology care management experts was subsequently convened to evaluate the measurability and significance of each determinant via a Likert scale ranging from 0 (very low) to 10 (very high). Factors were retained if a minimum of 80% of the experts deemed them both measurable and significant. Following this evaluation, twelve variables were retained for consideration, namely, sex, average monthly income, marital status, educational attainment, body mass index, comorbidities, complications, past medical history, surgical history, total duration of hospital stay, self-care ability, and severity of illness. Conversely, two factors–type of admission and type of discharge–were excluded because of their negligible impact on the nursing workload, especially considering that patients with gastrointestinal diseases are typically admitted on an outpatient basis and subsequently discharged to home postrecovery. In addition, age was removed since it is already incorporated into the SCS scale assessment.
We assessed patient self-care via the Activities of Daily Living (ADL) scale [25], a 10-item scale that quantitatively measures an individual’s level of functional independence with respect to various ADLs by assigning weights to each of 10 functions, such as personal care, mobility, transfers, bathing, and feeding. The cumulative score ranges from 0, signifying total dependence, to 100, indicating complete independence. This tool, which has been validated and implemented for Chinese patients [26], was employed in our assessment.
We used the simple clinical score (SCS) [27] to assess patients’ disease severity, including 14 clinical indicators such as age, temperature, systolic blood pressure, respiration, and pulse. The score ranges from 0 to 33, with higher scores indicating more severe disease and poorer prognostic outcomes. Song et al. confirmed the reliability of its use with Chinese patients [28].
Data collection
A time-and-motion study was conducted to evaluate nursing workload. Each hospital established an observation study team to train investigators, clarify the purpose and methodology of the study, and define the specific scope of care activities. On the basis of the patients’ inclusion criteria, trained observers recorded data through questionnaires and maintained continuous observation until patients were discharged, transferred, or deceased. The research team employed the Nursing Workload Influencing Factors Questionnaire as a tool to document the sociodemographic characteristics and physical condition of patients. Additionally, the ADL and SCS scale scores were utilized to capture data regarding patients’ self-care capabilities and illness severity. The aforementioned questionnaire was updated on daily, with any changes recorded promptly. The duration of each nursing activity was measured via the Nursing Time Measurement Questionnaire. The time spent on nursing tasks commenced with the preparation of the item and persisted until its final disposal. Any missing data were retrieved from medical and nursing records or obtained directly from the patients during their hospital stay. To guarantee the survey’s quality, each ward was furnished with a quality supervisor responsible for verifying the completeness and coherence of the questionnaire, addressing any issues promptly, and ensuring its overall quality.
Statistical analysis
The general information and multiple linear regression analyses were conducted via IBM Statistical Product and Service Solutions (IBM SPSS) software (version 26.0; SPSS Inc.). Continuous variables, including nursing hours and patient age, were characterized using means and standard deviations. Conversely, categorical variables, such as sex and income, were depicted using percentages. Analysis of variance (ANOVA) was employed to compare nursing hours across various stages of hospitalization, whereas multiple linear regression analysis was used for the preliminary screening of predictor variables. To assess the presence of multiple collinearity among the variables that remained in the model, the variance inflation factor (VIF) was employed, with VIF values of ≥ 5 indicating its presence. A two-sided alpha of 0.05 was used to determine statistical significance.
Machine learning modelling was performed via R software (version 4.2.2), which employs three widely used supervised learning models: DTs, RF, and XGBoost. The R package “classification and regression tree (CART)” was employed for further data analysis.
The construction of the DT model was facilitated by the application of the CART algorithm [29]. To enhance the performance of the model, a grid was established of the hyperparameter cp. (complexity parameter), encompassing two distinct values: 0.001 and 0.01. Through tenfold cross-validation, cp. = 0.001 was determined as the configuration that enables more precise control over the tree’s complexity, thereby augmenting the predictive efficacy of the model.
The RF algorithm, a robust nonparametric supervised learning methodology, was examined in this study, with a focus on hyperparameter tuning [30]. The optimal number of features per tree (mtry) was investigated, with potential values of 3, 5, 7, 9, and 11. Through a tenfold cross-validation process, the optimal value was mtry = 7, indicating that seven features are incorporated in the construction of each tree. This selection is expected to increase the accuracy and stability of the model.
The XGBoost algorithm, which optimizes model performance by sequentially developing multiple weak learners (utilizing CART trees) [31], was also employed in this study. The impact of various hyperparameters on the model performance was investigated, including eta (learning rate), max_depth (maximum depth of the tree), and nrounds (number of estimators). The potential values were as follows: 0.001, 0.01, and 0.1 for eta; 1, 3, and 5 for max_depth; and 100 and 500 for nrounds. By utilizing a tenfold cross-validation method, the optimal hyperparameter combination was determined to be nrounds = 500, max_depth = 3, and eta = 0.1. This configuration was selected because it effectively balances model complexity and ensures the model’s ability to learn relevant information.
Anzeige
The predictive efficacy of the model was assessed via the mean square error (MSE), root mean square error (RMSE), and R2 [32]. The MSE, which accounts for both the prediction accuracy and bias, is inversely proportional to the predictive ability of the model; a smaller MSE value signifies superior predictive ability. The RMSE, on the other hand, signifies the discrepancy between the predicted and actual values, with a smaller RMSE indicating enhanced model prediction performance. The comparative fit of various models to the identical dataset was evaluated via R², which ranges from 0 to 1. The model’s fit to the data improves as the R² value approaches 1.
Results
Three general gastroenterology wards in two tertiary hospitals in China were investigated. The sample consisted of 133 participants (79 adults and 54 elderly individuals), predominantly male (59.4%), with a mean age of 58.71 (± 14.09) years. Of these participants, 93.23% were married. Among them, 50 patients (37.59%) had concurrent chronic diseases, 85 patients (63.91%) had a previous medical history, 62 patients (46.62%) underwent surgery, and 14 (10.53%) patients experienced complications during hospitalization (Table 1). Among the 1139 hospitalization days recorded for these 133 patients, the average ADL score was 93.83 (± 16.53), and the SCS score was 2.40 (± 2.17).
Table 1
Demographic characteristics of the participants (n = 133)
Variable
Frequency
Percent
Age Mean (SD)
58.71 (14.09)
ADL Mean (SD)
93.83 (16.53)
SCS Mean (SD)
2.40 (2.17)
Sex
Male
79
59.40
Female
54
40.60
Marital Status
Single or divorced
9
6.77
Married
124
93.23
Average monthly income (in RMB)
Less than 1000
4
3.00
1000–1999
5
3.76
2000–2999
46
34.59
3000–4999
56
42.11
More than 5000
22
16.54
Education level
Illiterate
0
0.00
Primary or lower secondary education
54
40.60
Junior high school or college degree
67
50.38
University degree or above
12
9.02
BMI
Less than 18.5
6
4.51
18.5–23.9
15
11.28
24–27
59
44.36
28–32
43
32.33
More than 32
10
7.52
Comorbidities
Yes
50
37.59
No
83
62.41
Complications
Yes
14
10.53
No
119
89.47
History of past illness
Yes
85
63.91
No
48
36.09
Surgical intervention
Yes
62
46.62
No
71
53.38
A total of 1139 hospital days were examined in this study, and the average patient stay was 8.56 ± 2.53 days. The analysis of variance results revealed that there was a statistically significant difference (p = 0.000, F = 142) in the required direct nursing time among patients with different hospital stays. Specifically, patients with hospital stays of 8–10 days had the longest mean direct nursing time (61.54 ± 33.05) minutes, whereas those on the day of discharge received the shortest mean direct nursing time (31.22 ± 26.28) minutes, as illustrated in Table 2. Furthermore, Fig. 1 depicts the trend in average direct nursing time across different hospital stays.
Anzeige
Table 2
Distribution of direct nursing hours on different hospitalization days
Days of hospital stay
Person/day (n)
Direct nursing hour range (min)
Direct nursing hours (
\(\overline {X}\) ± S)
Date of admission
133
11.70-235.50
47.35 ± 30.86
Admission 2–4 days
392
12.30-240.83
54.55 ± 33.83
Admission 5–7 days
310
11.98-306.18
55.20 ± 36.16
Admission 8–10 days
135
13.3-241.40
61.54 ± 33.05
Admission 11–15 days
36
12.01–304.50
56.50 ± 48.31
Date of discharge
133
10.30–166.00
31.22 ± 26.28
F Value
142.00
P Value
0.00*
Fig. 1
Trends in mean direct nursing time across hospital days
×
Categorical variables were converted to dummy variables prior to being entered into the multiple linear regression model. As indicated in Table 3, several factors, including BMI, economic income, history of past illness, complications, comorbidities, SCS scores, and ADL scores, were significantly associated with nursing workload (P < 0.005). These aforementioned indicators collectively accounted for 36.7% of the variability observed in total nursing workload. By examining the standardized regression coefficients, it is evident that ADL scores exerted the most substantial influence on the allocation of direct nursing time, followed by complications, comorbidities and a history of past illness.
Table 3
Linear regression analysis of the factors associated with the nursing workload during hospitalization
Variables
Standardized Coefficients
Beta
Standard error
CI 95.0%
t
Sig.
VIF
Adjusted R Square
0.367
BMI
-0.075
17.059
-81.912 - -14.969
-2.840
0.005
1.249
Economic income
0.129
60.594
170.820–408.600
4.781
0.000
1.285
History of past illness
0.191
126.992
588.460-1086.493
6.596
0.000
1.494
Complications
0.284
124.262
973.211-1460.840
9.794
0.000
1.489
Comorbidities
0.260
281.275
104.493-1208.254
2.334
0.000
1.185
SCS score
0.115
33.772
43.581-176.107
3.252
0.001
2.217
ADL score
-0.355
4.005
-53.485 - -36.768
-11.143
0.000
1.805
Anzeige
Seven statistically significant factors identified through multiple linear regression analysis were included as predictors in machine learning prediction to enhance predictive performance. The R language was utilized to randomly assign 80% of the patients to the training set, whereas the remaining 20% were assigned to the test set. Among the three models used, the RF model achieved the highest performance, with an RMSE of 1148.38 and an MSE of 1318774.64, both of which were the smallest, indicating good predictive ability. The R2 value was 0.74, representing the highest degree of fit to the data among the three models. The performance parameters of each model are shown in Table 4. Figure 2 shows the distribution of the data, with the x-axis representing the true values and the y-axis representing the predicted values. The more concentrated around the diagonal the scatter points are, the better the prediction performance. The figure clearly shows that the RF and XGBoost methods exhibited better prediction performance than the DT method did.
Table 4
Model performance
Model
RMSE
R2
MSE
DT
1366.49
0.63
1867304.93
RF
1148.38
0.74
1318774.64
XGBoost
1175.55
0.73
1381909.64
Abbreviations: DT, decision tree; RF, random forest; XGBoost, extreme gradient boosting; RMSE, root mean square error; R2, coefficient of determination; MSE, mean square error
Fig. 2
Machine learning test set prediction results. Abbreviations: R2, coefficient of determination
×
Discussion
The current study revealed a statistically significant variation in the average number of direct care hours across various stages of the 1139 inpatient days (p < 0.05). This discrepancy can be attributed to variances in the nursing care workflow among patients with disease regression at different stages of their hospital stay. For example, upon admission, nurses dedicate a substantial amount of time to documentation and care coordination, whereas on the day of discharge, the nursing workload diminishes owing to reduced treatment requirements for the patient. Therefore, longitudinal tracking of nursing hours can truly reflect the clinical work and actual needs of patients and improve the accuracy of prediction.
A linear regression model was developed to explain the need for care on the basis of patient characteristics across hospitalization days. Significant factors contributing to increased care needs included complications, comorbidities, BMI, income, history of past illness, SCS score, and ADL score. However, no statistically significant correlations were observed between nursing demand and patients’ sex, marital status, educational attainment, or surgical intervention. This lack of association may stem from the limited number of unmarried patients and the distributions of sex, educational attainment and surgical intervention within the sample. This concentration is likely specific to gastroenterology wards, where such trends are common.
The most important factor related to nursing workload, according to this study, is the ADL score. A reduced patient self-care capacity directly leads to an increase in all types of nursing time and longer nursing observation times. A previous study presented a comparison of the variance in nursing workload explained by the ICF Core Set categories with the ADL score, and the results showed that the ADL score can explain the variance in the nursing workload of patients with neurological, cardiopulmonary, and musculoskeletal conditions [33]. Although the classification methods used for patients differed between the two studies, the predictive value of the ADL score was verified in both studies. As such, the ADL score may emerge as the preferred factor for efficiently predicting workload, given its greater flexibility and ease of application.
The second most important factor related to nursing workload, complications, has also been confirmed in other studies. The influence of complications on the demand for care primarily stems from the time required for diagnostic observation or nursing interventions, such as surgical procedures to address complications. The findings in a study conducted by Gijsen et al. [34] align with those in the present study, suggesting that patient complications can significantly affect the requirements for patient care. Additionally, the presence of multiple comorbidities during hospitalization has a substantial effect on the nursing workload. This numerical value is likely to be correlated with nursing evaluations and medication administration. For example, patients frequently present with comorbidities such as hypertension and diabetes, necessitating an increased frequency of measurements throughout the day. The patient’s income, medical history, and SCS score also have a certain impact on nursing workload. Although there is a lack of comparable evidence, this limited influence may be attributed to the fact that medical history and patients’ income are associated primarily with heightened treatment and medication utilization, consequently increasing the demand for care. Furthermore, previous studies [35, 36] have shown that the severity of illness represented by the SCS score undoubtedly increases nursing time, as disease severity leads to increased treatment complexity and length of hospitalization, thereby increasing nursing workload. The results of a study by Myny et al. [33] indicate that a BMI exceeding 30 is a factor affecting workload, whereas Huang et al. [38] clearly found that an increase in the number of obese patients and the severity of their obesity require more nursing time. Oetelaar et al. [13] suggested that even if a patient’s BMI is normal, additional care time may, such as extended bath time, due to abnormal body proportions. The lower the BMI in this study was, the longer the duration of nursing care, which is inconsistent with the results of other studies. This discrepancy may be related to malnutrition, weakened immunity, increased risk of various infections and pressure ulcers in patients with low BMIs, and nursing staff may need to spend more time on pressure ulcer prevention and wound care. In addition, only 133 patients were included in this study. Therefore, the inconsistent results may also be due to the small sample size, resulting in certain biases in the results.
The findings also revealed that machine learning models have the potential to achieve comparable or superior performance compared with that of linear regression. Multiple linear regression is deemed inadequate for dynamic prediction, as it has a limited explanatory power of only 36.7%. This limitation arises from the assumption of linearity between the dependent and independent variables within this regression type. Consequently, we conducted a comparative analysis of diverse machine learning techniques, including RFs and DTs, with the objective of capturing nonlinear associations between features and outcomes that may contribute to prediction accuracy. RF yielded the best results, with an RMSE of 1148.38, an R2 of 0.74, and an MSE of 1318744.64. Key predictive features of the model included the ADL score, complications, comorbidities, income, SCS score, history of past illness and BMI. Rodney et al. applied machine learning algorithms to predict whether patients’ surgery would end at 5 pm and whether patients would be discharged at 7 pm; the balanced bagging classifier performed best [39]. Jia et al. identified patients suitable for outpatient arthroplasty by applying machine learning models, and the best performing classifier was a balanced RF [40]. Machine learning models perform well in predicting various types of health data. However, more research is needed in the future to verify and improve the accuracy of machine learning in predicting workload.
Relevance to clinical practice
The findings of this study suggest that patient characteristics are the primary determinants of nurses’ workload and that these characteristics continuously change as the duration of hospitalization progresses. To gain a comprehensive understanding of the fluctuating patterns in workload during hospitalization, it is imperative to thoroughly examine potential influencing factors. Additionally, quantifying nursing workload enables nursing managers to acquire more precise and dependable estimates of workload. Finally, nursing managers can enhance their comprehension of future workloads by employing machine learning techniques such as RFs, which consistently incorporate new data and revise their models. Consequently, clinical nursing teams can effectively optimize workflows and allocate resources, thereby adapting to evolving patient requirements and medical settings.
Conclusions
The variation in nursing workload during hospitalization is influenced primarily by patient self-care capacity, complications, and comorbidities. While previous studies have identified factors impacting nursing workload, this study contributes to the literature by examining workload trends over time and applying RFs for workload prediction. The findings of this study can be used to assist nursing managers in effectively allocating workload and staffing. Furthermore, considering the model’s applicability, it should be continuously updated and improved on the basis of changes in the clinical department’s medical environment and patient characteristics.
Implications for nursing management
This study introduces a quantitative model for assessing nursing workload throughout the duration of hospitalization. This framework allows nursing managers to relatively comprehensively account for diverse factors that impact workload, resulting in a deeper comprehension and interpretation of workload variations. By employing an RF algorithm for workload prediction, nursing managers can anticipate workload levels with proactive precision, facilitating more efficient human resource planning. This approach can provide objective data support for manpower allocation, thereby improving the efficiency and quality of care delivery.
Limitations
Our study was conducted in a gastroenterology ward, making it uncertain whether the study results can be readily applied to different settings, such as emergency or ICU wards. This study focuses on the effects of patient factors on care time and does not consider other known factors that influence nursing time, such as unit-related characteristics or ward team factors. In addition, the actual duration spent on nursing activities is recorded here as the nursing workload, but it is possible that, subjectively or even objectively, there was too little time to carry out care in a quality manner.
Acknowledgements
All authors express gratitude for the support of the patients, nurses, and study assistants involved in the study.
Declarations
Ethics approval and consent to participate
Ethical approval for this study was granted by the Institutional Review Board (IRB) of Nanjing Hospital of Chinese Medicine, which is located in Nanjing City. The specific approval for this research is documented under the reference number KY2020172. Informed consent was obtained from all participating nurses and patients by the research team before commencing ward observations. As this study did not involve any therapeutic intervention or psychological influence on the participants, the participants agreed not to sign an informed consent form, as verbal consent could be obtained from all the participants after the study was explained to them as per the ethical committee’s requirements for relevant research. Those who refused to be observed were excluded from the study.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.