A dynamic early warning model based on LSTM-AM for property-based elderly care resource shortages in urban-rural integrated counties in China

Study area and data

Research area

This study takes six urban-rural integration pilot counties (cities and districts) in Xuchang City, Henan Province, including Yuzhou City, Changge City, Yanling County, Xiangcheng County, Weidu District and Jian’an District, focusing on the dynamic early warning problem of resource shortage in the property-based elderly care service model, and constructing a resource shortage early warning system adapted to the characteristics of urban-rural integration through the innovation of the three levels of spatial and temporal data fusion, optimization of LSTM algorithms, and design of the early warning mechanism. Xuchang City, Henan Province, as the only national urban-rural integration development pilot area in Henan Province, firstly, its urban population accounts for 53.55% and rural population accounts for 46.45%, which is a relatively balanced ratio between urban and rural population, and has the characteristics of “two-way population mobility” and “rapid demand change”; secondly, it has the characteristics of “two-way population mobility” and “rapid demand change”. “Secondly, the six urban-rural integration pilot counties (cities and districts) have basically established dynamic databases covering urban and rural areas, such as property operation and maintenance logs, residents’ health records, etc., and have realized the integration of multi-source data through the national pilot project of “online and offline integration of governmental services”, so that the data have a certain degree of completeness.

Data types

The research data covers three dimensions: demand-side, supply-side, and spatial and temporal characteristics¹⁰. Demand-side data are used to calculate the elderly population density by integrating the data from the public security household registration system and the area data from the geographic information system (GIS), to obtain the electronic health records of the elderly to assess their health status and medical care needs, and to collect and analyze the data from the property-based elderly care service platform and the community service hotline to calculate the frequency of service requests. Supply-side data are collected through on-site visits to count the number of senior care facilities in the property, questionnaires and interviews to obtain caregiver staffing data, and analysis of financial statements to obtain operation and maintenance cost data. Temporal and spatial characteristics data use the map data interface combined with administrative district boundary data to generate GIS population heat maps and record the service response time series data of the property senior care service platform.

The time span is from January 2018 to December 2024, covering a period of seven years. Among these, dynamic indicators such as service request frequency and property response efficiency are sampled monthly; facility numbers and nursing staff allocation are updated quarterly; and indicators such as population aging rate and service coverage rate are statistically compiled annually.

For missing values in continuous time series data such as elderly population density and service request frequency, time series linear interpolation is used to fill in the gaps. The calculation formula is as follows:

$${\text{x}}_{{\text{t}}} = {\text{x}}_{{{\text{t}} – 1}} + {\text{s}}({\text{x}}_{{{\text{t}} + 1}} 0{\text{x}}_{{t – 1}} ) \times ({\text{t}} – {\text{t}}_{{ – 1}} )/({\text{t}}_{{ + 1}} 0{\text{t}}_{{ – 1}} )$$

(1)

Among them, $\:{\text{x}}_{\text{t}}$is a missing value, and $\:{\text{x}}_{\text{t+1}}$、$\:{\text{t}}_{\text{-1}}$ are adjacent observation time points.

Model construction

Indicator design

Core indicators

In order to break through the single dimension of traditional “service completion rate”, Property Response Efficiency (PRE) is set to make the indicator more suitable for the time-sensitive characteristics of the elderly to the emergency services, and the weight of timeliness is innovatively introduced to identify the main cause of resource shortage. The calculation formula is as follows:

$$\:PRE=\left(\frac{Number\:of\:Completed\:Services}{Total\:Number\:of\:requests}\right)\times\:Timeliness\:Coefficient$$

(2)

Number of service completions: the number of work orders recorded by the property platform that have successfully closed the loop (excluding canceled or timeout work orders due to insufficient resources); Total number of requests: the number of all service requests in the same time period (including unfinished work orders);

Timeliness factor: reflects the timeliness of the service.

$$\:Timeliness\:Coefficient=1-\frac{Average\:{Re}sponse\:Time}{Service\:Commitment\:Time}$$

(3)

The timeliness coefficient is assigned differently based on service type. For emergency services, the response time commitment is set to within 30 min, with a timeliness coefficient of 1.2. For each additional 10-minute delay in response, the coefficient decreases by 0.1, with a minimum of 0.5. For standard services, the response time commitment is within 24 h, with a timeliness coefficient of 0.8. The coefficient remains at 0.8 for delays up to 12 h, and decreases by 0.2 for delays exceeding 12 h. For scheduled services, if completed within the scheduled time window, the coefficient is 1.0. If completed early, the coefficient increases by 0.1, and if delayed, it decreases by 0.1, with a maximum of 1.2 and a minimum of 0.5.

Auxiliary indicators

The Facility Accessibility Index (FAI) in this study is the core index to measure the spatial convenience of access to property-based elderly care service facilities for the elderly groups in urban-rural integration areas¹¹which assesses the spatial fairness of resource allocation by quantifying the coupling relationship between the service capacity of the facilities and the geographic distance¹². The formula is as follows:

$$\:FA{I}_{i}={\sum\:}_{j=1}^{n}\frac{{S}_{j}}{{d}_{ij}^{\beta\:}}$$

(4)

$\:FA{I}_{i}$: the facility accessibility index of region i. Higher values indicate that elderly people in region i have easier access to elderly care services;$\:{S}_{j}$: the service capacity of facility j (e.g., the number of beds, the number of caregivers, or the average daily number of service visits in senior care stations);$\:{d}_{ij}$: the actual access distance from region i to facility j (based on road network data to calculate the walking or car traveling time, unit: minutes);$\:\beta\:$: distance decay coefficient (taken as = 1.5 in this study), reflecting the decay rate of service utility with increasing distance.

Demand Fluctuation Entropy (DFE) is a quantitative indicator constructed based on the information entropy theory, which is used to measure the uncertainty and complexity of the market demand for property senior care services over time, in which the higher the entropy value indicates that the demand fluctuation is more intense¹³. The calculation formula is as follows:

$$\:DFE=-{\sum\:}_{t=1}^{T}{p}_{t}{{lnp}}_{t}$$

(5)

$\:{p}_{t}$

The ratio of service requests to total requests in time window t, t is a calendar month.

LSTM-AM network architecture construction

Based on the dynamic and spatio-temporal heterogeneity characteristics of the property elderly service resource shortage warning in urban-rural integration areas, this study constructs an Long Short-Term Memory with Attention Mechanism (LSTM-AM) architecture¹⁴. The architecture can effectively improve the accuracy of resource gap prediction and make the prediction results more interpretable through collaborative modeling of spatio-temporal features and adaptive weighting of multi-source data. The model architecture mainly contains four parts: input layer, spatio-temporal attention fusion layer, stacked LSTM layer and dynamic threshold output layer (see Fig. 2).

Input layer design

The input data mainly consists of two types of matrices, i.e., the time-series feature matrix and the spatial feature matrix.

First, the time-series feature matrix integrates indicators from three categories, including demand side, supply side and service effectiveness, and is constructed into the time-series input $\:{X}_{t}\in\:{R}^{T\times\:N}$ with the dimension of T×N, where T stands for the time step and N takes the value of 8, which represents the number of features.

Secondly, the spatial feature matrix is based on GIS heat map and facility accessibility index FAI to generate the area grid code $\:{G}_{s}={R}^{S\times\:D}$ (where S is the number of spatial grids and D = 3 is the grid attribute dimension). After that, the spatial dependencies among them are extracted by graph convolution network (GCN), and the final output is the spatial embedding vector $\:{E}_{s}\in\:{R}^{S\times\:d}$ (d is the embedding dimension)¹⁵.

Spatio-temporal attention fusion layer

In order to accurately capture the spatio-temporal heterogeneity presented by the resource shortage, a dual-path attention mechanism is specially designed.

Temporal attention path: for the temporal feature Xt, the self-attention weight $\:{A}_{t}\in\:{R}^{T\times\:T}$ is computed as a way to characterize the dependency strength between different time steps, which is calculated as:

$$\:{A}_{t}=S\text{oftmax}\left(\frac{{X}_{t}{W}_{Q}{\left({X}_{t}{W}_{K}\right)}^{T}}{\sqrt{{d}_{k}}}\right)$$

(6)

where W_Q, W_K are the learnable parameter matrices and d_k is the scaling factor. Through the weighted aggregation operation, the final output of time-enhanced features $\:{{X}^{{\prime\:}}}_{t}={A}_{t}{X}_{t}{W}_{V}$.

Spatial attention path: based on the spatial embedding E_s, the inter-region interaction weight $\:{A}_{s}\in\:{R}^{S\times\:S}$ is calculated as a way to reflect the conduction effect of resource shortage across regions, which is calculated as:

$${\text{A}}_{s} \left( {i,j} \right) = \sigma \:\left( {e_{{\iota \:}}^{{\:T}} W_{s} \:E_{s} + b_{s} } \right)$$

(7)

where e_i and E_s are the embedding vectors of regions i and j, respectively, and σ is the Sigmoid function. The temporal feature matrix $\:{\text{X}}_{\text{t}}\in{\text{R}}^{\text{T} \times \text{N}}$ is processed through the temporal attention path (Formula 6) to generate a temporally enhanced feature vector $\:{\text{H}}_{\text{t}}\in{\text{R}}^{\text{T}\times{\text{d}}_{\text{1}}}$ by $\:{\text{H}}_{\text{t}}={\text{A}}_{\text{t}}\cdot\:{\text{X}}_{\text{t}}$. The spatial grid encoding $\:{\text{G}}_{\text{s}}$ is embedded through GCN to obtain a spatial feature matrix $\:{\text{E}}_{\text{s}}\in{\text{R}}^{\text{S}\times \text{D}}$, which is then processed through the spatial attention path (Formula 7) to calculate regional interaction weights and capture the spatial correlation of “high demand-low supply” regions, generating a spatially enhanced feature vector $\:{\text{H}}_{\text{s}}\in{\text{R}}^{\text{S} \times {\text{d}}_{\text{2}}}$ by $\:{H}_{s}={A}_{s}\bullet\:{E}_{s}$. Through feature concatenation, the temporally enhanced features and spatially enhanced features are fused into a spatio-temporal joint matrix:

$$\:{\text{H = }}[{\text{H}}_{{\text{t}}} {\text{ + H}}_{{\text{s}}} ] \in {\text{R}}^{{{\text{T}} \times \:{\text{S}} \times \:({\text{d}}_{{\text{1}}} {\text{ + d}}_{{\text{2}}} {\text{)}}}}$$

(8)

With the dual-path attention mechanism, the temporal fluctuation is separated from the spatial diffusion effect, which can effectively solve the deficiencies of traditional LSTM in modeling spatial heterogeneity.

Stacking LSTM layers

In this study, a two-layer bidirectional LSTM (Bi-LSTM)¹⁶ is used for deep mining of long-term dependencies in time series data. The LSTM unit implements selective memory and forgetting of key temporal features through a gating mechanism. Its core gating formula is as follows:

$$\:{\text{f}}_{\text{t}}{=\sigma(}{\text{W}}_{\text{f}}{\cdot[}{\text{h}}_{\text{t-1}}\text{,}{\text{x}}_{\text{t}}\text{]+}{\text{b}}_{\text{f}}\text{)}$$

(9)

$\:{\text{f}}_{\text{t}}$ is Control historical information retention;$\:{\text{W}}_{\text{f}}$ is Weight matrix;$\:{\text{b}}_{\text{f}}$ is Bias term; $\:{\sigma}$ is Sigmoid activation function.

$$\:{\text{i}}_{{\text{t}}} {\text{ = }}\sigma {\text{(W}}_{{\text{i}}} \cdot [{\text{h}}_{{{\text{t}} – {\text{1}}}} {\text{,x}}_{{\text{t}}} ]{\text{ + b}}_{{\text{i}}} {\text{)}}$$

(10)

$\:{\text{i}}_{\text{t}}$ is Control new information input; $\:{\sigma}$ is Sigmoid activation function;$\:{\text{W}}_{\text{i}}$ is Weight matrix;$\:{\text{b}}_{\text{f}}$ is Bias term.

$${{\tilde{C}}}_{{\text{t}}} {\text{ = tanh(W}}_{{\text{C}}} \cdot {\text{[h}}_{{{\text{t}} – {\text{1}}}} {\text{,x}}_{{\text{t}}} {\text{] + b}}_{{\text{C}}} {\text{)C}}_{{\text{t}}} {\text{ = f}}_{{\text{t}}} \cdot {\text{C}}_{{{\text{t – 1}}}} {\text{ + i}}_{{\text{t}}} \cdot {{\tilde{C}}}_{{\text{t}}}$$

(11)

$\:{\text{C}}_{\text{t}}$ is Cell status;$\:\:\text{tanh}$ is Hyperbolic tangent activation function;$\:{\text{W}}_{\text{C}}$ is Weight matrix$\:{\text{b}}_{\text{C}}$ is Bias term.

$$\begin{gathered} {\text{o}}_{{\text{t}}} {\text{ = s(W}}_{{\text{o}}} \cdot [{\text{h}}_{{{\text{t}} – {\text{1}}}} {\text{,x}}_{{\text{t}}} ]{\text{ + b}}_{{\text{o}}} {\text{)h}}_{{\text{t}}} {\text{ = o}}_{{\text{t}}} \cdot {\text{tanh(C}}_{{\text{t}}} {\text{)}} \hfill \\ \hfill \\ \end{gathered}$$

(12)

$\:{\text{o}}_{\text{t}}$ is Generate the current hidden state; $\sigma$ is Sigmoid activation function$\:{\text{h}}_{\text{t}}$ is t-moment hidden state$\:{\text{W}}_{\text{o}}$ is Weight matrix$\:{\text{b}}_{\text{o}}$ is Bias term.

The hidden state update formula for the first layer LSTM is:

$${\text{h}}_{{\text{t}}}^{{\text{l}}} {\text{ = o}}_{{\text{t}}}^{{\text{l}}} \cdot {\text{tanh(W}}^{{\text{l}}} {\text{x}}_{{\text{t}}} {\text{ + U}}^{{\text{l}}} {\text{h}}_{{{\text{t}} – {\text{1}}}}^{{\text{l}}} {\text{ + b}}^{{\text{l}}} {\text{)}}$$

(13)

Among them, $\:{\text{W}}^{\text{1}}\text{,}{\text{U}}^{\text{1}}$ is the weight matrix of layer l, and $\:{\text{b}}^{\text{l}}$ is the bias term. The bidirectional structure effectively enhances the modeling ability of demand fluctuation periodicity and policy lag effects by concatenating the hidden states of the forward LSTM and reverse LSTM $\:\text{[}\overrightarrow{{\text{h}}_{\text{t}}}\text{,}\overleftarrow{{\text{h}}_{\text{t}}}\text{]}$.

Dynamic threshold output layer

The output layer mainly contains two branches, the resource gap prediction branch and the risk level calibration branch:

Prediction branch: the hidden state $\:{h}_{t}^{\left(l\right)}$ of the top layer LSTM is mapped through the fully connected layer to get the resource gap amount $\:{\widehat{y}}_{t}$, which is calculated by the formula:

$$\:{\widehat{y}}_{t}={W}_{o}{h}_{t}^{\left(L\right)}+{b}_{o}$$

(14)

Threshold calibration branch: based on the historical error distribution (MAE, RMSE) and policy constraints, the warning threshold $\:{\tau\:}_{t}$ is dynamically adjusted by the formula:

$$\:{\tau\:}_{t}={\mu\:}_{e}+\alpha\:\cdot\:{\sigma\:}_{e}+\gamma\:\cdot\:{I}_{policy}$$

(15)

where $\:{\mu\:}_{e}$ and $\:{\mu\:}_{e}$ are the mean and standard deviation of the prediction error within the sliding window, respectively, $\:\alpha\:$ takes the value of 1.5 as the tolerance factor, and $\:\gamma\:$ is the policy adjustment factor. When $\:{\widehat{y}}_{t}\ge\:{\tau\:}_{t}$, the corresponding level of warning is triggered and a facility augmentation program is generated (e.g., new beds $\:\varDelta\:B=\left\lceil {\frac{{\hat{y}_{t} }}{{\beta \:}}} \right\rceil$, $\:\beta\:$ is the single-bed service efficiency). Expanding the original static warning threshold into a mechanism that can adaptively adjust with the error distribution as well as policy changes can greatly improve the timeliness of the warning.

Model training and optimization

The loss function adopts the joint loss of mean square error (MSE) and dynamic threshold constraints, with the specific expression:

$$\:L=\frac{1}{T}{{\sum\:}_{t-1}^{T}\left({\widehat{y}}_{t}-{y}_{t}\right)}^{2}+\lambda\:\cdot\:{max}\left(0,{\tau\:}_{t}-{\widehat{y}}_{t}\right)$$

(16)

Among them, $\:\lambda\:$ = 0.3 is the threshold violation penalty coefficient. During the optimization process, the Adam algorithm is used, with the initial learning rate set to 10⁻³ the first-order moment decay coefficient set to $\:{\beta}_{\text{1}}\text{=0.9}$, and the second-order moment decay coefficient set to $\:{\beta}_{\text{2}}\text{=0.999}$. At the same time, Dropout (p=0.2) and Layer Normalization are introduced to prevent the model from overfitting.

Validation method design

Based on the above model, in order to comprehensively assess the performance of the constructed LSTM-AM model in the early warning of the shortage of resources for property elderly care services in counties in urban-rural integration zones, this study carries out the validation in three aspects: controlled experiments, assessment indexes, and robustness tests.

Controlled experiment design

In order to clearly demonstrate the advantages of the LSTM-AM model in resource shortage warning, it is compared with a traditional time series forecasting model. First, ARIMA (autoregressive integrated sliding average model), Prophet (time series forecasting model based on additivity decomposition and trend, seasonality, and holiday effects), and the traditional LSTM (without incorporating the attention mechanism) are selected as control models. Second, under the same experimental environment, these four models were trained and predicted separately using the above-collected data related to property elderly care services in urban-rural integration area counties. To ensure the fairness and reliability of the experimental results, each model strictly follows its own parameter setting rules and training process. Finally, by comparing the performance of each model in resource gap prediction, we analyze the improvement effect of the LSTM-AM model in capturing complex spatial and temporal features and adapting to dynamic demand changes compared with the traditional model.

In addition, this paper compare LSTM-AM against existing SOTA approaches, including ST-GCN(Spatio-Temporal Graph Convolutional Network)¹⁷and Spatio-Temporal Transformer¹⁸。ST-GCN combines graph convolutions to extract spatial dependencies and uses temporal convolutions for sequence modeling, making it widely used in spatio-temporal data prediction. Spatio-Temporal Transformer uses self-attention mechanisms combined with positional encoding to capture long-range spatio-temporal correlations.

Design of assessment indicators

This study measures the prediction performance of the model from three perspectives: mean absolute error, root mean square error, and warning accuracy.

First, the mean absolute error (MAE), which is used to measure the average absolute value of the error between the predicted value and the true value, with the formula:

$$\:MAE=\frac{1}{n}{\sum\:}_{i-1}^{n}\left|{y}_{i-}{\widehat{y}}_{i}\right|$$

(17)

where $\:{y}_{i}$ is the true value, $\:{\widehat{y}}_{i}$ is the predicted value, and n is the number of samples. The smaller the value of MAE, the smaller the average deviation of the model prediction results from the true value, and the higher the prediction accuracy.

Second, Root Mean Square Error (RMSE), by calculating the square root of the mean of the sum of squared prediction errors, a greater weight is given to samples with larger errors, with the formula:

$$\:RMSE=\sqrt{\frac{1}{n}{\sum\:}_{i-1}^{n}{\left({y}_{i{-}_{}}{\widehat{y}}_{i}\right)}^{2}}$$

(18)

RMSE can reflect the fluctuation of the model’s prediction more sensitively, and the lower the value of RMSE, the better the overall prediction stability of the model.

Third, the warning accuracy (F1-score), considering the balance between Precision and Recall in the warning task, the F1-score is adopted as the evaluation index with the formula:

$$\:F1=2\times\:\frac{{Pr}ecision\times\:{Re}call}{{Pr}ecision+{Re}call}$$

(19)

Where Precision refers to the ratio of the number of samples predicted to be positive and actually positive to the number of samples predicted to be positive, and Recall refers to the ratio of the number of samples actually positive and predicted to be positive to the number of samples actually positive. The value of F1-score ranges from 0 to 1, and the closer it is to 1, the better the model’s early warning performance is, i.e., in terms of both correctly identifying the risk of resource shortages (Precision) and covering all the actual shortages (Recall), and in terms of correctly recognizing the risk of resource shortages (Precision) and covering all the actual shortages (Recall). (recall rate) both perform well.

Robustness test

In complex and unstable data environments, to test the reliability of the model, we systematically evaluated the performance of the LSTM-AM model and conducted robustness testing. Gaussian noise data is introduced into the original dataset, with its probability density function satisfying $\:\text{N}=(\mu = 0,{\sigma}^{\text{2}}\text{=0.0225)}$, where the mean is $\mu = 0$ and the standard deviation is $\sigma = 0.15$. Random perturbations of ± 15% are added to various types of data, including demand-side, supply-side, and spatio-temporal features¹⁹. The noise injection covers input feature dimensions such as demand-side original data, supply-side feature indicators, and spatio-temporal interaction features.

Retrain the LSTM-AM model as well as the control model on the dataset after adding noise and make predictions. Compare the prediction results of each model under the noisy data with the performance under the original data, and observe the changes in the evaluation metrics such as MAE, RMSE and F1-score. If the model can still maintain a relatively stable performance under the noise environment, i.e., the fluctuation of the assessment indicators is small, it indicates that the model has strong robustness, can adapt to the data errors and uncertainties that may occur in the practical application, and is more practical and reliable in the actual early warning scenarios of the shortage of resources for property elderly care services in urban-rural integration zones, with a view to providing a solid theoretical and practical basis for the application of the model in the field of property elderly care services in urban-rural integration zones. In order to provide a solid theoretical and practical basis for the application in the field of property elderly service in urban-rural integration area.

Ethical and methodological compliance statements

(1)

All the methods in this study were carried out in accordance with relevant guidelines and regulations.
(2)

All experimental protocols were approved by the Ethics Committee of North China University of Water Resources and Electric Power.
(3)

This study confirms that informed consent was obtained from all the object subjects or their legal guardian(s).

link

Adaptive Health Solutions