The applications of machine learning in business sectors span from advertising and manufacturing to automotive and healthcare. The latter offers plenty of opportunities for AI implementations, and one of them is supporting the decision-making processes of individual staff members and entire organizations.
In the following article, we present the process of creating a system equipped with an intelligent inference module for the healthcare industry. Specifically, we will analyze the problem related to queueing of patients waiting for a specialist medical examination. The task of the machine learning module will be to suggest changes in the order of people.
To train the model, we will use data from the medical records of former patients, thanks to which we can identify patients who require testing. This will reduce the response time and minimize the possibility of queue operator error. A working machine learning algorithm will decide with hundreds of thousands of cases in “memory,” performing one task in a fraction of a second and relieving the system operator.
Note: Using a decision support system cannot replace the work of the human operator. As the name suggests, it’s a support system, not a decision-making system. Its effectiveness will be measured by the quality of the data used for learning, its quantity, and the effectiveness of the tools used to create such an algorithm. In fact, a person with domain knowledge is needed at every stage, thanks to which the suggestions of the machine learning model are supervised.
Imagine a situation in which a hospital has specialized research equipment. Such equipment is very expensive, so there is not much of it, and the queue for examinations is becoming longer every day.
We assume that the discussed medical equipment allows diagnosing bone disease – osteoporosis. We know from research that the disease develops in different stages. It’s not a disease with specific binary symptoms. Detecting it early allows for administering treatment quickly, which significantly increases the chances of recovery or minimizes potential damage.
Of course, we want to prioritize patients most suspected of having the disease. But how do you know if a person is more vulnerable than others? Access to specialists capable of making such a diagnosis is very limited. A doctor’s time devoted to someone who has a negligible likelihood of developing the disease is wasted time.
The solution to this problem may seem to be the maximum reduction of the time between referral to the examination and a visit to a specialist.
An example process of treatment qualification.
The application’s location in the process.
The place where the system’s help is most useful is the initial classification, so we will embed our module in this area.
This is the first step in the process outlined below:
Outline of work stages.
To start working on the development of such a machine learning module, we need examination results along with appropriate measurements. We also need a lot of historical data that enables model training. If the training results are satisfactory, we will code the classification algorithm on its basis.
We can divide the project into five stages.
An engineer working on an algorithm needs to understand the problem really well. Without this knowledge of the problem and the environment in which it occurs, solving it is impossible.
This stage includes reading the literature and consulting with a specialist able to describe the problem – in our case, that would likely be a medical doctor, a researcher, or a medical worker.
At this stage, we analyze disease data. We should be able to deduce which examinations are usually performed and what additional information we have at our disposal – for example, about the age and gender of the respondent.
Determining the quality of the data set dictates the next steps. If it turns out that our data set isn’t sufficiently high quality, model training will have to be postponed in favor of collecting good quality data.
To correctly classify the probability of disease occurrence, the input data must carry information about factors that indicate it. Without the option to parameterize the patient’s health data, we won’t be able to create a well-functioning classifier.
However, if the amount of medical data and their quality are satisfactory, we can proceed to the next stage – training the classifier model.
At this stage, machine learning engineers typically check which coefficients correlate with the test result. For this purpose, we use the appropriate mathematical tools, compare their results, and select the most promising ones.
When we notice a trend in the results, we save them and present them visually so medical professionals can analyze them. This stage is the most important one in the entire process. It allows us to extract hidden trends by analyzing massive volumes of data in various configurations.
During such analysis, we may discover that combining several specific parameters with appropriate thresholds allows us to predict the possibility of developing a disease with the probability of 80%, for example. If the results are statistically significant, we can conclude that we have found variables indicating the disease risk. Using the knowledge of these variables is what enables us to create a sequence recommendation system for the study.
After conducting research, verifying its results, and receiving comments from experts with domain knowledge, we may decide to continue the work (we have enough knowledge) or go back to the data collection stage. If our analyses turned out to be correct, we proceeded to create the model. For this purpose, we use… the very variables mentioned above.
We investigate the variable combinations that account for the greatest percentage of the variance further. This is how we can choose the most appropriate classification algorithm. For this purpose, we use a set of tools and machine learning methods that will be presented in the next article in this series.
We then test the created module. Due to the seriousness of the field where the recommendation system is to operate – medicine – the model needs to be tested with maximum accuracy.
When operating the application, we obtain new results that allow us to refine the “knowledge” of our machine learning model. The system collects data: we know which people were referred for examinations with a higher level of certainty regarding the disease, and we also have the examination result.
This gives us information that acts as feedback for the entire system. The model can develop over time as it acquires knowledge used to modify its parameters via a training mechanism.
We need to run a series of thorough tests after each model retraining process to monitor the results and quickly spot any anomalies that may have originated in damaged learning data or other problems – for example, overfitting.
The model we created doesn’t have to be a separate program encapsulated with many additional functionalities. The machine learning model should be treated as a self-contained service, the function of which is only to provide a result based on the patient data input.
We treat it as a separate application that can be freely embedded in microservices or hexagonal architecture. Separating the algorithm that way allows engineers to work on its development, interfering with other parts of the system.
If a given medical unit already has a queuing system, such a service can be implemented into the running system. We equip it with interfaces to allow the website to communicate with other parts of the application. The input interface expects the required set of data about the patient, while the output interface generates the result as the probability of disease occurrence.
Example service interface.
The queue management system makes the decision on what to do with the information about the probability of disease occurrence.
We have described the machine learning model above, which could potentially significantly improve the effectiveness of detecting osteoporosis at an early stage. This is a great example of how IT systems may improve the quality of life of regular people, even from the healthcare perspective.
The above content is an introduction to an article series on data processing. In the following installments, we will describe how to start this type of project. We will share our experience in the field of collecting requirements, the use of tools, and a holistic approach to creating this type of model.
Remember that the content we present isn’t a ready-made recipe for creating a project. It’s a collection of experiences and best practices that we want to share as a company working with complex AI projects on a daily basis.
The article was prepared by developers from the Data Processing & AI team at 4soft: Michał Będkowski and Bartosz Rogulski.
Michał Będkowski
Michał is a software architect and CTO of 4soft. A robotics engineer by education, he has worked in the specialized software industry for 15 years. He gained experience working on a wide range of projects, from medical applications for Merck to creating systems for data analysis in the gaming industry.
Bartosz Rogulski
Bartosz is a Python Developer at 4soft. He calls himself “a scribbler in love with machine learning and self-learning.” He holds a Bachelor’s degree in Neurobiology from Jagiellonian University.