Abstract:
Claim verification has kept a manual procedure due to its nature, which necessitates human observation,
posing a barrier to the insurance sector. The cost of health insurance has been steadily rising over time. Most
Tanzanians cannot manage healthcare since they are not covered by insurance. With a low per capita
income, insurance costs are quite expensive. Administrative costs are expensive because of poor operational
systems prone to mistakes and false claims. The Tanzanian insurance industry is faced with the challenge
of discerning which insurance claims are legit and which ones are fraudulent. This piece of research is the
first of its kind to make use of a classifier and get a classification result with the goal of increasing the
efficiency with which a subject matter expert validates or rejects healthcare claims. The research mainly
aimed to establish a Machine-Learning algorithm that can be used in assessing insurance claims in Tanzania.
From the literature reviewed, the research has established a research gap, which is exploited by this
research. There is currently no all-encompassing Machine-Learning method for identifying fraudulent medical
insurance claims. There are a variety of authors, and each of them has his/her own recommendations.
However, these recommendations are not consistent throughout the many pieces of literature. This makes it
more difficult to evaluate these algorithms efficiently. The purpose of this project is to close this research gap
by developing a Machine-Learning algorithm for the detection of fraudulent medical insurance claims. This
algorithm will then be tested to determine how well it performs its intended function. The research has also
relied on two theories: the Winners curse theory and Fraud Management Lifecycle Theory which details fraud
management in the health sector.
The exploratory or interpretative research design was the basis for this research project's methodology. The
research is mainly dependent on secondary data retrieved from the NHIF-Tanzania database. The data were
acquired from the claims processing pipeline when they were priced and ready for finalisation and payment,
8
and this was the data introduction step. Sci-kit learning is the main data analysis method used. The main aim
of the research is to assess the various platforms and establish which are the most effective in assessing
insurance claims.
This research presented a Machine-Learning model that may be used to build a system that automates claim
evaluation, reducing the time and effort necessary to handle medical claims. The model is trained from prior
data patterns and forecasts the claim's accuracy or review, eliminating mistakes caused by manual
operations. The research has established that the Naves Bayer classifier model is the most efficient Machine Learning algorithm used to detect medical insurance claim fr