The amount of data collected from different areas has led to a situation in which the extraction of interesting knowledge from large datasets is a very attractive and challenging task for industry, security, healthcare, and economic applications.
However, in real-life applications, data change over time or represent different temporal situations, and this temporal information is usually included in the data collected. If these temporal components are not properly taken into account, the knowledge extracted may not be useful because there is no way to know if it is applicable at the present time or at some point in the future.
Useful knowledge such as, “After receiving a radiation treatment for 7 days, cancer patients suffer from both nausea and magnesium deficiency,” may be extracted from databases of cancer patients to assist healthcare providers in making decisions. Moreover, this knowledge may be overlooked due to its association with a particular time period or with special events or emergencies such as sports games, hurricanes, or, as we are currently witnessing, a pandemic, necessitating higher supplies of personal protective equipment for hospitals and individuals.
Association discovery is one of the most common data mining techniques for extracting knowledge from large databases. Association rules allow the identification of dependencies between the elements or values in a database and are defined as an implication expression of the type A → B (without intersection between the facts A and B).
For example, the rule mask → gloves could be extracted from the equipment repository for a hospital, indicating that most medical staff who wear a mask also wear gloves. This rule may help hospital managers to plan equipment orders, but it does not convey information about the demand in each time period. Over the last few years, many methods have been proposed in the literature for mining temporal association rules (TARs) from databases, which explicitly consider the temporal information.
In a recent review published in WIREs Data Mining and Knowledge Discovery, researchers from the University of Granada and the University of Jaén, Spain, review the current state of the TARs field in order to help students and researchers more easily locate articles related to the temporal-type problems they are trying to solve.
Their study evidences the great capacity of TARs to be applied successfully to a wide variety of real-world problems. This is the case “in industry, security, medicine, and healthcare, with a number of application proposals in the areas of medicine and healthcare particularly noteworthy,” the authors explained.
Several free and open-source software tools are available for applying TAR to different problems, but few researchers share the source codes associated with their proposals. For this reason, the authors recommend that researchers “share the source code of their proposals, since this would have a very positive impact, both on the development of better algorithms and on their applicability to new subjects and the industry.”
The authors explore different approaches to consider temporal information, such as the development of new proposals for solving temporal big data problems (e.g., the Internet of Things), making use of MapReduce paradigm, Deep Learning, Fog Computing, and other dimensions in the process of knowledge extraction (such as space) to obtain more complete and accurate information. The extraction of high-utility TARs that maximize/minimize some unit profit determines its relevance — for instance, the price of drugs available for medical treatment.
“This leads to the development of a group of potentially enhanced techniques, since the discovery of TARs allows us to obtain models with a greater predictive and descriptive power, providing an additional degree of interestingness,” the authors concluded.
Article written by Alberto Segura‐Delgado, María José Gacto, Rafael Alcalá, and Jesús Alcalá‐Fdez
Reference: Alberto Segura-Delgado et al. ‘Temporal association rule mining: An overview considering the time variable as an integral or implied component,’ WIREs Data Mining and Knowledge Discovery (2020). DOI: 10.1002/widm.1367