МЕТРИКА СХОЖОСТІ КАТЕГОРІАЛЬНИХ РОЗПОДІЛІВ, ЩО ВРАХОВУЄ СПОРІДНЕНІСТЬ РІЗНИХ КАТЕГОРІЙ
DOI:
https://doi.org/10.31649/1997-9266-2023-167-2-49-57Keywords:
categorical distribution, kinship categories, similarity metric, Czekanowski metric, pose detection, reviewer recommendation, , generalized Pareto distributionAbstract
Estimating a level of similarity of two objects is a common problem in pattern recognition, clustering and classification. Among these problems can be reviewer recommendation, similar text documents analysis, human pose detection in video, species distribution clustering, recommendation in internet-shops etc. In case of categorical attributes an object is described as a distribution of membership degrees over categories. Similarity metrics of such distributions are usually defined as a superposition of objects’ similarities for each category. Most often it is a sum of similarities in separate categories. In addition to that each category is considered independently and in isolation from the others. Some practical problems have categories that are kinship. Therefore, it is expedient to consider objects’ similarity not only directly, as a similarity between equivalent categories, but it is also necessary to consider an indirect similarity, cross-similarity through kinship categories. It is such similarity metric of two categorical distributions that accounts for the kinship of different categories is proposed in this paper. The metric has two components. The first component is defined as Czekanowski metric. It defines a direct similarity of categorical distributions as a sum of intersection of distributions’ membership degrees of two objects. After the intersection the residuals are accounted for in the second component of the metric. The second metric’s component is defined as element-wise product of two matrices: matrix of residuals composition from membership degrees of two categorical distributions and matrix of categories’ paired kinship. It is assumed that kinship indices for each pair of categories are known. As a result, with a large number of categories the overall noisy contribution from weakly kinship categories is prominent. Therefore, it is proposed to filter the noise and account only for contribution from strongly kinship categories.
References
N. Sebe, J. Yu, Q. Tian, and J. Amores, “A New Study on Distance Metrics as Similarity Measurement,” in 2006 IEEE International Conference on Multimedia and Expo, Toronto, Ont., 2006, pp. 533-536. https://doi.org/10.1109/ICME.2006.262443 .
Wang Wen-June, “New similarity measures on fuzzy sets and on elements,” Fuzzy sets and systems, no. 85.3, pp. 305-309, 1997. https://doi.org/10.1016/0165-0114(95)00365-7 .
Cha, Sung-Hyuk. “Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions,” International journal of mathematical models and methods in applied sciences, no. 1.4, pp. 300-307, 2007.
Jie Yu, Qi Tian, J. Amores, and N. Sebe, “Toward Robust Distance Metric Analysis for Similarity Estimation,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, pp. 316-322, https://doi.org/10.1109/CVPR.2006.310 .
S. Shtovba, and M. Petrychko, “An Algorithm for Topic Modeling of Researchers Taking Into Account Their Interests in Google Scholar Profiles,” CEUR Workshop Proceedings, vol. 2864 “Proceedings of the Fourth International Workshop on Computer Modeling and Intelligent Systems”, pp. 299-311, 2021. https://doi.org/10.32782/cmis/2864-26 .
S. Shtovba, and M. Petrychko, “Jaccard Index-Based Assessing the Similarity of Research Fields in Dimensions,” CEUR Workshop Proceedings, vol. 2533 “Proceedings of the First International Workshop on Digital Content & Smart Multimedia”, pp. 117-128, 2019.
Downloads
-
PDF (Українська)
Downloads: 111
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).