Text classification: an approach using machine learning





Classification, Machine Learning, Algorithms, Information, Information Science


Text classification has been employed as a foundation for organizing knowledge across a wide range of fields, as it allows for the grouping of categories to guide the segmentation of these domains. In the digital information age, where there is an abundance of data spread across cloud computing environments, the use of informational technologies is essential to facilitate the classification process of this data. Within this framework, Information Science plays a pivotal role in the production, organization, transmission, and utilization of information across diverse fields, including computer science, mathematics, artificial intelligence, among others. Through technology, when information is appropriately classified, it can be made available to society more effectively. The primary aim of this article is to address contexts regarding text classification using Machine Learning. This research is exploratory, adopting an experimental method, and employs a quantitative approach as its data analysis technique. As a result, after utilizing the Euclidean distance algorithm, a distance matrix and hierarchical grouping were established, along with a word cloud, highlighting terms of significance from the documents.

Author Biographies

Edberto Ferneda, Universidade Estadual Paulista (Unesp)

Full Professor in Information Retrieval (2016). Postdoctorate from the Federal University of Paraíba (2013). PhD in Communication Sciences (Information Science) from the University of São Paulo (2003). Master's in Informatics from the Federal University of Paraíba (1997). Holds a degree in Data Processing from the former Educational Foundation of Bauru (1985). Currently an Associate Professor in the Department of Information Science at the São Paulo State University Julio Mesquita Filho (UNESP) - Marília Campus. Works in Information Science, mainly in the areas of Automatic Indexing and Information Retrieval. CNPq Research Productivity Fellow - Level 2

Leonardo Botega, Universidade Estadual Paulista (Unesp)

Doctor in Computer Science from the Federal University of São Carlos - UFSCar with a Postdoctorate from the University of São Paulo - USP. Permanent Member of the Postgraduate Program in Information Science at UNESP-Marília. Collaborating Member of the Postgraduate Program in Computer Science at UNESP-Bauru/Prudente. Collaborating Researcher at the Institute of Computing at UNICAMP. Data Product Manager at PISMO company. Leader of the Human-Computer Interaction Group (GIHC) - UNESP. Reviewer for journals in the areas of data fusion, critical decision-making systems, semantic web, and information systems. Has academic and professional experience in the following topics: Data and Information Fusion, Data Mining, Data and Information Quality, Semantic Web, Management of Critical Data and Critical Decision-Making Systems. Has obtained various publications in national and international events and journals, in addition to guiding various undergraduate, master's, and doctoral works with scholarships from CAPES, CNPq, and FAPESP.


