R
RCAAP Rss Feeder
Guest
Breve resumo:
Info Adicional:
Autor:
Clica para continuares a ler...
It's no secret that the most important thing in our world is information. Nowadays, almost every action leaves a trace. And if we use this data correctly, we will get new knowledge and predictions. But this requires new specialized technologies such as Big Data. The work described in this dissertation focuses on three methods of Big Data analysis: descriptive analysis, correlation and predictive analysis. The purpose of the work is to explore these methods for practical application to a dataset containing information about IPB and Erasmus students. The following tasks were performed: collecting data from international students about their university practices and mobility, conducting descriptive analysis on general characteristics by year, course, gender, place of residence, degree, number of subjects studied and their grade point average. Correlation heat charts were constructed between the values in the dataset and dependencies were analyzed. The most important contribution of this paper is the practical application of three machine learning algorithms (Linear regression, Ridge regression, and Random forest) to predict the number of Erasmus students for the next year. The machine learning algorithms build a model from sample data, known as "training data," to make predictions or decisions without being explicitly programmed to do so.
Info Adicional:
It's no secret that the most important thing in our world is information. Nowadays, almost every action leaves a trace. And if we use this data correctly, we will get new knowledge and predictions. But this requires new specialized technologies such as Big Data. The work described in this dissertation focuses on three methods of Big Data analysis: descriptive analysis, correlation and predictive analysis. The purpose of the work is to explore these methods for practical application to a dataset containing information about IPB and Erasmus students. The following tasks were performed: collecting data from international students about their university practices and mobility, conducting descriptive analysis on general characteristics by year, course, gender, place of residence, degree, number of subjects studied and their grade point average. Correlation heat charts were constructed between the values in the dataset and dependencies were analyzed. The most important contribution of this paper is the practical application of three machine learning algorithms (Linear regression, Ridge regression, and Random forest) to predict the number of Erasmus students for the next year. The machine learning algorithms build a model from sample data, known as "training data," to make predictions or decisions without being explicitly programmed to do so.
Autor:
Clica para continuares a ler...