IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 390-394

International Journal of Research in Information Technology (IJRIT) www.ijrit.com

ISSN 2001-5569

Student Result and Placement Prediction Using Naïve Bayes Atish Shankar Ghone Department of computer Engg. Pimpri Chinchwad College of Engineering, Pune [email protected] Jitesh M. Sapariya Department of computer Engg. Pimpri Chinchwad College of Engineering, Pune [email protected] Avinash U. Jadhav Department of computer Engg. Pimpri Chinchwad College of Engineering, Pune [email protected] K. Rajeshwari. Department of Computer Engineering,Pimpri Chinchwad College of Engineering, Pune

Abstract— student data classification is a growing interest in the research of data mining. Preferably identifying student data into different category is still challenges because of huge amount features in the dataset. There are number of existing classification approaches, Bayesian classification is used as probability learning method, Naive bayes classifier are among the most successful known algorithm for learning to classify text document, the aim of this paper is to highlight the performance of student . Naive bayes in student data classification the information is extracted from could be used to find meaningful pattern or rules of the student to predict the future results of the student at the college level or university level. also model be used for future planning from teacher to which student provide the which type of treatment. Keywords— student data classification, naive bayes classifier, data mining, density function for normal distribution, Laplas Estimator.

I.INTRODUCTION In the today's era the increasing growth of student information year to year. The perfect classification of such amount of information into our need is a critical step towards the education success and student future. The number of student admits into collage all of them are from different background to identify the student performance manually and which student provide which type of treatment manually it is very time consuming process. there are many attempts to address this challenge, automatic data classification studies are gaining more interest in data mining research recently. there are number of approach are developed to resolve such problem like decision tree, neural network etc. from these different approach the Naive Bayes classifier are used because of simplicity of in classifying the dataset. Naive Bayes is simple technique for constructing classifier model the assign the class label to problem instance represent the vector of future value, where the class label are drawn from some finite set, Naive Bayes classifier assume that the value of a particular features is independent of the value any other feature. given the class variable. example: a fruit may be considered to be an apple if it is red, round, and about 3n of diameter.

II. LITERATURE SURVEY We propose three enhancements to the BKT model, the fully personalized P-BKT model ,the PC-BKT, which is the P-BKT model with clustering students into high, medium and low knowledge and the PC-BKT(EP), which is the PC-BKT which uses empirical probabilities to fit the parameters. The P-BKT model has higher student prediction accuracy than both the BKT and the PP-BKT as it has individual student priors for each skill. To deal with the cold Start problem, of students working on items with

Atish Shankar Ghone, IJRIT-390

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 390-394

skills that have not been encountered yet, we introduce the capability matrix and dynamic clustering based on this and show through extensive data analysis that this model improves the cold start issue.[9] In second referred paper proposed a new student model with a number of unique features for use in automatic scoring of reading skills when demonstrated by isolated words read out loud. First, they explained why this was not simply a pronunciation evaluation or verification task, then suggested some new features that should be useful in such a model—cues that expected teachers to use when judging reading ability, like pronunciation evidence and underlying information about the child or test. Then they described a hypothesized Bayesian Network structure that would account for the potential conditional dependencies among all these features and reflect the way expect teachers might conceive of a student’s cognitive state when reading. Also proposed a method for automatically refining this network by using a greedy forward-selection of the conditional dependencies.[8] Educational data mining (EDM) is a field that exploits machine-learning, statistical and data-mining algorithms over the different types of educational data[1]. Its main objective is to analyses data in order to resolve educational research issues. EDM is concerned with developing methods to explore the unique types of data in educational settings and, using these methods, to better understand students and the settings in which they learn. This data helps to understand that data extracted could be used to Find Meaningful Pattern for the students on the real time problem scenario application to be monitored at college level. Also the model can be used for the future planning of student selection criteria at college level. The objective of prediction is to estimate the unknown value of a variable that describes the student. Prediction of performance, knowledge, score, or mark is done. This value can be numerical/continuous value (regression task) or categorical/discrete value (classification task). Regression analysis finds the relationship between a dependent variable and one or more independent variables. In classification individual items are placed into groups based on quantitative information regarding one or more characteristics inherent in the items and based on a training set of previously labeled items. Data mining tools for educational research issues are prominently developed and used in many countries. In country like India, demand for educational data mining has been increased from last few years, because of increase in educational database year to year and need of discovery of knowledge from that database to take important decisions and remedial solutions for future purpose. Many commercial data mining software packages are available with various levels of sophistication and cost. One of the popular data mining tools is NAIVEBAYES. Since this is open source tool and supports many data mining algorithms, this has been used for preprocessing and classification of student data. WEKA 3.6.6[5] has been used to classify the students using decision tree and Naïve Bayesian algorithms.

III. SELECTIVE NAÏVE BAYES CLASSIFIER This section formally states the assumptions and notations and recalls the naive Bayes and selective naive Bayes approaches. Assumptions and Notation Let X = (X1; X2...XK) be the vector of the K explanatory variables and Y the class variable. Let λ1> λ2> 1λj be the J class labels of Y. Let N be the number of instances and D = (D1; D2… DN) the labeled database containing the Instances Dn = (x(n); y(n))……………………….(1) Let M = {Mm} be the set of all the potential selective naive Bayes models. Each model ‘Mm’ is described by ‘K’ parameter values Mm, where Let us denote by P (λj) the prior probabilities P(Y = λj) of the class values, and by P (Xk|λj) the conditional probability distributions & P(Xk |Y=λj) of the explanatory variables given the class values. We assume that the prior probabilities P ( λj) and the conditional probability distributions P (Xk |λj) are known, once the preprocessing is performed. In the paper, the class conditional probabilities are estimated using the MODL discretization[6] for the numeric variables and the MODL grouping method [6] for the categorical variables, where MODL stands for minimum optimized description length and refers to the principle of minimum description length (MDL) [3] as a model selection technique. More specifically, the MODL preprocessing methods exploit a maximum a posteriori (MAP) technique [3]to select the most probable model of discretization (resp. value grouping) given the input data. The choice of the prior distribution of the models is optimized for the task of data preparation, and the search algorithms are deeply optimized. Using the Bayes optimal MODL preprocessing methods to estimate the conditional probabilities has proved to be very efficient in detecting irrelevant variables [6]. In the experimental section, the P (λj) are estimated by counting and the P (Xk|λj) are computed using the contingency tables, resulting from the preprocessing of the explanatory variables. The conditional m = J/N and p = 1/J in order to avoid zero probabilities. Naive Bayes Classifier The naive Bayes classifier assigns to each instance the class value having the highest conditional probability P(λj |X)=p(Xj) p(X|λj) / p(X)…(2) Using the assumption that the explanatory variables are independent conditionally to the class variable, we get P(λj |X)=p(λj)ʌk=1 p(Xk|λj) / p(X)………(3) In classification problems, Equation (2) is sufficient to predict the most probable class given the input data, since P(X) is constant. In problems where a prediction score is needed, the class conditional probability can be estimated using P(λj |X) = p(λj)ʌk=1×p(Xk|λj) / p(Xi) ʌk=1 ×p(Xk|λj)... (4) The naive Bayes classifier is poor at predicting the true class conditional probabilities. But for the current set of data the classifier is best suited model.

IV.PROPOSED WORK

Atish Shankar Ghone, IJRIT-391

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 390-394

We design the system to predict the student result of final exam using the naïve bayes with using probability density function. Here we use the live dataset containing the attributes as marks of the students of each subject and attendance with respect to all those subjects. By applying the naïve bayes we get the probability or the result of the student as he may get pass or fail.

Fig.Proposed system 1. As shown above figure 1 the proposed system uses the training dataset containing the attributes as marks of unit tests of five subjects of the students with the attendence with the respective those five subjects and class label as pass or fail then accept the test data set as with attributes with only marks of the subject and the respective attendence.Then apply the Naïve Bayes Classifier to predict the result.Check that the probability of pass with probability of fail.Which probability is greater that will result.

Figure 2. Proposed System 2

Atish Shankar Ghone, IJRIT-392

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 390-394

As shown above figure 2 the second proposed system uses the training dataset containing the attributes as percentage marks of five semisters of the students of BE class and class label as placed or unplaced then accept the test data set as with attributes with only percentage marks of the five semisters of TE students .Then apply the Naïve Bayes Classifier to predict the result.Check that the probability of placed with probability of notplaced.Which probability is greater that will result.The same algorithm is used for the placement prediction.

V.USED NAÏVE BAYES ALGORITHM In the developed system the training dataset containing the attributes which are all numerical only. So that the special case of the naïve bayes is used.The steps are as follows. For each attribute1.First calculate the mean ( µ ) By using formula-

Where, Xi,i=1..n, the ith measurement and the n- number of measurements. 2. Calculate standard deviation ( σ ) –

3.Calculate f(attribute=value|pass) and f(attribute=value|fail) for test data using the probability density function for normal distribution –

4.Calculate the probabilities for the nominal attributes as- P(pass)=total no. of pass/n, P(fail)=total no. of fail/n; 5. If any probability for nominal att. Or any f value is getting 0 then used here Laplace Estimator – (Proba.of att. Or the f value+1)/(no. of pass or fail+preset value of µ ) 6. Calculate the final probabilitiesa. P(pass|E)=(multiplication of all found f values of att. for pass and the prob of nominal dist for pass)/P(E) b. P(fail|E)=(multiplication of all found f values of att. for fail and the prob of nominal dist for fail)/P(E) 7. Compare the final probabilities i.e. pass and fail,which is greater is the result.If pass probability is greater display result as pass otherwise fail.

VI.RESULT For this mini project for first proposed work we used the live dataset as a trained dataset having the attributes of five subject marks and attendance for those subjects. The results that obtains are the final probabilities that for pass and fail for the provided test data entered by the user. And the comparing the probabilities of pass and fail whichever is greater system displays the output. And the second proposed work we uses the training dataset is the five semester percentage marks with label paced or notplaced, and the testing data is the percentage of five semesters of TE students The results that obtains are the final probabilities that for placed and notplaced for the provided test data entered by the user. And the comparing the probabilities of placed and notplaced whichever is greater system displays the output. The accuracy of the proposed systems is 90% as by applying the 10 folds rule on the testing datasets.

Atish Shankar Ghone, IJRIT-393

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 390-394

VII.CONCLUSION We developed the system that predicts the students future results. This will help to improve the performance of the student in their academics. The system is the good application of the NB classifier.And the second system which predicts the future placements of the students.Which must important to students to improve their skills for placements.

REFERENCES Cristobal Romero, Sebastian Ventura, “Educational Data Mining: A Review of the State of the Art”, IEEE November 2010. Jiawei Han, Micheline Kamber, “Data Mining”, Second Edition, Elsevier, 2008. A. B. M. Shawkat Ali, Saleh A. Wasimi, “Data Mining: Methods and techniques” Cengage Learning, 2009. Yafei Sun, Zhishu Li, Lei Zhang ; Shuxiong Qiu, Yang Chen, “Evaluating Data Mining Tools for Authentic Emotion Classification”, Intelligent mining in Computation Technology and Automation (ICICTA), 2010 International IEEE Conference, Page(s): 228 - 232. [5] Weka manual, Free available, site: www.cs.waikato.ac.nz/ml/weka/index_documentation.html. [6] Marc Boull´e, “MODL: A Bayes optimal discretization method for continuous attributes”, Springer Science & Business Media, LLC 2006. [7] Abhilasha Dangi, Sumit Srivastav; "Educational data Classification using Selective Naïve Bayes for Quota categorization(Base paper)" 2014 IEEE InternationalConference on MOOC, Innovationand Technologyin Education (MITE) [8] Joseph Tepperman, Member, IEEE, Sungbok Lee, Member, IEEE, Shrikanth (Shri) Narayanan, Senior Member, IEEE, and Abeer Alwan, Fellow, IEEE(IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011)- A Generative Student Model for Scoring Word Reading Skills [9] Prema Nedungadi Amrita CREATE Amrita Vishwa Vidyapeetham M.S.Remya Dept. of Computer Science and Engineering(978-1-4799-3922-0/14/$31.00 ©2014 IEEE)- Predicting Students’ Performance on Intelligent Tutoring System - Personalized Clustered BKT (PC-BKT) Model [1] [2] [3] [4]

Atish Shankar Ghone, IJRIT-394

Student Result and Placement Prediction Using ...

collage all of them are from different background to identify the student performance manually and which student provide which type of treatment manually it is very time consuming process. there are many attempts to address this challenge, automatic data classification studies are gaining more interest in data mining ...

104KB Sizes 1 Downloads 185 Views

Recommend Documents

premier league game result prediction - GitHub
for the degree of B.Sc. in Computer Science and Information Technology has been well studied. In our opinion it is ..... 3.1.1 Data collection and normalization .

Optimal Placement Optimal Placement of BTS Using ABC ... - IJRIT
Wireless Communication, since the beginning of this century has observed enormous ... phone users, thus cellular telephony become the most important form of ...

Experimental Results Prediction Using Video Prediction ...
RoI Euclidean Distance. Video Information. Trajectory History. Video Combined ... Training. Feature Vector. Logistic. Regression. Label. Query Feature Vector.

Optimal Placement Optimal Placement of BTS Using ABC ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April .... the control functions and physical links between MSC and BTS.

Fair and Efficient Student Placement with Couples
Definition 3.2 Fairness for Single Students (Balinski and Sönmez, 1999) ...... linear order of the remaining position type assignments that complies with the ...

Student-t Based Robust Spatio-temporal Prediction
[6] M. Gandhi and L. Mili, “Robust kalman filter based on a gen- eralized maximum-likelihood-type estimator,” IEEE Transac- tions on Signal Processing, vol.

Anesthesia Prediction Using Fuzzy Logic - IJRIT
Thus a system proposed based on fuzzy controller to administer a proper dose of ... guide in developing new anesthesia control systems for patients based on ..... International conference on “control, automation, communication and energy ...

Knowledge Extraction and Outcome Prediction using Medical Notes
to perform analysis on patient data. By training a number of statistical machine learning classifiers over the unstructured text found in admission notes and ...

Seismic pore-pressure prediction using reflection tomography and 4-C ...
tomography and 4-C seismic data for pore pressure predic- tion. Reflection .... PS images obtained using an isotropic prestack depth migration for a 4-C line in ...

STUDENT GRADE PLACEMENT IN GRADES 1 TO 9
Apr 12, 2016 - Background ... goal of the district that all students progress in their educational programs to the maximum of ... 6.2.5 Assistive technology;.

Student-t Based Robust Spatio-temporal Prediction - IEEE Computer ...
T. Charles Clancy. ‡ and Yao-Jan Wu. §. ∗. Department of Computer Science, Virginia Tech, VA 22043. †. Google Inc. New York, NY 10011. ‡. Bradley Electrical and Computer Engineering, Virginia Tech, VA 22203. §. Department of Civil Engineeri