Most Read Research Articles


Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79
Call for Paper - May 2015 Edition
IJCA solicits original research papers for the May 2015 Edition. Last date of manuscript submission is April 20, 2015. Read More

An Improved Expectation Maximization based Semi-Supervised Email Classification using Naïve Bayes and K- Nearest Neighbor

Print
PDF
International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 101 - Number 6
Year of Publication: 2014
Authors:
Hiral Padhiyar
Purvi Rekh
10.5120/17689-8652

Hiral Padhiyar and Purvi Rekh. Article: An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor. International Journal of Computer Applications 101(6):7-11, September 2014. Full text available. BibTeX

@article{key:article,
	author = {Hiral Padhiyar and Purvi Rekh},
	title = {Article: An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {101},
	number = {6},
	pages = {7-11},
	month = {September},
	note = {Full text available}
}

Abstract

With the development of Internet and the emergence of a large number of text resources, the automatic text classification has become a research hotspot. Emails is one of the fastest and cheapest communication ways that today it has became the part of communication means of millions of people. It has become a part of everyday life for millions of people, changing the way we work and collaborate. The large percentage of the total traffic over the internet is the email. Email data is also growing rapidly, creating needs for automated analysis. In many security informatics applications it is important to detect deceptive communication in email. In the iterative process in the standard EM-based semi-supervised learning, there are two steps: firstly, use the current classifier constructed in the previous iteration to predict the labels of all unlabeled samples; then, reconstruct a new classifier based on the new training samples set. In this work, an EM based Semi-Supervised Learning algorithm using Naïve Bayesian is proposed in which unlabeled documents are divided into two parts, reliable and misclassified. An Ensemble technique is used to add only reliable unlabeled documents to the training set. Also preprocessing of unlabelled documents is performed before learning process of Naïve Bayesian and K-NN classifiers during first step of EM to reduce time of preprocessing, so with this proposed work accuracy of classifier will be increased and execution time will be decreased.

References

  • S. Appavu and R. Rajaram, "Learning to classifying threaten email", 2008 IEEE.
  • Lei SHI, Qiang WANG "Spam e-mail classification using Decesion tree Ensemble", 2012.
  • Xinghua Fan and Houfeng Ma, "An improved EM-based Semi-supervised learning method", 2009 IEEE.
  • Xiaojin Zhu, "Semi-Supervised Learning Literature Survey", Computer Sciences TR 1530, University of Wisconsin – Madison, 2005.
  • Jun-ming Xu, Giorgio Fumera, Fabio Roli and Zhi-Hua Zhou "Training SpamAssassin with Active Semi-supervised Learning", CEAS 2009.
  • Haibin Mei and Minghua zhang, "A semi supervised IDS alert classification model based on alert context", ICCSEE 2013.
  • Ye Tian, Gary M. Weiss and Qiang Ma, "A semi-supervised approach for web spam detection using combinatorial feature-fusion", 2007.
  • Vinod Patidar, Divakar Singh, "A Survey on Machine Learning Methods in Spam Filtering", International Journal of Advanced Research in Computer Science and Software Engineering, Page(s): 964-972, October 2013
  • Jalili, S. , Bitarafan, "Increase the efficiency of text categorization based on the improved feature selection method", 2006.
  • MohammadReza FeiziDerakhshi and Nayer TalebiBeyrami, "The Feature Selection and Dimensionality Reduction Methods for Email Classification", Journal of Basic and Applied Scientific Research , 633-636, 2013.
  • Xiaojin Zhu, "Semi-Supervised Learning Literature Survey", Computer Sciences TR 1530, University of Wisconsin – Madison, 2005.