S Poonkuzhali, P Sudhakar and K Sarukesi. Article: Signed-With-Weight Technique for Mining Web Content Outliers. IJCA Special Issue on International Conference on Communication, Computing and Information Technology ICCCMIT(2):40-45, February 2013. Full text available. BibTeX
@article{key:article, author = {S. Poonkuzhali and P. Sudhakar and K. Sarukesi}, title = {Article: Signed-With-Weight Technique for Mining Web Content Outliers}, journal = {IJCA Special Issue on International Conference on Communication, Computing and Information Technology}, year = {2013}, volume = {ICCCMIT}, number = {2}, pages = {40-45}, month = {February}, note = {Full text available} }
Web outlier mining is dedicated for finding web pages which differ significantly from the rest of the web document taken from the same category. Most of the existing algorithms for web content outlier mining is developed for structured documents, whereas WWW contains mostly unstructured and semi structured documents. Moreover, the false positive rate in the existing algorithms for mining web content outlier is more than 30%. Therefore, there is need to develop a technique to mine web outliers from unstructured and semi structured document types with less false positive rate. This paper, concentrates on mining web content outliers which extracts the dissimilar web document taken from the group of documents of same domain. The proposed work implement a novel mathematical approach based on signed-with-weight technique for mining web content outliers which retrieves top n outlier web documents from both structured and unstructured web documents. The proven results show the performance measure of this approach in terms of precision and recall is more than 90%. Also, the false positive rate of this algorithm is less than 15%.