An Improved Technique for Web Page Classification in Respect of Domain Specific Search

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Home Archives Volume 102 Number 4An Improved Technique for Web Page Classification in Respect of Domain Specific Search

Call for Paper - May 2015 Edition

IJCA solicits original research papers for the May 2015 Edition. Last date of manuscript submission is April 20, 2015. Read More

An Improved Technique for Web Page Classification in Respect of Domain Specific Search

International Journal of Computer Applications

Volume 102 - Number 4

Year of Publication: 2014

Authors:

Vivek Chandra

Nidhi Saxena

10.5120/17801-8615

Vivek Chandra and Nidhi Saxena. Article: An Improved Technique for Web Page Classification in Respect of Domain Specific Search. International Journal of Computer Applications 102(4):7-10, September 2014. Full text available. BibTeX

@article{key:article,
	author = {Vivek Chandra and Nidhi Saxena},
	title = {Article: An Improved Technique for Web Page Classification in Respect of Domain Specific Search},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {102},
	number = {4},
	pages = {7-10},
	month = {September},
	note = {Full text available}
}

Abstract

A domain specific crawler, as diverse from a general web search engine, focuses on a specific segment of web content. They are also called vertical or topical search engines. Common vertical search engines are meant for shopping, automotive industry, legal information, medical information, scholarly literature, and travel. Examples of vertical search engines are Trulia. com, Mocavo. com and Yelp. In contrast to genera lpurpose Web search engines, which attempt to index large portions of the World Wide Web using a web crawler, vertical search engines typically use a domain specific crawler that attempts to index only Web pages that are relevant to a pre-defined topic or set of topics. Vertical search offers several potential benefits over general search such as greater precision due to their limited scope, leverage domain knowledge including taxonomies and ontology and support of specific unique user tasks. This paper aims at analyzing the machine learning Techniques namely ANN, SVM and Hi-SVM being used for Web Page Classification and suggesting suitable improvements. Here a crawling framework has been designed and developed that allows flexible addition of new classifiers. This crawler has been used for classification of web content for few domains. The crawlers themselves are implemented as multithreaded objects that run concurrently. The results show that Hi-SVM is a better choice for guiding a topical crawler when compared to Support Vector Machine and Neural Network. The comparative analysis of the three classifier techniques namely ANN, SVM and Hi-SVM showed that the performance of Hi-SVM is most efficient.

References

De Bra, P. , Houben, G. , Kornatzky, Y. , and Post, R. "Information Retrieval in Distributed Hypertexts". Proceedings of RIAO'94, Intelligent Multimedia, Information Retrieval Systems and Management, pages 481–491,New York, 1994.
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks, 31(11-16):1623–1640, 1999.
Menczer, F. , Pant, G. and Srinivasan, P. "Topical Web Crawlers: Evaluating Adaptive Algorithms". ACM Transactions on Internet Technology (TOIT). 4(4):378–419, Nov. 2004.
F. Menczer, G. Pant, and P. Srinivasan. Topical Web crawlers: evaluating adaptive algorithms. ACM Transactions on Internet Technology, 4(4):378–419, Nov. 2004.
S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In WWW2002, Hawaii, May 2002.
Data Mining Algorithms In R-Classification-penalizedSVM - Wikibooks, open books for an open world. htm.
Artificial Neural Networks Neural Network Basics - Wikibooks, open books for an open world. htm.
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999.
M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
Robert Krovetz and W. Bruce Croft. Lexical ambiguity and information retrieval. Information Systems, 10(2):115–141, 1992.
Yilmazel, O. Finneran, C. M. , Liddy E. D. Metaextract: an NLP system to automatically assign metadata. In Proc. JCDL. 2004.

Index Terms

Computer Science

Web Services

Keywords

ANN SVM HiSVM VSM ROC REC POS WSD SOE.

Most Read Research Articles

An Improved Technique for Web Page Classification in Respect of Domain Specific Search

Abstract

References

Index Terms

Keywords