K-Means Clustering Algorithm based on Entity Resolution

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Home Archives Volume 108 Number 6K-Means Clustering Algorithm based on Entity Resolution

Call for Paper - May 2015 Edition

IJCA solicits original research papers for the May 2015 Edition. Last date of manuscript submission is April 20, 2015. Read More

K-Means Clustering Algorithm based on Entity Resolution

International Journal of Computer Applications

Volume 108 - Number 6

Year of Publication: 2014

Authors:

B. Vinay Kumar

B. Raghu Ram

B. Hanmanthu

10.5120/18919-0254

Vinay B Kumar, Raghu B Ram and B Hanmanthu. Article: K-Means Clustering Algorithm based on Entity Resolution. International Journal of Computer Applications 108(6):41-44, December 2014. Full text available. BibTeX

@article{key:article,
	author = {B. Vinay Kumar and B. Raghu Ram and B. Hanmanthu},
	title = {Article: K-Means Clustering Algorithm based on Entity Resolution},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {108},
	number = {6},
	pages = {41-44},
	month = {December},
	note = {Full text available}
}

Abstract

Entity resolution is the problem of recognizing which entry in database refers to same cluster. in this we have to run the ER in order to reduce the running time and to obtain good results. This paper investigates how we can reduce the running of ER with minimum amount of work using k-means clustering algorithm. In this, clustering can be done according to the matching of entries. We introduce a concept of technique called as k-means clustering to maximize the matching of entries identified using a limited amount of work. We illustrate the potential gains of this entity resolution approach using k-means.

References

A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge Data Eng. , vol. 19, no. 1, pp. 1-16, Jan. 2007.
A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999
H. B. Newcombe and J. M. Kennedy, "Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information," Comm. ACM, vol. 5, no. 11 pp. 563-566, 1962.
M. A. Herna´ndez and S. J. Stolfo, "The Merge/Purge Problem for Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127-138, 1995.
A. K. McCallum, K. Nigam, and L. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," Proc. ACM Sixth SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 169-178, 2000.
Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. 25th Int'l Conf. Very Large Databases (VLDB), pp. 518-529, 1999.
X. Dong, A. Y. Halevy, and J. Madhavan, "Reference Reconciliation in Complex Information Spaces," Proc. ACM SIGMOD Int'lConf. Management of Data, pp. 85-96, 2005.
M. Weis and F. Naumann, "Detecting Duplicates in ComplexXML Data," Proc. 22nd Int'l Conf. Data Eng. (ICDE),p. 109. 2006.

Index Terms

Computer Science

Algorithms

Keywords

Data cleaning Entity resolution-means Clustering Algorithm

Most Read Research Articles

K-Means Clustering Algorithm based on Entity Resolution

Abstract

References

Index Terms

Keywords