Most Read Research Articles


Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79

Warning: Creating default object from empty value in /var/www/html/sandbox.ijcaonline.org/public_html/modules/mod_mostread/helper.php on line 79
Call for Paper - May 2015 Edition
IJCA solicits original research papers for the May 2015 Edition. Last date of manuscript submission is April 20, 2015. Read More

K-Means Clustering Algorithm based on Entity Resolution

Print
PDF
International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 108 - Number 6
Year of Publication: 2014
Authors:
B. Vinay Kumar
B. Raghu Ram
B. Hanmanthu
10.5120/18919-0254

Vinay B Kumar, Raghu B Ram and B Hanmanthu. Article: K-Means Clustering Algorithm based on Entity Resolution. International Journal of Computer Applications 108(6):41-44, December 2014. Full text available. BibTeX

@article{key:article,
	author = {B. Vinay Kumar and B. Raghu Ram and B. Hanmanthu},
	title = {Article: K-Means Clustering Algorithm based on Entity Resolution},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {108},
	number = {6},
	pages = {41-44},
	month = {December},
	note = {Full text available}
}

Abstract

Entity resolution is the problem of recognizing which entry in database refers to same cluster. in this we have to run the ER in order to reduce the running time and to obtain good results. This paper investigates how we can reduce the running of ER with minimum amount of work using k-means clustering algorithm. In this, clustering can be done according to the matching of entries. We introduce a concept of technique called as k-means clustering to maximize the matching of entries identified using a limited amount of work. We illustrate the potential gains of this entity resolution approach using k-means.

References

  • A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge Data Eng. , vol. 19, no. 1, pp. 1-16, Jan. 2007.
  • A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999
  • H. B. Newcombe and J. M. Kennedy, "Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information," Comm. ACM, vol. 5, no. 11 pp. 563-566, 1962.
  • M. A. Herna´ndez and S. J. Stolfo, "The Merge/Purge Problem for Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127-138, 1995.
  • A. K. McCallum, K. Nigam, and L. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," Proc. ACM Sixth SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 169-178, 2000.
  • Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. 25th Int'l Conf. Very Large Databases (VLDB), pp. 518-529, 1999.
  • X. Dong, A. Y. Halevy, and J. Madhavan, "Reference Reconciliation in Complex Information Spaces," Proc. ACM SIGMOD Int'lConf. Management of Data, pp. 85-96, 2005.
  • M. Weis and F. Naumann, "Detecting Duplicates in ComplexXML Data," Proc. 22nd Int'l Conf. Data Eng. (ICDE),p. 109. 2006.