Growing parallel paths for entity-page discovery

Published in WWW, 2011

Citation: Weninger, T., Fumarola, F., Lin, C. X., Barber, R., Han, J., & Malerba, D. (2011, March). Growing parallel paths for entity-page discovery. In Proceedings of the 20th international conference companion on World wide web (pp. 145-146). ACM.

In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains.

Download paper here