Web entity extraction based on entity attribute classification

Chuan-Xi Li; Peng Chen; Ru-Jing Wang; Ya-Ru Su

doi:10.1117/12.920237

12 January 2012 Web entity extraction based on entity attribute classification

Chuan-Xi Li, Peng Chen, Ru-Jing Wang, Ya-Ru Su

Proceedings Volume 8350, Fourth International Conference on Machine Vision (ICMV 2011): Computer Vision and Image Analysis; Pattern Recognition and Basic Technologies; 835014 (2012) https://doi.org/10.1117/12.920237
Event: Fourth International Conference on Machine Vision (ICMV 11), 2011, Singapore, Singapore

Abstract

The large amount of entity data are continuously published on web pages. Extracting these entities automatically for further application is very significant. Rule-based entity extraction method yields promising result, however, it is labor-intensive and hard to be scalable. The paper proposes a web entity extraction method based on entity attribute classification, which can avoid manual annotation of samples. First, web pages are segmented into different blocks by algorithm Vision-based Page Segmentation (VIPS), and a binary classifier LibSVM is trained to retrieve the candidate blocks which contain the entity contents. Second, the candidate blocks are partitioned into candidate items, and the classifiers using LibSVM are performed for the attributes annotation of the items and then the annotation results are aggregated into an entity. Results show that the proposed method performs well to extract agricultural supply and demand entities from web pages.

Citation Download Citation

Chuan-Xi Li, Peng Chen, Ru-Jing Wang, and Ya-Ru Su "Web entity extraction based on entity attribute classification", Proc. SPIE 8350, Fourth International Conference on Machine Vision (ICMV 2011): Computer Vision and Image Analysis; Pattern Recognition and Basic Technologies, 835014 (12 January 2012); https://doi.org/10.1117/12.920237

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available