Information Extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted domains.
Posts about Information Extraction
  • Direct Answers: Extracting Text from Pages Citations

    … as really interesting, please let me know in the comments. Thanks, and I hope you find something really interesting in these. The key modules involved in TextRunner: from “Open Information Extraction from the Web.” [1] M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. (pdf) In Proceedings…

    Bill Slawski/ SEO by the Sea- 22 readers -
  • New Panda Update; New Panda Patent Application

    … Information extraction Others I’ve linked to the patent filing if you want to go through it, and discuss different aspects of it. While it’s possible that it discusses a separate and different content scoring algorithm other than an update to Panda, the timing is interesting, and it’s worth thinking about. …

    Bill Slawski/ SEO by the Seain SEO Google- 16 readers -
  • Google First Semantic Search Invention was Patented in 1999

    … experience, and filed again as a non-provisional patent: Information extraction from a database Invented by Sergey Brin Assigned to Google US Patent 6,678,681 Granted January 13, 2004 Filed: March 9, 2000 Abstract Techniques for extracting information from a database are provided. A database such as the Web is searched for occurrences of tuples…

    Bill Slawski/ SEO by the Seain Google- 10 readers -