Natural Language Processing and Information Retrieval from Unstructured Documents

Natural Language Processing and Information Retrieval from Unstructured Documents

A Context-Based Free Text interpreter (with Kurfess, F.)

Natural language is a fundamental aspect of human behavior. In the area of NLP and information retrieval, one most challenging topic is the study of context where meaning of words/sentences convey. One of my research projects, a context-based free text interpreter (CFTI), a computer-based natural language understanding (NLU) system that can handle text as generated and used by humans, within a given context. It takes advantage of tracking the contextual meaning of words and phrases during (and after) the development of an ontology for that context, and subsequently uses this information as a knowledge base for interpretation of free text sentences. Two existing language tools, Link Grammar and WordNet, are examined and incorporated into the system.

Read the full paper here...

Facing the Challenges of Managing Unstructured Documents (with Cheng, C.)

Over the last ten years, the increased availability of document in digital forms had contributed to the sheer volume of knowledge and information available to computer users. The World Wide Web has become the biggest digital library available, with more than 1 billion unique indexable web pages [1]. Indeed, it is increasingly difficult to retrieve valuable information from these documents due to the dynamic, fast growth rate and unstructured nature. More importantly, these unstructured documents can only be useful if information can be retrieve effectively and efficiently. This paper is to present the results of a literature survey on the current state-of-the-arts researches and commercial applications.

Read the full paper here...

Ontology-based Semantic Classification of Unstructured Documents (with Cheng, C. and Kurfess, F.)

As more and more knowledge and information becomes available through computers, a critical capability of systems supporting knowledge management is the classification of documents into categories that are meaningful to the user. In a step beyond the use of keywords, we developed a system that analyzes the sentences contained in unstructured or semi-structured documents, and utilizes an ontology reflecting the domain knowledge for a semantic classification of the documents. An experimental system has been implemented for the analysis of small documents in combination with a limited ontology; an extension to larger sets of documents and extended ontologies, together with an application to practical tasks, is the focus of ongoing work.

Read the full paper here...

Providing Context for Free Text Interpretation (with Assal, H.)

The effort to provide general-purpose natural language processing (NLP) is difficult because of the ambiguity of natural language sentences that can only be cleared if the context is well defined. For the purposes of applications in a specific domain, this problem can be alleviated by offering a model for the application domain as context for interpreting natural language statements. This paper presents architecture and an implementation for this concept and a discussion for future extensions.

Read the full paper here...