Project Summary
Project Proposal
Test Case 1
Test Case 2

Test Case - Mapping Between OmniClass and IfcXML

Business / Engineering Problem
In most industries practitioners use more than one terminology classification class or data model structure. For instance, in the AEC industry, there are a few ontologies to describe the semantics of building models, such as the Industry Foundation Classes (IFC), the CIMsteel Integration Standards (CIS/2), and the OmniClass Construction Classification System. For various model rebuilding and data exchange purposes, comparison and mapping between heterogeneous ontologies in the same industry are often inevitable.
The ontology comparison and mapping problem is commonly performed manually by domain experts, who are familiar with one or more industry-specific taxonomies. It could be time-consuming, unscalable and inefficient especially if they start from scratch. Automated comparison and mapping based on the ontology structures and the linguistic similarity between concepts are therefore growing in popularity in recent years. This research tries to propose a different approach to achieve ontology mapping by the use of document corpus as a medium for semantic similarity comparison.

Ontology Mapping
Test Case
In the AEC industry nowadays, the urge for Building Information Model (BIM) leads to the establishment of various description and classification standards to facilitate data exchange. OmniClass and IfcXML by far are two of the most commonly used data models for buildings and constructions. OmniClass, consisting of 15 tables, categorizes elements and concepts in the AEC industry and provides a rich pool of vocabularies practitioners can use in legal documents. It contains a set of object data elements that represent the parts of buildings or processes, and the relevant information about those parts. IfcXML, specialized in modeling CAD models and work process, is frequently used by practitioners to build information-rich product and process models and to act as a data format for interoperability among different software. It is a single XML schema file comprised of concept terms which are highly hierarchically structured and cross-linked. This test case focuses on the mapping between OmniClass and IfcXML.
OmniClass Construction Classification System
Main Idea and Implementation
With the intuition that related terms should appear in the same paragraphs or sections , concept comparison and matching by co-occurrence is proposed to map different sets of terms in heterogeneous ontologies. The number of co-occurrence of two concepts in the corpus reveals the closeness of the two topics and acts as a means to evaluate the relatedness between them.
Mapping Methodology Preprocessing of the two ontologies, OmniClass and IfcXML, is necessary at the beginning stage. The entity concept terms of both ontologies are extracted. Unique ID and suffix of the concepts are removed and duplicated concepts are discarded. The entire preprocessed concept terms of OmniClass and IfcXML are latched to each section of the International Building Codes (IBC) XML files. The concept tags, <OMNICLASS> and <IFCXML>, are inserted into the corresponding sections which match the concepts in the stemmed form.
In the example showed on the right, the concepts "Concrete" and "Steel Decking" from OmniClass and the concepts "IfcSlab" and "steel" from IfcXML are all matched to the same section 2209.2 of IBC. It implies that they may be potentially related in some aspects. Further confirmation of their relatedness can be deduced by considering their co-occurrence in other sections.
Relatedness Analysis
The number of co-occurred sections of the two concepts and the number of times the two concepts are matched to each of these sections reveal the semantic similarity between the two concepts. Three relatedness analysis measures have been used for concept comparison between OmniClass and IfcXML. They are cosine similarity measure, Jaccard similarity coefficient, and market basket model. Cosine similarity is a measure of similarity between two vectors of n dimensions by finding the angle between them. Jaccard similarity coefficient is a statistical measure, using set theory, of the extent of overlapping of two vectors in n dimensions compared to union. Market basket model is a probabilistic data-mining technique to find item-item correlation.
Related concepts yet rarely co-occurred Although in most circumstances related concepts can be captured by treating each section as an independent dimension in concept co-occurrence comparison, some related concepts rarely co-occur in the same sections. Examples are Is-A-related concepts (e.g. "concrete" and "building materials") and concepts that are within the same scope (e.g. "steel" and "concrete"). Corpus hierarchical structure is therefore considered in order to capture those related but not co-occurred concepts.
Besides identical concepts such as "curtain walls" from OmniClass and "IfcCurtainWall" from IfcXML, related concepts that cannot be matched by conventional term matching techniques, for instance, "roof decking" from OmniClass and "IfcSlab" from IfcXML, are captured via this co-occurrence analysis. In the test case, market basket model outperforms other two relatedness analysis approaches in terms of root mean square error (RMSE) as well as F-measure, a combination of precision and recall rate. In fact, the market basket model shows the highest recall rate and a moderately high precision.
Root Mean Squared Error (RMSE)
Evaluation results of the three measures using RMSE
Evaluation results of the three measures using F-Measure
This research proposes a new approach to compare and map heterogeneous ontologies, so as to achieve interoperability between data models. It enables information exchange and sharing among project stakeholders. Once the mapping between ontologies is completed, updating and consistency checking of data can also be allowed although the data sources are using different data models.
Should you have any comments or suggestions, please contact Jack Cheng at cpcheng@stanford.edu.
All rights reserved. Engineering Informatics Group, Stanford University.