With its unique classification scheme, Content Search Alpha lets you find parts regardless of how they were categorized when added to the system. How may you ask? Read on...
Mike Haley is the Software Architect for Content Search Alpha on Autodesk Labs. Mike shared his insights behind the taxonomy that organizes content search.
A taxonomy is just a fancy word for a classification scheme. The Content Search solution uses a unique taxonomy developed as part of the project. Essentially there are two problems that the content search taxonomy is solving:
When manufacturers make and sell parts, and distributors/aggregators market parts, they can use a variety of taxonomies for categorizing and describing the parts. Some of these may be industry standards whereas others may be proprietary. One thing you can be sure of though … almost everyone will have a different view on the best taxonomy to use. This is a lot more severe in the manufacturing world with a lot more competing standards.
When users (e.g. architects, engineers, etc.) look for parts, they are often trained to think in terms of a taxonomy for finding a part of the right category. Once again this taxonomy may be an industry standard or it could be a customer company standard. In extreme cases, it could even be his/her individual method of organization.
The outcome of all this is that a user may be searching for something like casement windows using taxonomy A in a traditional system while many of the really good windows were expressed in taxonomy B. Although semantically speaking the user wants all casement windows, he/she is probably only going to get those that happen to have been originally expressed in terms of taxonomy he is using to search for them. Over time this problem becomes worse for systems like the Content Search Alpha because as it ingests more data from different disciplines and supports users from the same, the number of taxonomies explodes. For our Content Search Alpha, we needed a way of rationalizing this and making our system adaptable to this constantly evolving situation.
To solve this Content Search Alpha has an internal taxonomy that we call the Canonical Taxonomy (or CanTax for short). No user or data publisher ever sees this taxonomy. It is purely internal but is actually the taxonomy used to ultimately classify every part Content Search Alpha deals with. In addition to CanTax, Content Search Alpha also supports any number of other taxonomies (typically industry standard or widely adopted proprietary schemes) that we call Mapping Taxonomies. In the current version of Content Search Alpha, there are three Mapping Taxonomies: Master Format 2004, UniFormat II, and OmniClass 1.0. Additional Mapping Taxonomies can be added as the technology preview matures.
The database that stores all of these taxonomies also contains mappings from the various Mapping Taxonomies to the Canonical Taxonomy. For example, the OmniClass category “23.30.20.17.21.14 - Casement Windows” happens to map to “08.14 - Windows.Casement Windows” in CanTax.) Note that this mapping is only one way.
When Content Search Alpha is indexing new part data from a source, it first determines what taxonomy is being used to categorize the various parts. If it is one of the known taxonomies (i.e., a Mapping Taxonomy) then each part that is processed is re-categorized using the equivalent Canonical Taxonomy category. This is what is then indexed.
On the flip-side, when a user is browsing in Content Search Alpha using a particular taxonomy (e.g., Master Format 2004) and a particular category is selected, the server converts that category into the equivalent Canonical Taxonomy category and runs the search against the index using the converted category.
The end result of this is that 3 different publishers of data could provide casement window parts using three distinct taxonomies and a user could find all three sets of casement windows using any one of those taxonomies or even another 4th one. This allows the user to operate in a familiar context while not forgoing the breadth of results.
Exploring new ways to help customers find the right designs is alive in the lab.