Natural language processing algorithms and information extraction methods have proven to be valuable tools supporting humans in structuring, aggregating and managing large amounts of information, available as text, in several domains. Patent claims, although subject to a number of rigid constraints and therefore pressed into foreseeable structures, are written in a very domain-specific and almost artificial language common information extraction and retrieval methods tend to show poor performance on. This work presents a rule-based approach for decomposing patent claims into smaller parts for providing a basis for further analysis. As claims are drafted according to very precise syntactic and semantic rules, they contain a high number of reoccurring grammatical patterns. A set of rules based on linguistic analysis is used to identify and extract these patterns. The extracted claim parts are organized in a tree structure in order to retain the information on how they are related to each other. An algorithm is proposed for automatically reorganizing and then visualizing this tree structure for improving readability of claims. The evaluation of the method shows that rule-based patent claim decomposition is feasible and provides promising results in terms of reduction of length and complexity of patent claims.
It shows that the decomposition method can be used to ease the application and raise the performance of existing information extraction tools.