PATENTSCOPE is a free patent search system offered by the World Intellectual Property Organization (WIPO). Users can search in 52 million patent documents covering 2.9 million published international patent applications (PCT) and many patent collections from national IP authorities.
The system is being constantly enhanced, for example, by the addition of new patent collections, new functionality or additional languages in the user interface.
The last big step forward in this evolution was the addition of chemical search capabilities, accomplished using InfoChem’s text- and image-mining technologies. An automatic workflow was developed and put into operation allowing real-time, multi-modal chemical text annotation and image recognition.
This talk addresses the technical challenges encountered such as OCR quality, scalability, performance and parallelization.