1 / 26

Quality of Classification

Quality of Classification. Recall = = 1. # retrieved relevant documents. # existing relevant documents. What to achieve ?.  Optimum: All documents pertaining to specific technical area (concept) are found by classification search.

dpulido
Download Presentation

Quality of Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quality of Classification

  2. Recall = = 1 # retrieved relevant documents # existing relevant documents What to achieve ? •  Optimum: • All documents pertaining to specific technical area (concept) are found by classification search For concepts defined in IPC: documents have all appropriate symbols Priority 1:  < > Efficiency: documents have no inappropriate symbols Priority 2:

  3. Phenomenology of quality issues •  document is unclassified •  has wrong / inappropriate classification •  has outdated / invalid classification •  non-exhaustive/ incomplete classification • > appropriate symbols are missing • > given symbols are not specific enough •  varying classifications of family members •  excessive classification

  4. Different aspects •  individual document / publication • - classification by publishing IPO • - and by other IPOs, e.g. EPO > ECLA • DPMA > "ICP" • JPO,… ? • > examiners create their own search files •  different publication levels: • - unexamined (unsearched) applications • - granted patents •  families: in MCD reclassification at family level •  data in different databases

  5. Unclassified documents Published before 1.1.2006: many documents in MCD still unclassified / not reclassified: 92% of all documents in MCD* 87% of all documents of EPO members Published after 1.1.2006: 97% of all documents in MCD 91% of all WO each week 6 - 8% of WO publications are not classified at all *cf IPC/CE/40/4

  6. Unclassified WO documents

  7. Unclassified WO documents • Publication week 50 (13.12.2007): 260 of 3272 (7.9%) • ISA • EP 218 (84%) • KR 27 (10%) • AU 5 • US 5 • RU 2 • SE 2 • CA 1 • Receiving Office • US 177 • IB 31 • EP 26 • GB 9 • KR 3 • DE 2 • FR 2 • IL 2 • : Lesson : There are still many documents without any valid classification > Top priority: All documents should have at least one valid classification

  8. courtesy of M. Meier (Audi) A61N 1/00 Electrotherapy; Circuits therefor Wrong classification

  9. courtesy of M. Meier (Audi) B60K Arrangement or mounting of propulsion units or of transmissions in vehicles Wrong classification Lesson : Completely wrong classifications do occur

  10. Wrong classification • Example: WO2007126503 • ISR: G01L 19/02 • Espacenet: G10L 19/02 Lesson : Typos may occur; flaws of concordance tables Wrong classifications:  difficult to investigate because difficult to find  feedback by users needed

  11. Outdated / invalid classification • Business methods: G06F 17/60  G06Q [2006.01] • in Espacenet: 0 WO docs with a:G06F17/60 • in Patentscope: 1506 WO docs with G06F17/60 • -e.g. WO2007004271 reclassified in Espacenet only to ECLA Lesson : Classification data may be different in different databases in Espacenet: many non-PCT min are not reclassified - e.g. CZ, UY, NZ, AR not all PCT min is reclassified - e.g. only 678 of 14543 KR docs reclassified in ECLA/IPC Lesson : Reclassification following revision is still incomplete

  12. Outdated / invalid classification • Traditional medicine: A61K 35/78  A61K 36/.. [2006.01] • in Espacenet: 10413 docs still have 35/78 as ECLA • only 7412 thereof have 36/.. Lesson : Reclassification to valid IPC incomplete Further example WO1998039019 in Espacenet: A61K 36/02 as IPC-AL A61K 35/80 as ECLA Patentscope: A61K 35/80 as IPC Lesson : Classification data may be different in different databases

  13. Varying classifications in family • Example: Aircraft cargo loading logistics system • US 2005246132 A1 (3.11.2005) • US 7100827 B2 (5.9.2006) • DE 102005019194 A1 (24.11.2005) • FR 2871269 A1 (9.12.2005) Lesson : Classification of granted patents may be very different Lesson : Assessment of main classification varies

  14. Varying classifications in family Lesson : classification data from subsequent publications may not be in MCD Lesson : some reclassification data may not be in MCD;exist as ECLA only

  15. Varying classifications of single document • Example: WO2007126503 • ECLA: G01L 19/00B (roll up to IPC: G01L 19/00) • IPC: G01L 19/02 Lesson : different views of different classifiers • US7258017 B1 (granted family member) • IPC: G01L 19/04 Lesson : classification of granted patents may be different

  16. by courtesy of H. Wongel Current problems in classification (I): IPC consistency • KR20070005367 A (Prio.: KR20050060661) • Multifocal lens and manufacture method thereof • IPC (AL):G02B3/10 • JP2007017937 A (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/13; G02B3/14; G02F1/1334 • US2007008599 A (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02B5/32 • CN1892258 A (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02B3/10 • EP1742100 A1 (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/1334 Lesson : classifiers may have different views of subject matter to be classified or interpret IPC groups differently

  17. Non-exhaustive classification • Example: Secondary scheme A01P [2006.01] • "Biocidal, pest repellant ,… activity of chemical compounds" not in ECLA ! • Espacenet: Lesson : incompatibility of IPC and ECLA may cause non-exhaustive classification

  18. Non-exhaustive classification • Example: A61K 36/.. • ECLA: 22440 documents • IPC: only 17847 thereof have a:A61K 36/.. Lesson : relevant classifications may not be given / available as IPC • Example: EP1881839 • ECLA: A61K 36/487 • IPC: A61K 36/00 • Example: C12Q 1/68 • Espacenet: > 100.000 docs • ECLA: > 40 subgroups • IPC: 0 subgroups Lesson : classifications could be more specific

  19. Causes/sources for deficiencies  "wrong" or varying intellectual classification: • rules too complicated • drawbacks of classification scheme (too much overlap) • interpretation of subject matter • differing national practise • lack of expertise, diligence, time pressure  granted claims may differ  incompatibility ECLA - IPC; USPC concordance tables  lack or delay of reclassification: • insufficient resources for intellectual reclassification  data exchange / management problems  data input (typos)

  20. Options for improvement • on IPO level: - allocate resources - adapt / harmonize classification practise / training - develop classification assistance tools • on user level: - knowing deficiencies > adapt search strategies • on IPC level: - improve user-friendliness (e.g. definitions) - simplify IPC scheme, rules More liberal approach when classifying ? One more symbol better than one symbol missing ? Do we need to be worried about varying classifications ?

  21. Options for improvement On MCD / database level: • crosscheck content of databases • pooling / compiling of classification data (in one searchable field / on family level ?) of - classification data of fam members - subsequent publications - other sources (DE: ICP,…) • processing such compilations of classifications of different origin, e.g.: compare classification of subsequent publications (A, B, ..) > create "trusted" classifications (e.g. class (A) = class (B)) ?

  22. Learn from / go WEB 2.0 ? • "Folksonomy", "social tagging", "cooperative, collaborative classification" > include broader user community ? e.g. any searcher ? > implement feedback channels ?

  23. Are you satisfied with classification in A61N 1/00 ? Yes / No Would you like to suggest further classifications: ....................... ....................... ....................... Click opens Submit

  24. Learn from / go WEB 2.0 ? • "Folksonomy", "social tagging", "cooperative, collaborative classification" > include broader user community > compile varying views, ie classifications • process such data; create "trusted" classifications • broader participation in scheme development, in particular definitions ? Tagging of IPC entries ? Thank you

  25. Top priority: all documents should have at least one valid classification Priority 1: documents have all appropriate symbols Priority 2: documents have no inappropriate symbols More liberal approach when classifying ? One more symbol better than one symbol missing ? Do we need to be worried about varying classifications ? Include broader user community ? e.g. any searcher ? Implement feedback channels ? Create "trusted" classifications (e.g. class (A) = class (B)) ?

More Related