This document summarizes work to develop an improved search filter to more rapidly identify reports of randomized controlled trials (RCTs) in Embase for inclusion in the Cochrane Central Register of Controlled Trials (CENTRAL). Methods included developing and validating a sensitive search filter using reference sets of known RCTs. The updated 2015 filter identified RCTs with over 97.6% sensitivity compared to the previous Cochrane filter. Future work includes exploring text mining and crowdsourcing to further improve identification of RCTs for inclusion in CENTRAL.
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
Improving Access to RCT Reports from Embase
1. Providing Consultancy &
Research in Health Economics
Julie Glanville, York Health Economics Consortium, UK
Gordon Dooley, Metaxis, UK
Anna Noel Storr, Cochrane Dementia and Cognitive Improvement Group
Ruth Foxlee, Cochrane Editorial Unit
October 2015
Improving rapid access to reports of
RCTs from Embase: innovative
methods to enhance the Cochrane
Central Register of Controlled Trials
(CENTRAL)
2. Providing Consultancy &
Research in Health Economics
Presentation Overview
Background
Objectives
Methods
Results
The future
3. Background
Cochrane systematic reviews rely on the efficient identification of
research evidence, specifically evidence from randomised
controlled trials (RCTs) and quasi randomised studies.
The largest single source of RCTs is the Cochrane Central Register
of Controlled Trials (CENTRAL)
CENTRAL was mainly populated with records from Medline, but
also contained records from Embase
Collaboration identified the need for improved rapid identification of
trials from Embase for inclusion in CENTRAL
4. The project and its
objectives
The Cochrane Collaboration commissioned the Embase update
project in March 2013
Project is undertaken by a consortium of three organisations
the Cochrane Dementia and Cognitive Improvement Group
Metaxis, UK
York Health Economics Consortium. University of York, UK
Objectives
To identify reports of RCTs and controlled clinical trials from
Embase for more rapid availability in CENTRAL
Today I will report on the development of the bespoke search filter
to identify the trials
5. Methods, 1
We developed and validated a sensitive search filter to identify
reports of RCTs
Reference standard of 10,000 randomly selected relevant Embase
reports of RCTs and quasi RCTs already available in CENTRAL
was compiled.
published 2000-2010
Used Simstatw and Wordstatw to Identify terms, phrases and
grouped terms within that reference standard set of records
which could be tested in filters
6. Methods, 2: techniques for
identifying candidate terms
The frequency of terms which appeared in more than 10 records.
Terms were analysed by their location within a record: title,
abstract, EMTREE headings. Also all terms (independent of their
location within a record) were analysed by frequency.
The WordStat phrase finding option was used to identify phrases
which appeared in more than 10 records.
Case occurrence and term frequency–inverse document frequency
(tf*idf) were tested.
WordStat clustering option to identify terms which form groups, i.e.
words which often appear in close proximity to each other.
7. Methods, 3: testing and
validation
Draft strategies were tested on a second set of 10,000 randomly
selected Embase RCT records from CENTRAL
The best candidate filter was validated against a third set of 10,000
randomly selected Embase RCT records from CENTRAL
We also assessed the performance of the filter against the previous
Cochrane filter
8. Methods, 3: 2015 revision
Cochrane 2014 strategy was revised and a range of exclusion
terms were added
These were identified from the rejected studies
Subject terms and also animal terms
The impact of the exclusions was tested
The revised strategy was adopted from February 2015 onwards
This summer we have developed a filter just to remove animal
studies presented as conference papers
9. Results (reference standard 3)
The validated search filter identifies reports of RCTs in
Embase with over 97.6% sensitivity
97.6% in records published in 2002 (reference standard 3)
100% in records published in 2010 (reference standard 3)
Number needed to read
156 (records published in 2001)
400 (records published in 2010)
10. Embase Filter (Ovid
interface) 2014
1. Randomized controlled trial/
2. Controlled clinical study/
3. 1 or 2
4. Random$.ti,ab.
5. randomization/
6. intermethod comparison/
7. placebo.ti,ab.
8. (compare or compared or comparison).ti.
9. ((evaluated or evaluate or evaluating or
assessed or assess) and (compare or
compared or comparing or
comparison)).ab.
10. (open adj label).ti,ab.
11. ((double or single or doubly or singly) adj
(blind or blinded or blindly)).ti,ab.
12. double blind procedure/
13. parallel group$1.ti,ab.
14. (crossover or cross over).ti,ab.
15. ((assign$ or match or matched or
allocation) adj5 (alternate or group$1 or
intervention$1 or patient$1 or subject$1
or participant$1)).ti,ab.
16. (assigned or allocated).ti,ab.
17. (controlled adj7 (study or design or
trial)).ti,ab.
18. (volunteer or volunteers).ti,ab.
19. human experiment/
20. trial.ti.
21. or/4-20
22. 21 not 3
11. Process
An analysis of the records retrieved resulted in a tiered
record assessment process
The most obvious RCT reports are fast-tracked into CENTRAL
Animal studies are set to one side for team assessment
The less obvious RCT records are assessed for relevance by
internet crowdsourcing
Record screening software written by Metaxis
Between two and six people assess whether a record is really a
report of an RCT
12. Performance against
original Cochrane filter
The Cochrane 2014 filter found 71,448 records that were not
retrieved by the original Cochrane filter:
1000 of the most recent records were obtained for assessment.
9.1% were possibly reports of CCTs or RCTs
If this % is extrapolated to the 71,448 unique records retrieved by the
Cochrane 2014 filter then 6500 extra reports of RCTs might be identifed
by this filter
The original filter found 988/1000 records that were not retrieved by
the Cochrane 2014 filter: all of these records were downloaded.
3% of these records were possibly reports of controlled clinical trials
The records found by both filters totalled 33,360.
13. Cochrane 2015 filter
The following two slides show the search
terms which are excluded from the results
of the Cochrane 2014 filter
1. Cochrane 2014 filter
2. Exclusions (2015)
3. 1 NOT 2
14. Cochrane 2015 filter
exclusions, 1
(random$ adj sampl$ adj7 ("cross section$" or questionnaire$1 or
survey$ or database$1)).ti,ab. not (comparative study/ or controlled
study/ or randomi?ed controlled.ti,ab. or randomly assigned.ti,ab.)
(5813)
Cross-sectional study/ not (randomized controlled trial/ or controlled
clinical study/ or controlled study/ or randomi?ed controlled.ti,ab. or
control group$1.ti,ab.) (100831)
(((case adj control$) and random$) not randomi?ed controlled).ti,ab.
(10405)
(Systematic review not (trial or study)).ti. (44089)
(nonrandom$ not random$).ti,ab. (11950)
"Random field$".ti,ab. (1294)
(random cluster adj3 sampl$).ti,ab. (703)
15. Cochrane 2015 filter
(review.ab. and review.pt.) not trial.ti. (480641)
"we searched".ab. and (review.ti. or review.pt.) (13032)
"update review".ab. (64)
(databases adj4 searched).ab. (11423)
(rat or rats or mouse or mice or swine or porcine or murine or sheep
or lambs or pigs or piglets or rabbit or rabbits or cat or cats or dog
or dogs or cattle or bovine or monkey or monkeys or trout or
marmoset$1).ti. and animal experiment/ (819059)
Animal experiment/ not (human experiment/ or human/) (1669138)
((In vitro or invitro) not (invivo or "in vivo")).ti. (239064)
or/1-14 (2553242)
16. Embase processing
January 2014-end Jan 2015 using Cochrane 2014 filter
February 2015-July 2015 using revised filter
Jan 2014 to 31
Jan 2015
Feb 2015-
July 2015
inclusive
Total retrieved 153610 78516
Records sent directly into
Central 54282 9607
Screened RCT or CCT 4324 4515
Screened Reject 94095 63900
Screened Unsure 909 494
17. Study identification:
precision
Jan 2014 to 31
Jan 2015
Feb 2015-July
2015 inclusive
Precision: all records 38.15% 17.99%
Precision: screened records 4.55% 7.01%
NNR all RCT/CCT records 2.621063 5.559836
NNR screened records only 21.97132 14.26224
January 2014-end Jan 2015 using Cochrane 2014 filter
February 2015-July 2015 using revised filter
18. Summary
Many next steps including exploring text mining options
We have achieved improved currency of Embase record
availability in CENTRAL
The number of irrelevant and duplicate records will be
fewer
Searchers will be able to identify more RCTs more
accurately than previously by a rapid search of
CENTRAL
19. We need help!
Please visit our project website
http://www.metaxis.com/embasepublic/
Feel free to join the crowd!
http://www.metaxis.com/embase/login.php
20. Providing Consultancy &
Research in Health Economics
http://tinyurl.com/yhec-facebook
http://twitter.com/YHEC1
http://www.minerva-network.com/
Thank you
julie.glanville@york.ac.uk
Telephone: +44 1904 324832
Website: www.yhec.co.uk
Notas del editor
The frequency of terms which appeared in more than 10 records. Terms were analysed by their location within a record: title, abstract, EMTREE headings. Also all terms (independent of their location within a record) were analysed by frequency.
The WordStat phrase finding option was used to identify phrases which appeared in more than 10 records.
Case occurrence and term frequency–inverse document frequency (tf*idf) were tested. Case occurrence is the frequency of presence of terms in the body of records. The tf*idf statistic reflects how important a word is to a record in a set of records. The tf-idf value increases proportionally to the number of times a word appears in the record, but is offset by the frequency of the word in the set of records. This helps to take account of the fact that some words are more common than others.The highest frequency terms (Randomized controlled trial/ and Controlled study/), which provided the highest number of relevant records, were identified and removed from the analysis. The yield of terms using the case occurrence analysis and the tf*idf analysis were then explored to identify whether different terms would be highlighted by each approach.
WordStat clustering option to identify terms which form groups, i.e. words which often appear in close proximity to each other.
Each of these analyses generated candidate terms which were then tested in candidate filters, to ascertain how many of the gold standard records they could identify in Ovid Embase. All of the gold standard records were identified in Embase by searching for their unique identifier.
Case occurrence is the frequency of presence of terms in the body of records. The tf*idf statistic reflects how important a word is to a record in a set of records. The tf-idf value increases proportionally to the number of times a word appears in the record, but is offset by the frequency of the word in the set of records. This helps to take account of the fact that some words are more common than others.The highest frequency terms (Randomized controlled trial/ and Controlled study/), which provided the highest number of relevant records, were identified and removed from the analysis. The yield of terms using the case occurrence analysis and the tf*idf analysis were then explored to identify whether different terms would be highlighted by each approach.
The frequency of terms which appeared in more than 10 records. Terms were analysed by their location within a record: title, abstract, EMTREE headings. Also all terms (independent of their location within a record) were analysed by frequency.
The WordStat phrase finding option was used to identify phrases which appeared in more than 10 records.
Case occurrence and term frequency–inverse document frequency (tf*idf) were tested. Case occurrence is the frequency of presence of terms in the body of records. The tf*idf statistic reflects how important a word is to a record in a set of records. The tf-idf value increases proportionally to the number of times a word appears in the record, but is offset by the frequency of the word in the set of records. This helps to take account of the fact that some words are more common than others.The highest frequency terms (Randomized controlled trial/ and Controlled study/), which provided the highest number of relevant records, were identified and removed from the analysis. The yield of terms using the case occurrence analysis and the tf*idf analysis were then explored to identify whether different terms would be highlighted by each approach.
WordStat clustering option to identify terms which form groups, i.e. words which often appear in close proximity to each other.
Each of these analyses generated candidate terms which were then tested in candidate filters, to ascertain how many of the gold standard records they could identify in Ovid Embase. All of the gold standard records were identified in Embase by searching for their unique identifier.
n March 2013 the contract to identify Embase records was awarded to a consortium made up of Metaxis Ltd, the Cochrane Dementia and Cognitive Improvement Group, and York Health Economics Consortium (YHEC). Searches covering January 2011 to December 2013 identified 33,564 unique Embase records and these were published in CENTRAL, January 2014 Issue 1. All these records were identified from a search in Embase (via Ovid SP) using the Emtree terms Randomized Controlled Trial or Controlled Clinical Trial. It is estimated that 2/3 of records eligible for CENTRAL (according to CERT guidance) from the backlog have been captured and fed into CENTRAL by this search; work to identify the remaining third, (i.e. records not indexed with the RCT or CCT term) is ongoing. The estimates are based on a ‘gold standard’ set of records (made up of large random samples of 1000 Embase records already in CENTRAL across all years). The record set added January issue 1 did not include conference publications; work on these is also ongoing.
Records based on a newly developed highly sensitive search strategy will be fed into CENTRAL from January 2014 on a monthly basis. The search strategy currently is:
1 Random$.ti,ab.
2 randomization/
3 intermethod comparison/
4 placebo.ti,ab.
5 (compare or compared or comparison).ti.
6 ((evaluated or evaluate or evaluating or assessed or assess) and (compare or compared or comparing or comparison)).ab.
7 (open adj label).ti,ab.
8 ((double or single or doubly or singly) adj (blind or blinded or blindly)).ti,ab.
9 double blind procedure/
10 parallel group$1.ti,ab.
11 (crossover or cross over).ti,ab.
12 ((match or matched or allocation) adj5 (alternate or group$1 or intervention$1 or patient$1 or subject$1 or participant$1)).ti,ab.
13 (assigned or allocated).ti,ab.
14 (controlled adj7 (study or design or trial)).ti,ab.
15 (volunteer or volunteers).ti,ab.
16 human experiment/
17 trial.ti.
18 or/1-17
19 18 NOT tier 1 results [RCT/ OR CCT/]
20 19 not conference abstract.pt.
21 (mammal/ or marine species/ or nonhuman/ or bird/ or animal experiment/ or exp rodent/ or cattle/) not human/
22 20 not 21
Animal studies
Hi, using the animal studies you sent me I have devised the following. It performs just a little better than the one Anna devised. However, adding in the NOT human/ does mean that it fails to spot lots of animal studies which EMBASE has also tagged with HUMAN. My strategy is line 17, Anna's is line 18. Mine finds all of Anna's and some extra. The test on the 33 you sent me on Sunday show that the filter removes 9, but doesn't remove 24. These are a bunch of studies with Human in as well!!!
Not sure whether we want to propose another 2 -tier approach - use line 17 and then do it again without the 'not Human/' to give those a quick eyeball rather than submitting to the reviewers? if you did that my strategy removes 32 in total and 1 slips through - paper about food chemistry.
Ideally we need to test this some more on more result sets.
1 exp experimental organism/ 302378
2 animal tissue/ 726327
3 animal cell/ 697222
4 exp animal disease/ 139836
5 exp carnivore disease/ 25207
6 exp bird/ 97046
7 exp experimental animal welfare/ 2288
8 exp animal husbandry/ 34998
9 animal behavior/ 52234
10 exp animal cell culture/ 8989
11 exp mammalian disease/ 78395
12 exp mammal/ 11005108
13 exp marine species/ 3468
14 nonhuman/ 2884891
15 animal.hw. 2256671
16 or/1-15 12120404
17 16 not human/ 2718150
18 (mammal/ or marine species/ or nonhuman/ or bird/ or animal experiment/ or exp rodent/ or cattle/) not human/ 2421425
19 17 not 18 296725
20 18 not 17 0
21 ("2014092492" or "2014102544" or "2014093898" or "23179110" or "2014105508" or "23402514" or "23263675" or "2014093590" or "2014032170" or "2014086649" or "2014086497" or "2014090976" or "2014098222" or "2014100105" or "2014103920" or "23547003" or "23531829" or "23531823" or "2014034485" or "24043704" or "2014037245" or "2014106987" or "2014103562" or "2014091297" or "2014090223" or "2014094000" or "23775276" or "2014083973" or "2014034971" or "2014099859" or "2014081346" or "2014034212" or "2014081091").an. 33
22 17 and 21 9
23 21 not 22 24