#37 — Medline import fails
| State | Unconfirmed |
|---|---|
| Version: | 0.8.0 |
| Area | Functionality |
| Issue type | Bug |
| Severity | Medium |
| Submitted by | unset |
| Submitted on | Sep 03, 2007 |
| Responsible |
—
|
| Target release: |
—
|
Last modified on
Jan 08, 2009
by
Matthew Wilkes
Until recently we could import Pubmed citations by pasting the Medline-formatted entry into the 'import' tab of a bibliography folder. Now, this suddenly stopped working. Perhaps Pubmed made a slight change to their format.
Details:
No import happens but the following traceback appears in the Error log of the site:
Traceback (innermost last):
Module ZPublisher.Publish, line 115, in publish
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 41, in call_object
Module Products.CMFFormController.FSControllerPageTemplate, line 96, in __call__
Module Products.CMFFormController.BaseControllerPageTemplate, line 39, in _call
Module Products.CMFFormController.ControllerBase, line 243, in getNext
Module Products.CMFFormController.Actions.TraverseTo, line 36, in __call__
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 41, in call_object
Module Products.CMFFormController.FSControllerPythonScript, line 107, in __call__
Module Products.CMFFormController.Script, line 141, in __call__
Module Products.CMFCore.FSPythonScript, line 108, in __call__
Module Shared.DC.Scripts.Bindings, line 311, in __call__
Module Shared.DC.Scripts.Bindings, line 348, in _bindAndExec
Module Products.CMFCore.FSPythonScript, line 164, in _exec
Module None, line 65, in bibliography_import
- <FSControllerPythonScript at /sfiles/bibliography_import used for /sfiles/mycoplasma/myco_bibliography>
- Line 65
AttributeError: 'str' object has no attribute 'get'
Details:
No import happens but the following traceback appears in the Error log of the site:
Traceback (innermost last):
Module ZPublisher.Publish, line 115, in publish
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 41, in call_object
Module Products.CMFFormController.FSControllerPageTemplate, line 96, in __call__
Module Products.CMFFormController.BaseControllerPageTemplate, line 39, in _call
Module Products.CMFFormController.ControllerBase, line 243, in getNext
Module Products.CMFFormController.Actions.TraverseTo, line 36, in __call__
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 41, in call_object
Module Products.CMFFormController.FSControllerPythonScript, line 107, in __call__
Module Products.CMFFormController.Script, line 141, in __call__
Module Products.CMFCore.FSPythonScript, line 108, in __call__
Module Shared.DC.Scripts.Bindings, line 311, in __call__
Module Shared.DC.Scripts.Bindings, line 348, in _bindAndExec
Module Products.CMFCore.FSPythonScript, line 164, in _exec
Module None, line 65, in bibliography_import
- <FSControllerPythonScript at /sfiles/bibliography_import used for /sfiles/mycoplasma/myco_bibliography>
- Line 65
AttributeError: 'str' object has no attribute 'get'
- Steps to reproduce:
- 1) select an article from Pubmed, choose Medline view
2) Cut&Paste Medline view into import tab of bibliography folder
3) Select 'Medline' format and 'import'
Added byunsetonSep 27, 2007 02:46 PM
I had a deeper look at this issue and it turns out that Pubmed has indeed changed the medline formate:
The medline parser in CMFBibliography expects records to start like this 'PMID-' or this 'AU -'. Instead, in the current version, medline uses tabs to separate the '-'. I've adapted tool/parsers/medline.py to be more permissive. I created a patch for the file medline.py. Paste the following lines into a new file medline.py.patch and apply it with 'patch -p0 < medline.py.patch'!
--- __medline.py 2005-06-01 13:40:06.000000000 +0200
+++ medline.py 2007-09-27 16:32:26.000000000 +0200
@@ -17,6 +17,10 @@
import re
+def extractKey(rawkey):
+ """adapt to new Pubmed format"""
+ return rawkey.split('-')[0].strip()
+
class MedlineParser(BibliographyParser):
"""
@@ -34,7 +38,7 @@
id = 'medline',
title = "Medline parser",
delimiter = '\n\n',
- pattern = r'(^.{0,4}- )',
+ pattern = r'(^.{0,5}- )',
flag = re.M):
"""
initializes including the regular expression patterns
@@ -54,11 +58,12 @@
# vanilla test for 'PMID- ' in the sub-string 'source[0, 100]'
## rr: can definitively be improved
- if source.find('PMID- ', 0, 1000) > -1:
+ if source.find('PMID', 0, 1000) > -1:
return 1
else:
return 0
+
def parseEntry(self, entry):
"""
parses a single entry
@@ -71,7 +76,7 @@
tokens = self.pattern.split(entry)
checkAU = 0
- if 'FAU - ' not in tokens:
+ if 'FAU\t-' not in tokens:
checkAU = 1
nested = []
@@ -81,24 +86,25 @@
# some defaults
result['note'] = 'automatic medline import'
- for key, value in nested:
- if key == 'PT - ' and value.find('Journal Article')> -1:
+ for k, value in nested:
+ key = extractKey(k)
+ if key == 'PT' and value.find('Journal Article')> -1:
result['publication_type'] = 'ArticleReference'
- elif key == 'TI - ':
+ elif key == 'TI':
title = value.replace('\n', ' ').replace(' ', '').strip()
result['title'] = title
- elif key == 'AB - ':
+ elif key == 'AB':
tmp = value.replace('\n', ' ').replace(' ', '')
result['abstract'] = tmp.replace(' ', '').replace(' ', '')
- elif key == 'PMID- ': result['pmid'] = str(value).strip()
- elif key == 'TA - ': result['journal'] = str(value).strip()
- elif key == 'VI - ': result['volume'] = str(value).strip()
- elif key == 'IP - ': result['number'] = str(value).strip()
- elif key == 'PG - ': result['pages'] = str(value).strip()
- elif key == 'DP - ':
+ elif key == 'PMID': result['pmid'] = str(value).strip()
+ elif key == 'TA': result['journal'] = str(value).strip()
+ elif key == 'VI': result['volume'] = str(value).strip()
+ elif key == 'IP': result['number'] = str(value).strip()
+ elif key == 'PG': result['pages'] = str(value).strip()
+ elif key == 'DP':
result['publication_year'] = value[:4]
result['publication_month'] = value[5:].replace('\n','').replace('\r','')
- elif key == 'FAU - ':
+ elif key == 'FAU':
raw = value.replace('\n', '').split(', ')
lname = raw[0]
fnames = raw[1].split(' ',1)
@@ -113,7 +119,7 @@
}
result.setdefault('authors',[]).append(adict)
- elif checkAU and key == 'AU - ':
+ elif checkAU and key == 'AU':
raw = value.replace('\n', '').split()
lname = raw[0]
fnames = raw[1]
Added byunsetonSep 27, 2007 02:49 PM
I had a deeper look at this issue and it turns out that Pubmed has indeed changed the medline formate:
The medline parser in CMFBibliography expects records to start like this 'PMID-' or this 'AU -'. Instead, in the current version, medline uses tabs to separate the '-'. I've adapted tool/parsers/medline.py to be more permissive. I created a patch for the file medline.py. Apply it with 'patch -p0 < medline.py.patch'! I am also attaching the modified medline.py (based on branch 0.8).
--- __medline.py 2005-06-01 13:40:06.000000000 +0200
+++ medline.py 2007-09-27 16:32:26.000000000 +0200
@@ -17,6 +17,10 @@
import re
+def extractKey(rawkey):
+ """adapt to new Pubmed format"""
+ return rawkey.split('-')[0].strip()
+
class MedlineParser(BibliographyParser):
"""
@@ -34,7 +38,7 @@
id = 'medline',
title = "Medline parser",
delimiter = '\n\n',
- pattern = r'(^.{0,4}- )',
+ pattern = r'(^.{0,5}- )',
flag = re.M):
"""
initializes including the regular expression patterns
@@ -54,11 +58,12 @@
# vanilla test for 'PMID- ' in the sub-string 'source[0, 100]'
## rr: can definitively be improved
- if source.find('PMID- ', 0, 1000) > -1:
+ if source.find('PMID', 0, 1000) > -1:
return 1
else:
return 0
+
def parseEntry(self, entry):
"""
parses a single entry
@@ -71,7 +76,7 @@
tokens = self.pattern.split(entry)
checkAU = 0
- if 'FAU - ' not in tokens:
+ if 'FAU\t-' not in tokens:
checkAU = 1
nested = []
@@ -81,24 +86,25 @@
# some defaults
result['note'] = 'automatic medline import'
- for key, value in nested:
- if key == 'PT - ' and value.find('Journal Article')> -1:
+ for k, value in nested:
+ key = extractKey(k)
+ if key == 'PT' and value.find('Journal Article')> -1:
result['publication_type'] = 'ArticleReference'
- elif key == 'TI - ':
+ elif key == 'TI':
title = value.replace('\n', ' ').replace(' ', '').strip()
result['title'] = title
- elif key == 'AB - ':
+ elif key == 'AB':
tmp = value.replace('\n', ' ').replace(' ', '')
result['abstract'] = tmp.replace(' ', '').replace(' ', '')
- elif key == 'PMID- ': result['pmid'] = str(value).strip()
- elif key == 'TA - ': result['journal'] = str(value).strip()
- elif key == 'VI - ': result['volume'] = str(value).strip()
- elif key == 'IP - ': result['number'] = str(value).strip()
- elif key == 'PG - ': result['pages'] = str(value).strip()
- elif key == 'DP - ':
+ elif key == 'PMID': result['pmid'] = str(value).strip()
+ elif key == 'TA': result['journal'] = str(value).strip()
+ elif key == 'VI': result['volume'] = str(value).strip()
+ elif key == 'IP': result['number'] = str(value).strip()
+ elif key == 'PG': result['pages'] = str(value).strip()
+ elif key == 'DP':
result['publication_year'] = value[:4]
result['publication_month'] = value[5:].replace('\n','').replace('\r','')
- elif key == 'FAU - ':
+ elif key == 'FAU':
raw = value.replace('\n', '').split(', ')
lname = raw[0]
fnames = raw[1].split(' ',1)
@@ -113,7 +119,7 @@
}
result.setdefault('authors',[]).append(adict)
- elif checkAU and key == 'AU - ':
+ elif checkAU and key == 'AU':
raw = value.replace('\n', '').split()
lname = raw[0]
fnames = raw[1]
No responses can be added.