Imagine you are given a long Bibtex file, that needs to be presented on a website. In order to achieve some flexibility in presenting the data, a MySQL database is to be used. This rules out alternatives such as bibtex2html, which creates static html pages.
A Python script that can be used to parse the Bibtex file and write it into the database can be downloaded here. Pybtex is used to conveniently parse the Bibtex file. The database layout for the different publication types (article, inproceedings, incollection, etc.) is hardcoded into the script, but should be easily adaptable to your needs.

This blog is supposed to be a collection of random, unrelated, little ideas, thoughts, and discoveries, which I assume to be helpful to a negligible part of the world's population and wish to share out of pure altruism. If posts appear really weird, maybe you have the wrong kind of humor. Many of the posts are science/technology related. If you are opposed to that, stop reading here! Comments, criticism, corrections, amendments, questions are always welcome.
Showing posts with label BibTeX. Show all posts
Showing posts with label BibTeX. Show all posts
2014-07-22
2013-12-06
Matching Bibtex and HTML
Recently I was given two very long lists of scientific publications. One as a BibTeX file and another as a table in an HTML file. Some of the publications in the BibTeX file were missing in the HTML table and the task was to find out which ones these were. An additional challenge was, that both lists were created manually by different people and therefore author names, titles, etc. did not match character by character. Words with special characters, eg. 'Jörg', would be spelled as 'J\"org' in BibTeX and 'Jörg' in the HTML table.
A simple script that helps with this tedious problem, can be downloaded here. The script reads the .bib and the .html file and compares the title field of every BibTeX entry with every row in the HTML table. The package difflib is used to perform "approximate (sub)string matching". By some string comparison metric, it calculates a value from 0.0 (no match at all) to 1.0 (identical string is contained as a substring).
Finally the script generates a report, that contains all the publications, which are most probably missing.
A simple script that helps with this tedious problem, can be downloaded here. The script reads the .bib and the .html file and compares the title field of every BibTeX entry with every row in the HTML table. The package difflib is used to perform "approximate (sub)string matching". By some string comparison metric, it calculates a value from 0.0 (no match at all) to 1.0 (identical string is contained as a substring).
Finally the script generates a report, that contains all the publications, which are most probably missing.
Subscribe to:
Posts (Atom)