Weird Invention of the Day: BibTeX

2014-07-22

Presenting a Bibtex Bibliography on a Website

Imagine you are given a long Bibtex file, that needs to be presented on a website. In order to achieve some flexibility in presenting the data, a MySQL database is to be used. This rules out alternatives such as bibtex2html, which creates static html pages.

A Python script that can be used to parse the Bibtex file and write it into the database can be downloaded here. Pybtex is used to conveniently parse the Bibtex file. The database layout for the different publication types (article, inproceedings, incollection, etc.) is hardcoded into the script, but should be easily adaptable to your needs.

2013-12-06

Matching Bibtex and HTML

Recently I was given two very long lists of scientific publications. One as a BibTeX file and another as a table in an HTML file. Some of the publications in the BibTeX file were missing in the HTML table and the task was to find out which ones these were. An additional challenge was, that both lists were created manually by different people and therefore author names, titles, etc. did not match character by character. Words with special characters, eg. 'Jörg', would be spelled as 'J\"org' in BibTeX and 'Jörg' in the HTML table.

A simple script that helps with this tedious problem, can be downloaded here. The script reads the .bib and the .html file and compares the title field of every BibTeX entry with every row in the HTML table. The package difflib is used to perform "approximate (sub)string matching". By some string comparison metric, it calculates a value from 0.0 (no match at all) to 1.0 (identical string is contained as a substring).
Finally the script generates a report, that contains all the publications, which are most probably missing.