Imagine you want to do some very important research and you are despaired to identify a research gap according to the current state-of-the-art. Moreover let's assume you have the intuition that a research gap can be found by combining two concepts from two different fields.
For instance, you might just have read two textbooks, one about freshwater aquarium fish and one about chemicals dissolved in water. Now you want to combine concepts from these two fields. To do this you need an estimate of 'how much' research has been done on the effect of chemical X on fish Y.
To get a rough estimate 'how much' research has already been done, Google Scholar can be used. For every search, it gives you an approximate number of publications that match your search terms. With this you can build a matrix like the following:
The rows correspond the keywords from one category (here: different types of fish) and the columns correspond to the other category (here: different chemicals). The color corresponds to the approximate amount of publications on Google Scholar that contain both keywords.
Certainly you cannot gain ultimate wisdom from this. Two keywords might just be a nonsensical paring or the keywords might be used in many publications, but in a context totally different from what you anticipated. However it provides a quick and simple way to figure out if you are entering a crowded field or not.
The script that was used to produce this plot can be downloaded here. The text based web browser Lynx needs to be installed to run it.

This blog is supposed to be a collection of random, unrelated, little ideas, thoughts, and discoveries, which I assume to be helpful to a negligible part of the world's population and wish to share out of pure altruism. If posts appear really weird, maybe you have the wrong kind of humor. Many of the posts are science/technology related. If you are opposed to that, stop reading here! Comments, criticism, corrections, amendments, questions are always welcome.
2013-11-21
2013-11-01
Sorting Papers by Keywords
Imagine you are a given an inhumanely big electronic pile of publications to read and an early deadline. Even reading the abstracts will cost you a considerable amount of your time and most of the papers are not all related to what you are up to. How do you select the papers to read first?
A simple approach might be the following: Assume you can come up with a set of keywords with an accompanying quality factor.The quality factor indicates how much you are interested in a given keyword. A very important keyword might be given a quality factor of 1.0 and a more general keyword might have a quality factor of just 0.1.
With this set of keywords and quality factors it is quite easy to compute a score for every publication. For every paper and every keyword the number of occurrence of the keyword is counted and the score of the document is increased according to the quality factor. The papers can be sorted by score and this gives you the priorities in which to read the papers. While this may not be a masterpiece of Information Retrieval, it is still a simple and quick approach to find relevant information.
A simple R script to create a table with paper scores can be downloaded here. The text mining package tm is used, which reads .pdf files conveniently.
The keywords/quality factor pairs need to be provided in an extra file just like the paths to the publications. The script creates a simple .html file for convenient viewing of the scored paper list.
A simple approach might be the following: Assume you can come up with a set of keywords with an accompanying quality factor.The quality factor indicates how much you are interested in a given keyword. A very important keyword might be given a quality factor of 1.0 and a more general keyword might have a quality factor of just 0.1.
With this set of keywords and quality factors it is quite easy to compute a score for every publication. For every paper and every keyword the number of occurrence of the keyword is counted and the score of the document is increased according to the quality factor. The papers can be sorted by score and this gives you the priorities in which to read the papers. While this may not be a masterpiece of Information Retrieval, it is still a simple and quick approach to find relevant information.
A simple R script to create a table with paper scores can be downloaded here. The text mining package tm is used, which reads .pdf files conveniently.
The keywords/quality factor pairs need to be provided in an extra file just like the paths to the publications. The script creates a simple .html file for convenient viewing of the scored paper list.
2013-09-13
'Synchronizing' Podcasts with a Portable Device
Do you also like listening to podcasts?
I do and my usual use case is the following: At first I discover a
new podcast on the web. Then I use a program like gpodder or Miro to
download all the episodes, which end up in one plain directory of the
hard drive. At last I want to 'synchronize' the episodes with a
portable device.
A lot of the time the portable device will have less memory than the size of the downloaded files or it is not desirable to fill up the portable device with just one podcast. So 'sychronizing' should copy only some episodes at a time to the portable device and remember which episodes have been copied. After listening/watching an episode, it can be deleted on the portable device to free some space for new episodes. New episodes should be copied during the next synchronization. No more capabilities except for playback and deleting episodes are assumed on the side of the portable device. My use case is a bit similar to the use cases discussed here.
Labels:
gpodder,
gpodder synchronization,
podcast,
sqlite,
synchronize
2013-09-01
Remarks on Presentations
Here are some simple suggestions that I find useful to improve slides for (scientific) presentations. Many suggestions are subjective and no exhaustiveness is claimed. Therefore feel free to differ (and comment).
The suggestions can be downloaded here as pdf and odp.
The suggestions can be downloaded here as pdf and odp.
2013-08-22
Very Simple Thermal Simulation Of Tiled Multi-Core System
Let's assume a tiled multi-core architecture, eg. a two dimensional grid of nxm compute
tiles. There might be several tasks running on every tile, which results in an
increase of temperature of the corresponding tiles. Idle tiles will otherwise cool down.
The following video shows the result of a very simple thermal simulation with a 2x2 grid and a single task. The toy "thermal management" migrates the task if the average temperature of a tile exceeds a threshold temperature and migrates the task to the tile with the minimum average temperature.
The video was created using this script (Python+Numpy+Matplotlib+ffmpeg). A toy model for temperature conduction is being used. Albeit the temperature model not being physically accurate, the simulation might still be useful to quickly evaluate more sophisticated thermal management strategies.
The following video shows the result of a very simple thermal simulation with a 2x2 grid and a single task. The toy "thermal management" migrates the task if the average temperature of a tile exceeds a threshold temperature and migrates the task to the tile with the minimum average temperature.
The video was created using this script (Python+Numpy+Matplotlib+ffmpeg). A toy model for temperature conduction is being used. Albeit the temperature model not being physically accurate, the simulation might still be useful to quickly evaluate more sophisticated thermal management strategies.
2013-07-31
Ear training with Anki
This post was created in collaboration with Thomas Fischbach after a discussion whether it would be possible for a person with only relative hearing to gain perfect hearing by practicing identifying musical notes with a flash card program.
No conclusive answer to this question will be given within this post, but you may try for yourself using the following script. It can be used to create an Anki deck with sounds. Anki is an excellent flash card program (similar to Mnemosyne). csound is a software synthesizer used for sound generation. Installation instructions are provided with the script.
No conclusive answer to this question will be given within this post, but you may try for yourself using the following script. It can be used to create an Anki deck with sounds. Anki is an excellent flash card program (similar to Mnemosyne). csound is a software synthesizer used for sound generation. Installation instructions are provided with the script.
2013-07-01
Dark Frame Analysis
To improve noisy video, shot at low light conditions, it is useful to measure the distribution of the noise. Therefore I recorded several minutes of video in a setup, where no light would enter the camera (see Dark-frame subtraction). A script was used to simply sum up a big amount of recorded frames. This 'noise accumulation frame' was used for further analysis.
The camera that was being used, is a consumer grade Panasonic HC-V500. Some strange effects will be unveiled further down and you might be interesting in testing if your camera has these as well. The simple script that was used to create the plots can be downloaded here (requires Scipy, Numpy, Opencv and Matplotlib).
Unfortunately there is no 1:1 correspondence of the pixels that end up in the video file and the real pixels on the CMOS sensor. It is therefore unknown if effects seen further down, are a result of the sensor or the image processing in the camera, esp. video compression. It would be favorable to take still images at the highest possible resolution in an automated way, but at least my video camera does not have this feature.
The distribution of the blue-channel in the noise accumulation frame looks like this:
The other channels (red, green) are almost indistinguishable from this. The following plot shows the distribution of red+green+blue channel:
It can be seen that a big part of the noise is approximately normal distributed. However if you look at the noise accumulation frame directly, some structure is visible, which looks a bit like the electric field of a quadrupole:
Even though there is no direct correspondence of the pixels in the video file and the pixels on the CMOS sensors, "hot pixels" are still present (Why?). These can be seen easily by looking at details of the picture above. Keep in mind that mu is around 3.61 and sigma is around 0.47, so all values above 5 should be extremely unlikely. The plots below simply show the same as the plot above subdivided into 4 parts:
The camera that was being used, is a consumer grade Panasonic HC-V500. Some strange effects will be unveiled further down and you might be interesting in testing if your camera has these as well. The simple script that was used to create the plots can be downloaded here (requires Scipy, Numpy, Opencv and Matplotlib).
Unfortunately there is no 1:1 correspondence of the pixels that end up in the video file and the real pixels on the CMOS sensor. It is therefore unknown if effects seen further down, are a result of the sensor or the image processing in the camera, esp. video compression. It would be favorable to take still images at the highest possible resolution in an automated way, but at least my video camera does not have this feature.
The distribution of the blue-channel in the noise accumulation frame looks like this:
The other channels (red, green) are almost indistinguishable from this. The following plot shows the distribution of red+green+blue channel:
It can be seen that a big part of the noise is approximately normal distributed. However if you look at the noise accumulation frame directly, some structure is visible, which looks a bit like the electric field of a quadrupole:
Subscribe to:
Posts (Atom)