2014-06-24

Ordering Pictures by Dissimilarity

Imagine you want to post some pictures on the web, eg. for a social media post or a picture gallery on the web. Sometimes the context defines how the pictures have to be ordered, eg. in chronological order, but sometimes you also have the freedom to order the pictures, such that the presentation becomes more interesting.

The question is whether it is possible to automatically find an ordering for the pictures that makes the presentation more interesting to the user. The idea of this post is that the pictures should be ordered to maximize dissimilarity between consecutive pictures. It should catch the the viewers attention if the visual impression of two consecutive pictures is as different as possible.

A metric is required to formalize 'dissimilarity'. Different options exist here, but one approach is to look at the histograms of every picture and quantify the similarity between two pictures as the statistical distance between their histograms.

A script that implements this idea can be downloaded here (needs OpenCV). The Bhattacharyya distance is used as one example of a statistical distance. An example of a picture set in 'boring order' and a picture set that has been sorted as outlined above is shown below:

Lenna in 'boring order'

Lenna in 'interesting order'

2014-05-30

DDNS and IPv6?

Ever tried setting up DDNS with IPv6? I recently had to learn that there is a remarkable range of tools and services that do not work with IPv6.

ddclient does not support IPv6, nor does inadyn.
inadyn-mt claims to support IPv6, but if that is true, it is at least hard to configure.

Also not all DDNS-Services offer IPv6 support with freedns.afraid.org being one notable exception. This service also allows setting the IPv6 via a URL.

A pragmatic solution to get DDNS working with IPv6 could be running the following Python script periodically using a cronjob:
 
#!/usr/bin/env python
''' update ipv6 record on freedns.afraid.org '''

import netifaces
import subprocess
import sys

iface_name = "eth0"
pwd_hash = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr"

try:
addrs = netifaces.ifaddresses(iface_name)
ipv6_str = addrs[netifaces.AF_INET6][0]['addr']
#check if link local adress
if ipv6_str[0:5] == "fe80:":
raise Exception()

except:
sys.exit("could not determine ipv6 address");


subprocess.call(
[
"wget",
"-q",
"--read-timeout=0.0",
"--waitretry=5",
"--tries=400",
"https://freedns.afraid.org/dynamic/update.php?"+pwd_hash+"&address="+ipv6_str
]
)

The script can also be downloaded here. It requires the package netifaces, which can be installed using pip. Moreover wget needs to be installed. You need to adjust the interface name and the password hash manually.
$> sudo apt-get install python-pip
$> sudo pip install netifaces

I suggest simply copying the script to /opt, making it executable and adding a cronjob for it.
$> sudo crontab -e

add the following line to the crontab
 */30 * * * *    /opt/ipv6_update.py

and restart the cron daemon
$> sudo /etc/init.d/cron restart

If you are using a network manager, eg. wicd or network-manager, the Python script above can be hooked in there. For wicd the script would have to be copied to
/etc/wicd/scripts/postconnect

2014-05-06

Experiences from running a Tor intermediate relay

In this post I want to share some basic experiences from running a Tor intermediate relay for 5 months.


Resource Usage

Fortunately Tor's hardware requirements are modest. An old  ASUS EeePC 1101HA netbook with an Intel Atom CPU Z520 @ 1.33GHz and 1GB of RAM was used for the Tor node and found to be sufficient. The node was used in conjunction with a simple cable internet connection (10Mbit/s downstream; 1Mbit/s upstream). Amazingly and despite the cheap hardware, the system worked stable and no system crashes occurred.

Tor's  memory usage is about 210MB and the memory usage of the entire system is only about 250MB.

The  bandwidth usage was around 100kBytes/s on average and a little more than 1GByte/day was up- and downloaded. The bandwidth is mostly consumed in bursts since Tor is designed to reduce latency and then the system goes idle to not exceed the bandwidth limit. The relay was also used to mirror directory information.

The temperature of the CPU was around 27°C, which is not so much above ambient temperature. This conveniently reduces aging and noise from the fan.


Power Consumption

An attempt was made to reduce power consumption by
  • uninstalling unnecessary services (eg. CUPS)
  • configuring DPMS to turn off the screen quickly eg. turning off the screen after 10s:  $> xset dpms 10 0 0
  • alternatively it is possible to turn off the screen using vbetool dpms off , if no X server is present. Strangely the screen will automatically turn itself back on after some time and a cron job is required to keep it turned off permanently.
  • shutting down Wifi and Bluetooth (Fn+... on the keyboard) 
  • using powertop for further optimizations
Assuming a (probably too high) power consumption of 15W and a price for electricity of 0.2€/kWh, the energy bill is increased by around 26€ per year through this act of altruism. For a future relay, I intend to use an old smartphone or tablet computer, which will reduce power consumption significantly.


Configuration

The configuration was pretty usual. Some minor things to be mentioned here:

To be able to use the tor relay from the local network, eg.
SocksPort 192.168.178.42:910
SocksPolicy accept 192.168.178.0/16
It seems Tor will not start automatically after reboot, if these lines are in the config file.

To view connection information (eg. open circuits) from arm:
DisableDebuggerAttachment 0

To fix the exit node to a specific country:
ExitNodes de 


Software 

Not much software was required:
  • Debian 7 (Wheezy) was used as OS.
  • wicd-curses was used as a network manager for convenient configuration of the network interface
  • Tor was installed from the Debian repositories.
  • X was necessary for DPMS (energy management); alternatively vbetool is sufficient to turn the screen off.
  • OpenBox was used as a resource friendly window manager and for convenience
  • The hardware clock of the EeePC seems to be drifting considerably. Therefore it was necessary to install ntpd to keep system clock in sync.
  • some other optional tools (arm, htop, powertop, sensors, unattended-upgrades)
Statistics about the Tor node can be obtained through Atlas and arm. arm crashes after some time, if left running.

arm needs to be started as the user running tor eg.
$> sudo -u yourusername-tor arm

2014-03-18

Wget, Cookies and Firefox

Did you ever want to automatically (mass)-download data from a website, where a login is required, eg. a wiki or a social network? If the website stores a session cookie on your computer, it might be possible to download content automatedly using Wget.

It is possible to pass Wget a cookie file as a parameter. This might look like the following:
wget --keep-session-cookies --load-cookies=cookies.txt -p -k https://someurl.org/protected/site_01.htm

An example of a cookie file might look as follows (use tabs instead of spaces!):
# HTTP cookie file.
someurl.org  TRUE  /  FALSE  1391671828  someurlUserID  42
someurl.org  TRUE  /  FALSE  1391671828  someurlUserName Peter
someurl.org  TRUE  /  FALSE  1391671828  someurlToken  d3d3fdsere
someurl.org  TRUE  /  FALSE  -1  someurl_session  g8furfv99dmp1

After logging in on the respective website, you can conveniently view the necessary cookies in Firefox.




date can be used to convert the expiration time of the cookies in Firefox to the format used in the wget cookie files, eg. by issuing:
date -d "Wed 12 Mar 2014 01:31:42 PM CET" +%s

2014-01-10

Installing Tizen SDK on Ubuntu with OpenJDK

Today I tried installing the Tizen SDK (tizen-sdk-ubuntu64-v2.2.71.bin) on Ubuntu 12.04. The installation script exited complaining that it requires Oracle JDK instead of OpenJDK ("OpenJDK is not supported. Try again with Oracle JDK.").

This is a bit annoying, because OpenJDK comes with Ubuntu per default and Oracle JDK is not in the repositories any more. It seems, the installer's requirement is merely a policy and not an actual technical requirement. The installation does succeed even with the OpenJDK, if the following lines are commented out in the installation script:

# check the default java as OpenJDK ##
if [ "ubuntu" = "${OS_NAME}" ] ; then
    CHECK_OPENJDK=`java -version 2>&1 | egrep -e OpenJDK`
    if [ -n "${CHECK_OPENJDK}" ] ; then
        echo "${CE} OpenJDK is not supported. Try again with Oracle JDK. ${CN}"
        exit 1
    fi
fi

The installation and basic usage of the IDE seem to work without problems after this. The OpenJDK version used was 1.6.0_27.

2013-12-28

Fun with File Systems

Imagine you have a data logging application, that writes data to disk continuously. Since the application is not very stable, you want it to write out the data in small files, so it does not loose too much data, if the application crashes. This creates some need to find a good trade-off between file size and file system in order to avoid wasting too much disk space with file system overhead.

An approach to measure file system overhead and to explore the design space of different file systems and file sizes quickly is as follows:
  • Create a ramdisk and in this ramdisk create a bulk file of given size (using dd)
  • For all combinations of file size and file system:
    • Format the bulk file with a desired file system (using mkfs) and mount it.
    • Continuously write files with a fixed size to the mounted bulk file until an exception occurs and record how many files could be written to mounted bulk file (using some script).
Operations to the mounted bulk file are very fast, since the bulk file resides in a ramdisk. An experiment using this approach was conducted for a bulk file of 1 GiB. Considered file systems were ntfs, exfat, vfat, ext2,ext3 and ext4. File sizes were varied from 1 byte to 220 bytes. A plot summarizing the relative file system overhead for different file sizes and file systems is shown below:
From this figure it can be seen that file system overhead is excessive for small file sizes. ext2, ext3 and ext4 behave almost identical in terms of overhead. Minimal overhead in this experiment is observed for vfat at a file size of 65536 bytes per file. Strangely exfat is always outperformed by ntfs.

The scripts that were used to conduct this experiment can be downloaded here.

2013-12-19

Creating Tagclouds with PyTagCloud

Tag clouds are a nice way to visualize textual information. They provide a colorful overview of frequent terms of a text and they might also tell you something about it's writing style.

For instance, the following is a tag cloud of the famous paper "Cramming more components onto integrated circuits" by Gordon Moore. The script that was used to create it, can be downloaded here.
The script uses PyTagCloud, which gets most of the job done. Cloning the git repository, building and installing is straight-forward. Do not forget to have pygame installed.

Nice Tagclouds can not be created fully automatically. To create beautiful tag clouds, natural language text usually needs a bit of preprocessing. The script provided above uses NLTK for stop word removal and calculating term frequencies. Moreover it might be necessary to manually change term frequencies or to remove certain terms entirely.

PyTagCloud supports exporting the tag cloud to .png images. Exporting to HTML/CSS is also almost possible, but this feature seems a little broken at the time of this writing. PyTagCloud will not export correctly whether a term should be rotated or not resulting in tag clouds with overlapping terms.