This thesis program is the result of my work from Dec 2014 until around Mar 2015. It is a parallelisation of swarm - an existing single-linkage clustering algorithm for metagenomics data. This program arranges the complete hierarchical clustering workflow into of a set of finely-grained computations (referred as row calculations) that can be run in parallel. The concept of "subseed" was re-used from swarm to skip most of the redundant comparisons (referred as economic search).

The code for global pairwise alignment and k-mer filtering scheme were also re-used from Swarm since they were proven to be very efficient with the SIMD utilisation.

Based on the tests that were run on medium sequence length dataset (averagely 381 nucleotides), the speedup that can be achieved through the parallelisation is averagely 11 times when using 16 threads. This result indicates a successful parallelisation technic which according to Amdahl's law it means 97% of the code were able to be parallelised.

4 level economic search

sequence distance triangle inequality

Is write to disk faster than write to memory? Don't think so


Not long after being featured in IT Worldthis paper went viral in Slashdot.

Code was written in different language: Java and Python and this paper claimed to prove that "in-memory DOES NOT always work faster than disk operation". I couldn't help but to read the code because it cannot be true.

First of all, the code is not proving that memory operation is slower than disk operation.

First part of the code repeatedly concatenate the String object "1" as a single byte using `concatString += addString` which is a proven inefficient way to do it (mentioned in every Java programming book). This is irrelevant with memory operation being slow.

Second of all, what "Single Write to Disk Time" does is actually not an in-memory operation. This is just the time used to write an object of a very big String (length of 1,000,000) into a file. What will happen is that CPU will fetch buffers of data from RAM and then pass it to the file.

The part "Write to Disk Time" that is claimed to be faster is the part where "1" is repeatedly written to BufferedWriter for 1,000,000 times. Of course it will be faster because "1" will be stored in CPU cache since it's used for 1 million times and fetching from CPU cache is very fast because the data is already there in the CPU.

It is not a surprise that reading "1" from CPU cache for 1,000,000 times is faster than reading a String containing 1,000,000 of "1"s from the RAM.

So, no, disk operation cannot be faster than memory operation.

OTOH this could also remind us how important it is to validate any research paper before citing or referring from it.

Setting up Valgrind on Mac OS Yosemite


Valgrind is a popular tool for debugging memory leak in C++. By using this tool you will be able to see objects that were forgotten to be freed/deleted, and potential bug from code that Valgrind doesn't like.

Installing Valgrind on older Mac OS is very simple and can be done from brew. However the most recent Mac OS 10.10 (Yosemite), has not been officially supported by Valgrind. Hence to be able to use this, installation must be done manually from their latest svn trunk which is also very simple.

svn checkout svn://svn.valgrind.org/valgrind/trunk

Followed by going into the directory trunk and:

make install

While running autogen.sh one might get an error saying that aclocal was not found.
./autogen.sh: line 6: aclocal: command not found
error: while running 'aclocal'

In this case the automake needs to be installed prior to running this script. The easiest way to do it is to install from brew, such as:

brew install automake

To start using valgrind:

valgrind --leak-check=full --show-reachable=yes --log-info=leak.log "[your program and args]"

For less detailed information and log to stdout:

valgrind --leak-check=yes "[your program and args]"

How to Setup GDB in Mac OS


The last time I wrote something in C++ was 10 years ago during the lab session in the university, and recently I just decided to write my thesis project in C++ because of many reasons. First time writing after 10 years wasn't so easy and I got this error message on the screen when running the code:

Segmentation fault: 11

Not so helpful, is it? Meaning some errors relating to pointer happened somewhere in the program.

In Java (and probably many other modern programming languages too) usually a complete stack trace will be printed at least on the stdout, even if it's the programmer's fault for not catching the exception. Apparently in C to debug something like this there is a tool called GDB that can be easily installed in linux. On Mac it can be installed from brew, followed by a simple step of adding keychain access for gdb.

So to install:

  1. brew tap homebrow/dupes
  2. brew install gdb
  3. Open /Applications/Utilities/Keychain Access and create a certificate in "System". Things to highlight here are (chronologically):
    • Identity type: Self signed root
    • Certificate type: code signing
    • Let me override defaults
    • Store in "System"
  4. Get info of the just created certificate and set everything to "Always trust"
  5. codesign -s [name of certificate] $(which gdb)
Start using GDB:
  1. gdb [name of executable]
  2. run
    • up to this point the program will run and print out line number where the error happened and if need more information use backtrace.
  3. backtrace 
  4. help catch exception followed by catch throw and re-run debug to help catch exceptions that were not caught in code.
  5. quit

C2 Cubic Piecewise Interpolation with Different Parameterization Methods


On the previous geometric modelling class assignment, we were given some points in dimension 2, and then asked to implement a C2 cubic interpolation with 3 different parameterization methods.

Here I will try to explain briefly everything related to the image above as much as I can in layman terms. 


Remember in junior high when mathematics was very basic, where we were given 2 points $(x_1, y_1); (x_2,y_2)$ and then we were asked to find a line passing these points? This is what interpolation means. Only that we would want a spline instead of a line, which means that it will involve polynomial equations. A line $f(x)=ax+b$ itself is a degree one polynomial.


Piecewise means that we want the interpolation to be done separately by considering only 2 points at one time. And then we connect those pieces into one complete spline.

C2 Continuity

Above we also see the terms C2 cubic interpolation. What C2 means is that we want the spline segments to be connected in the way such that their second derivative are the same. Here I call a spline between 2 points as a spline segment. 


For this term we may refer to Bézier curve where it indicates that the curve will have 4 control points. Here is the equation for Bézier representation:
$p(u)=\sum_{i=0}^n c_i B^n_i(u)$ where $B^n_i(x)=\binom{n}{i}t^n (1-x)^{n-i}$


The image above was drawn using 3 different parameterization methods:
  • uniform for $\mu=0$
  • chordal for $\mu=1$
  • centripetal for $\mu=0.5$
The $\mu$ was used to calculate the length of each parameter in their sequence $t_0,t_1, ..., t_n$ by using $t_{i+1}-t_i=\|x_{i+1}-x_i\|^\mu$

To make it easier to imagine how these $\mu$ values will affect the shape of the spline, I made up a little story by looking the points as the racing game where the driver were required to pass these points in order to complete the race. Here we have 3 different drivers with different driving style.

Racer #1 drives with the same speed all the way, this is why he made sharp turn when he has to pass points that are closed to each other. This causes the ^ section which is called as cusp.

And then moving on to racer #2 who adjusts his speed extremely based from time to time by observing how far away the next point is. If the next point is near then he will slow down, otherwise if it is still far away then he will increase speed. This causes the last turn to be a late apex.
In racing, a driver will often use a "late apex," turning into the corner a little later than normal in order to straighten out the last part of the corner. This allows the driver to accelerate earlier and harder, gaining maximum speed down the next straight.
Lastly racer #3 is also the type who adjusts speed based on how far away the next point is. But he does not make too much extreme changes of speed along the way. So here we can see that the driving path he made is the one passing those points elegantly.

The question now is which racers complete the trip the earliest?
They completed it in the same amount of time!