Friday, November 28, 2008

Not so much increase in parallel performance

A few days ago I was celebrating the dramatic improvement of computing
performance using "snow" package in R.
(http://somerandomwalksofmind.blogspot.com/2008/11/my-first-parallel-program.html).
But later when I posted the question on R mailing list
(http://www.r-project.org/mail.html), some people pointed out that in
fact I did not such more-than-doubling improvement in performance.

The trick, according to Stefan Evert
(http://www.nabble.com/More-than-doubling-performance-with-snow-td20654005.html),
is to look at the elapsed time in the system.time() output. In that
case the boost in speed is not so large. For example:

> library(snow)
>
> cc <- makePVMcluster(2)
>
> n.size <- 1000
>
> temp <- NULL
> for(i in 1:10){
+ x <- list(matrix(rnorm(n.size^2),n.size))
+ temp <- c(temp,x)
+ }
>
> system.time(t.1 <- clusterApply(cc,temp,"solve"))
user system elapsed
2.980 0.548 21.909
> system.time(t.2 <- lapply(temp,"solve"))
user system elapsed
24.290 0.636 25.058

So to further gain increase in execution in speed using snow I may
have to have a computer with many cores (much more than my current duo
core laptop).

Sunday, November 23, 2008

My first parallel program

My new laptop computer is a Lenovo Thinkpad T400 with Intel Core 2 Duo
processor. And I'm naturally seduced to do some experiment on parallel
programming. The result is far more than exciting. I tried the
following program:

library(snow)
cc <- makePVMcluster(2)
n.loop <- 10
n.size <- 1000
temp <- NULL
for(i in 1:n.loop){
x <- list(matrix(rnorm(n.size^2),n.size))
temp <- c(temp,x)
}

system.time(t.1 <- clusterApply(cc,temp,"solve"))
system.time(t.2 <- sapply(temp,"solve"))

The serial program took 23.9 seconds while the parallel one took only
3.0 seconds. That's more than double the performance!

Then I n.loop to 100 and n.size to 100. This time serial processing
outperformed parallel processing.

I guess this is a result of the trade-off between parallel
communication and computational time. When computational task is
relatively heavy, the time spent on communicating between processing
units are relatively insignificant. Thus the parallel method works
better. When the computational task is light, more time is wasted
sending messages. And the parallel way takes more time.

Monday, November 17, 2008

Ranking the living organism for environmental protection.

I have to admit that my thought is wicked at today's English as a
Second Language class. During the discussion session we were faced
with controversial over the trade off between environmental protection
and economic concerns. The text book Raise the Issue gave a very
extreme case that in Tennessee the construction of a dam was suspended
in the name of protecting an endangered species of snail. I was very
surprised about this. The dam may be of the welfare of many people in
terms of flood control and hydro-electricity production, which is on
the other hand good for the environment.

Then I ran into the idea that no consensus can be reached if we hold
on agonizing over these choices. The importance of wild life and
welfare of human being can hardly be preserved at the same time. And
we have to device a cut point, as we simply can not save every living
thing on our effort. An ideal cutting line is that on one side of it
the wildlife are protected, and on the other the wildlife protection
is not assigned top priority.

A very intuitive solution to this is to rank the living organisms
according to their genetic similarity with human. This argument is
well founded if we consider the ultimate goal of environmental
protection as self-protection. Thus the closer we are with some
non-human life, the more related our fate will be. The deterioration
of the habits of these living forms may lead to immediate threaten to
human society.

Another issue raises easily as how to find the cutting point.
Considering that the biosphere is a complicated graph of food chain,
where each point stands for a specie, and each edge stands for a
dependency relationship. We may simply find within our budget the set
of points with greatest sum of similarity and smallest dependency on
those which are not included in the set.

And this may give some rational criteria in the environmental issues,
and save the time and energy out of debates and campaigns. Some may
argue that the living things that are distant to human may also be
important. However, they may also get protected if they live within
the protection area for the larger living forms which are listed as
protection object.

Of course the idea above is no simple deed. While the current
development in biological science can provide a tree of phylogeny
similarity, the graph of direct dependence may not be easily obtained.
Besides, the budget mentioned above is difficult to estimate. Thus
this may take the synergy of both (bio)statisticians, biological
scientists, and economist to achieve the ultimate results.

Saturday, October 4, 2008

Finally, my computer is running again.

I was surprised at the monthly bill from Emory. The 2.5 months housing bill added up to this month and surpassed my monthly income. And to make things worse, my computer was crashing from time to time after I did some modification to the partitions. Then I thought that maybe my BenQ Joybook was too old to accompany me any more and I need to get a new computer. But the financial burden I'm facing has made it very difficult. So I decided to give a last shot before taking on more balance in my credit card.

I first used "free" command in terminal and discovered that the swap has size zero. How stupid! Some modification may have disabled the system from automatically mounting swap partition. Then I swapon'ed using partition manager of ubuntu (qparted).

To completely solve this problem I googled "ubuntu auto swapon" and come into this website:

https://answers.launchpad.net/ubuntu/+question/34437.

Then I realized that it is the UUID problem. So I edited fstab according to the output of blkid.

Finally I rebooted. Now it's running OK again. :)

Friday, June 27, 2008

Commencement

The Commencement of Fudan University 2008 is held in Zhengda Sports Center this morning but I had to skip the ceremony for Real Analysis exam. I feel so upset that I had to miss this precious moment for the exam of a course which is not required by my department.

Many people told me that Real Analysis willl have very good impact for later academic career in statistics. But I strongly doubt this now, partly because I finally discovered what one may possibly give up to get such a plus in knowledge structure. And paying too much attention to the mathematical details may even distract a person from seeing statistical problems in a broader view.

After all, I'm not graduated from Fudan until a couple of days later when I finish my Stochastic Process exam.

Thursday, May 1, 2008

Some thoughts on cloud computing

Thanks to the help Sun Grid Engine(SGE) now I can have a good time with my girl friend while having several jobs running in parallel in PICB's mini-cluster. The concept of grid computing seemed extremely difficult to me as I first got in touch of this in Oracle's advertisement. However, the usage of SGE is really simple to the end users like me. As I just have to learn to use the simple command like qsub, which specifies which program to submit to the grid engine, and qdel, which tells the engine to kill a thread. The operations are so simple and the output file are directly stored in my home folder, where I can take a look the next day.

This reminds me of the articles I read about cloud computing in Business Week a couple of month ago. Which seemed too far away at the very beginning. In could computing, as the media say, the user just need to enter the jobs in remote terminal, and the results will come. It is super stable, super scalable, super cost-effective, and even super environment-friendly, as the computing center may have some optimized cooling system.

However, I am still concerned whether the cloud will assign my jobs, which are mostly computation intensive, to the a few of its fasted CPUs, or just randomly throw it to one of its nodes. Maybe this will not be known by the cloud until the program gets started. An alternative solution is to specify the number of threads manually, which is obviously not what we want to see.

This may partly make Linux system even stronger, because Linux, which is free, seems an idea operating system for this kind of multi-node task, and the system is growing rapidly, with the help of GNU GPL.

And finally, this kind of computing service will surely find its way into family entertainment and personal computing areas. But I guess individual families are unlikely to contact directly with computing service vendors. But maybe some the-Comcast-of-tomorrow company will surely do this, giving out a new kind of uniform OS, which I hope will be open source.

But to have individuals using this kind of service, we need either (1) they have weak computers or (2) their programs are too challenging. While the first condition seems very unlikely in countries like United States, it may be possible to do this in developing countries, who may outsource the could computing to industry giants like IBM. Then these country just need to develop their OLPC and build good internet connection. The second is even simpler. Just think between the age of 10 and 25, how many of your friends' computer upgrades are caused by Blizzard.

Thursday, April 24, 2008

Performance of loop functions in R

I've recently paid special attention to loop functions in R because of my intern jobs at PICB, which usually take a couple of days to run on the server. A colleague told me that the apply() and similar functions may work well to accelerate the program performance. I doubted this at the very beginning. However, I implemented lots of my program using replicate() and the code seem much nicer.

Then I came into this post, saying that apply is even slower than for loops. And this forces me to do some simulation on my own. Some simulations study between sapply and for loop shows that the former one has better performance than the latter one. Maybe we need to choose the right *apply function to get the best performance.

This also brought me an alternative solution to the loops on vectors which I usually meet in practical programming. Some times I use Vectorize() on the atomic functions to process the whole vector. This also yields tidy codes.

Finally, the simulation results shows that the performance is like:

sapply > Vectorize > for

I guess I should consult the R mailing list for better understanding of the performance evaluation and improvement.