2016年7月30日 星期六

Take a peep on Genetic Algorithm (a.k.a mimicking naïve evolution)

Pre

Since stuff didn't go that busy on these days, finally got a chance to take a quick look on the Genetic Algorithm (GA) and understand the basics of it. Before this, I was overwhelmed by the side projects as well as the internship. The side projects are mainly about the node app and stuff about html, javascript and css. More specifically, one is React app and another one is Angular app. Doing these two projects simultaneously seems over-killing. Anyway, this is the whole point of life, right?

Let's get right into it


You know, a peep doesn't give you the whole picture. I just watched two great videos which explain the algorithm quite well. 

  1. A open lecture from MIT: https://www.youtube.com/watch?v=kHyNqSnzP8Y
  2. A kinda book sharing of the Nature of Code (good book though): https://www.youtube.com/watch?v=6l6b78Y4V7Y
After these two videos, I got the full picture about the GA. However, it is becomes more and more complicated when the problem size and all the controls you could make to improve the accuracy as well as diversity.

There are a few terms which would be stuck in your head if you are going to look around the GA.

Fitness

A mathematical function (or map function) to evaluate how well the current population is. 

Genotype and Phenotype

Genotype is the data structure (or say the binary) of your population, just like the chromosome (or say DNA) in our body which determined the phenotype.

Phenotype is the physical representation of your genotype. You can imagine it as the iris, hair color, and height of humans.

Evolution

The whole point of GA is to mimic the evolution again in your problem space. There are some basic steps:
  1. Initialize the state or random generate some "creatures"
  2. Evaluate them by the fitness function
  3. Select some amount of population to survive according to some functions (can imagine as a simple probability that higher fitness would be more likely to be selected) [Natural Selection]
  4. Reproduce and Crossover the genotypes
  5. Mutate the genotypes according to a small probability
  6. With this new set of population, go back to step #2 and loop until the appropriate solution is found
Actually, GA is nothing like super fancy. But it sounds quite fancy and professional by itself. I will try to do a simple demo in the following weeks and try to dig more into it and see if I can find out anything special.

Post

After reading through these materials, I just felt like computer scientists are quite keen on mimicking the whole world (maybe be should say God?). Look at what we have today, the neural network, which is going to be competitive to human brains in the following years. So...should there any stance between this kind of mimicking and the nature? In the video, there is one interesting question asked if we put the same algorithm back to real life. Would that be possible? Of yes, that's why we can survive till nowadays. Stop until it grows? We may say "Stop before it starts".


2016年7月20日 星期三

De-the-bug, oh yeah!

Let the bug crawls!

"A bug is never just a mistake. It represents something bigger. An error of thinking. That makes you who you are." - Eliot @ Mr. Robot

Today, we just spent a half day to de-a-bug. This is a tough, really really tough bug. Let me tell you a story of this:

TL;DR

1. Last week, I found that the performance of one of the Power8 clusters for running the Spark is abnormal. Also, it is even slower than another Intel Xeon machine which has "worse" specs than this one. 




2. As the power8 machine has more cores and more memory, even the clock speed is faster than the Xeon machine, we totally have no idea on this issue. 

3. First, we tested with a simple Spark application but no luck. We tried to dig into each stage step by step to see the duration, write time and serialization delay sort of things.


4. After that, we put the focus on the JVM. We suspected that JVM would cause a performance difference as IBM Java and Oracle Java are used on Power8 and Xeon respectively. However, the performance still differed after installing OpenJDK on both machines for a simple Java sorting program.


5. As we did't have any idea on Java, we targeted on something more fundamental --- C.


6. Guess what? After dealing with the optimization flags and all sorts of CPU benchmarking, we concluded that even the Power8 has "higher" specs than Xeon, Xeon stills outweigh it because of an all-rounded functionality and optimized instructions set.




Oh, well ...... my suggestion is ...... don't buy something from a company which you would think whether it has been closed down already. :)

2016年7月16日 星期六

A new journey in Levyx, Inc and More

Turn a new page

After spamming all the companies and startups nearby, I finally got a position in Levyx, Inc. This is a great to place to work, hmm... not exactly only but learn much as well. Great place, nice people, awesome food and free environment make me enjoying here so much!


Diving into Big Data

One of the important goals for me to choose working here is that I can take this awesome opportunity to learn the things that I couldn't learn by myself in terms of resources, environment, and timing.

Before the work, I was told to get familiar with Scala which is a functional programming running on JVM. The reason is that the framework we are using now, the Spark is based on Scala (actually, it does support Python and R natively). 


Spark is a framework which emphasizes on the parallel processing of the data. When you start dealing with Spark, you will get overwhelmed with the term, "RDD" which is "Resilient Distributed Datasets". Because of the power of RDD, Spark could result in high performance in data processing as well as other application like Machine Learning and MapReduce.

Once more thing, I got the access to some "powerful" machines as well for the profiling. The "powerful" I talked here is about 60+ cores and 200GB+ memory with 3TB+ SSD. I will find a better way to utilize (or torture) these machines. 😎

Oh, by the way...

I am still in California for this internship. Yes, I will go back in September. See you guys there~~~