This is not gambling :)
Today I decided to resume one of my old projects: data crawling and mining on the Hong Kong Jockey Club!From my past experience, it is quite hard to predict the result of a horse race as there are too many variants, such as the horses' and riders'. So I decided to change the subject from horse racing to Mark Fix......
First, let me introduce the "Mark Six" (六合彩):
The concept is like Powerball. There are 49 balls in total. In each draw, 7 balls will be picked random by the machine and one of them is called "special number". To win the first place, you need to guess all 7 numbers correctly. So, we will know the possibility is about 1/85900584 by simple math.
This time, I will first crawl the data from the official website of Mark Six and then do some data-mining between the data and try to predict the result based on the findings. Except the basic calculations on the possibilities, I will also try to predict it through a neural network by representing the data as a series of binary numbers.
Data Crawling
To crawl the data, I take the advantage of htmlparser2 and shelljs. They are really handy. I do the prototype in a short time as the tools are really easy to use. (That's why I like Javascript very much 🤓) Shelljs will execute the curl commands and pass the standard output to the html parser. Then the parser will generate the database from all possible entries. For now, I just crawled 3000 draws from the site from 1993 to now. That's a special feeling. 🤑Next, I will start to do the data processing part on python (most likely will use sklearn). Hope that I could be a millionaire somedays~~~

沒有留言:
張貼留言