之前shinchen同学提出“一起学习Mahout"，大家反响得很强烈，甚至有同学建议建一个C++版的机器学习开源项目。本着“不重复造轮子”的原则，我google了一下“机器学习开源项目”，发现这方面国内总结的不多，倒是发现了一个国外的非常不错的机器学习开源工具集-MLOSS(Machine learning open source software)，这个网站上目前已经收集了400多个开源的机器学习工具包，各种语言各种算法实现，对于机器学习或数据挖掘感兴趣的朋友来说，绝对是一个宝库。关于MLOSS的背景和目标，以下引用其官方网站的说明：
Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for a wide range of applications. Inspired by similar efforts in bioinformatics (BOSC) or statistics (useR), our aim is to build a forum for open source software in machine learning.
If you want more background about why open source software is important for machine learning, read our position paper about the need for open source software in machine learning.
If you have written machine learning software, consider adding it to the projects at mloss.org.
In case your machine learning software can be considered a useful, mature piece of work consider a submission to the JMLR track for machine learning open source software.
Our goal is to support a community creating a comprehensive open source machine learning environment. Ultimately, open source machine learning software should be able to compete with existing commercial closed source solutions. To this end, it is not enough to bring existing and freshly developed toolboxes and algorithmic implementations to people's attention. More importantly the MLOSS platform will facilitate collaborations with the goal of creating a set of tools that work with one another. Far from requiring integration into a single package, we believe that this kind of interoperability can also be achieved in a collaborative manner, which is especially suited to open source software development practices.