GraphLab

GraphLab collaborative filtering library: efficient probabilistic matrix/tensor factorization on multicore

This webpage explains how to use GraphLab collaborative filtering library. In this library, multiple matrix decomposition algorithms are implemented. See description in the following papers:
Probablistic matrix/tensor factorization:
A) Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell, Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization. In Proceedings of SIAM Data Mining, 2010. html (source code is also available).

B) Salakhutdinov and Mnih, Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo. in International Conference on Machine Learning, 2008. pdf project website, since our code implements matrix factorization as a sepcial case of a tensor as well.

C) Alternating least squares: Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management. Shanghai, China pp. 337-348, 2008. pdf

D) SVD++ algorithm: Koren, Yehuda. "Factorization meets the neighborhood: a multifaceted collaborative filtering model." In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 426434. ACM, 2008. http://portal.acm.org/citation.cfm?id=1401890.1401944

E) SGD (sotchastic gradient descent) algorithm: Matrix Factorization Techniques for Recommender Systems Yehuda Koren, Robert Bell, Chris Volinsky In IEEE Computer, Vol. 42, No. 8. (07 August 2009), pp. 30-37.
F) Tikk, D. (2009). Scalable Collaborative Filtering Approaches for Large Recommender Systems. Journal of Machine Learning Research, 10, 623-656.

G) For Lanczos algorithm (SVD) see: wikipedia.

H) For NMF (non-negative matrix factorization) see: Lee, D..D., and Seung, H.S., (2001), 'Algorithms for Non-negative Matrix Factorization', Adv. Neural Info. Proc. Syst. 13, 556-562.

I) For Weighted-Alternating least squares: Collaborative Filtering for Implicit Feedback Datasets Hu, Y.; Koren, Y.; Volinsky, C. IEEE International Conference on Data Mining (ICDM 2008), IEEE (2008).
J) Pan, Yunhong Zhou, Bin Cao, Nathan N. Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One-Class Collaborative Filtering. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM '08). IEEE Computer Society, Washington, DC, USA, 502-511.

K) For sparse factor matrices see: Xi Chen, Yanjun Qi, Bing Bai, Qihang Lin and Jaime Carbonell. Sparse Latent Semantic Analysis. In SIAM International Conference on Data Mining (SDM), 2011.

D. Needell, J. A. Tropp CoSaMP: Iterative signal recovery from incomplete and inaccurate samples Applied and Computational Harmonic Analysis, Vol. 26, No. 3. (17 Apr 2008), pp. 301-321.

L) For SVD see Wikipedia

M) For time-SVD++, see Yehuda Koren. 2009. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 447-456. DOI=10.1145/1557019.1557072


N) For bias-SVD
Y. Koren. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Equation (5), pdf.

O) For RBM:
G. Hinton. A Practical Guide to Training Restricted Boltzmann Machines. University of Toronto Tech report UTML TR 2010-003 pdf.

GraphLab collaborative filtering library: efficient probabilistic matrix/tensor factorization on multicore

This document described the popular Graphlab collaborative filtering library. The library has been download thousands of times for far, and is widely used both in the industry and academia.

Solutions based on this library won the 5th place in KDD CUP 2011 track1 (out of more than 1000 participants) as well as small part of the 1st place in the same contest.

GraphLab collaborative filtering library was written by Danny Bickson. The goal is to factorize a user/item matrix into two lower dimensional matrices. In other words, we build a linear model for the data, which can be later used for prediction of user/item pairs not seen before.

Please ask any questions about usage / bug reports in our google group.

News

  • 18 May 2012: The first GraphLab workshop is coming! More than 80 participants so far from about 50 companies and 12 universities. Mark you calendar: July 9 in San Francisco.
  • 15 Dec 2011: GraphLab receives additional NSF grant of 200,000 cpu hours on BlackLight/Kraken supercomputers. read more.
  • 11 Nov 2011: time-SVD++ is now implemented as part of the GraphLab collaborative filtering library. read more.
  • 1 Oct 2011: GraphLab now supports Eigen linear algebra package. read more.
  • 23 Aug 2011: released an optimized version for BlackLight supercomputer, which is working 5 times faster! read more.
  • 21 Aug 2011: Our paper efficient multicore collaborative filtering appeared today at the ACM KDD CUP workshop 2011. pptx slides.
  • 16 June 2011: GraphLab based matrix factorization code ended in the 5th place (out of more than 1000 research groups) in Yahoo! KDD CUP 2011. read more.
  • 6 June, 2011: more than 300 unique installations of the GraphLab collaborative filtering library!
  • 5 April, 2011: Updated instructions on how to install and run Graphlab matrix factorization with Yahoo! KDD CUP data are found here

    PMF Documentation / User Manual