XGBoost: A Scalable Tree Boosting System
Tianqi Chen, Carlos Guestrin

Citation
Tianqi Chen, Carlos Guestrin. "XGBoost: A Scalable Tree Boosting System". Technical report, LearningSys, December, 2015.

Abstract
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

Electronic downloads


Internal. This publication has been marked by the author for TerraSwarm-only distribution, so electronic downloads are not available without logging in.
Citation formats  
  • HTML
    Tianqi Chen, Carlos Guestrin. <a
    href="http://www.terraswarm.org/pubs/767.html"
    ><i>XGBoost: A Scalable Tree Boosting
    System</i></a>, Technical report,  LearningSys,
    December, 2015.
  • Plain text
    Tianqi Chen, Carlos Guestrin. "XGBoost: A Scalable Tree
    Boosting System". Technical report,  LearningSys,
    December, 2015.
  • BibTeX
    @techreport{ChenGuestrin15_XGBoostScalableTreeBoostingSystem,
        author = {Tianqi Chen and Carlos Guestrin},
        title = {XGBoost: A Scalable Tree Boosting System},
        institution = {LearningSys},
        month = {December},
        year = {2015},
        abstract = {Tree boosting is a highly effective and widely
                  used machine learning method. In this paper, we
                  describe a scalable end-to-end tree boosting
                  system called XGBoost, which is used widely by
                  data scientists to achieve state-of-the-art
                  results on many machine learning challenges. We
                  propose a novel sparsity-aware algorithm for
                  sparse data and weighted quantile sketch for
                  approximate tree learning. More importantly, we
                  provide insights on cache access patterns, data
                  compression and sharding to build a scalable tree
                  boosting system. By combining these insights,
                  XGBoost scales beyond billions of examples using
                  far fewer resources than existing systems.},
        URL = {http://terraswarm.org/pubs/767.html}
    }
    

Posted by Tianqi Chen on 30 Mar 2016.
Groups: tools

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.