GraphLab(2): Machine Learning for Big Data in the Cloud
Carlos Guestrin

Citation
Carlos Guestrin. "GraphLab(2): Machine Learning for Big Data in the Cloud". Tutorial, May, 2013.

Abstract
Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle "Big Data." Unfortunately, implementing efficient parallel ML algorithms is challenging. Existing high-level parallel abstractions such as MapReduce and Pregel are insufficiently expressive to achieve the desired performance, while low-level tools such as MPI are difficult to use, leaving ML experts repeatedly solving the same design challenges. In this talk, I will also describe the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.

Electronic downloads


Internal. This publication has been marked by the author for TerraSwarm-only distribution, so electronic downloads are not available without logging in.
Citation formats  
  • HTML
    Carlos Guestrin. <a
    href="http://www.terraswarm.org/pubs/57.html"
    ><i>GraphLab(2): Machine Learning for Big Data in
    the Cloud</i></a>, Tutorial,  May, 2013.
  • Plain text
    Carlos Guestrin. "GraphLab(2): Machine Learning for Big
    Data in the Cloud". Tutorial,  May, 2013.
  • BibTeX
    @tutorial{Guestrin13_GraphLab2MachineLearningForBigDataInCloud,
        author = {Carlos Guestrin},
        title = {GraphLab(2): Machine Learning for Big Data in the
                  Cloud},
        month = {May},
        year = {2013},
        abstract = {Today, machine learning (ML) methods play a
                  central role in industry and science. The growth
                  of the Web and improvements in sensor data
                  collection technology have been rapidly increasing
                  the magnitude and complexity of the ML tasks we
                  must solve. This growth is driving the need for
                  scalable, parallel ML algorithms that can handle
                  "Big Data." Unfortunately, implementing efficient
                  parallel ML algorithms is challenging. Existing
                  high-level parallel abstractions such as MapReduce
                  and Pregel are insufficiently expressive to
                  achieve the desired performance, while low-level
                  tools such as MPI are difficult to use, leaving ML
                  experts repeatedly solving the same design
                  challenges. In this talk, I will also describe the
                  GraphLab framework, which naturally expresses
                  asynchronous, dynamic graph computations that are
                  key for state-of-the-art ML algorithms. When these
                  algorithms are expressed in our higher-level
                  abstraction, GraphLab will effectively address
                  many of the underlying parallelism challenges,
                  including data distribution, optimized
                  communication, and guaranteeing sequential
                  consistency, a property that is surprisingly
                  important for many ML algorithms. On a variety of
                  large-scale tasks, GraphLab provides 20-100x
                  performance improvements over Hadoop. In recent
                  months, GraphLab has received thousands of
                  downloads, and is being actively used by a number
                  of startups, companies, research labs and
                  universities.},
        URL = {http://terraswarm.org/pubs/57.html}
    }
    

Posted by Mila MacBain on 6 May 2013.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.