VIDEOS: Facebook’s @Scale 2014, Data Track

by David Cohen

Growing Facebook on Mobile, a Realtime Analytics Story - @Scale 2014 - Data

Anshul Jaiswal, Engineering Manager at Facebook; Weizhe Shi, Software Engineer at Facebook and Will Wirth, Product Analyst at Facebook Facebook has grown tremendously on mobile. In this talk we’ll discuss how realtime mobile analytics helped accelerate the growth by enabling faster feedback loops during product iteration cycles. We’ll discuss the challenges involved in logging data on mobile, processing it, and analyzing it in real-time. We’ll present the systems we developed to solve these problems and specific examples of how we used these systems to carry out improvements in our mobile app performance, stability, and engagement.

Data Platform Architecture, Evolution, and Philosophy at Netflix - @Scale 2014 - Data

Justin Becker, Senior Software Engineer at Netflix and Kurt Brown, Director of Data Platform at Netflix The data platform at Netflix has evolved from a traditional BI stack to a modern, cloud-based architecture. Some current production components include Hadoop 2, Pig (on Tez), Hive, Presto, Teradata (Cloud), and S3 as our central data hub, along with lots of custom tooling (including Netflix open sourced techs like Inviso, Lipstick, and Aegisthus). We'll run through our data platform evolution, what our current architecture looks like and why, and the philosophical principles which drive how we get things done.

Zen: Pinterest's Graph Storage Service - @Scale 2014 - Data

Xun Liu, Engineer at Pinterest and Raghavendra Prabhu, Engineering Manager at Pinterest Zen is a storage service built at Pinterest that offers a graph data model on top of HBase and potentially other storage backends. Zen was originally conceived and built in summer 2013 and since then, has grown to be one of the our most widely used storage solutions, powering the home feed, interest graph, notifications, messages, and other upcoming features. In this talk, we'll go over the design motivation for Zen and briefly describe the data model, API, and internals.

Building Scalable Caching Systems via McRouter - @Scale 2014 - Data

Rajesh Nishtala, Engineer at Facebook and Ricky Ramirez, Engineer at Reddit Modern large scale web infrastructures rely heavily on distributed caching (e.g memcached) to process user requests. The problems that McRouter addresses are not specific to Facebook, but distributed caching systems in general. As a result, Instagram and Reddit have also adopted McRouter as the primary communication layer to their cache tiers.

Structured Data at Box: How We're Building for Scale - @Scale 2014 - Data

Tamar Bercovici, Senior Engineering Manager at Box Serving hundreds of thousands of businesses, tens of millions of users, and storing metadata for billions of files, Box's database and caching tier are at the very core of their stack. The talk overviews Box's current MySQL-based sharded database architecture, including a few of the unique design choices they made and how they’ve panned out for Box.

Facebook's A B Platform Interactive Analysis in Realtime - @Scale 2014 - Data

Itamar Rosenn, Engineering Manager at Facebook This talk presents Facebook¹s platform for streamlined and automated A/B test analysis. We will discuss the system architecture, how it enables interactive analysis on near-realtime results, and the challenges we've faced in scaling the system to meet our customer needs.

MySQL for Messaging - @Scale 2014 - Data

Harrison Fisk, Data Performance Engineer at Facebook The original Facebook messaging system was designed and built as an email/message/sms hybrid system. As Facebook evolved to be a mobile-first company, the mobile-to-mobile message use case came to dominate this system. To improve that experience, Facebook has created a new architecture which uses MySQL on flash as a queueing system.

Scaling YouTube's Backend: The Vitess Trade-offs - @Scale 2014 - Data

Sugu Sougoumarane, Software Engineer at YouTube If your website or mobile service becomes hugely successful, you will quickly realize that one of the hardest parts to scale is the storage. You will also find yourself making various trade-offs related to transactions, consistency, availability and durability. We faced these challenges at YouTube, which gave rise to the development of Vitess, an open source project.

Read the full article