Publish/subscribe (pub/sub) is a popular communication paradigm in the design of large-scale distributed systems. We are witnessing an increasingly widespread use of pub/sub for a wide array of applications both in industry and academia. For instance, the pub/sub paradigm is used for RSS feed notifications, financial data dissemination and business process management. Pub/sub has also been used in social interaction message notifications such as in Spotify. Social network interactions have grown exponentially in recent years to the order of billions of notifications generated by millions of users every day. However, there are a number of critical challenges yet to be addressed to design a pub/sub system that can scale massively.
Pub/sub systems are generally deployed in centralized datacenters or using federated organizations of cooperatively managed servers. However, an increasingly higher number of pub/sub applications are being deployed in P2P environments due to their ability to provide scalable and robust decentralized solutions. The design of a system with a goal to support notifications at massive scale from social interactions has several challenges. For one, such a large-scale system has to possess a distinctively high number of desirable characteristics all at once in order to be a viable practical solution. However, we show that the existing state-of-the-art solutions provide only a subset of these characteristics. In this thesis, we propose PolderCast, a P2P topic-based pub/sub system that is fault-tolerant, robust, scalable and fast in terms of dissemination latency while attaining a low communication overhead. We do an extensive experimental analysis of PolderCast using Twitter and Facebook traces and show that PolderCast performs well under realistic churn compared to the widely used pub/sub system Scribe.
Understanding the challenges faced by a real pub/sub system and getting insights from the workload it drives are critical to design a pub/sub system. Yet there is a serious lack of detailed study of a large-scale pub/sub system and its workload. In this thesis, we present an overview of a pub/sub system used to drive social interaction at Spotify. We then present a detailed analysis of traces from a real deployment of Spotify pub/sub. We further analyze the Twitter traces we collected via public APIs provided by Twitter. The analysis of these traces provides several interesting observations and conclusions which can benefit pub/sub designers.
Inspired by the peer-assisted solution used by Spotify to stream music, we explore a similar solution to provide a scalable dissemination of notification events to the users. The task of distributing the workload among user peers and datacenter servers prompts a fundamental problem: How to select a subset of the pub/sub workload to be served by datacenter servers in a manner to maximize satisfaction requirements of users under resource constraints? In this thesis we provide, to the best of our knowledge, the first formal treatment of the above problem by introducing two metrics that capture subscriber satisfaction in the presence of limited resources. This allows us to formulate the problem as two new flavors of maximum coverage optimization problems. Unfortunately, both variants of the problem prove to be NP-hard. By subsequently providing formal approximation bounds and heuristics, we show however, that efficient approximations can be attained. We validate our approach using real-world traces from Spotify and show that our solutions can be executed periodically in real- time in order to adapt to workload variations.
One of the fundamental challenges which remains to be addressed in deploying pub/sub systems on a datacenter or a cloud infrastructure is efficient and cost- effective resource allocation that would allow delivery of notifications to all subscribers. Specifically, the challenge is to answer the following three fundamental questions: Given a pub/sub workload, (1) what is the minimum amount of resources needed to meet satisfaction requirements of all the subscribers, (2) what is a cost-effective way to allocate resources for the given workload, and (3) what is the cost of hosting it on a public Infrastructure-as-a-Service (IaaS) provider like Amazon EC2. We formulate the problem to address these questions and provide an efficient solution. We do an extensive evaluation of the solution using real traces from Twitter and Spotify. With evidence from the empirical results we show that our solution can be used as a tool to allocate resources on datacenters and cloud so as to minimize infrastructure costs.