Is Pinterest hosted on AWS?

Introduction

Pinterest achieved a unique space in the social media apps space.

When we open the Pinterest app, we cannot resist looking at good-quality images and scrolling for related images, but system lovers like me immediately notice its smooth algorithm, which keeps suggesting you images.

Pinterest calls itself a Visual Discovery Search Engine, where everything is related to images, the input is an image, and the output is also an image or images. On the surface, it looks so smooth and easy, but in any system, if things are very easy and smooth at the UI, actually, it’s more complex at the backend. As we discussed earlier, Pinterest deals with images; every image you see is recommended by AI and Machine Learning on its server.

We should make a note that AI and ML programs require more computing power to process images rather than text. So on the Pinterest server, huge computing is always happening.

Another thing we have to notice is that Pinterest requires huge storage space as it deals with images.

For all these computation and storage requirements, Pinterest is almost entirely dependent on Amazon Web Services.

Pinterest is one of the few big companies that relies on AWS from day 1. In one of the article we saw, Netflix is also using AWS heavily, but it was using its own Infrastructure initially and then shifted to AWS after some time. Read all the details here.

Brief History of Pinterest

We will not waste much time on the History of Pinterest, it was founded in the year of 2009 by Ben Silbermann, Paul Sciarra, and Evan Sharp.

Ben used to work at Google before launching Pinterest, and Evan Sharp was one of the main developers in the initial days of Pinterest

An interesting thing is, Pinterest was inspired by an app called Tote, created by the same founders.

Ben Silbermann said that for the first 5 thousand users, he offered his contact number and also offered to meet some of them.

How does Pinterest use Amazon Web Services?

Using S3 for storing images

Pinterest deals with images massively; in fact, Pinterest is all about images, so storing them is one of the most important things for Pinterest

Pinterest uses the S3 Simple Storage Service of AWS for storing images. Pinterest demands low latency and speedy fetching of the Images stored in S3.

We all know the quality and scale of the S3 APIs, which are extremely reliable and fast.

So S3 serves images for Pinterest users.

It stores billions of images (pins) but also videos, backups, and logs.

For every image, Pinterest stores multiple versions, like high, medium, or thumbnail for different experiences

Whenever a user uploads any image, Pinterest puts is Queue like SQS or Kafka and then processes the image, such as compressing or resizing, and then stores in to S3

Pinterest divides images into multiple classes

  1. Recent or Popular Pins: This is hot data, and for these images, it uses S3 standard
  2. Old Pins: These are old pins, and Pinterest use for intelligent tiering
  3. Archived: Pins that are rarely used, use the Glacier version

By classifying images, Pinterest saves S3 cost.

EC2 for Compute

The main logic of Pinterest is hosted and executed on EC2.

The initial version of Pinterest was written in Python, mainly using the Django framework used was Django.

But as Pinterest started gaining users and shifted more focus on image processing, only Python was not sufficient, and Pinterest started using Java, and for the performance-critical part, it uses Go lang

All this code is executed at EC2.

For user login, feeds, and query handling, EC2 is responsible.

Use of Autoscaling in EC2

Pinterest uses autoscaling features. Whenever traffic increases, AWS automatically adds more servers so that the app runs smoothly.

Number of EC2 Instances

Pinterest did not disclose the official number of instances of EC2, and moreover, the number is always dynamic due to autoscaling in use, but experts suggest that Pinterest uses tens of thousands of EC2 instances

We must also remember that EC2 is also responsible for ML interface training as well.

It uses different types of EC2 instances, for ex General purpose instances for APIs and Feed Generations

But for heavy computation like recommendation and ranking algorithms, it uses an EC2 instance, which is compute-heavy

For more intense AI and ML processing, it uses GPU instances as well

Amazon CloudFront

CloudFront is a global network of edge servers.

It delivers efficient caching for its customers, and it is one of the most powerful CDNs in the world

Pinterest uses CloudFront heavily. Let’s look at the main flow of how Pinterest uses CloudFront in easy language

Users Open Pin’s feed->App initiates a request for that image->Request goes to the nearest Cloudfront Edge Server->If image is cached, it is immediately returned ->If not, then it will fetch from S3.

Use of CDN like Cloufront is extremely necessary for Pinterest, because if every request goes to S3, it increases time for loading the image and also will increase cost because S3 is a bit costly, it costs not only api calling but also for bandwidth.

Amazon Dynamo DB

DynamoDB is a NoSQL database product by AWS; it is a key-value pair database

Pinterest stores images in S3, but images and pins also have metadata.

Apart from metadata, Pinterest has to store user-related data as well

For all this data, Pinterest uses Dynamo DB, it provide faster access because it’s a no-SQL database and does not have extra overhead like traditional DBMS

Amazon Aurora

While Dynamo DB is no SQL, Amazon Aurora is an SQL Database product by AWS

It is faster compared to traditional relational databases

Aurora has separate writer and reader instances by default, so it could handle massive read and write queries

Pinterest needs some SQL data also, where relationships matter for business logic and reports.

Elastic Cache

Amazon provides in-memory caching with Memcached and Redis.

In any system design, caching in RAM is very important. We must not query the database for frequently used data, because querying data and processing requires time, and higher processing time is always bad for user experience, especially when you have products like memcached and Redis, then you must always use them.

 

So we have covered all major AWS services used by Pinterest, but apart from what we have discussed, Pinterest also uses other AWS services because Pinterest is a massive system relying heavily on AWS.

 

Before we conclude, it is important to discuss what services Pinterest has built on top of AWS.

Big companies like Pinterest always create their own custom systems because their requirements are unique. Pinterest has created the following services on top of AWS

  1. Feed Generation System
  2. Visual Search System
  3. Image Processing Pipelining
  4. Real-Time Event Processing System

 

Thanks for reading!

 

 

 

DeepSystemStuff is not a normal Blog

We work on the WWH framework

For any concept related to Computer Science, we focus on

What (W): What the concept is, not a big picture, but a detailed description

Why(W): You won’t find many blogs that focus on why the concept exists, but we do

How(H): We focus on How concept works under the hood.

Be with DSS, you will get frequent system updates from our Chief Author, Nadeem Shaikh, who has achieved 1 million Quora views in 6 months and 6 million distribution of his answers. See his Quora Profile here

 

Leave a Comment