How TikTok Works: Decoding System Design & Architecture with Recommendation System
TikTok is not merely a mobile app, but it’s a revolution.
The first ever non-Meta app to reach 1 billion users, and 3 billion app downloads, the incredible success of TikTok has stunned the tech world.
As we speak, more than 1.04 billion users are watching short videos on this app, creating content, and showcasing what happens, when the right technology meets the right audience.
Generating more than $20 billion in ad revenues every year, TikTok is a phenomenon, that seldom happens.
Nothing short of magic!
In this blog, we will discuss the system design and architecture of TikTok, to understand and decode how it works. We will also understand the famous recommendation system deployed by TikTok, which generates an insane amount of engagement and forces the user to stay on this app, for a longer time.
But before that, let’s have an overview of TikTok and a brief on its amazing history.
The Origin Of TikTok
ByteDance, a Chinese Internet company, launched Douyin, a short video app in 2016, only for the Chinese market. Within a year, Douyin amassed more than 100 million users and enabled 1 billion views.
Buoyed by the success of this app, ByteDance launched TikTok a year later, for the global market, in 2017.
In between, ByteDance acquired Musical.ly, an application that enabled users to create lip-sync and comedy videos, based on existing videos. In 2018, the features and database of Musical.ly was integrated into TikTok, which enabled a movement of user-generated videos, and history hasn’t been the same again.
Soon, celebrities such as Jennifer Lopez, Jessica Alba, Will Smith, and Justin Bieber joined TikTok to showcase their videos, and the user base exploded soon after.
5 years after its launch, TikTok crossed 1 billion users in 2021, thereby becoming the fastest app ever to reach this threshold, and 1st ever non-Meta app to have this distinction.
The highlight of TikTok’s success is attributed to an amazing engagement rate of 18-20%, in different nations and demographics.
On average, a typical TikTok user opens the app 19 times a day, each user spends 23.3 hours every month on this application, which is among the highest in the world.
In 2022, more than 1 billion hours of content was accessed on this app and was ranked among the top 5 most popular applications on Earth.
At a brand valuation of $85 billion, and market valuation of $200 billion, TikTok was the most valuable startup in the world (2023).
What Exactly Is TikTok?
TikTok is a short video app, that allows users to create, share, and discover short videos ranging from 15 seconds to one minute. While users can create upto 10 mins long videos, only 60-second videos are allowed to be uploaded.
The main users of this app are the young audience, with more than 60% of users aged between 18-25, catering to the need for swift entertainment, with a continuous stream of short, engaging videos, highly targeted as per the users’ preferences and choices.
TikTok is based on the architecture and UI of Douyin, which mainly caters to the Chinese audience, while TikTok caters to a global audience.
The primary monetization model of TikTok is advertisements, perfectly complimented with affiliate marketing, sponsored content, live gifts, and more.
TikTok System Design: Key Components
TikTok’s architecture is built on three primary components: Big Data Frameworks, Machine Learning, and Microservices. This combination allows TikTok to efficiently process vast amounts of data and deliver personalized content to users in real time.
1. Big Data Frameworks
Big data frameworks are essential for processing the enormous volumes of data generated by TikTok users. These frameworks enable:
Real-Time Data Processing: TikTok utilizes technologies such as Apache Kafka for real-time data streaming, allowing for immediate processing of user interactions.
TikTok System Design
Data Storage: The platform employs distributed databases to store user profiles, videos, and engagement metrics, ensuring quick access and retrieval.
2. Machine Learning
Machine learning is at the heart of TikTok’s recommendation system. The platform employs various algorithms to analyze user behavior and preferences, enabling hyper-personalized content delivery. Key aspects include:
Deep Learning Models: TikTok uses neural networks to analyze video content and user interactions, improving the accuracy of recommendations.
Candidate Generation and Ranking: The recommendation process consists of a candidate generation stage, where a subset of videos is selected, followed by a fine ranking stage that determines the most relevant videos for each user.
3. Microservices Architecture
TikTok employs a microservices architecture to enhance scalability and maintainability. This design allows different system components to be developed, deployed, and scaled independently.
Key features include:
Service Mesh: TikTok uses service mesh technologies to manage communication between microservices, ensuring seamless data exchange and reducing latency.
Containerization: The platform leverages Kubernetes for container orchestration, facilitating efficient resource management and deployment of services.
Deep-Diving Into TikTok System Architecture
At its core, TikTok’s architecture is a distributed system designed for high scalability, low latency, and real-time processing. It employs a microservices architecture, allowing for independent scaling and development of various components. The system can be broadly divided into the following layers:
Client Layer: Mobile apps and web interfaces
API Gateway Layer: Handles incoming requests and routes them to appropriate services
Application Layer: Consists of various microservices
Data Storage Layer: Manages persistent data storage
Content Delivery Network: Ensures fast video delivery worldwide
Key Components of TikTok’s Architecture:
Frontend
TikTok’s frontend is primarily mobile-based, with apps for iOS and Android. These clients are responsible for:
Video playback and rendering
User interface and interactions
Local caching for improved performance
Video recording and editing features
The front end communicates with the backend services through RESTful APIs and WebSocket connections for real-time features.
Backend Services
TikTok’s backend is composed of numerous microservices, each responsible for specific functionalities:
User Service: Manages user profiles, authentication, and social connections
Content Service: Handles video uploads, metadata, and content management
Feed Service: Generates personalized video feeds for users
Interaction Service: Processes likes, comments, and shares
Search Service: Enables content and user discovery
Analytics Service: Collects and processes user behavior data
Notification Service: Manages push notifications and in-app alerts
These services are likely implemented using a combination of technologies, possibly including:
Programming Languages: Go, Java, Python
Frameworks: Spring Boot, gRPC
Message Queues: Apache Kafka, RabbitMQ
Data Storage
TikTok’s data storage requirements are immense and diverse. The architecture likely incorporates:
Relational Databases: For structured data like user profiles and video metadata
Possible technologies: MySQL, PostgreSQL
NoSQL Databases: For handling unstructured data and scaling horizontally
Possible technologies: Cassandra, MongoDB
In-Memory Databases: For caching and real-time data processing
Possible technologies: Redis, Memcached
Object Storage: For storing video files and other large media
Possible technologies: Amazon S3, Google Cloud Storage
Content Delivery Network (CDN)
To ensure low-latency video delivery worldwide, TikTok employs a robust CDN. This network of distributed servers caches content closer to end-users, significantly reducing load times. TikTok likely uses a combination of third-party CDN providers and its own edge network to optimize content delivery based on geographical locations.
Video Processing Pipeline
When a user uploads a video to TikTok, it goes through a sophisticated processing pipeline:
Upload: The video is uploaded to TikTok’s object storage system.
Transcoding: The video is converted into multiple formats and resolutions for different devices and network conditions.
Feature Extraction: AI models analyze the video content, extracting features like objects, scenes, and audio characteristics.
Thumbnail Generation: Attractive thumbnails are automatically created.
Content Moderation: AI and human moderators check for inappropriate content.
Indexing: The video and its metadata are indexed for quick retrieval and search.
This pipeline is likely implemented using a combination of serverless functions and batch processing systems, allowing for efficient scaling based on upload volumes.
Real-time Features and Streaming
TikTok’s real-time features, such as live streaming and instant notifications, require a different architectural approach. These components likely utilize:
WebSocket connections for bi-directional communication
RTMP (Real-Time Messaging Protocol) for live video streaming
Pub/Sub systems for distributing real-time events
Technologies like WebRTC might be employed for peer-to-peer communications in features like TikTok Live.
Scalability Strategies By TikTok
To accommodate its rapidly growing user base, TikTok employs several scalability strategies:
Horizontal Scaling: The system is designed to scale horizontally, allowing additional servers to be added as user demand increases. This approach helps distribute the load effectively.
Caching Mechanisms: TikTok utilizes caching strategies, such as Redis, to store frequently accessed data, reducing latency and improving performance.
Load Balancing: Load balancers distribute incoming traffic across multiple servers, preventing any single server from becoming overwhelmed during peak usage times.
TikTok’s Recommendation System: How It Works
TikTok’s recommendation system stands as a paragon of efficiency and effectiveness in the world of content curation. Operating at an unprecedented scale, it outperforms even the most sophisticated systems developed by tech giants.
The challenge lies in its non-stationary training data, where user interests can pivot rapidly, coupled with the ever-expanding universe of users, videos, and advertisements.
Architecture Overview
The system follows a multi-stage process, differentiating itself through its unique approach to each component:
User Interaction: The process initiates when a user opens the TikTok app, triggering a request for a video feed population.
Service Request: This action prompts a request to the TikTok service.
Recommendation Engine Activation: The service then calls upon the recommendation engine for feed ranking.
Candidate Generation: In this crucial first stage, a subset of approximately 100 relevant videos is selected from a pool of hundreds of millions. This stage employs two key components:
a) Deep Retrieval Model
b) Simple Linear Model
Fine Ranking: The second stage involves a meticulous ranking of the selected candidates, ensuring the most engaging content appears at the top.
Content Delivery: The final ranked list is transmitted to the user’s device.
Candidate Generation Stage
The Deep Retrieval Model:
Unlike traditional recommender systems that rely on feature matching or latent representation comparisons, TikTok’s approach is revolutionary. Instead of iterating over all possible items – a computationally expensive process given TikTok’s vast video library – the Deep Retrieval model directly generates candidates for a given user.
Model Architecture:
The model utilizes a multi-layer perceptron (MLP) with a tree-structured output layer. This structure enables the model to map users to items through a series of binary decisions, creating a path through the tree. Each leaf node in this tree corresponds to a set of items, allowing for efficient retrieval.
Training Methodology:
The discrete nature of mapping items to paths precludes the use of gradient descent. Instead, the model employs a likelihood maximization principle, akin to clustering problems. The training process uses the Expectation-Maximization (EM) algorithm:
Expectation Step: Backpropagation of the loss function.
Maximization Step: Path mapping using only the highest probability paths via beam search.
This approach allows the model to learn parameters that effectively represent user-item pairs in the data.
Fine Ranking Stage: While the candidate selection prioritizes latency and recall, ensuring all relevant videos are included even at the cost of some irrelevance, the fine ranking stage optimizes for precision.
Key Characteristics:
Latency Tolerance: With only ~100 videos to rank, this stage can afford more computational intensity.
Model Complexity: Larger, more sophisticated models with higher predictive performance are employed.
Precision Focus: The goal is to ensure all top-ranked videos are highly relevant to the user.
Technical Implementation
Deep Neural Networks: Likely employing architectures like Transformer or BERT for contextual understanding.
Feature Engineering: Incorporating user behavior data, video metadata, and temporal factors.
Multi-objective Optimization: Balancing user engagement, creator fairness, and platform health metrics.
The synergy between the fast, recall-oriented candidate generation and the precise, engagement-maximizing fine ranking creates TikTok’s addictive user experience. This two-stage approach allows TikTok to handle its massive scale while still delivering personalized, engaging content to each user in real-time.
User Engagement Mechanisms
TikTok’s success in generating user engagement can be attributed to several key mechanisms:
1. Personalized Feeds
The “For You” page is a hallmark of TikTok’s user experience, showcasing a personalized feed of videos tailored to individual preferences. This is achieved through:
User Profiling: TikTok collects data on user interactions, such as likes, shares, and watch time, to build comprehensive user profiles.
Recommendation Algorithms: Advanced algorithms and logic, as shared above, analyze user behavior and content characteristics, ensuring that users are consistently presented with videos that align with their interests.
2. Real-Time Interactivity
TikTok’s architecture supports real-time interactions, allowing users to engage with content instantly. Features such as live streaming and real-time comments enhance the social aspect of the platform, encouraging users to participate actively.
3. Content Creation Tools
The app provides a suite of editing tools, effects, and filters that empower users to create high-quality videos easily. This focus on user-generated content fosters creativity and encourages users to spend more time on the platform.
Lessons for Developers: What We Can Learn from TikTok
TikTok’s architecture offers valuable insights for developers and companies building scalable applications:
Embrace Microservices: Allows for independent scaling and development of components
Prioritize User Experience: Design systems that deliver content quickly and seamlessly
Leverage AI and Machine Learning: Personalization is key to user engagement
Optimize for Mobile: Consider mobile-first architectures for global reach
Plan for Scale: Design systems that can handle rapid growth from the start
Balance Real-time and Batch Processing: Combine both for efficient data handling
Invest in Content Delivery: A robust CDN is crucial for media-heavy applications
Prioritize Security and Privacy: Build trust with users through robust security measures
Conclusion
TikTok’s system design and architecture represent a masterclass in building scalable, engaging, and performant applications. By combining cutting-edge technologies, innovative algorithms, and a deep understanding of user behavior, TikTok has created a platform that continues to captivate millions worldwide.
At TechAhead, we’re passionate about pushing the boundaries of what’s possible in mobile app development. By understanding and applying the lessons from platforms like TikTok, we can help our clients create applications that not only meet but exceed user expectations in today’s dynamic digital landscape.
Source URL: https://www.techaheadcorp.com/blog/decoding-tiktok-system-design-architecture/