Facebook Instagram Twitter Vimeo Youtube
Sign in
  • Home
  • About
  • Team
  • Buy now!
Sign in
Welcome!Log into your account
Forgot your password?
Privacy Policy
Password recovery
Recover your password
Search
Logo
Sign in
Welcome! Log into your account
Forgot your password? Get help
Privacy Policy
Password recovery
Recover your password
A password will be e-mailed to you.
Thursday, September 4, 2025
Sign in / Join
  • Contact Us
  • Our Team
Facebook
Instagram
Twitter
Vimeo
Youtube
Logo
  • Home
  • News
    • News

      Anthropic Confirms Claude AI Was Weaponized in Major Cyberattacks by Husain Parvez

      3 September 2025
      News

      Over 30,000 Malicious IPs Target Microsoft Remote Desktop in Global Surge by Husain Parvez

      31 August 2025
      News

      Cyber Threat-Sharing Law Nears Expiration: Experts Warn of Risks by Husain Parvez

      31 August 2025
      News

      North Korean Hacking Tools Leak Online, Including Advanced Linux Rootkit by Paige Henley

      28 August 2025
      News

      iiNet Cyberattack Exposes Data of 280,000 Customers by Husain Parvez

      28 August 2025
  • Data Modelling & AI
    • AllBig dataBusiness AnalyticsData ScienceData Structure & AlgorithmDatabasesVector DatabaseDeep LearningEthical HackingGenerative AIMachine Learning
      Big data

      LangExtract + Milvus: A Practical Guide to Building a Hybrid Document Processing and Search System

      30 August 2025
      Big data

      Stop Your AI Assistant from Writing Outdated Code with Milvus SDK Code Helper

      26 August 2025
      Big data

      A Practical Guide for Choosing the Right Vector Database for Your AI Applications

      26 August 2025
      Big data

      Why I’m Against Claude Code’s Grep-Only Retrieval? It Just Burns Too Many Tokens

      26 August 2025
    • Big data
    • Business Analytics
    • Databases
    • Data Structure & Algorithm
    • Data Science
    • Deep Learning
    • Ethical Hacking
    • Generative AI
    • Machine Learning
    • Security & Testing
  • Mobile
    • AllAndroidIOS
      Android

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025
      Android

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025
      Android

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025
      Android

      Google Maps navigation on Pixel and Wear OS watches just got a lot easier

      3 September 2025
    • Android
    • IOS
  • Languages
    • AllAjaxAngularDynamic ProgrammingGolangJavaJavascriptPhpPythonReactVue
      Languages

      Working with Titles and Heading – Python docx Module

      25 June 2025
      Languages

      Creating a Receipt Calculator using Python

      25 June 2025
      Languages

      One Liner for Python if-elif-else Statements

      25 June 2025
      Languages

      Add Years to datetime Object in Python

      25 June 2025
    • Java
    • Python
  • Guest Blogs
  • Discussion
  • Our Team
HomeData Modelling & AIBig dataDesign a Distributed Job Scheduler – System Design Interview
Big dataGuest Blogs

Design a Distributed Job Scheduler – System Design Interview

Algomaster
By Algomaster
15 June 2025
0
0
Share
Facebook
Twitter
Pinterest
WhatsApp

    Design a Distributed Job Scheduler – System Design Interview

    Ashish Pratap Singh's avatar

    Ashish Pratap Singh
    Sep 12, 2024

    A distributed job scheduler is a system designed to manage, schedule, and execute tasks (referred to as “jobs”) across multiple computers or nodes in a distributed network.

    Visualized using Multiplayer

    Distributed job schedulers are used for automating and managing large-scale tasks like batch processing, report generation, and orchestrating complex workflows across multiple nodes.

    In this article, we will walk through the process of designing a scalable distributed job scheduling service that can handle millions of tasks, and ensure high availability.


    1. Requirements Gathering

    Before diving into the design, let’s outline the functional and non-functional requirements.

    Functional Requirements:

    1. Users can submit one-time or periodic jobs for execution.

    2. Users can cancel the submitted jobs.

    3. The system should distribute jobs across multiple worker nodes for execution.

    4. The system should provide monitoring of job status (queued, running, completed, failed).

    5. The system should prevent the same job from being executed multiple times concurrently.

    Non-Functional Requirements:

    • Scalability: The system should be able to schedule and execute millions of jobs.

    • High Availability: The system should be fault-tolerant with no single point of failure. If a worker node fails, the system should reschedule the job to other available nodes.

    • Latency: Jobs should be scheduled and executed with minimal delay.

    • Consistency: Job results should be consistent, ensuring that jobs are executed once (or with minimal duplication).

    Additional Requirements (Out of Scope):

    1. Job prioritization: The system should support scheduling based on job priority.

    2. Job dependencies: The system should handle jobs with dependencies.


    2. High Level Design

    At a high level, our distributed job scheduler will consist of the following components:

    Sketched using Multiplayer

    1. Job Submission Service

    The Job Submission Service is the entry point for clients to interact with the system.

    It provides an interface for users or services to submit, update, or cancel jobs via APIs.

    This layer exposes a RESTful API that accepts job details such as:

    • Job name

    • Frequency (One-time, Daily)

    • Execution time

    • Job payload (task details)

    It saves job metadata (e.g., execution_time, frequency, status = pending) in the Job Store (a database) and returns a unique Job ID to the client.

    2. Job Store

    The Job Store is responsible for persisting job information and maintaining the current state of all jobs and workers in the system.

    The Job Store contains following database tables:

    Job Table

    This table stores the metadata of the job, including job id, user id, frequency, payload, execution time, retry count and status (pending, running, completed, failed).

    Sketched using Multiplayer

    Job Execution Table

    Jobs can be executed multiple times in case of failures.

    This table tracks the execution attempts for each job, storing information like execution id, start time, end time, worker id, status and error message.

    If a job fails and is retried, each attempt will be logged here.

    Sketched using Multiplayer

    Job Schedules

    The Schedules Table stores scheduling details for each job, including the next_run_time.

    • For one-time jobs, the next_run_time is the same as the job’s execution time, and the last_run_time remains null.

    • For recurring jobs, the next_run_time is updated after each execution to reflect the next scheduled run.

    Sketched using Multiplayer

    Worker Table

    The Worker Node Table stores information about each worker node, including its ip address, status, last heartbeat, capacity and current load.

    Sketched using Multiplayer

    3. Scheduling Service

    The Scheduling Service is responsible for selecting jobs for execution based on their next_run_time in the Job Schedules Table.

    It periodically queries the table for jobs scheduled to run at the current minute:

    SELECT * FROM JobSchedulesTable WHERE next_run_time = 1726110000;

    Once the due jobs are retrieved, they are pushed to the Distributed Job Queue for worker nodes to execute.

    Simultaneously, the status in Job Table is updated to SCHEDULED.

    4. Distributed Job Queue

    The Distributed Job Queue (e.g., Kafka, RabbitMQ) acts as a buffer between the Scheduling Service and the Execution Service, ensuring that jobs are distributed efficiently to available worker nodes.

    It holds the jobs and allows the execution service to pull jobs and assign it to worker nodes.

    5. Execution Service

    The Execution Service is responsible for running the jobs on worker nodes and updating the results in the Job Store.

    It consists of a coordinator and a pool of worker nodes.

    Coordinator

    A coordinator (or orchestrator) node takes responsibility for:

    • Assigning jobs: Distributes jobs from the queue to the available worker nodes.

    • Managing worker nodes: Tracks the status, health, capacity, and workload of active workers.

    • Handling worker node failures: Detects when a worker node fails and reassigns its jobs to other healthy nodes.

    • Load balancing: Ensures the workload is evenly distributed across worker nodes based on available resources and capacity.

    Worker Nodes

    Worker nodes are responsible for executing jobs and updating the Job Store with the results (e.g., completed, failed, output).

    • When a worker is assigned a job, it creates a new entry in the Job Execution Table with the job’s status set to running and begins execution.

    • After execution is finished, the worker updates the job’s final status (e.g., completed or failed) along with any output in both the Jobs and Job Execution Table.

    • If a worker fails during execution, the coordinator re-queues the job in the distributed job queue, allowing another worker to pick it up and complete it.

    Share


    3. System API Design

    Here are some of the important API’s we can have in our system.

    1. Submit Job (POST /jobs)

    2. Get Job Status (GET /jobs/{job_id})

    3. Cancel Job (DELETE /jobs/{job_id})

    4. List Pending Jobs (GET /jobs?status=pending&user_id=u003)

    5. Get Jobs Running on a Worker (GET /job/executions?worker_id=w001)


    4. Deep Dive into Key Components

    4.1 SQL vs NoSQL

    To choose the right database for our needs, let’s consider some factors that can affect our choice:

    • We need to store millions of jobs every day.

    • Read and Write queries are around the same.

    • Data is structured with fixed schema.

    • We don’t require ACID transactions or complex joins.

    Both SQL and NoSQL databases could meet these needs, but given the scale and nature of the workload, a NoSQL database like DynamoDB or Cassandra could be a better fit, especially when handling millions of jobs per day and supporting high-throughput writes and reads.

    4.2 Scaling Scheduling Service

    The Scheduling service periodically checks the the Job Schedules Table every minute for pending jobs and pushes them to the job queue for execution.

    For example, the following query retrieves all jobs due for execution at the current minute:

    SELECT * FROM JobSchedulesTable WHERE next_run_time = 1726110000;

    Optimizing reads from JobSchedulesTable:

    Since we are querying JobSchedulesTable using the next_run_time column, it’s a good idea to partition the table on the next_run_time column to efficiently retrieve all jobs that are scheduled to run at a specific minute.

    If the number of jobs in any minute is small, a single node is enough.

    However, during peak periods, such as when 50,000 jobs need to be processed in a single minute, relying on one node can lead to delays in execution.

    The node may become overloaded and slow down, creating performance bottlenecks.

    Additionally, having only one node introduces a single point of failure.

    If that node becomes unavailable due to a crash or other issue, no jobs will be scheduled or executed until the node is restored, leading to system downtime.

    To address this, we need a distributed architecture where multiple worker nodes handle job scheduling tasks in parallel, all coordinated by a central node.

    But how can we ensure that jobs are not processed by multiple workers at the same time?

    The solution is to divide jobs into segments. Each worker processes only a specific subset of jobs from the JobSchedulesTable by focusing on assigned segments.

    This is achieved by adding an extra column called segment.

    The segment column logically groups jobs (e.g., segment=1, segment=2, etc.), ensuring that no two workers handle the same job simultaneously.

    A coordinator node manages the distribution of workload by assigning different segments to worker nodes.

    It also monitors the health of the workers using heartbeats or health checks.

    Sketched using Multiplayer

    In cases of worker node failure, the addition of new workers, or spikes in traffic, the coordinator dynamically rebalances the workload by adjusting segment assignments.

    Each worker node queries the JobSchedulesTable using both next_run_time and its assigned segments to retrieve the jobs it is responsible for processing.

    Here’s an example of a query a worker node might execute:

    SELECT * FROM JobSchedulesTable WHERE next_run_time = 1726110000 AND segment in (1,2);

    4.3 Handling failure of Jobs

    When a job fails during execution, the worker node increments the retry_count in the JobTable.

    • If the retry_count is still below the max_retries threshold, the worker retries the job from the beginning.

    • Once the retry_count reaches the max_retries limit, the job is marked as failed and will not be executed again, with its status updated to failed.

    Note: After a job fails, the worker node should not immediately retry the job, especially if the failure was caused by a transient issue (e.g., network failure).

    Instead, the system retries the job after a delay, which increases exponentially with each subsequent retry (e.g., 1 minute, 5 minutes, 10 minutes).

    4.4 Handling failure of Worker nodes in Execution Service

    Worker nodes are responsible for executing jobs assigned to them by the coordinator in the Execution Service.

    When a worker node fails, the system must detect the failure, reassign the pending jobs to healthy nodes, and ensure that jobs are not lost or duplicated.

    There are several techniques for detecting failures:

    • Heartbeat Mechanism: Each worker node periodically sends a heartbeat signal to the coordinator (every few seconds). The coordinator tracks these heartbeats and marks a worker as “unhealthy” if it doesn’t receive a heartbeat for a predefined period (e.g., 3 consecutive heartbeats missed).

    • Health Checks: In addition to heartbeats, the coordinator can perform periodic health checks on each worker node. The health checks may include CPU, memory, disk space, and network connectivity to ensure the node is not overloaded.

    Once a worker failure is detected, the system needs to recover and ensure that jobs assigned to the failed worker are still executed.

    There are two main scenarios to handle:

    Pending Jobs (Not Started)

    For jobs that were assigned to a worker but not yet started, the system needs to reassign these jobs to another healthy worker.

    The coordinator should re-queue them to the job queue for another worker to pick up.

    In-Progress Jobs

    Jobs that were being executed when the worker failed need to be handled carefully to prevent partial execution or data loss.

    One technique is to use job checkpointing, where a worker periodically saves the progress of long-running jobs to a persistent store (like a database). If the worker fails, another worker can restart the job from the last checkpoint.

    If a job was partially executed but not completed, the coordinator should mark the job as “failed” and re-queue it to the job queue for retry by another worker.

    4.5 Addressing Single Points of Failure

    We are using a coordinator node in both the Scheduling and Execution service.

    To prevent the coordinator from becoming a single point of failure, deploy multiple coordinator nodes with a leader-election mechanism.

    This ensures that one node is the active leader, while others are on standby. If the leader fails, a new leader is elected, and the system continues to function without disruption.

    • Leader Election: Use a consensus algorithm like Raft or Paxos to elect a leader from the pool of coordinators. Tools like Zookeeper or etcd are commonly used for managing distributed leader elections.

    • Failover: If the leader coordinator fails, the other coordinators detect the failure and elect a new leader. The new leader takes over responsibilities immediately, ensuring continuity in job scheduling, worker management, and health monitoring.

    • Data Synchronization: All coordinators should have access to the same shared state (e.g., job scheduling data and worker health information). This can be stored in a distributed database (e.g., Cassandra, DynamoDB). This ensures that when a new leader takes over, it has the latest data to work with.

    4.6 Rate Limiting

    Rate Limiting at the Job Submission Level

    If too many job submissions are made to the scheduling system at once, the system may become overloaded, leading to degraded performance, timeouts, or even failure of the scheduling service.

    Implement rate limits at the client level to ensure no single client can overwhelm the system.

    For example, restrict each client to a maximum of 1,000 job submissions per minute.

    Rate Limiting at the Job Queue Level

    Even if the job submission rate is controlled, the system might be overwhelmed if the job queue (e.g., Kafka, RabbitMQ) is flooded with too many jobs, which can slow down worker nodes or lead to message backlog.

    Limit the rate at which jobs are pushed into the distributed job queue. This can be achieved by implementing queue-level throttling, where only a certain number of jobs are allowed to enter the queue per second or minute.

    Rate Limiting at the Worker Node Level

    If the system allows too many jobs to be executed simultaneously by worker nodes, it can overwhelm the underlying infrastructure (e.g., CPU, memory, database), causing performance degradation or crashes.

    Implement rate limiting at the worker node level to prevent any single worker from taking on too many jobs at once.

    Set maximum concurrency limits on worker nodes to control how many jobs each worker can execute concurrently.


    Thank you for reading!

    If you found it valuable, hit a like ❤️ and consider subscribing for more such content every week.

    If you have any questions or suggestions, leave a comment.

    This post is public so feel free to share it.

    Share


    P.S. If you’re finding this newsletter helpful and want to get even more value, consider becoming a paid subscriber.

    As a paid subscriber, you’ll receive an exclusive deep dive every week, access to a comprehensive system design learning resource , and other premium perks.

    Get full access to AlgoMaster

    There are group discounts, gift options, and referral bonuses available.


    Checkout my Youtube channel for more in-depth content.

    Follow me on LinkedIn, X and Medium to stay updated.

    Checkout my GitHub repositories for free interview preparation resources.

    I hope you have a lovely day!

    See you soon,
    Ashish

    Share
    Facebook
    Twitter
    Pinterest
    WhatsApp
      Previous article
      SQL vs NoSQL – 7 Key Differences You Must Know
      Next article
      CAP Theorem Explained
      Algomaster
      Algomasterhttps://blog.algomaster.io
      RELATED ARTICLES
      Guest Blogs

      7 Best 123Movies Alternatives in 2025: Free & Safe Sites by Ivan Stevanovic

      3 September 2025
      Guest Blogs

      Interview with Tyson Garrett – CTO of TrustOnCloud – Making Cloud Threat Modeling Executable by Shauli Zacks

      2 September 2025
      Big data

      LangExtract + Milvus: A Practical Guide to Building a Hybrid Document Processing and Search System

      30 August 2025

      LEAVE A REPLY Cancel reply

      Log in to leave a comment

      Most Popular

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025

      Google Maps navigation on Pixel and Wear OS watches just got a lot easier

      3 September 2025
      Load more
      Algomaster
      Algomaster
      202 POSTS0 COMMENTS
      https://blog.algomaster.io
      Calisto Chipfumbu
      Calisto Chipfumbu
      6637 POSTS0 COMMENTS
      http://cchipfumbu@gmail.com
      Dominic
      Dominic
      32260 POSTS0 COMMENTS
      http://wardslaus.com
      Milvus
      Milvus
      81 POSTS0 COMMENTS
      https://milvus.io/
      Nango Kala
      Nango Kala
      6625 POSTS0 COMMENTS
      neverop
      neverop
      0 POSTS0 COMMENTS
      https://geeksforgeeks.org
      Nicole Veronica
      Nicole Veronica
      11795 POSTS0 COMMENTS
      Nokonwaba Nkukhwana
      Nokonwaba Nkukhwana
      11855 POSTS0 COMMENTS
      Safety Detectives
      Safety Detectives
      2594 POSTS0 COMMENTS
      https://www.safetydetectives.com/
      Shaida Kate Naidoo
      Shaida Kate Naidoo
      6746 POSTS0 COMMENTS
      Ted Musemwa
      Ted Musemwa
      7023 POSTS0 COMMENTS
      Thapelo Manthata
      Thapelo Manthata
      6694 POSTS0 COMMENTS
      Umr Jansen
      Umr Jansen
      6714 POSTS0 COMMENTS

      EDITOR PICKS

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025

      POPULAR POSTS

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025

      POPULAR CATEGORY

      • Languages45985
      • Data Modelling & AI17566
      • Java15156
      • Android14048
      • Mobile12983
      • Javascript12713
      • Guest Blogs12669
      • Data Structure & Algorithm10077
      Logo

      ABOUT US

      We provide you with the latest breaking news and videos straight from the technology industry.

      Contact us: hello@geeksforgeeks.org

      FOLLOW US

      Blogger
      Facebook
      Flickr
      Instagram
      VKontakte

      © NeverOpen 2022

      • Home
      • News
      • Data Modelling & AI
      • Mobile
      • Languages
      • Guest Blogs
      • Discussion
      • Our Team