Learn how it handles RPC, task plugins, fault tolerance & scheduling. Perfect for devs looking to optimize or extend their workflows!Learn how it handles RPC, task plugins, fault tolerance & scheduling. Perfect for devs looking to optimize or extend their workflows!

Dissecting the Master Server: How DolphinScheduler Powers Workflow Scheduling

In modern data-driven enterprises, a workflow scheduling system is the "central nervous system" of the data pipeline. From ETL tasks to machine learning training, from report generation to real-time monitoring, almost all critical business processes rely on a stable, efficient, and scalable scheduling engine.

The author believes that Apache DolphinScheduler 3.1.9 is a stable and widely used version, so this article focuses on this version, analyzing the relevant processes related to the Master service startup, deeply exploring its core source code, architectural design, module division, and key implementation mechanisms. The goal is to help developers understand how the Master works and lay a foundation for further secondary development or performance optimization.

This series of articles is divided into three parts: the Master Server startup process, the Worker server startup process, and related process diagrams. This is the first part.

1. Core Overview of Master Server Startup

  • Code Entry: org.apache.dolphinscheduler.server.master.MasterServer#run
 public void run() throws SchedulerException {         // 1. Initialize rpc server         this.masterRPCServer.start();          // 2. Install task plugin         this.taskPluginManager.loadPlugin();          // 3. Self-tolerant         this.masterRegistryClient.start();         this.masterRegistryClient.setRegistryStoppable(this);          // 4. Master scheduling         this.masterSchedulerBootstrap.init();         this.masterSchedulerBootstrap.start();          // 5. Event execution service         this.eventExecuteService.start();         // 6. Fault tolerance mechanism         this.failoverExecuteThread.start();          // 7. Quartz scheduling         this.schedulerApi.start();         ...     } 

1.1 RPC Startup:

  • Description: Registers processors for relevant commands, such as task execution, task execution results, task termination, etc.
  • Code Entry: org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer#start
 public void start() {          ...         // Task execution request processor         this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RUNNING, taskExecuteRunningProcessor);         // Task execution result request processor         this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RESULT, taskExecuteResponseProcessor);         // Task termination request processor         this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_RESPONSE, taskKillResponseProcessor);         ...         this.nettyRemotingServer.start();         logger.info("Started Master RPC Server...");     } 

1.2 Task Plugin Initialization:

  • Description: Task-related template operations such as creating tasks, parsing task parameters, and retrieving task resource information. This plugin needs to be registered on the API, Master, and Worker sides. The role in Master is to set the data source and UDF information.

1.3 Self-Tolerant (Master Registration):

  • Description: Registers the master’s information to the registry (using Zookeeper as an example), and listens for changes in the registration of itself, other masters, and all worker nodes to perform fault-tolerant processing.
  • Code Entry: org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient#start
public void start() {         try {             this.masterHeartBeatTask = new MasterHeartBeatTask(masterConfig, registryClient);             // 1. Register itself to the registry;             registry();             // 2. Listen to the connection state with the registry;             registryClient.addConnectionStateListener(                     new MasterConnectionStateListener(masterConfig, registryClient, masterConnectStrategy));             // 3. Listen to the status of other masters and workers in the registry and perform fault-tolerant work             registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_NODE, new MasterRegistryDataListener());         } catch (Exception e) {             throw new RegistryException("Master registry client start up error", e);         }     } 

1.4 Master Scheduling:

  • Description: A scanning thread that periodically checks the command table in the database and performs different operations based on command types. This is the core logic for workflow startup, instance fault tolerance, etc.
  • Code Entry: org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap#run
public void run() {         while (!ServerLifeCycleManager.isStopped()) {             try {                 // If the server is not in running status, it cannot consume commands                 if (!ServerLifeCycleManager.isRunning()) {                     logger.warn("The current server {} is not at running status, cannot consume commands.", this.masterAddress);                     Thread.sleep(Constants.SLEEP_TIME_MILLIS);                 }                  // Handle workload overload (CPU/memory)                 boolean isOverload = OSUtils.isOverload(masterConfig.getMaxCpuLoadAvg(), masterConfig.getReservedMemory());                 if (isOverload) {                     logger.warn("The current server {} is overload, cannot consume commands.", this.masterAddress);                     MasterServerMetrics.incMasterOverload();                     Thread.sleep(Constants.SLEEP_TIME_MILLIS);                     continue;                 }                  // Get commands from the database                 List<Command> commands = findCommands();                 if (CollectionUtils.isEmpty(commands)) {                     Thread.sleep(Constants.SLEEP_TIME_MILLIS);                     continue;                 }                  // Convert commands to process instances and handle the workflow logic                 List<ProcessInstance> processInstances = command2ProcessInstance(commands);                 if (CollectionUtils.isEmpty(processInstances)) {                     Thread.sleep(Constants.SLEEP_TIME_MILLIS);                     continue;                 }                  // Handle workflow instance execution                 processInstances.forEach(processInstance -> {                     try {                         LoggerUtils.setWorkflowInstanceIdMDC(processInstance.getId());                         WorkflowExecuteRunnable workflowRunnable = new WorkflowExecuteRunnable(processInstance, ...);                         processInstanceExecCacheManager.cache(processInstance.getId(), workflowRunnable);                         workflowEventQueue.addEvent(new WorkflowEvent(WorkflowEventType.START_WORKFLOW, processInstance.getId()));                     } finally {                         LoggerUtils.removeWorkflowInstanceIdMDC();                     }                 });             } catch (InterruptedException interruptedException) {                 logger.warn("Master schedule bootstrap interrupted, close the loop", interruptedException);                 Thread.currentThread().interrupt();                 break;             } catch (Exception e) {                 logger.error("Master schedule workflow error", e);                 ThreadUtils.sleep(Constants.SLEEP_TIME_MILLIS);             }         }     } 

1.5 Event Execution Service:

  • Description: Responsible for polling the event queue of the workflow instance. Events such as workflow submission failures or task state changes are handled here.
  • Code Entry: org.apache.dolphinscheduler.server.master.runner.EventExecuteService#run
public void run() {         while (!ServerLifeCycleManager.isStopped()) {             try {                 // Handle workflow execution events                 workflowEventHandler();                 // Handle stream task execution events                 streamTaskEventHandler();                 TimeUnit.MILLISECONDS.sleep(Constants.SLEEP_TIME_MILLIS_SHORT);             } ...         }     } 

1.6 Fault Tolerance Mechanism:

  • Description: Responsible for Master and Worker fault tolerance.
  • Code Entry: org.apache.dolphinscheduler.server.master.service.MasterFailoverService#checkMasterFailover
    public void checkMasterFailover() {         List<String> needFailoverMasterHosts = processService.queryNeedFailoverProcessInstanceHost()                 .stream()                 .filter(host -> localAddress.equals(host) || !registryClient.checkNodeExists(host, NodeType.MASTER))                 .distinct()                 .collect(Collectors.toList());         if (CollectionUtils.isEmpty(needFailoverMasterHosts)) {             return;         }          for (String needFailoverMasterHost : needFailoverMasterHosts) {             failoverMaster(needFailoverMasterHost);         }     } 

Conclusion:

The article provides an in-depth look at the Apache DolphinScheduler 3.1.9 Master service startup process, fault tolerance mechanisms, and the overall architecture. Further articles will explore the Worker startup process and interactions between Master and Worker.

Market Opportunity
Brainedge Logo
Brainedge Price(LEARN)
$0.00865
$0.00865$0.00865
+0.11%
USD
Brainedge (LEARN) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Q4 2025 May Have Marked the End of the Crypto Bear Market: Bitwise

Q4 2025 May Have Marked the End of the Crypto Bear Market: Bitwise

The fourth quarter of 2025 may have quietly signaled the end of the crypto bear market, according to a new report from digital asset manager Bitwise, even as prices
Share
CryptoNews2026/01/22 15:06
CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

The post CEO Sandeep Nailwal Shared Highlights About RWA on Polygon appeared on BitcoinEthereumNews.com. Polygon CEO Sandeep Nailwal highlighted Polygon’s lead in global bonds, Spiko US T-Bill, and Spiko Euro T-Bill. Polygon published an X post to share that its roadmap to GigaGas was still scaling. Sentiments around POL price were last seen to be bearish. Polygon CEO Sandeep Nailwal shared key pointers from the Dune and RWA.xyz report. These pertain to highlights about RWA on Polygon. Simultaneously, Polygon underlined its roadmap towards GigaGas. Sentiments around POL price were last seen fumbling under bearish emotions. Polygon CEO Sandeep Nailwal on Polygon RWA CEO Sandeep Nailwal highlighted three key points from the Dune and RWA.xyz report. The Chief Executive of Polygon maintained that Polygon PoS was hosting RWA TVL worth $1.13 billion across 269 assets plus 2,900 holders. Nailwal confirmed from the report that RWA was happening on Polygon. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 The X post published by Polygon CEO Sandeep Nailwal underlined that the ecosystem was leading in global bonds by holding a 62% share of tokenized global bonds. He further highlighted that Polygon was leading with Spiko US T-Bill at approximately 29% share of TVL along with Ethereum, adding that the ecosystem had more than 50% share in the number of holders. Finally, Sandeep highlighted from the report that there was a strong adoption for Spiko Euro T-Bill with 38% share of TVL. He added that 68% of returns were on Polygon across all the chains. Polygon Roadmap to GigaGas In a different update from Polygon, the community…
Share
BitcoinEthereumNews2025/09/18 01:10
BlackRock Increases U.S. Stock Exposure Amid AI Surge

BlackRock Increases U.S. Stock Exposure Amid AI Surge

The post BlackRock Increases U.S. Stock Exposure Amid AI Surge appeared on BitcoinEthereumNews.com. Key Points: BlackRock significantly increased U.S. stock exposure. AI sector driven gains boost S&P 500 to historic highs. Shift may set a precedent for other major asset managers. BlackRock, the largest asset manager, significantly increased U.S. stock and AI sector exposure, adjusting its $185 billion investment portfolios, according to a recent investment outlook report.. This strategic shift signals strong confidence in U.S. market growth, driven by AI and anticipated Federal Reserve moves, influencing significant fund flows into BlackRock’s ETFs. The reallocation increases U.S. stocks by 2% while reducing holdings in international developed markets. BlackRock’s move reflects confidence in the U.S. stock market’s trajectory, driven by robust earnings and the anticipation of Federal Reserve rate cuts. As a result, billions of dollars have flowed into BlackRock’s ETFs following the portfolio adjustment. “Our increased allocation to U.S. stocks, particularly in the AI sector, is a testament to our confidence in the growth potential of these technologies.” — Larry Fink, CEO, BlackRock The financial markets have responded favorably to this adjustment. The S&P 500 Index recently reached a historic high this year, supported by AI-driven investment enthusiasm. BlackRock’s decision aligns with widespread market speculation on the Federal Reserve’s next moves, further amplifying investor interest and confidence. AI Surge Propels S&P 500 to Historic Highs At no other time in history has the S&P 500 seen such dramatic gains driven by a single sector as the recent surge spurred by AI investments in 2023. Experts suggest that the strategic increase in U.S. stock exposure by BlackRock may set a precedent for other major asset managers. Historically, shifts of this magnitude have influenced broader market behaviors as others follow suit. Market analysts point to the favorable economic environment and technological advancements that are propelling the AI sector’s momentum. The continued growth of AI technologies is…
Share
BitcoinEthereumNews2025/09/18 02:49