Skip to main content

Best approach for scheduling recurring tasks in PHP?

Created
Active
Viewed 280 times
12 replies
1

Hey, community!

I'm developing a system in PHP, hosted on a Linux server, and I need to implement recurring tasks, such as executing a job every 7 days or monthly. Since I don’t have much experience with this functionality, I’d like some suggestions on the best approaches.

  • Is there any recommended library for this?

  • Are cron jobs the best solution, or are there more efficient alternatives?

  • How can I handle large volumes of data in these executions to avoid overloading the server?

I’d really appreciate any tips or experiences you can share!

12 replies

Sorted by:
79418268
0

Thank you for the information! I will rewrite my question in English.

79418553
5

Cron jobs on a linux box are the go-to solution for this kind of stuff.

Handling large volumes of data is usually a matter of processing things in batches. You could also sleep() in between iterations of a loop processing the data, which gives the server a chance to do something else for a little bit.

79435385
1

In any case, it is best to get help from Linux CRONjobs or Windows Task Scheduler with:

  • For large data or enterprise-grade works: Use batch processing, queue systems like RabbitMQ https://github.com/php-amqplib/php-amqplib

  • For framework based works: Use Laravel's scheduler or Symfony's Messenger

  • For small works: Use 3rd party libraries like dragonmantank/cron-expression

  • For simple works: Use Linux CRONjobs or Windows Task Scheduler directly to run PHP scripts.

79445195
0
  • 765
  • 7
  • 30

it you deal with very time consuming tasks you should add a timeout in your crontab job, here is an example:

*/10 * * * * /usr/bin/timeout -s SIGINT 3800 /usr/bin/php /opt/myScript.php

This will avoid overloading the server

79450016
2
  • cronjobs are simple and robust but can become hard to monitor

  • Systemd Timers are more reliable, can handle failures and comes with logs

  • scheduler libraries like LaravelScheduler or Crunz add more flexibility and management options

  • if you need to handle large amounts or concurrent data some kind of queuing system like LaravelQueue, Beanstalkd+Supervisor (fast) or RedisQueue+PHPWorkers (large) makes sense

79462902
0

Hello

1. You can use Cron job like this

0 0 */7 * * command_to_run

2. You can use the Tmux extension in Linux, create a session, and run your PHP code, that code can work for more than a Year, in your code you must use sleep interval, and this approach is can handle more data without downtime

79480525
0

As others have already mentioned, crond or systemd is a common and simple solution to run PHP code regularly. Always be mindful of the user permissions under which the code is executed. Make sure the code does NOT run as "root" because anyone who gains write access to your PHP code could easily take over the entire server!

If you're not using a framework that implements an event queue, it’s also a good idea to design cron jobs so that they can handle not being executed exactly at the scheduled time or not running error-free. If a job runs incompletely or not at all, a later run should be able to catch up on the missed tasks.

Let me give an example: Suppose we have a forum where users receive “reputation points” at the end of the month if they posted at least once during the previous month.

A simple approach would be to run the job exactly once at the end of the month, selecting all users via an SQL query who posted at least once in the last month, and adding 10 reputation points to each of them.

However, if for some reason the job fails after processing the first 5 out of 20 users, only the first 5 would receive their points, leaving the remaining 15 without any points. If the job were to be restarted, those first 5 users would receive the points again, as there’s no way to distinguish who has already received their points for that month and who hasn’t.

A better approach would be to extend the data structure by adding a field in the database that tracks the last time points were successfully awarded to each user. The SQL query could then be modified to select users who posted in the previous month AND for whom the new field “date_last_monthly_reputation_granted” is either NULL or contains a date older than the current month.

When awarding the 10 points, the date in the new field is updated at the same time. This way, if a failure occurs and the job is restarted, it can continue processing from where it left off.

If you're dealing with large datasets, it might also be a good idea not to select and process all users who meet the criteria at once. Instead, you could select the next 100 users, for example, and run the job every few minutes. This ensures that users are processed gradually without PHP trying to load all users into memory at once.

Additionally, it's important to handle concurrency to prevent multiple instances of the same job running simultaneously, which could lead to data corruption or inconsistent results. One simple solution is to use a file-based locking mechanism like flock, which ensures that only one instance of the job is running at any given time. By locking a specific file at the start of the job and releasing it upon completion, you can prevent overlapping executions. If another instance of the job tries to run while the file is locked, it will either wait or terminate, depending on how you've configured it. This ensures that the job processes data safely and sequentially, even in environments where the cron scheduler might trigger multiple instances.

However, this introduces another challenge: monitoring for stale jobs or lockfiles. If a job hangs or crashes without releasing the lock, the system could be blocked indefinitely. Proper monitoring is essential to catch such scenarios and clean up stale lockfiles. In general, monitoring cron jobs is a story in itself. It’s a good idea to redirect STDOUT and STDERR to a log file for easy tracking of job execution. Additionally, setting up another cron job or using logrotate to regularly truncate and manage that log file helps prevent it from growing indefinitely and consuming excessive disk space.

79483650
0
package org.socgen.ibi.effectCalc.jdbcConn

import com.typesafe.config.Config
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, SaveMode}
import org.apache.spark.sql.functions._
import java.sql.{Connection, DriverManager, Statement}
import org.socgen.ibi.effectCalc.logger.EffectCalcLogger
import org.socgen.ibi.effectCalc.common.MsSqlJdbcConnectionInfo

class EffectCalcJdbcConnection(config: Config) {
  
  // Constants
  private val BatchSize = "10000"
  private val LoadDateTimeCol = "loaddatetime"
  private val ColumnToFilter = List(LoadDateTimeCol)
  
  // Database configuration
  private val url: String = config.getString("ibi.db.jdbcURL")
  private val user: String = config.getString("ibi.db.user")
  private val pwd: String = config.getString("ibi.db.password")
  private val driverClassName: String = config.getString("ibi.db.driverClass")
  private val databaseName: String = config.getString("ibi.db.stage_ec_sql")
  private val dburl = s"$url;databasename=$databaseName"

  private val dfMsqlWriteOptions = new MsSqlJdbcConnectionInfo(dburl, user, pwd)

  // Initialize connection
  private lazy val conn: Connection = {
    Class.forName(driverClassName)
    DriverManager.getConnection(dburl, user, pwd)
  }

  // Column type mappings
  private val columnTypeMappings = Map(
    "results" -> ColumnTypeMapping(
      stringColumns = List("accounttype", "baseliiaggregategrosscarryoffbalance" /* ... other columns ... */),
      decimalColumns = List("alloctakeovereffect", "closinggrosscarryingamounteur" /* ... other columns ... */),
      integerColumns = List()
    ),
    "stocks" -> ColumnTypeMapping(
      stringColumns = List("accountaggregategrosscarryoffbalance" /* ... other columns ... */),
      decimalColumns = List("grosscarryingamounteur", "provisionamounteur"),
      integerColumns = List("forbearancetype" /* ... other columns ... */)
    )
  )

  case class ColumnTypeMapping(
    stringColumns: List[String],
    decimalColumns: List[String],
    integerColumns: List[String]
  )

  private def truncateTable(table: String): String = s"TRUNCATE TABLE $
table;"

  private def getTableColumns(table: String, connection: Connection): List[String] = {
    val columnStartingIndex = 1
    val statement = s"SELECT TOP 0 * FROM
$table"
    val resultSetMetaData = connection.createStatement().executeQuery(statement).getMetaData
    (columnStartingIndex to resultSetMetaData.getColumnCount).toList
      .map(resultSetMetaData.getColumnName)
      .filterNot(ColumnToFilter.contains(_))
  }

  private def getNumPartitions(df: DataFrame): Int = {
    val numExecutors = df.sparkSession.conf.get("spark.executor.instances").toInt
    val numExecutorsCores = df.sparkSession.conf.get("spark.executor.cores").toInt
    numExecutors * numExecutorsCores
  }

  private def castColumns(df: DataFrame, tableType: String): DataFrame = {
    val mapping = columnTypeMappings(tableType)
    
    val selectWithCast = df.columns.map(column => {
      val columnLower = column.toLowerCase
      if (mapping.stringColumns.contains(columnLower))
        col(column).cast(StringType)
      else if (mapping.decimalColumns.contains(columnLower))
        col(column).cast(DecimalType(30, 2))
      else if (mapping.integerColumns.contains(columnLower))
        col(column).cast(IntegerType)
      else col(column)
    })

    df.select(selectWithCast: _*)
  }

  private def writeToTable(
    df: DataFrame,
    tableName: String,
    tableType: String
  ): Unit = {
    try {
      // Truncate table
      val stmt = conn.createStatement()
      stmt.executeUpdate(truncateTable(tableName))
      EffectCalcLogger.info(s"TABLE $
tableName TRUNCATED", this.getClass.getName)

      // Calculate partitions
      val numPartitions = getNumPartitions(df)
      EffectCalcLogger.info(s"Using
$numPartitions partitions", this.getClass.getName)

      // Prepare DataFrame
      val processedDf = castColumns(df, tableType)

      // Get column order and write
      val orderOfColumnsInSQL = getTableColumns(tableName, conn)
      
      EffectCalcLogger.info(s"Starting writing to $
tableName table", this.getClass.getName)
      
      processedDf
        .select(orderOfColumnsInSQL.map(col): _*)
        .coalesce(numPartitions)
        .write
        .mode(SaveMode.Append)
        .format("jdbc")
        .options(dfMsqlWriteOptions.configMap ++ Map(
          "dbTable" -> tableName,
          "batchsize" -> BatchSize
        ))
        .save()

      EffectCalcLogger.info(s"Writing to
$tableName table completed", this.getClass.getName)
    } catch {
      case e: Exception =>
        EffectCalcLogger.error(s"Exception while pushing to $tableName:${e.getMessage}", this.getClass.getName)
        throw e
    } finally {
      conn.close()
    }
  }

  def pushToResultsSQL(resultsDf: DataFrame): Unit = {
    val resultsTable = config.getString("ibi.db.stage_ec_sql_results_table")
    writeToTable(resultsDf, resultsTable, "results")
  }

  def pushToStockSQL(stockDf: DataFrame): Unit = {
    val stockTable = config.getString("ibi.db.stage_ec_sql_stocks_table")
    writeToTable(stockDf, stockTable, "stocks")
  }
}
79492701
0

Hey!

For recurring tasks in PHP on a Linux server, cron jobs are one of the most reliable and widely used solutions. You can set up a cron job to run scripts at specific intervals, like every 7 days or monthly, using simple cron expressions.

If you prefer a more flexible approach, you might consider Laravel Task Scheduler, which provides an intuitive way to manage scheduled tasks within your application. Another option is using queue systems like Redis with Laravel Queues or Supervisord to handle background jobs efficiently.

When dealing with large volumes of data, you can optimize execution by:

  • Processing data in batches to reduce memory usage.

  • Using database indexing to speed up queries.

  • Implementing job queues to distribute the workload over time instead of executing everything at once.

79542435
0

In my opionion it depends mainly on your stack and your intend, if a "admin" can or should be configure the tasks.

My favorout setup is: Symfony App server + Symfony worker server (docker)

In this case my best practise would be dukecity/command-scheduler-bundle. Then I'm writing Symfony cli commands. The internal script of the bundle (check docu) called by a cron schedule every minute checking if a job have to be executeted. The CLI command (on the main / frontend server) then only sends a Message to the background workers. Check "Symfony's Messenger" for this - I use RabbitMQ most times. The workers are doing the actuall job on a diffrent VM, so the main server has no performance issues.

79542482
0

There are very good starting points at https://github.com/systemsdk/docker-apache-php-symfony

Systemsdk are doing a very good job, it's increadible.