Created 3 months ago

Active 1 month ago

Viewed 280 times

12 replies

Hey, community!

I'm developing a system in PHP, hosted on a Linux server, and I need to implement recurring tasks, such as executing a job every 7 days or monthly. Since I don’t have much experience with this functionality, I’d like some suggestions on the best approaches.

Is there any recommended library for this?
Are cron jobs the best solution, or are there more efficient alternatives?
How can I handle large volumes of data in these executions to avoid overloading the server?

I’d really appreciate any tips or experiences you can share!

edited Feb 10 at 19:24

M--

29.8k
10
70
106

created Feb 6 at 13:03

Lucilia

12 replies

Sorted by:

Lucilia

Author

Thank you for the information! I will rewrite my question in English.

Feb 6 at 14:15

Greg Burghardt

19.1k
10
56
101

Cron jobs on a linux box are the go-to solution for this kind of stuff.

Handling large volumes of data is usually a matter of processing things in batches. You could also sleep() in between iterations of a loop processing the data, which gives the server a chance to do something else for a little bit.

Feb 6 at 15:53

Yousha Aleayoub

5.8k
4
58
71

In any case, it is best to get help from Linux CRONjobs or Windows Task Scheduler with:

For large data or enterprise-grade works: Use batch processing, queue systems like RabbitMQ https://github.com/php-amqplib/php-amqplib
For framework based works: Use Laravel's scheduler or Symfony's Messenger
For small works: Use 3rd party libraries like dragonmantank/cron-expression
For simple works: Use Linux CRONjobs or Windows Task Scheduler directly to run PHP scripts.

Feb 13 at 7:48

Sami

it you deal with very time consuming tasks you should add a timeout in your crontab job, here is an example:

*/10 * * * * /usr/bin/timeout -s SIGINT 3800 /usr/bin/php /opt/myScript.php

This will avoid overloading the server

Feb 17 at 11:01

Ian Carter

2.2k
13
26

cronjobs are simple and robust but can become hard to monitor
Systemd Timers are more reliable, can handle failures and comes with logs
scheduler libraries like LaravelScheduler or Crunz add more flexibility and management options
if you need to handle large amounts or concurrent data some kind of queuing system like LaravelQueue, Beanstalkd+Supervisor (fast) or RedisQueue+PHPWorkers (large) makes sense

Feb 19 at 1:52

Vilgar Mkrtchyan

Hello

1. You can use Cron job like this

0 0 */7 * * command_to_run

2. You can use the Tmux extension in Linux, create a session, and run your PHP code, that code can work for more than a Year, in your code you must use sleep interval, and this approach is can handle more data without downtime

Feb 24 at 8:49

took

As others have already mentioned, crond or systemd is a common and simple solution to run PHP code regularly. Always be mindful of the user permissions under which the code is executed. Make sure the code does NOT run as "root" because anyone who gains write access to your PHP code could easily take over the entire server!

If you're not using a framework that implements an event queue, it’s also a good idea to design cron jobs so that they can handle not being executed exactly at the scheduled time or not running error-free. If a job runs incompletely or not at all, a later run should be able to catch up on the missed tasks.

Let me give an example: Suppose we have a forum where users receive “reputation points” at the end of the month if they posted at least once during the previous month.

A simple approach would be to run the job exactly once at the end of the month, selecting all users via an SQL query who posted at least once in the last month, and adding 10 reputation points to each of them.

However, if for some reason the job fails after processing the first 5 out of 20 users, only the first 5 would receive their points, leaving the remaining 15 without any points. If the job were to be restarted, those first 5 users would receive the points again, as there’s no way to distinguish who has already received their points for that month and who hasn’t.

A better approach would be to extend the data structure by adding a field in the database that tracks the last time points were successfully awarded to each user. The SQL query could then be modified to select users who posted in the previous month AND for whom the new field “date_last_monthly_reputation_granted” is either NULL or contains a date older than the current month.

When awarding the 10 points, the date in the new field is updated at the same time. This way, if a failure occurs and the job is restarted, it can continue processing from where it left off.

If you're dealing with large datasets, it might also be a good idea not to select and process all users who meet the criteria at once. Instead, you could select the next 100 users, for example, and run the job every few minutes. This ensures that users are processed gradually without PHP trying to load all users into memory at once.

Additionally, it's important to handle concurrency to prevent multiple instances of the same job running simultaneously, which could lead to data corruption or inconsistent results. One simple solution is to use a file-based locking mechanism like flock, which ensures that only one instance of the job is running at any given time. By locking a specific file at the start of the job and releasing it upon completion, you can prevent overlapping executions. If another instance of the job tries to run while the file is locked, it will either wait or terminate, depending on how you've configured it. This ensures that the job processes data safely and sequentially, even in environments where the cron scheduler might trigger multiple instances.

However, this introduces another challenge: monitoring for stale jobs or lockfiles. If a job hangs or crashes without releasing the lock, the system could be blocked indefinitely. Proper monitoring is essential to catch such scenarios and clean up stale lockfiles. In general, monitoring cron jobs is a story in itself. It’s a good idea to redirect STDOUT and STDERR to a log file for easy tracking of job execution. Additionally, setting up another cron job or using logrotate to regularly truncate and manage that log file helps prevent it from growing indefinitely and consuming excessive disk space.

Mar 3 at 8:45, edited Mar 3 at 9:05

DOGANDDEV.EU

Tu peux faire tâche cron sa sera pas mal (voir d'une personne comme toii).

Un petit tuto sur comment cela fonctionne : Cron-expression

Sinon d'autres alternative :
https://github.com/jobbyphp/jobby
https://supervisord.org/
https://github.com/chrisboulton/php-resque-scheduler
https://github.com/php-amqplib/php-amqplib

Mar 4 at 11:06

Arun Rathod

package org.socgen.ibi.effectCalc.jdbcConn

import com.typesafe.config.Config
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, SaveMode}
import org.apache.spark.sql.functions._
import java.sql.{Connection, DriverManager, Statement}
import org.socgen.ibi.effectCalc.logger.EffectCalcLogger
import org.socgen.ibi.effectCalc.common.MsSqlJdbcConnectionInfo

class EffectCalcJdbcConnection(config: Config) {
  
  // Constants
  private val BatchSize = "10000"
  private val LoadDateTimeCol = "loaddatetime"
  private val ColumnToFilter = List(LoadDateTimeCol)
  
  // Database configuration
  private val url: String = config.getString("ibi.db.jdbcURL")
  private val user: String = config.getString("ibi.db.user")
  private val pwd: String = config.getString("ibi.db.password")
  private val driverClassName: String = config.getString("ibi.db.driverClass")
  private val databaseName: String = config.getString("ibi.db.stage_ec_sql")
  private val dburl = s"$url;databasename=$databaseName"

  private val dfMsqlWriteOptions = new MsSqlJdbcConnectionInfo(dburl, user, pwd)

  // Initialize connection
  private lazy val conn: Connection = {
    Class.forName(driverClassName)
    DriverManager.getConnection(dburl, user, pwd)
  }

  // Column type mappings
  private val columnTypeMappings = Map(
    "results" -> ColumnTypeMapping(
      stringColumns = List("accounttype", "baseliiaggregategrosscarryoffbalance" /* ... other columns ... */),
      decimalColumns = List("alloctakeovereffect", "closinggrosscarryingamounteur" /* ... other columns ... */),
      integerColumns = List()
    ),
    "stocks" -> ColumnTypeMapping(
      stringColumns = List("accountaggregategrosscarryoffbalance" /* ... other columns ... */),
      decimalColumns = List("grosscarryingamounteur", "provisionamounteur"),
      integerColumns = List("forbearancetype" /* ... other columns ... */)
    )
  )

  case class ColumnTypeMapping(
    stringColumns: List[String],
    decimalColumns: List[String],
    integerColumns: List[String]
  )

  private def truncateTable(table: String): String = s"TRUNCATE TABLE $
table;"

  private def getTableColumns(table: String, connection: Connection): List[String] = {
    val columnStartingIndex = 1
    val statement = s"SELECT TOP 0 * FROM
$table"
    val resultSetMetaData = connection.createStatement().executeQuery(statement).getMetaData
    (columnStartingIndex to resultSetMetaData.getColumnCount).toList
      .map(resultSetMetaData.getColumnName)
      .filterNot(ColumnToFilter.contains(_))
  }

  private def getNumPartitions(df: DataFrame): Int = {
    val numExecutors = df.sparkSession.conf.get("spark.executor.instances").toInt
    val numExecutorsCores = df.sparkSession.conf.get("spark.executor.cores").toInt
    numExecutors * numExecutorsCores
  }

  private def castColumns(df: DataFrame, tableType: String): DataFrame = {
    val mapping = columnTypeMappings(tableType)
    
    val selectWithCast = df.columns.map(column => {
      val columnLower = column.toLowerCase
      if (mapping.stringColumns.contains(columnLower))
        col(column).cast(StringType)
      else if (mapping.decimalColumns.contains(columnLower))
        col(column).cast(DecimalType(30, 2))
      else if (mapping.integerColumns.contains(columnLower))
        col(column).cast(IntegerType)
      else col(column)
    })

    df.select(selectWithCast: _*)
  }

  private def writeToTable(
    df: DataFrame,
    tableName: String,
    tableType: String
  ): Unit = {
    try {
      // Truncate table
      val stmt = conn.createStatement()
      stmt.executeUpdate(truncateTable(tableName))
      EffectCalcLogger.info(s"TABLE $
tableName TRUNCATED", this.getClass.getName)

      // Calculate partitions
      val numPartitions = getNumPartitions(df)
      EffectCalcLogger.info(s"Using
$numPartitions partitions", this.getClass.getName)

      // Prepare DataFrame
      val processedDf = castColumns(df, tableType)

      // Get column order and write
      val orderOfColumnsInSQL = getTableColumns(tableName, conn)
      
      EffectCalcLogger.info(s"Starting writing to $
tableName table", this.getClass.getName)
      
      processedDf
        .select(orderOfColumnsInSQL.map(col): _*)
        .coalesce(numPartitions)
        .write
        .mode(SaveMode.Append)
        .format("jdbc")
        .options(dfMsqlWriteOptions.configMap ++ Map(
          "dbTable" -> tableName,
          "batchsize" -> BatchSize
        ))
        .save()

      EffectCalcLogger.info(s"Writing to
$tableName table completed", this.getClass.getName)
    } catch {
      case e: Exception =>
        EffectCalcLogger.error(s"Exception while pushing to $tableName:${e.getMessage}", this.getClass.getName)
        throw e
    } finally {
      conn.close()
    }
  }

  def pushToResultsSQL(resultsDf: DataFrame): Unit = {
    val resultsTable = config.getString("ibi.db.stage_ec_sql_results_table")
    writeToTable(resultsDf, resultsTable, "results")
  }

  def pushToStockSQL(stockDf: DataFrame): Unit = {
    val stockTable = config.getString("ibi.db.stage_ec_sql_stocks_table")
    writeToTable(stockDf, stockTable, "stocks")
  }
}

Mar 4 at 12:07

Scotc Brown

Hey!

For recurring tasks in PHP on a Linux server, cron jobs are one of the most reliable and widely used solutions. You can set up a cron job to run scripts at specific intervals, like every 7 days or monthly, using simple cron expressions.

If you prefer a more flexible approach, you might consider Laravel Task Scheduler, which provides an intuitive way to manage scheduled tasks within your application. Another option is using queue systems like Redis with Laravel Queues or Supervisord to handle background jobs efficiently.

When dealing with large volumes of data, you can optimize execution by:

Processing data in batches to reduce memory usage.
Using database indexing to speed up queries.
Implementing job queues to distribute the workload over time instead of executing everything at once.

Mar 7 at 14:56

elh

In my opionion it depends mainly on your stack and your intend, if a "admin" can or should be configure the tasks.

My favorout setup is: Symfony App server + Symfony worker server (docker)

In this case my best practise would be dukecity/command-scheduler-bundle. Then I'm writing Symfony cli commands. The internal script of the bundle (check docu) called by a cron schedule every minute checking if a job have to be executeted. The CLI command (on the main / frontend server) then only sends a Message to the background workers. Check "Symfony's Messenger" for this - I use RabbitMQ most times. The workers are doing the actuall job on a diffrent VM, so the main server has no performance issues.

Mar 28 at 21:24, edited Mar 28 at 21:34

elh

There are very good starting points at https://github.com/systemsdk/docker-apache-php-symfony

Systemsdk are doing a very good job, it's increadible.

Mar 28 at 21:57

Share perspectives, advice, and insights

Use Discussions to engage in deeper dialogue, have opinion-based conversations, and exchange perspectives about a technical concept. See full Discussions guidelines.

Discussions is different than Q&A

Discussions exists separately from the traditional question-and-answer space. If you have a specific programming question, go to Stack Overflow Q&A to post your question.

Be welcoming and patient

All users are expected to treat one another with kindness and respect. Remember, everyone is here to learn, and sometimes while learning, people make mistakes. See code of conduct.

No resume or job listings

Discussions are not for sharing your resume or job listing.

Avoid self-promotion

If your post happens to be about your product or website, you must disclose your affiliation. See spam guidelines and best practices.

Collectives™ on Stack Overflow

Best approach for scheduling recurring tasks in PHP?

12 replies