mysql - PHP-FPM crashes when having too many users while doing a heavy job


Question: 

I have a Server running Apache/2.2.22 (Debian), PHP 5.6.17 as FPM and MySQL 5.6.25.

The project runs using a CMS called Redaxo (I don't think it's that important, but I'll tell ya anyway). In Redaxo there are some functions which take some time (e.g. deleting cache and rebuilding it takes 1-2 minutes). In this time, when other users come on the website FPM crashes with a 500 Internal Server Error and I have to reload the page multiple times until the Server Error disappears and the process is done.

I noticed that this only will happen if too many users are on the site on the same time and only when heavy operations are done.

10 users at the same time just surfing = No Problem
10 users at the same time just surfing, while cache deletion = 500 Error for everyone.

I checked this by disallowing the website for everyone except me (.htaccess deny/allow with ip). Then I did the heavy operation and had no problem. As soon as multiple people were on the site again, the problem was there again.

What could it be? What information do you need from me?

These values are set (not commented) in the php-fpm.conf

[global]
pid = /run/php5-fpm.pid
error_log = /var/log/php5-fpm.log
emergency_restart_threshold = 0
include=/etc/php5/fpm/pool.d/*.conf

These values are set (not commented) in the project specific fpm.conf

[projectname]
user = projectname
group = projectname

listen = /var/run/php5-fpm-projectname.sock
listen.owner = projectname
listen.group = projectname
listen.mode = 0660

pm = dynamic
pm.max_children = 150
pm.start_servers = 10
pm.min_spare_servers = 10
pm.max_spare_servers = 30

chdir = /

php_value[upload_max_filesize] = 128M
php_value[max_post_size] = 128M
php_value[max_execution_time] = 180
php_value[memory_limit] = 256M

The script when it fails does much with MySQL and File creation if it helps? But it's pretty big, so I'm not sure if I should post it here? Or if it is even the problem?

The apache error log says either this

[Tue Feb 09 10:54:01 2016] [error] [client {IP}] (104)Connection reset by peer: FastCGI: comm with server "/fcgi-bin-php5-fpm-projectnmae" aborted: read failed
[Tue Feb 09 10:54:01 2016] [error] [client {IP}] FastCGI: incomplete headers (0 bytes) received from server "/fcgi-bin-php5-fpm-projectnmae"

or this

[Tue Feb 09 11:00:46 2016] [error] [client {IP}] FastCGI: incomplete headers (0 bytes) received from server "/fcgi-bin-php5-fpm-projectname"
[Tue Feb 09 11:00:48 2016] [error] [client {IP}] (104)Connection reset by peer: FastCGI: comm with server "/fcgi-bin-php5-fpm-projectname" aborted: read failed

The fpm-log says the following. Always different timings of course

[10-Feb-2016 09:40:59] WARNING: [pool projectname] child 10970 exited on signal 7 (SIGBUS) after 50.186611 seconds from start
[10-Feb-2016 09:40:59] NOTICE: [pool projectname] child 11092 started

Sometimes there's a warning like this in it

[09-Feb-2016 11:00:41] WARNING: [pool projectname] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 6 total children
[09-Feb-2016 11:00:42] WARNING: [pool projectname] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 7 total children

Here are some more debug information

[18-Feb-2016 17:42:01] WARNING: [pool projectname] child 9088 exited on signal 7 (SIGBUS) after 70.130564 seconds from start
[18-Feb-2016 17:42:01] NOTICE: [pool projectname] child 9205 started
[18-Feb-2016 17:43:55] WARNING: [pool projectname] child 9099 said into stderr: "NOTICE: PHP message: PHP Notice:  Undefined offset: 1181 in /var/www/projectname/htdocs/redaxo/include/classes/class.ooarticle.inc.php on line 44"
[18-Feb-2016 17:43:55] WARNING: [pool projectname] child 9099 said into stderr: "NOTICE: PHP message: PHP Warning:  Invalid argument supplied for foreach() in /var/www/projectname/htdocs/redaxo/include/classes/class.ooredaxo.inc.php on line 134"
[18-Feb-2016 17:43:55] WARNING: [pool projectname] child 9099 exited on signal 7 (SIGBUS) after 183.838886 seconds from start
[18-Feb-2016 17:43:55] NOTICE: [pool projectname] child 9330 started
[18-Feb-2016 17:44:00] WARNING: [pool projectname] child 9101 exited on signal 7 (SIGBUS) after 188.987954 seconds from start
[18-Feb-2016 17:44:00] NOTICE: [pool projectname] child 9336 started



4 Answers: 

This might just be the effect of some lock issue from your MySQL server.

You have to connect to your MySQL host during the latency.

  • If you can't connect, then you run out of the number of allowed concurrent connections of your MySQL server or your user

  • If you can connect, you have to see what the mysql command "show processlist" returns. Now you have 2 choices:

    • Many "Waiting for query cache lock" : this will require that you change some of your MySQL server configuration. (this can be caused by an oversized query cache)

    • You have a request which takes all the resources, which you will have to optimize.

 

Unless you have plenty of RAM available (like over 16GB available), I'd suggest you are running out of resources and this is causing the 500 error.

Your configuration is saying that you can spawn up to 150 PHP-FPM processes and each one can use 256MB of memory — this alone enables the PHP-FPM server to use over 38GB of memory, and if that's not available, it will cause the 500 error.

Calculate what each server can use of memory, and then set it properly. Does this CMS need up to 256MB of memory? Could it run with fewer memory (like 32MB)? If MySQL, Apache and Nginx are in this same server, separate the memory each one will use, then set the proper value for pm.max_children and php_value[memory_limit].

Please note that the lack of resources is system-wide, so if your PHP process uses all the memory available, MySQL might end up crashing for running out of resources (this might be the reason for the record not found).

If you can say how much memory you have available, I can help you out in configuring these numbers.

It'd be also good to know how much memory is available before you issue cache deletion and how much is available while it is running — it might be indeed using too much memory and suffocating the other processes (and if it uses PHP-CLI, it may have no memory limit).

 

I've been eying this for a few days now and finally am deciding to add my 2 cents worth.

I've been using FPM for a long time and it's a great thing but to get a scalable configuration with it is another story. There's a whole lot that can be going wrong causing your problem but I have one suspicion.

I want to focus on the PHP errors showing up in your output because they indicate something is going wrong that shouldn't be. I am wondering if, while you are clearing your cache and users browse the site, they are simultaneously pulling incomplete data because some information is deleted or in the process of being rebuilt. You could even be seeing a situation where cache is being deleted, and new stuff is being cached at the same time. I haven't looked at the CMS code for cache deletion but the PHP errors you showed seem to indicate some invalid data is being fetched in the process.

One thing to try would be to explicitly lock the tables before cache deletion and then release them. This way, user's can't read or write data while things are being deleted. In whatever script you call to clear the cache, try adding a query LOCK TABLES articles WRITE, othertable WRITE, anyothertable WRITE. This will prevent other sessions (users) from reading or updating those tables while the cache is being cleared.

Users are impatient, if they try to load a page and it doesn't give them any feedback, they often try to reload, or go back and click other links. This can cause the number of FPM processes to increase. If 10 users refresh 5 times, you now have 50 extra processes running and also hanging making things worse.

-- Other stuff

Increase ProxyTimeout or Timeout in Apache. If you have a script that can run for a while, Apache will terminate the connection if it doesn't get any data back in a certain amount of time (which can be okay). If it takes 5 minutes to clear the cache and nothing is sent back by PHP until it finishes, and Apache has a timeout of 120 seconds, it will drop the connection before it completes resulting in a timeout like you are seeing. I have many sites that can do things for up to 10 minutes, so my Timeout in Apache is 600 seconds. This allows the PHP requests to finish without things breaking.

Something else I noticed is that you are using unix domain sockets for FPM communication. This can be okay, but they don't scale well on very busy sites. I'd suggest using a TCP socket instead. listen = 127.0.0.1:9000 You'll then need to modify Apache to connect using tcp instead of a domain socket.

Set listen.backlog so connections can be queued when busy. You'll probably also need to adjust the kernel value net.core.somaxconn using sysctl since it's usually pretty low.

Apache MPM: Switch to MPM worker if you aren't using it already. Since you're using FPM, worker is a very efficient MPM for Apache, much better than prefork (often the default). Make sure to tune it to your needs (i.e. settings servers, threads, and MaxRequestWorkers appropriately).

-- Closing

I don't think there's anything too complicated going on here, the first thing I would look at is ensuring a cache delete can finish uninterrupted. Even if this means users see a maintenance page for a couple minutes or their requests are blocked for a short time until it completes, if it avoids 500's and errors it's a small price to pay.

I honestly think there is a problem with deleting the cache and people browsing that is affecting the process and making things take longer than necessary or break.

Let me know if you have any questions or feel free to contact me.

 

Every time the server hangs up you can see a different error if php and/or Apache reach them limits.

If your host is Unix/Linux, could you check the results of command $ top while the CMS is doing any of hard jobs?

If you see the memory exhausted, a big part of swap memory filled and the CPU at top, try to adjust the memory_limit of php.ini to distribute the resources. But probably you need increase the resources, memory and CPU.

If memory and CPU are not busy, may be you have assigned less memory as expected to php. You could run more php-fpm workers, inrease memory limit per process,... see http://linuxbsdos.com/2015/02/17/how-to-reduce-php-fpm-php5-fpm-ram-usage-by-about-50/. Also take a look to the Apache memory and CPU configuration.

 

More Articles


scalatest - Confusing type mismatch in Scala

I have:val words = List("all", "the", "words", "all", "the", "counts", "all", "day")val map = Exercise02.count(words.iterator)val expected = Map("all" -> 3, "the" -> 2, "words" -> 1, "counts" -> 1, "day" -> 1)where Exercise02.count is java.util.Iterator[String] => Map[String, Int]

R: turning list items into objects

I have a list of objects that I've created manually, like this:rand1 <- rnorm(1e3)rand2 <- rnorm(1e6)myObjects <- NULLmyObjects[[1]] <-rand1myObjects[[2]] <-rand2names(myObjects) <- c("rand1","rand2")I'm working on some code that bundles up objects and puts them up in S3. Then I ha

macos - uninstall python on mac Mavericks

I'm a new Mac user. I installed python 2.7.6 but I cannot run my scripts (through IDLE or simply a script). I want to reinstall it but I don't want to mess with the version which comes pre-installed on mac.Thank you for your help.note: there are questions on stackOverflow that are similar to mine, y


php - how to set a page active automatically when it is loaded

i want to change the active class when each page loaded. my jquery code is$(document).ready(function () {$('.nav li a').click(function(e) { $('.nav li a.active').removeClass('active'); $(this).addClass('active'); e.preventDefault();});});and my html code is<ul class="nav nav-stacked bg-n

reactjs - What is the benefit of @emotion/core over emotion for a React project?

If I install emotion then the API is nice and clear: package.json:"dependencies": { "emotion": "^10.0.9",React component: import React from "react";import { css } from "emotion";const someStyle = css` display: none;`function MyComponent() { return ( <div className={someStyle} /> );}Ho

Java's "forEach" to Scala

How can the following be converted to scala? If I live it as it is, I'm getting a big type mismatch expected.entrySet().forEach(entry -> {..})I tried specifying entry to java.util.Map.Entry, or changing to scala foreach, doesn't work.Let me know if you need any more info/code, and I'll create som


macos - What is the difference between installing python from the website and using brew?

I have a Mac with OSX 10.11.6. I used brew to install python3. It installed python 3.5.2, but I need python 3.5.1. I've been googling, but can't figure out how I would install 3.5.1 via brew. So I went to python.org and downloaded the python-3.5.1-macosx10.6.pkg. I searched for how installing python

java - Hadoop options are not having any effect (mapreduce.input.lineinputformat.linespermap, mapred.max.map.failures.percent)

I am trying to implement a MapReduce job, where each of the mappers would take 150 lines of the text file, and all the mappers would run simmultaniously; also, it should not fail, no matter how many map tasks fail.Here's the configuration part: JobConf conf = new JobConf(Main.class); c

php - Dynamically replace form with message upon submission

I've written a basic PHP script to generate an email when my contact form is submitted, but as it stands, it redirects the user to an error or thankyou page, which is a little clunky. I'd love to dynamically replace the entire form (or, at least, the submit button) with an error or success message.

Multi-user mass image uploads with PHP

I'll soon be building a competition microsite which is based entirely around image uploads: in order to enter, users will need to upload an image.I've previously built a similar website on top of a PHP MVC framework, which worked awesomely up until a couple of hours before the competition was due to