Performance issues (many users)

Hi,

We’ve got a customised version of CTFd. When subjected to a high number of concurrent users, performance suffers quite badly. We also see high CPU usage on the Docker hosts.

We’ve looked into the problem (with cProfile and other techniques) and it is looking likely that the use of Redis is at least partly involved. The profiler indicates unreasonable lengths of time spent in the redis client. I appreciate that the author of redis-py my be a more appropriate person to direct my queries towards, but I thought I’d ask here also, in case anyone’s got some knowledge gained from running large-scale CTFd-powered events.

For example, with an endpoint configured to just do cache gets and returning an empty response, getting hit by 500 concurrent users starts giving response times around 7 seconds. Under lighter loads response times sit at around 0.1 seconds.

Has anyone seen behaviour like this or have any advice around the use of redis (or anything else) that might help?

Thanks,
Jeremy