Why is it needed?
•Old LVE-statistics store averages as integer numbers, as % of CPU usage. If user used 100% of CPU for 1 second within an hour, it is only 1-2% for a minute, and 0 for 5 minutes. Data in old LVE-statistics is aggregated to 1-hour intervals. So, such peak load will not be recorded and we need to store data with much higher precision.
•100% CPU usage in old lve statistics means “all cores”. On 32 core servers usage is not visible for most users (as they are limited to 1 core).
•Old LVE-statistics does not provide a way to determine a cause of LVE faults, i.e. what processes are running when user hits LVE limits.
•Notifications in old LVE-statistics are not accurate because they are based on average values for CPU, IO, IOPS.
•Old LVE-statistics functionality is hard to extend.
Major improvements and features
•increased precision of statistics;
•CPU usage is calculated in terms of % of a single core (100% usage means one core);
•lvestats-server emulates and tracks faults for CPU, IO, IOPS;
•lvestats-server saves “snapshots” of user’s processes and queries for each “incident” - added new lve-read-snapshot utility;
•improved notifications about hitting LVE limits (more informative and without false positives);
•implemented ability to add custom plugins;
•MySQL and PostGreSQL support;
•more pretty, scalable, interactive charts;
•snapshots include HTTP-requests.
What features will be implemented in the future?
•Notifications for control panels other than CPanel.
•Burstable Limits/server health: We are monitoring server health (LA, memory, idle CPU) and automatically decreasing/increasing limits based on server health.
•Reseller limits: plugin would analyze usage per group of users (reseller’s usage), and do actions.
•Suspend/notify plugin: would detect that user is being throttled for 10 minutes, and suspend him (just because), or notify, or increase limits.