Our team in Yota puts all the efforts to the PCRF product development. Policy and Charging Rules Function (PCRF) is an advanced policy management solution enabling the operator to dynamically control network elements and apply real-time policies based on service, subscriber and usage context. The performance of our application takes huge part of our energy and one of the key measurements that we definitely care is the CPU load. Below we would like to demonstrate the fantastic progress we’ve achieved from the version 3.5.2 to the version 3.6.1 in this field (there is a CPU load percent along the Y axis):
Most of the improvements touched the interaction between Data Base and PCRF logic. The smart statements usage, intelligent data r/w operations and cached information gave us such a great profit. To analyze and improve the DB statements usage our team developed a special utility called tt_perf_info. The idea of it is to measure the time that was spent in various parts of the statement execution. The fetch time, the execution time, the number of calls and the total percent from the whole time are collected with the help of this instrument. The measurements are based on the simple counters that are set inside the PCRF code for each statement and looks like:
- Example of the simple update statement that increments usage accumulator value:
UPDATE FIRST 1 accum SET value = (value + :increment) WHERE accum_id=:accum_id AND subscriber_id=:subscriber_id RETURNING scheme_id, value, st_level, next_reset, last_reset INTO :scheme_id, :value, :st_level, :next_reset, :last_reset
- Example of the simple select statement that takes session blob data:
SELECT FIRST 1 value FROM session_data_b WHERE obj_id=:obj_id
The statistics can be collected during PCRF run and dumped in the table format that has changed a bit from the 3.5.2 to the 3.6.1 version. The examples of the tt_perf_info output is given below with the top 15 of the statements (time is given in microseconds). The test used for the statistics collection is one with the usage calculation (CCR-I, CCR-U (with usage), CCR-T (with usage)).
3.5.2 top 15
3.6.1 top 15 (empty places are just 0.0 values in this format)
Step 1: commit reducing
One can mention here that pcrf.commit statement decreased from 12006 CPS to 1199 CPS. This happened due to the first optimization step. During this step the statements were updated to make commit only when actual changes are done. That means that for example for UPDATE statement the changed rows count check has appeared and the commit is done only when this count is non-zero. DELETE statament is handled in the same way.
Step 2: merges excluding
Many MERGE requests were used in PCRF 3.5.2 broadly. But one day we’ve realized that too many locks on the whole tables are done and this happens due to these MERGE requests. For optimization the MERGE statement was changed to get-update-insert combination. So to merge some information to the DB table first try to get it, if it exists then call UPDATE statement, if not call INSERT one. Moreover no transaction was implemented here and the whole algorithm is fully lockless in this sense. In case INSERT or UPDATE fails the whole combination is called again recursively. Our experience shows that this approach really works for our case and reduces the table locks strongly.
Step 3: usage configuration caching
The next steps referred to the data storage in DB and caching it in the application. PCRF subscribes and collects users traffic usage counters. In the 3.5.2 version the traffic accumulators configuration was stored in DB fully and managed through the REST methods outside. The scheme was quite complicated with the many-to-many DB relationship (monitoring key can match several accumulators and vice versa). Also several sources (PCEFs, DPIs) to get the accumulator values can be supported and united in one value inside the PCRF. In the 3.6.1 version all the configuration information was moved to the text file in the xml format that is re-read by the periodic process when some changes were performed to it. And per-session accumulator information is stored in the session blob that already exists in DB and that is read from there on every incoming diameter request for the pointed session. The checksum of the configuration information is calculated and stored in this blob. During usage counters handling PCRF checks this sum and updates session blob if needed. To read/write the blob from one table in the DB is times faster than to collect information from various DB tables so the reduction of the time spent on DB information reading was considerable.
Step 4: Lua Engine calls optimization
The idea of the checksum found such a great support in our team that we’ve also optimized the Lua Engine call that produces the subscriber policies. The structure with all the data that potentially influences the Lua call results (and so the policies) was made and the hash was calculated. In the 3.5.2 version PCRF calls Lua Engine for nearly every CCR-I, CCR-U and RAR request. In the 3.6.1 version PCRF checks this sum first and calls Lua only in case of changes.
Step 5: network configuration moving
After picking out the components that use the DB often we’ve marked PCRF business logic and networking part. For the last one the networking configuration has been stored in DB since very early version. After moving all the peers and connections data to the shared memory we have separated business logic to work with DB from network component to work with shared memory. Some periodic processes were added of course to update the shm data but in the whole it works times faster now due to the DB locks reducing.
Step 6: diameter command parse optimization
There are a big variety of Diameter commands used in PCRF. And they can be either short or long with many AVPs inside. But not all the parts of the application need all the information from the Diameter commands. Some uses only the header with destination/origin host/realm or just several identification fields like subscriber_id, session_id, etc. To make the Diameter parser wise but elegant we’ve added a mask with the fields that need to be read from the command. And this mask differs for the various components inside the PCRF. Furthermore to cut down expenses on the AVPs data storing we nearly removed the memory coping and replaced it with the pointers set-up. So only one copy of the Diameter message is always stored and many structures with various pointers are used inside the PCRF components.
Step 7: time caching
When the PCRF starts working so quickly we start noticing heavy logging process too often. Some says that you can sacrifice the log information to the whole performance but we could not accept it. Starting to analyze the logging process and the information that are stored on various logging levels we’ve found out that the most popular and frequent note that are present there is … date and time! Of course every single line of the log has it. This string was constructed every time for every log entry in the 3.5.2 version. Within the simple optimization date and time string was cached with the second precision and this string is re-calculated only in the next second and that made a great impact on the logging performance because PCRF handles so many events at a time that hundreds of log entries have the same time mark.
Many other architectural changes were also done between the 3.5.2 and the 3.6.1 versions, a big part of the new functionality was added there but PCRF still stands for it performance.