Our team is developing PCRF (Policy and Charging Rules Function) product that is a significant component in operator network. We have a successful installation in Yota production network and so two important characteristics that we thoroughly care about from version to version are performance and stability under high load. To reach this goal we passed a long and interesting way from unit testing to functional regression testing and various performance testing in the end. And so here we would like to tell you some of our PCRF testing “secrets”. Below we will tell you about three main steps in PCRF testing (excluding unit testing that is zero and obvious one) that we follow during each version’s stabilisation.
1. Regression testing
We hold on to the Continuous Integration methodology and have Jenkins installed with all builds done there and unit tests and regressions tasks started from it after each build automatically. At present we have:
|Component||Smoke tests||Regression tests|
Smoke tests are the selection from Regression part. This small set is intended for initial evaluation of the new version and thus includes the main functionality tests with the exception of the negative tests and all non-default and non-trivial configurations. One can also mention that we have independent set for PCRF and DDF (that is Data Distribution Function node – a centralized storage used by all other local PCRFs). The reports in Jenkins are presented in the following way (click to look closer):
Our QA department prepares functional tests using Robot Framework and their own library written on Python as a back-end. To make the testers life easier when working with the Diameter (that is a binary protocol) in Python we have implemented C shared library that supports all Diameter applications known to PCRF and is built together with it so any change in Diameter protocol library is delivered to QA team immediately. It is called libdiameter_converted.so and it makes the translation from XML format to Diameter binary message and back. So test can work with the Python object that can be converted to Diameter binary data and back through the XML step.
Regression tests recreate full PCRF environment emulating PCEF (Policy and Charging Enforcement Function) equipment, various DPIs (Deep Packet Inspection), AF (Application Function), provisioning from BSS (Business Support System) and O&M console operations. Example of one of our test cases together with the Robot Framework test implementation can be found here.
QA team also calculates test coverage. That is 64% for now. But this includes many Diameter messages that are presented in the library but not used in PCRF application and also some debugging utilities that are compiled together with PCRF and these are not covered with tests yet. The unit-test coverage is much better and is shown on the picture:
Rem: On Free PCRF image meanwhile one can start Seagull functional tests by the following commands set:
2. Seagull performance tests
To measure the performance on first-row scenarios we started from the Seagull instrument, that is a free, GPL multi-protocol traffic generator test tool, very popular for many cases, including Diameter testing (compatible with the RFC 3588) . And the significant benefit is that Seagull is optimized for the performance scenarios. And if some new Diameter application is needed this becomes a matter of editing an XML file.
Seagull scenarios that we’ve passed through are the following:
– one PCEF GxCCR-I/GxCCA-I -> GxCCR-T/GxCCA-T
– one PCEF GxCCR-I/GxCCA-I -> GxCCR-U/GxCCA-U -> GxCCR-T/GxCCA-T
– one PCEF GxCCR-I/GxCCA-I -> (GxRAR/GxRAA) * n times -> GxCCR-T/GxCCA-T
– one PCEF GxCCR-I/GxCCA-I -> GxCCR-U/GxCCA-U with usage monitoring enabled-> GxCCR-T/GxCCA-T
Interesting report done with the help of Seagull instrument for the previous 3.5.2 PCRF version can be found here.
Next let us show some numbers from 3.6.0 version that was released recently:
– 1 000 000 subscribers (IMSIs)
– 50 unique services in PCRF dictionary
– 50 unique policies in PCRF dictionary
– 2 services for each subscriber
– 2 attributes for each service
– servers used: HP ProLiant DL360 G5: Intel Xeon E5420 2500 MHz x 2, 8 Gb RAM
– Lua script runs for policy selection for each subscriber and analyzes subscriber profile, subscriber services, service attributes
– performance run – 11000 TPS
– errors happening during performance testing – 0, no errors.
3. Self-implemented bench tool
Seagull is perfect for simple scenarios (especially when only one outer node is communicating with PCRF), but several more complicated cases also exist and need to be measured in respect of performance. They are:
– PCEF and DPI both establish sessions to PCRF for one subscriber.
– PCEF, DPI and AF (Application Function) establish sessions to PCRF for one subscriber.
– Usage monitoring performance cases with PCEF and DPI sessions established.
– Session validation scenario based on GxRAR sent from PCRF to PCEF and DPI sessions established.
– Session re-validation scenario based on GxCCR-U with revalidation event trigger sent from PCEF and DPI to PCRF.
In all cases above the order of the messages is not strictly determined. And despite going through the different connections messages are strictly relative to each other in terms of order and data.
After failing to make all of these complicated scenarios with branching logic and mutable messages order on Seagull we’ve decided to implement our own benching solution. This was quite easy since we’ve implemented a Diameter library, transport library and all statistics stuff as independent components that have high re-use factor. Scenarios were implemented in C/C++ and are based on finite automaton with some non-deterministic parts included to handle mutable message order.
All tests below were done on the following configuration:
– 1 000 000 subscribers (IMSIs)
– 100 unique services in PCRF dictionary, 50 for one PCEF dialect and 50 for other one
– 100 unique policies in PCRF dictionary, 50 for one PCEF dialect and 50 for other one
– 2 services for each subscriber at a time
– 2 attributes for each service
– 250 000 subscribers have 1-3 usage accumulators assigned
– servers used: usual hosts with 4 Intel core processor i7 2.93GHz, 8 Gb RAM
– Lua script runs for policy selection for each subscriber and analyzes PCEF dialect, subscriber profile, subscriber services, service attributes, subscriber accumulator
Scenarios and results summary:
|Scenario name||Scenario description||TPS (messages handled per sec)||Simultaneous sessions count||Max CPU % on between all PCRF processes|
|double_procera||PCEF and DPI (Procera) both establish sessions to PCRF for one subscriber, usage monitoring is switched on.||9000||500 000||80-85%|
|double_procera with revalidation||The same as double_procera scenario but with revalidation GxCCR messages included.||8500||500 000||80%|
|rx_simple||PCEF, DPI and AF (Application Function) establish sessions to PCRF for one subscriber.||11000||500 000||90-95%|
It should be mentioned here that there are no errors of any kind during these scenarios running. And any of these load-scenario can operate for the unlimited amount of time on PCRF cluster without any degradation.
Yota PCRF team