Hi @rbielski ,
I noticed that in my 2 consecutive runs ( not rebuilding again) the Approach 2
showed 9.50 ms
and 75 ms
execution times. Whereas, for the same runs, Approach 1
captured execution times are 3.88 ms
and 3.84 ms
.
So according to nsys reports I noticed that :
1.st Run :
** OS Runtime Summary (osrt_sum):
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------ ------------- -------- ----------- ------------ ----------------------
63.3 2,675,490,863 35 76,442,596.1 100,108,943.0 255,806 339,113,956 61,035,452.3 poll
34.0 1,435,584,138 611 2,349,564.9 1,360,573.0 1,002 37,396,376 3,639,243.4 ioctl
2nd Run :
** OS Runtime Summary (osrt_sum):
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------ ------------ -------- ----------- ------------ ----------------------
78.7 1,494,585,725 24 62,274,405.2 47,293,382.0 275,764 230,479,141 58,195,470.8 poll
16.2 307,073,339 607 505,886.9 35,768.0 1,002 33,598,222 1,838,594.1 ioctl
So ioctl
takes around 4x more time which might be one of the reasons of this Approach 2
noted discrepancy between execution times.
I would assume there is a potential
stalling between kernel start/synchronise and might have some gaps between the start of each iteration ( I am iterating these 3 kernels 14 times )
This brings us back to main question. What would be the approach here to make sure that the code does not have this kind of 8x total elapsed time differences ?
Thank you in advance for the help!
Edit : Run 3,4 also resulted in totally different execution times for approach 2
22ms
and 35ms
Also, would it be more healthy way to combine all these kernels into a single kernel ?