Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal statistics when FPU support is enabled #45

Open
jserv opened this issue Aug 23, 2013 · 5 comments
Open

Abnormal statistics when FPU support is enabled #45

jserv opened this issue Aug 23, 2013 · 5 comments
Assignees
Labels

Comments

@jserv
Copy link
Member

jserv commented Aug 23, 2013

After commit 27b9fb2, F9 microkernel has FPU support now. However, it brings a side effect of abnormal statistics as the following:

-------TOP------
 4209 [ schedule_select          ]
 1548 [ softirq_execute          ]
 1544 [ svc_handler              ]
 1113 [ thread_current           ]
  867 [ thread_isrunnable        ]
  440 [ kernel_thread            ]
  436 [ __svc_handler            ]
   72 [ L4_Ipc                   ]

It is evident that symbol L4_Ipc should not run out the ranking.

@ghost ghost assigned georgekang Aug 23, 2013
@georgekang
Copy link
Member

It could be solved by following patch. However, I still don't have good explanation about this.

diff --git a/include/platform/irq.h b/include/platform/irq.h
index 792f36b..53e7f4c 100644
--- a/include/platform/irq.h
+++ b/include/platform/irq.h
@@ -171,7 +171,8 @@ static inline int irq_number(void)
        {                                                               \
                irq_enter();                                            \
                sub();                                                  \
-               request_schedule();                                     \
+               if(NO_PREEMPTED_IRQ)                            \
+                       request_schedule();                             \
                irq_return();                                           \
        }

@jserv
Copy link
Member Author

jserv commented Aug 24, 2013

This implies that the above change reverts all PendSV utilization introduced by @arcbbb

@georgekang
Copy link
Member

I think it might be a timing issue.
And it also might mean the cost of context
switch is heavy.

The following is my sampling result.

 7034 [ no_fp                    ]
 1010 [ schedule_select          ]
  621 [ softirq_execute          ]
  504 [ L4_Ipc                   ]
  386 [ syscall_handler          ]
...
----------------

According to the sampling result of my board,
the address with the most sampling number is 0x80018e0. It is the
return instruction of context switch. We can see when the irq is reopened,
the pendsv is preempted immediately.

0800189a <no_fp>:
 800189a:   4610        mov r0, r2
 800189c:   f002 faf4   bl  8003e88 <thread_switch>
 80018a0:   682b        ldr r3, [r5, #0]
 80018a2:   695a        ldr r2, [r3, #20]
 80018a4:   4696        mov lr, r2
 80018a6:   691a        ldr r2, [r3, #16]
 80018a8:   4610        mov r0, r2
 80018aa:   699a        ldr r2, [r3, #24]
 80018ac:   4612        mov r2, r2
 80018ae:   f00e 040f   and.w   r4, lr, #15
 80018b2:   f094 0f09   teq r4, #9
 80018b6:   bf0c        ite eq
 80018b8:   f380 8808   msreq   MSP, r0
 80018bc:   f380 8809   msrne   PSP, r0
 80018c0:   f103 021c   add.w   r2, r3, #28
 80018c4:   4610        mov r0, r2
 80018c6:   e890 0ff0   ldmia.w r0, {r4, r5, r6, r7, r8, r9, sl, fp}
 80018ca:   f382 8814   msr CONTROL, r2
 80018ce:   f8d3 2080   ldr.w   r2, [r3, #128]  ; 0x80
 80018d2:   b122        cbz r2, 80018de <no_fp+0x44>
 80018d4:   f103 0340   add.w   r3, r3, #64 ; 0x40
 80018d8:   4618        mov r0, r3
 80018da:   ec90 8b10   vldmia  r0, {d8-d15}
 80018de:   b662        cpsie   i
 80018e0:   4770        bx  lr
 80018e2:   f85d eb04   ldr.w   lr, [sp], #4
 80018e6:   4770        bx  lr

I think the root cause of this issue is the same as
issue #40. After patching FPU support, the cost of context switch would exceed one
tick and it is preempted and sampled by Kprobe(ktimer) immediately after reopening irq.
So, to solve it, we should improve context switch performance.

@georgekang
Copy link
Member

Here is a workaround solution.
However, there is one drawbacks in this patch. It would break the encapsulation of mempool.
Besides, closing irq in context switch (6f51800) is still necessary.

diff --git a/include/memory.h b/include/memory.h
index 43b313d..c274e4f 100644
--- a/include/memory.h
+++ b/include/memory.h
@@ -111,7 +111,13 @@ void memory_init(void);

 memptr_t mempool_align(int mpid, memptr_t addr);
 int mempool_search(memptr_t base, size_t size);
-mempool_t *mempool_getbyid(int mpid);
+
+extern mempool_t memmap[];
+inline mempool_t *mempool_getbyid(int mpid)
+{
+       return (mpid != -1)?(memmap + mpid):NULL;
+}
+

 int map_area(as_t *src, as_t *dst, memptr_t base, size_t size,
                map_action_t action, int is_priviliged);
diff --git a/kernel/memory.c b/kernel/memory.c
index 5d826c7..74f4055 100644
--- a/kernel/memory.c
+++ b/kernel/memory.c
@@ -44,7 +44,7 @@
  * Memory map of MPU.
  * Translated into memdesc array in KIP by memory_init
  */
-static mempool_t memmap[] = {
+mempool_t memmap[] = {
        DECLARE_MEMPOOL_2("KTEXT", kernel_text,
                MP_KR | MP_KX | MP_NO_FPAGE, MPT_KERNEL_TEXT),
        DECLARE_MEMPOOL_2("UTEXT", user_text,
@@ -129,14 +129,6 @@ int mempool_search(memptr_t base, size_t size)
        return -1;
 }

-mempool_t *mempool_getbyid(int mpid)
-{
-       if (mpid == -1)
-               return NULL;
-
-       return memmap + mpid;
-}
-
 void memory_init()
 {
        int i = 0, j = 0;

@arcbbb
Copy link
Member

arcbbb commented Aug 31, 2013

As @georgekang mentioned, it is expensive to do dynamic probing on ktimer.
To do pc-sampling, I think we can use static probe instead.
And I have set up an experiment with static probe on ktimer: https://github.com/arcbbb/f9-kernel/tree/test-sampling
The result seems normal.

## KDB ##
-------TOP------
 3672 [ L4_Ipc                   ]
 1373 [ kernel_thread            ]
 1224 [ softirq_execute          ]
 1069 [ __svc_handler            ]
  765 [ schedule_select          ]
  610 [ syscall_handler          ]
  304 [ thread_map_search        ]
  154 [ thread_current           ]
  153 [ __ping_thread            ]
  153 [ dbg_printf               ]
  153 [ pendsv_handler           ]
  153 [ do_ipc                   ]
  152 [ sched_slot_dispatch      ]
  152 [ sys_ipc                  ]
  152 [ ipc_read_mr              ]
    1 [ __pong_thread            ]
----------------

But currently I haven't come out a good way to calculate stack pointer flexibly, I just hardly coded it.
And it needs some work to create a static probe framework like trace event in linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants