Mutex vs. Semaphores – Part 1: Semaphores

Nytro · August 14, 2012

Mutex vs. Semaphores – Part 1: Semaphores

It never ceases to amaze me how often I see postings in newsgroups, etc. asking the difference between a semaphore and a mutex. Probably what baffles me more is that over 90% of the time the responses given are either incorrect or missing the key differences. The most often quoted response is that of the “The Toilet Example © Copyright 2005, Niclas Winquist” . This summarises the differences as:

A mutex is really a semaphore with value 1

No, no and no again. Unfortunately this kind of talk leads to all sorts of confusion and misunderstanding (not to mention companies like Wind River Systems redefining a mutex as a “Mutual-Exclusion Semaphore” – now where is that wall to bang my head against?).

Firstly we need to clarify some terms and this is best done by revisiting the roots of the semaphore. Back in 1965, Edsger Dijkstra, a Dutch computer scientist, introduced the concept of a binary semaphore into modern programming to address possible race conditions in concurrent programs. His very simple idea was to use a pair of function calls to the operating system to indicate entering and leaving a critical region. This was achieved through the acquisition and release of an operating system resource called a semaphore. In his original work, Dijkstra used the notation of P & V, from the Dutch words Prolagen (P), a neologism coming from To try and lower, and Verhogen (V) To raise, To increase.

With this model the first task arriving at the P(S) [where S is the semaphore] call gains access to the critical region. If a context switch happens while that task is in the critical region, and another task also calls on P(S), then that second task (and any subsequent tasks) will be blocked from entering the critical region by being put in a waiting state by the operating system. At a later point the first task is rescheduled and calls V(S) to indicate it has left the critical region. The second task will now be allowed access to the critical region.

A variant of Dijkstra’s semaphore was put forward by another Dutchman, Dr. Carel S. Scholten. In his proposal the semaphore can have an initial value (or count) greater than one. This enables building programs where more than one resource is being managed in a given critical region. For example, a counting semaphore could be used to manage the parking spaces in a robotic parking system. The initial count would be set to the initial free parking places. Each time a place is used the count is decremented. If the count reaches zero then the next task trying to acquire the semaphore would be blocked (i.e. it must wait until a parking space is available). Upon releasing the semaphore (A car leaving the parking system) the count is incremented by one.

Scholten’s semaphore is referred to as the General or Counting Semaphore, Dijkstra’s being known as the Binary Semaphore.

Pretty much all modern Real-Time Operating Systems (RTOS) support the semaphore. For the majority, the actual implementation is based around the counting semaphore concept. Programmers using these RTOSs may use an initial count of 1 (one) to approximate to the binary semaphore. One of the most notable exceptions is probably the leading commercial RTOS VxWorks from Wind River Systems. This has two separate APIs for semaphore creation, one for the Binary semaphore (semBCreate) and another for the Counting semaphore (semCCreate).

Hopefully we now have a clear understanding of the difference between the binary semaphore and the counting semaphore. Before moving onto the mutex we need to understand the inherent dangers associated with using the semaphore. These include:

Accidental release
Recursive deadlock
Task-Death deadlock
Priority inversion
Semaphore as a signal

All these problems occur at run-time and can be very difficult to reproduce; making technical support very difficult.

Accidental release

This problem arises mainly due to a bug fix, product enhancement or cut-and-paste mistake. In this case, through a simple programming mistake, a semaphore isn’t correctly acquired but is then released.

When the counting semaphore is being used as a binary semaphore (initial count of 1 – the most common case) this then allows two tasks into the critical region. Each time the buggy code is executed the count is increment and yet another task can enter. This is an inherent weakness of using the counting semaphore as a binary semaphore.

Deadlock

Deadlock occurs when tasks are blocked waiting on some condition that can never become true, e.g. waiting to acquire a semaphore that never becomes free. There are three possible deadlock situations associated with the semaphore:

Recursive Deadlock
Deadlock through Death
Cyclic Deadlock (Deadly Embrace)

Here we shall address the first two, but shall return to the cyclic deadlock in a later posting.

Recursive Deadlock

Recursive deadlock can occur if a task tries to lock a semaphore it has already locked. This can typically occur in libraries or recursive functions; for example, the simple locking of malloc being called twice within the framework of a library. An example of this appeared in the MySQL database bug reporting system: Bug #24745 InnoDB semaphore wait timeout/crash – deadlock waiting for itself

Deadlock through Task Death

What if a task that is holding a semaphore dies or is terminated? If you can’t detect this condition then all tasks waiting (or may wait in the future) will never acquire the semaphore and deadlock. To partially address this, it is common for the function call that acquires the semaphore to specify an optional timeout value.

Priority Inversion

The majority of RTOSs use a priority-driven pre-emptive scheduling scheme. In this scheme each task has its own assigned priority. The pre-emptive scheme ensures that a higher priority task will force a lower priority task to release the processor so it can run. This is a core concept to building real-time systems using an RTOS. Priority inversion is the case where a high priority task becomes blocked for an indefinite period by a low priority task. As an example:

An embedded system contains an “information bus”
Sequential access to the bus is protected with a semaphore.
A bus management task runs frequently with a high priority to move certain kinds of data in and out of the information bus.
A meteorological data gathering task runs as an infrequent, low priority task, using the information bus to publish its data. When publishing its data, it acquires the semaphore, writes to the bus, and release the semaphore.
The system also contains a communications task which runs with medium priority.
Very infrequently it is possible for an interrupt to occur that causes the (medium priority) communications task to be sch
eduled while the (high priority) information bus task is blocked waiting for the (low priority) meteorological data task.
In this case, the long-running communications task, having higher priority than the meteorological task, prevents it from running, consequently preventing the blocked information bus task from running.
After some time has passed, a watchdog timer goes off, notices that the data bus task has not been executed for some time, concludes that something has gone drastically wrong, and initiate a total system reset.

This well reported event actual sequence of events happened on NASA JPL’s Mars Pathfinder spacecraft.

Semaphore as a Signal

Unfortunately, the term synchronization is often misused in the context of mutual exclusion. Synchronization is, by definition “To occur at the same time; be simultaneous”. Synchronization between tasks is where, typically, one task waits to be notified by another task before it can continue execution (unilateral rendezvous). A variant of this is either task may wait, called the bidirectional rendezvous. This is quite different to mutual exclusion, which is a protection mechanism. However, this misuse has arisen as the counting semaphore can be used for unidirectional synchronization. For this to work, the semaphore is created with a count of 0 (zero).

Note that the P and V calls are not used as a pair in the same task. In the example, assuming Task1 calls the P(S) it will block. When Task 2 later calls the V(S) then the unilateral synchronization takes place and both task are ready to run (with the higher priority task actually running). Unfortunately “misusing” the semaphore as synchronization primitive can be problematic in that it makes debugging harder and increase the potential to miss “accidental release” type problems, as an V(S) on its own (i.e. not paired with a P(S)) is now considered legal code.

In the next posting I shall look at how the mutex address most of the weaknesses of the semaphore.

Posted on September 7th, 2009

Sursa: Sticky Bits

innocentia · August 23, 2012

Cred c? afirma?ia corect? este 'Un mutex poate fi implementat cu un semafor cu valoarea ini?ial? 1'.

Nytro · August 23, 2012

Cred c? afirma?ia corect? este 'Un mutex poate fi implementat cu un semafor cu valoarea ini?ial? 1'.

Nu, asa stiam si eu, dar am cautat mai mult de o pagina de "documentatie", si nu e chiar asa. In plus, am tot vazut zeci de pareri care se contrazic, iar diferenta in primul rand consta in modul in care se folosesc.

Apoi, modul de implementare este total diferit, mai ales in functie de sistemele de operare. Pe Windows de exemplu, se pot folosi sectiuni critice in loc de mutexi, si e de preferat asta, deoarece sectiunile critice sunt implementate user-land, iar mutexul este un obiect global, inter-process, gestionat de kernel, iar aceste sysenter-uri in kernel sunt mai consumatoare de timp. Pe Linux la fel, lucrurile stau putin diferit, dar nu pot sa iti spun cu certitudine cum, deoarece am citit mai multe pareri si se contrazic, unele afirmand ca semafoarele sunt implementate user-land, ceea ce mi se pare o prostie din moment ce exista explicit syscall-uri special pentru lucrul cu semafoare, iar altii spuneau ca mutex-ul chiar este construit pe baza de semafor, in timp ce altii afirmau ca nu au nicio legatura, dar ca mutex-ul este implementat userland.

Cand ajung la munca cred ca fac un mic "benchmark test".

innocentia · August 23, 2012

Da, sec?iunile critice sunt mai rapide, dar nu pentru c? totul se întîmpl? în userland (nu po?i implementa un mutex corect f?r? s? ai acces privilegiat) ci pentru c? mutex-ul Win32 e mult mai 'heavy'. E cross-process, are un nume, are un handle windows, are un nume... e cu totul altceva. Sec?iunile critice sunt, în Win32 API, mutexul.

În general, pentru implementarea unui mutex po?i folosi un semafor. E îns? complicat s? implementezi un mutex recursiv cu semafor, ?i vei avea nevoie de dou? semafoare. Nu e un lucru facil s? faci s? mearg? un mutex cum trebuie

PS: pluralul substantivelor masculine care se termin? în x se face cu c?i, deci, mutec?i.

Sign In

Mutex vs. Semaphores – Part 1: Semaphores

Recommended Posts

Nytro

innocentia

Nytro

innocentia

Join the conversation

Browse

Activity

Pages