Programming with boost signals and slots

by rk on 05/12/2016

Signals and slots are a wonderful way to organize a big application, programmers who have worked on huge C/C++ projects might have felt the pain of tight coupling in the code. Tight coupling makes it harder to keep track of function call hierarchy and brings in a hard compile time dependency between the parts of the application. Huge projects like video games and operating system kernels etc quickly become unmaintainable piles of crap if we follow structured/object oriented designs. 

Slots break the tight coupling and make the applications more modular, independent pieces of code can be coupled in a loose fashion and made to work with each other smoothly. Slots are a light weight publish/subscribe mechanism but the publish and subscribe is happening in the operating system process. Similar to topic based publish/subscribe, slots can register handlers to be invoked when a signal happens, the slots framework linksthe slots to their corresponding signals, when a signal is raised the corresponding slot handler is invoked with the arguments used to raise the signal. Signals can also be raised from inside the library and can be handled by slots registered in the application.

Qt uses signals and slots for its GUI event handling.
GLIB also has a similar mechanism.
Many protocol libraries use this approach to publish events to state machines.

There are various libraries available in C++. I am going to use boost::signals2 library, i will also be using boost::asio to implement the indirect invocation example. I will be basing the example on boost helloworld example given in signals page.

Direct invocation:
In the direct invocation approach, raising a signal synchronously invokes the corresponding slots. Direct invocation is of very little us to us, since any medium to high scale application is written in a dispatch based method where the events from different channels are being handled using a dispatch loop.

Code:

#include<iostream>
#include<boost/signals2/signal.hpp>

static void
handleInputText(std::string text)
{
    std::cout<<"handleInputText()"<<" text provided: "<<text;
    return;
}

//let us have a GUI event which is raised when the user is done entering text at the console.
int main(int ac, char **av)
{
    boost::signals2::signal<void(std::string)> textEntered;
    textEntered.connect(&handleInputText);
    std::string hello("hello world");
    textEntered(hello);
    return 0;
}

Indirect invocation:
In the indirect invocation approach, a dispatcher loop akin to boost::asio::run() is used to dispatch the signal to the corresponding slots along with their arguments, dispatcher based invocation would help us to integrate the slots mechanism with an existing channel multiplexing loop in the application. Indirect invocation method is also thread safe, a signal can be raised from a worker thread and the slot can be invoked in the main thread of the application.

Code:

#include<iostream>
#include<thread>
#include<boost/signals2/signal.hpp>
#include<boost/asio.hpp>

static boost::asio::io_service svc;
static boost::signals2::signal<void(std::string)> textEntered;

static void
handleInputText(std::string text)
{
    std::cout<<"\nhandleInputText()"<<" text provided: "<<text;
    return;
}

static void 
worker()
{
    sleep(2);
    //Post gaurantees the lambda will be called with its arguments 
    //in the main thread.
    //raise the signal.
    svc.post([](){
            std::cout<<"\nRaising signal.";
            std::string hello("hello world");
            textEntered(hello);
            });
    return;
}

//let us have a GUI event which is raised when the user is 
//done entering text at the console.
int main(int ac, char **av)
{
    try
    {
        boost::asio::io_service::work work(svc);
        textEntered.connect(&handleInputText);
        std::thread w(std::bind(&worker));
        svc.run_one();
        w.join();
    }
    catch(std::exception &ex)
    {
        std::cerr<<"\nmain() exited with exception:"<<ex.what();
    }
    return 0;
}

References:

http://www.boost.org/doc/libs/1_57_0/doc/html/signals2.html

No Comments

Designing better multithreaded applications

by rk on 05/12/2016

Why do we need multithreading ? 
A not so recent herb sutter article suggests that the hardware has changed and is changing at a very faster pace. It is not about clock cycles any more, it is about the number of cores on the die. At the time of writing this article i am holding a smart phone which has 4 cores and a GPU chip on it. A single threaded application will just run on a single core leaving all the other 3 cores free, to use all the 4 cores efficiently we need to introduce 3 more threads in to our application. Designing application as multithreaded makes them more responsive. 

Some times multithreading simplifies the design of the application to a greater extent, ever seen code full of callbacks ? pick any nodejs project, you can see how hard it is to understand the control flow. Having said all this, i want to reiterate it is hard to get multithreading right. It is about designing your application and not coding. Today there are a plethora of wonderful tools available to make multithreading easy. Tools which can detect data races and memory corruption in your program with complete stack trace information.

Here i will try to dump my knowledge gained through my experience desiging and implementing multithreaded software. This article by no means is complete and is a living document, from time to time i will revisit and add/update the content.

Message passing:
The best form of threading out there, lots of time critical systems use this form of threading. Every thread has its own data structures and are not shared with other threads. Whenever there is a need for communicating with other threads, a message is built and is sent to other threads. Every thread has a mailbox in to which messages are dropped. Threads pick up the messages from the mailbox at their own pace and process them, messages can be requests from other threads to which responses need to be sent after processing. Response messages are dropped in to the mail box of the requestor thread. The famous mach micro kernel is one such kernel which follows this threading model, mailboxs in mach are called ports.

Lock free:
Data structures are said to be lock free, if "some" concurrent operations are gauranteed to be finished in a finite number of steps. Even if a single thread is blocked, other threads in system are able to progress. Scalability of the system is impacted but lock free data structures are simpler to implement than wait free data structures.

Wait free:
Data structures are said to be wait free, if "every" concurrent operation is gauranteed to be finished in a finite number of steps. Wait free data structures gaurantee that no single thread will ever be blocked. Wait free data structures are harder to implement and get right.

Obstruction free:
A multithreaded system can some times enter in to a phase of deadlock, a dead lock is a situation where there is a cyclic dependency between threads, a thread is waiting for a resource locked by another thread which in turn is waiting for a resource to be freed by the first thread. Both of the threads stuck waiting for each other and the system progress comes to a grinding halt. This happens when a mutex is used to implement locking mechanism between threads.

Live lock is a complete opposite situation of dead lock, threads are blocked in a dead lock situation but are not consuming any CPU cycles, live lock is a situation where threads are continously checking for the availability of the resources while consuming the CPU cycles. This happens when a spinlock is used to implement locking mechanism between threads.

Read/write locks instead of locks:
A general lock is a "write" lock, a write lock is also called as "exclusive" lock. Most of the times programmers use exclusive lock for all kinds of uses. This is wrong way of using locks. For example, if we were to implement a datastore to be shared between multiple threads. A datastore supports 2 operations, a fetch operation to read the object and a store operation to update the value of the object. 

Issuing a fetch operation by 2 threads at same time will not corrupt the data store, since the data store is not getting modified. Using an exclusive lock inside a fetch operation will reduce the scalability by factor of two. It is advised to use a read/shared lock inside a fetch operation. Read lock is called shared lock because it can be shared between threads which are reading.

Issuing a store operation by 2 threads at same time will corrupt the data store, the data store will be in an inconsistent state. So it is advised to use an exclusive lock inside update operation. Write lock is called exclusive lock because it cannot be shared between threads which are writing.

Software transactional memory:
Transactional memory is a recent entry to the concurrency scene, transactional memory focusses on simplifying the job of programmer. Transactional memory borrows its concepts from databases, it is easier to understand and implement as well. All the operations which are transactional are enclosed in a transactional block, transactional block is a programming language construct. All the operations in transactional block are retried up on detecting any inconsistency in the variable values. There are no available production ready implementations of transactional memory available.

Read your library documentation for multithreading/reentrancy support:
Before using any library, read through the documentation of the library to check whether the library can be used in a multithreaded application. Libraries which are not re-entrant are also not fit to be used in multithreaded applications. Unix man pages often warn about library functions which are not re-entrant, functions which are suffixed with "_r" should be used or every non re-entrant call should be protected with a mutex.

Libraries which are not thread safe should not be used in multithreaded applications, using a library which is not thread safe results in corruption of the private state of library resulting in hard to debug crashes.

Mixing signals with multithreading:
Signals should never be mixed with multithreading, it is unclear to which thread signals will be delivered. Signals should be blocked inside all threads except one thread in which signals will be handled. Modern Linux kernels provide a file descriptor called signal_fd which is used to handle the pending signal notifications to the process.

Using tcmalloc or jemalloc to reduce malloc contention:
Malloc and free are used for dynamic memory allocation. Malloc is implemented at time when multithreading is not available. Malloc implementation becomes a bottleneck since it is not thread aware, every malloc call will be acquiring a global lock for which all other threads will be contending. Modern malloc implementations like tcmalloc use a thread level memory cache from which every malloc request will be served. When the implementation runs out of memory from the local cache it fetches more memory from the global cache. This will avoid the contention for global lock and brings in huge performance improvements.

Thread cancellation right approach:
The right way to cancel a thread is to ask it to exit. The right way to implement a thread routine is to use a atomic flag owned by thread which when set will make the thread to exit. The thread routine should be implemented as below 

        Thread A                                                Thread B
        ————————                                —————–
        bool askedToExit = false;                    //ask thread to exit
        while(!askedToExit){                           askedToExit = true;
            //Thread code.
            // …
        }

Thread routine can also exit up on executing last statement of the routine. If you are using a programming language which supports exceptions, thread might exit on an unhandled exception.

Thread specific data:
Every one of us might have used global variables one or the other time, and we have been taught in our programming class that they are BAD !!! and EVIL !!!, we were told the same about goto statements, but neverthless, every production code 
contains goto statements and global variables. Why all this crap ? 

A scope of global variable is whole program or process wide, global variables can be accessed inside every function in the program. Global variables become tricky the moment we bring in threads in to picture. All the threads will be accessing and modifying the same global variable, i will demonstrate this with the help of an example. 

System calls on unix leave their error status in a global variable called "errno" (defined in errno.h). Suppose if 2 system calls fail in 2 threads, both the threads will try to set the errno variable to their respective error code. Depending on the interleaving of threads, one thread overwrites the error code set by the other thread. To prevent this we need 2 instances of global variable errno one per each thread. Just for this exact reason, we have thread specific data feature. Declare the global variable to a thread specific data and each thread has its own private global variable. GCC compiler provides keyword "__thread" to declare a varable as thread specific. 

Using thread sanitizer and memory sanitizer:
Recent compiler developments brought some wonderful tools to debug multithreaded applications. Clang compiler implemented 
flags like -fsanitize-thread, -fsanitize-memory to instrument application code. Compile/Link your applications with flags these flags to detect race conditions and memory corruptions in multi-threaded applications. These flags instrument the application code which log the access to every memory location and thread spawns. Up on running the application, the application produces a log of all the anaomolies found during execution. Lot of google chrome bug fixes are reported by this wonderful library and were fixed. At the time of writing this article, some g++ compilers also support these options to a limited extent.

Using pipeline model with lock free queues:
Pipeline model is one of the best approaches to multithreading, the application is divided in to multiple stages of processing connected by queues, each stage picks its input from an input queue and enqueues its output into another queue to be processed by next stage. Each stage can be implemented by a thread. Queues connecting multiple stages should be lock free to avoid contention between threads. Pipe line model is easy to debug and is less prone to race conditions.

Avoid dynamic thread creation:
Thread creation can be a costly affair both time wise and memory wise depending on the operating system. A thread pool should be implemented to which jobs can be assigned. Most of the latency sensitive applications like video games and CAD simulations implement thread pools to avoid delays.

Fibers:
Two approaches are followed whenever implementing highly scalable servers, servers which need to handle thousands or lakhs of network connections. Threading model creates a single thread to handle a connection. Nonblocking model uses IO multiplexing techniques to multiplex between simultaneous network connections. Threading model is easier to implement but takes up more resources compared to nonblocking model. Applications developed using nonblocking design result in a pile of callbacks. Code which uses callbacks is harder to debug and understand. 

A middle ground exists which is not much mentioned in the literature, it is fibers, fibers are like threads but they are not real operating system threads. Fibers combine best of the both worlds, they are much cheaper and scale well. Fibers have their own scheduler and are scheduled basing on the network input. This style of threading can be seen in programming languages like Erlang and Go.

Use timed locks and timed waits:

Every locking mechanism provides a timeout option and an option to block forever, for example, pthread_mutex_lock blocks indefinitely for ever waiting for the lock. Instead use pthread_mutex_timedlock, pthread_mutex_timedlock fails returns a timeout error if it could not get the lock in a given time period. Using a timed lock improves the debuggability of the application, if any of the threads in an application could not retrieve a lock for any reason, application fails gracefully giving a chance for the programmer to debug. All the pthread apis have their timed eqivalents, read your programmer manual today.

RCU mechanism to increase system concurrency: 

RCU expands to Read-Copy-Update. This is helpful in cases when there are more number of readers compared to writers, data structures are more often read than modified. Linux kernel uses this mechanism to increase the concurrent access to its main memory and networing protocol layer data structures. The fundamental concept is to maintain 2 versions of data, an old version and a new version, new version of the data is the data written recently, data being read is read through a reference, each time a reader is done reading the data it decrements the reference count, when the last reader is done the reference count becomes zero and he updates the links to point to new data. Concurrency improves by letting multiple reads happen simultaneously. While this is implemented in kernel and works well, it is debatable about its use in user space systems. There are libraries available for creating RCU capable data structures. I have no authority to comment on the quality of these libraries, since i have never used them. 

Wrapping long operations in threads:

A complex server application interacts with many external servers during its course of execution, it might talk to an accounting server, it might talk to an authorization server or any other server. External servers may sit on different machines in LAN and might take a while to respond to the request, depending on the load and the network traffic. If we call in to the external server in the main thread, our main thread gets blocked until the response arrives, our server cannot afford to block as it degrades the overall system throughput. All blocking operations like disk access, external server access etc should be performed inside a native thread. The thread should be assigned a callback to be performed after it completes its task. Thread pools should be used to dispatch jobs. C++11 provides a feature called std::async which exactly does this, it runs the given function inside a native thread. Alternatively several libraries provide thread pools for this purpose. 

Logging:
Logging library being used should be thread safe, log messages should have a thread id prefixed.

References:
http://www.gotw.ca/publications/concurrency-ddj.htm
http://en.wikipedia.org/wiki/Mach_%28kernel%29
http://www.1024cores.net/
http://rethinkdb.com/blog/lock-free-vs-wait-free-concurrency/
http://concurrencyfreaks.blogspot.in/2013/05/lock-free-and-wait-free-definition-and.html
http://stackoverflow.com/questions/4211180/examples-illustration-of-wait-free-and-lock-free-algorithms
https://www.justsoftwaresolutions.co.uk/threading/non_blocking_lock_free_and_wait_free.html

http://www.boost.org/doc/libs/1_57_0/doc/html/lockfree.html 

http://lwn.net/Articles/262464/

No Comments

Building distributed systems IV: a template for microservices and more

by rk on 05/12/2016

I was just wondering, how to name this post. For now i am not able to think of any thing better. Microservices seem to be all rage these days. I guess i cannot do a better job than martin fowler, thanks for a wonderfully written article. He describes it at an architectural level, in this article i will try to develop a template for microservice and how you can quickly build one using existing libraries and frameworks. I will be focussing on UNIX based systems.

Microservices:
Miroservices were always there, it is classic UNIX philosophy where you design your system as a group of processes exchanging data with each other, every process is designed to a single well defined task. In the UNIX architecture the out put of one process was sent to input of another process through a PIPE. This is good for text processing systems, but todays systems are more complex and have complicated requirements, a server can be multithreaded and can be talking to many different external servers on behalf of a single request.

Advantages of microservice pattern:
It is always good to have a modular code rather than having a huge pile. Modular code is easy to maintain and debug.

Loose coupling:
Parts of the system will be loosely coupled, there will be less depdency between subsytems, it is advised to prefer loose coupling to strong coupling when designing systems. 

Memory protection:
Operating system enforces a boundary between processes in system, processes cannot step on each other's memory and a crash is contained in a particular process and is not impacting other processes in the system. The overall progress of the system is not affected and still system as a whole is progressing.

Multicore:
Splitting system in to multiple processes improves the utilization of the extra cores on the hardware.

Programming language independence:
The development team need not rely on one programming langauge expertise, different services can be implemented in different programming languages. Development is accelerated when there is programming language independence.


Remote procedure calls:
In the modern architecture we replace PIPEs with remote procedure calls, a service calls in to another service if it needs a piece of data or some service. A remote procedure call involves marshalling the parameters for the request in to a flat buffer and sending it to the provider service, the provider unmarshalls the message and executes the operation requested, depending on the result of the operation provider can send back data or an error.

Libraries like apache thrift make it super easy to implement remote procedure call based services, the library provides many different servers ranging from a simple one to a thread pooled nonblocking server.


Service channel:
A service channel is the channel on which service accepts service requests and sends responses back to the callers. It is the main channel through which lot of data flows in and out of the service. On unix based systems, stream based unix domain sockets can be used to implement this channel. If you are developing a mission critical application, you can posix message queues to decouple the transport from the service, the incoming messages can be queued during the time the service is recovering from a crash or going for an upgrade.

Control channel: 
A node level service is required to monitor the health of each service running on the node, let us call it service manager. The service manager often needs to talk to the services for different things, for example: it can send health check messages to see if the service is running fine, it can send an upgrade message to the service when there is a software update etc. To avoid mixing with the service messages, we keep a separate channel for sending control messages to the service.

Event channel:
Events happen all the time in systems, a node may arrive, a node may go down, a new user may login, a user may logout. All these are events in the system, services need to be aware of the events happening in system so that they can make better decisions. For example: a load balancer can start assigning connections to the new node, a high availability manager can initiate the switch over on detecting a node departure. It is good to maintain a unique channel for events separate from control and service channels.

Timers:
We live in time and so are the services, services need to keep track of different activities in the system. A health manager maintains timers for each service in the system, a connection timer is maintained to check activity on the connection, a response timer is started after sending the request. Timers fire when they run out of interval, callbacks are executed when timers fire.

Messaging:
Communication in cluster is hard and expensive, as the scale of the cluster increases so is the latency of the message. Care should be taken to limit communication with entities on other nodes in cluster. All messages should be ordered if you are following replicated state machine approach, as the events should be delivered to the active and standby services in the exact same order. There should be a node level messaging manager which is responsible for delivering the messages to other nodes in the cluster. If you are sending huge data between nodes in the cluster, then you are doing it wrong. Messages in the cluster should be short and for important things, a simple topic based publish subscribe mechanism will suffice for communicating important updates to other nodes. Products like hazelcast which include inbuilt publish-subscribe can be used.

Complex things like distributed locking, semaphores and reliable publish-subscribe mechanisms can be built on top of reliable ordered group messaging.

Service monitoring:
Services crash most of the time, there could be a variety of reasons. Badly implemented code, running out of resources like leaking memory or file descriptors etc. Services need to be monitored for their health, a node level process keeps track of health of every service running on the node. A periodic health check message is sent to each service, if the service is running fine it would return back a response to the service manager. If the service does not respond in a stipulated time, the service is terminated and restarted by the service manager.

Pairing services:
Not all applications are same, some are highly time critical, the impact of a mail server crash is different from a crash of trading application, some applications are highly critical to the business, where you cannot afford to be out of service even for a brief period of time. For example: a trading application crash might result in loss of millions of dollars even if its unavailable for a few seconds, a mission control application might result in loss of lives etc. Services need to be highly available to avoid business disruption.  Pairs of services running on different nodes, possibly different lan segments can be paired to protect each other. As soon as the service crashes for some reason, it's twin can start serving requests from there on.

Shared data:
In a distributed system, there is always data of global nature, global data as the name says it is global to the system. There can also be data which is local to the node but is shared between services running on the node. Data of this nature can be put in to a run time cache and can be shared between services, this will avoid unnecessary interprocess communication and messaging between services for frequently accessed data. The cache can be constructed in shared memory and a read only access can be provided to the services guarded by a lock. Boost C++ libraries provide boost interprocess communication module to implement data structures in shared memory. Updates to the common data can result in origination of messages to other nodes.

Links:

https://thrift.apache.org/

http://hazelcast.org/

http://martinfowler.com/articles/microservices.html

http://opensource.com/business/14/12/containers-microservices-and-orchestrating-whole-symphony

No Comments

Creating data structures in shared memory with boost interprocess

by rk on 05/12/2016

Boost interprocess:
In this edition of boost tutorial series, i am going to demonstrate the features of boost interprocess library. Boost interprocess is another powerful library from boost which needs to be present in system builders toolkit. Boost interprocess provides facilities for communication and synchronization between different unrelated processes in the system. It provides semaphores, message queues, shared memory. All these facilities can be found already in any unix/windows operating system. But boost abstracts out the idiosynchracies and semantic differences between operating systems. We get added benefit of writing code once and running it on different operating systems. One more powerful feature of boost inteprocess is it provides facilities to create datastructures in shared memory.

Shared memory backgrounder:
Shared memory is the fastest means of interprocess communication between 2 processes on unix, a piece of common memory region is mapped in to both the process address spaces, after the mapping is complete, the memory location can be written like any other regular memory location. But this is different from the regular memory, there is a chance that 2 processes might be simultaneously accessingthe shared memory, while a process is reading from the memory, another process might be writing to the same memory. To avoid inconsistent behavior, every piece of shared memory needs to be protected by a lock. 

On POSIX based systems there are a variety ways in which processes can share a lock.

Semaphores as a locking mechanism:
Semaphores are system wide data structures available to all the processes in the system, they reside in kernel. Any read/write access to them goes through kernel, semaphores persist even if the process which created them crashes. Semaphores can be counting or binary, for a simple lock a binary semaphore can be used.

Creating mutexes in shared memory:
POSIX standard defines standards to create mutexes in shared memory, it can be created with regular pthread_mutex calls, a flag PTHREAD_PROCESS_SHARED should be passed while initializing the mutex. The memory region being used for mutex creation should be from the shared memory region.

File locking using flock():
File locking can be used between processes, a file can be created with a well known name and can be locked using flock() system call. Lot of daemons do it to avoid multiple instances of them running, a file lock can be shared or exclusive, a file lock will be unlocked when the holder process dies.

While there are many options available, the efficiency of locking mechanism is very much dependent on the operating system. It is out of scope of this post to go in to a detailed discussion.

Creating data structures in shared memory:
Creating data structures in shared memory has advantages. Unlike threads, data structures in shared memory can be accessed and modified by unrelatedprocesses in the system. There is no marshalling/unmarshalling and no transfer of data between the processes, processes must cooperate with each other on use of shared memory, there should be a well defined protocol between participating processes.While sharing memory between processes has its advantages, it also comes with its share of disadvantages. All the problems with threading are inherited by the shared memory. I will discuss some problems and their solutions in the below sections.

Typical use cases of shared memory datastructures:
Message queues can be implemented in shared memory for sharing data between processes. This will be much faster and effecient as thedata need not go through kernel. You just need to have a buffer scheme like mbuff or skbuff, but the buffers are coming from shared memory. Write the data to the shared memory buffer and send the address of the buffer to the recieving process.

Databases can store the data read from the persistent store and can share the data structures read only to all the processes opened the database. This will be much faster than sharing the data by messaging.

Run time information to be shared between different processes in the system can be put in shared memory. For example: every server maintains session information for all the logged in users, quite often this information needs to be accessed by other services as well, to avoid a copy per daemon, this can be kept in the shared memory and a read only access can be provided to all daemons.

What are these names ?
You might have observed already, every data structure has a name. Shared memory is mapped in to different addresses in different processes, so the address of the variable is different in different processes depending on the mapping address. While we can map the shared memory to the same address in every process, it is not the right way to do things. To overcome this randomness the library to maintain the address of the variable from the mapping address indexed by its name.

Translation of addresses:
We all know linked data structures like trees, lists etc maintain pointers to their next/children as links. But the pointers for shared memory datastructures are a bit different. They cannot be directly accessed or referenced for the reason the mapping address is different and varies from process to process. They need to be translated to local mapping before the process can access them. The translation is very simple, we just need to take the offset from the mapping address and add it to local mapping address of the process. Boost defines a smart pointer called offset_ptr to do the same. I have provided boost link for your reference and more details onoffset_ptr implementation.

Problems with data structures in shared memory:
A rogue/badly implemented process can corrupt the shared memory and can lead to the crash of other sane stable processes in the system. Problems pertaining to random behavior apply to shared memory approach as well, a process can corrupt the memory and the processwhich is accessing the memory later might crash. This reduces the debuggability of the system in the event of a crash.

Process crashing holding the lock:
When a process crashes holding the lock to shared memory, it stalls the progress of other processes in the system. Robust mutexes are provided to prevent this problem. All the locks should be timed, and a mechanism need to be present to verify the consistency and integrity of data structures in a shared memory. This can be triggered after detecting a crash.

Robust mutexes:
Robust mutexes throw an error if the holder crashes holding the lock, some operating system kernels have support for this. Linux operating system has implemented robust mutexes. Boost interprocess has no support for robust mutexes at the time of writing this article.

Wait free and lock free algorithms:
Similar to multithreading, wait free and lock free algorithms can be implemented on top of shared memory as well.

Code:
In this example we will see how to use boost interprocess and other locking primitives to store session information of a hypothetical user. I will show how to store strings in shared memory and use read/write locks to access the information from the data structure. The session data store will return the name of the user given the id of the user.

#include <boost/interprocess/sync/sharable_lock.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_upgradable_mutex.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/containers/map.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/containers/string.hpp>
#include <functional>
#include <utility>

using namespace boost::interprocess;

static boost::interprocess::permissions perms(0660);
static managed_shared_memory ocache(open_or_create, "session_info", 1024*512, nullptr, perms);

//Typedefinitions for the character and string allocation in the shared memory.
typedef boost::interprocess::allocator<char, boost::interprocess::managed_shared_memory::segment_manager> shmCharAllocatorT;
typedef boost::interprocess::basic_string<char, std::char_traits<char>, shmCharAllocatorT> shmStringT;
typedef boost::interprocess::allocator<shmStringT, boost::interprocess::managed_shared_memory::segment_manager> shmStringAllocatorT;

typedef std::pair<const int, shmStringT> nameTupleT;
typedef boost::interprocess::allocator<nameTupleT, managed_shared_memory::segment_manager> nameTupleAllocatorT;
typedef map<int, shmStringT, std::less<int>, nameTupleAllocatorT> nameMapT;
static boost::interprocess::named_upgradable_mutex unameMapMutex(boost::interprocess::open_or_create, "uname_cache_lock");

static shmStringAllocatorT shmStringAllocator(ocache.get_segment_manager());
static nameTupleAllocatorT nameTupleAllocator(ocache.get_segment_manager());
static nameMapT *unameMap = nullptr; //xlate uid to uname

void
createUnameMap(void)
{
	assert(!unameMap);
	try 
	{
        unameMap = ocache.find_or_construct<nameMapT> ("unameMap")(std::less<int>(), nameTupleAllocator);
	}
	catch(boost::interprocess::interprocess_exception &ex)
	{
		std::cerr<<__FUNCTION__<<"() caught boost interprocess exception."<<ex.what(); 
		throw ex;
	}
	catch(std::exception &ex)
	{
		std::cerr<<__FUNCTION__<<"() caught standard exception."<<ex.what(); 
		throw ex;
	}
	return;
}

void 
addUnameAndUid(int uid, std::string _uname)
{
	try 
	{
        boost::interprocess::scoped_lock<boost::interprocess::named_upgradable_mutex> lock(unameMapMutex);
		if (!unameMap) createUnameMap();
		shmStringT uname(_uname.c_str(), shmStringAllocator);
		unameMap->insert(nameMapT::value_type(std::make_pair(uid, uname)));
	}
	catch(boost::interprocess::interprocess_exception &ex)
	{
		std::cerr<<__FUNCTION__<<"() caught boost interprocess exception."<<ex.what(); 
		throw ex;
	}
	catch(std::exception &ex)
	{
		std::cerr<<__FUNCTION__<<"() caught standard exception."<<ex.what(); 
		throw ex;
	}
	return;
}

std::string
getUnameForUid(int uid)
{
	try 
	{
        boost::interprocess::sharable_lock<boost::interprocess::named_upgradable_mutex> lock(unameMapMutex);
        if (!unameMap) createUnameMap();
        std::string uname;
        for(nameMapT::iterator itr = unameMap->begin(); 
                itr != unameMap->end(); 
                itr++)
            if ((*itr).first == uid){
                uname = (*itr).second.c_str(); 
                break; 
            }
		return uname;
	}
	catch(boost::interprocess::interprocess_exception &ex) 
	{
		std::cerr<<__FUNCTION__<<"() caught boost interprocess exception."<<ex.what(); 
		throw ex;
	}
	catch(std::exception &ex)
	{
		std::cerr<<__FUNCTION__<<"() caught standard exception."<<ex.what(); 
		throw ex;
	}
}

int main(int ac, char **av)
{
    int id = 1000;
    std::string name = "john";
    addUnameAndUid(id, name);
    std::cout<<"name of id:"<<id<<" is:"<<getUnameForUid(id);
    return 0;
}

Compile with:

clang++ -std=c++11  ocache.cc -I/usr/local/include -L/usr/local/lib -lboost_thread -lboost_system -lpthread -lrt -o ocache

Explanation of code:
String allocator:
Normal strings allocate memory from process local malloc allocator when extra memory is needed, but the strings we create need to bepresent in shared memory, so we create a char allocator and then a string allocator, we provide the string allocator to allocate memory from when extra memory for strings is needed.

Our name map has an integer as key and then a shared memory string data type as the value.

we create another allocator to create our tuples to construct the name map. Our data structure is in shared memory so we need a name to get its reference, so we name it as "unameMap".

boost::interprocess::named_upgradeable_mutex provides a mutex which can be used as a read/write lock. Using a read/write lock improves the parallelism of system by letting multiple readers navigate the datastructure parallely.

References:
http://www.boost.org/doc/libs/1_57_0/doc/html/interprocess/offset_ptr.html
http://www.boost.org/doc/libs/1_57_0/doc/html/boost/interprocess/named_upgradable_mutex.html

No Comments

boost intrusive containers

by rk on 05/12/2016

All of us have used linked lists one or the other time in our careers, if you are working on low level software like networking stacks or kernel, chances are that you almost use them on a daily basis. The most familiar definition would be that objects are chained using link fields in them. Link fields are just references/pointers to objects of similar type. Data structures take a different twist when we move from C to C++ programming language, C++ programming landscape is all ruled by stl(standard template library). Stl provides container data structures to store the objects, the problem with stl is that every object needs an extra copy and an extra allocation on the heap to store the object in the container. It is not the object but a copy is created and is stored in the container. 

Boost intrusive containers bring the C style chaining of objects to C++, boost intrusive containers provide generic algorithms for all familiar data structures like singly linked list, doubly linked list and redblack tree to chain the objects. 

In this post i will explain how to program with boost intrusive containers. i will do it with the definition of a sample class. Sample class objects will be chained in a rbtree structure. I will implement addition/deletion and fetching from the data structure.

Code:

#include <iostream>
#include <boost/intrusive/set.hpp>

class example : public boost::intrusive::set_base_hook<boost::intrusive::optimize_size<true>>
{
    int id;
    public:
    example(int _id): id(_id) { return; }
    ~example() { return; }
    friend bool operator < (const example &a, const example &b) { return a.id < b.id; }
    friend bool operator > (const example &a, const example &b) { return a.id > b.id; }
    friend bool operator == (const example &a, const example &b) { return a.id == b.id; }
    int getId(){ return id; }
};

typedef boost::intrusive::set<example, boost::intrusive::compare<std::greater<example>>> exampleTable;
exampleTable exTbl;

static void addEx(example &ex) { exTbl.push_back(ex); }
static void remEx(example &ex) { exTbl.erase(exampleTable::s_iterator_to(ex)); }

static example*
getEx(int id)
{
    for(exampleTable::iterator itr = exTbl.begin(); itr != exTbl.end(); itr++)
        if(itr->getId() == id) return &(*itr);
    return nullptr;
}

int main(int ac, char **av)
{
    example e1(10);
    addEx(e1);
    example *e2 = getEx(10);
    std::cout<<"id:"<<e2->getId();
    remEx(*e2);
    return 0;
}

References:

http://www.boost.org/doc/libs/1_57_0/doc/html/intrusive.html

No Comments

what are extended attributes and how are they useful to linux programmers ?

by rk on 05/12/2016

Introduction:
Usually all linux file system stores attributes of the file in a stat record. The stat record of the file is defined by the POSIX as below ( this is on linux and other systems might have more or less fields).

struct stat {
               dev_t     st_dev;     /* ID of device containing file */
               ino_t     st_ino;     /* inode number */
               mode_t    st_mode;    /* protection */
               nlink_t   st_nlink;   /* number of hard links */
               uid_t     st_uid;     /* user ID of owner */
               gid_t     st_gid;     /* group ID of owner */
               dev_t     st_rdev;    /* device ID (if special file) */
               off_t     st_size;    /* total size, in bytes */
               blksize_t st_blksize; /* blocksize for filesystem I/O */
               blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
               time_t    st_atime;   /* time of last access */
               time_t    st_mtime;   /* time of last modification */
               time_t    st_ctime;   /* time of last status change */
           };

The fields are all self explanatory and you can set and get them with their respective system calls ( man 2 stat ). For an ordinary user this information is more than enough, a typical user will be interested only in the last access fields and the size of the file. 

 

But for system builders, there often arises a need to store some extra "user defined" information along with the file in the file system. Suppose if the user is building a document management system and he wants to provide a field "state" for the document. The state field can be "open" , "locked" and "review" etc. As we see the above stat structure there is no field to store this piece of information. 

 

Now he has only one choice to store this information, he has to store this field along with the file name in a database, the database can be indexed with the file name and the record can be retrieved. But there is a problem with this solution, every time the file system is altered the database needs to be kept in sync with the file system. 

 

Enter extended attributes, extended attributes provide a facility to store a "blob" of information along with the file in the file system itself. i will now show the power of extended attributes with an example. 

In the below example i will store a user defined attribute of the file.

 

enabling the extended attributes on the system: 

By default some systems do not enable the extended attributes. You can do so by following command.

 

sudo apt-get install attrs. 

 

Apis and code: 

We will accept 2 arguments the first one being file name and the second argument is the state of the file name. 

 

save the code in a file and name it as xattr.c 

//file name attr.c
//----------------------------------------
#include <string.h>
#include <stdio.h>
#include <sys/types.h>
#include <attr/xattr.h>

int
main(int ac, char **av)
{
    if(ac < 3)
    {
        fprintf(stderr,"\nNeed 3 arguments to run: usage ./attr <file_name> <state='open','closed','locked'>");
        return -1;
    }
    int rc  = setxattr(av[1], "user.file.state", av[2], strlen(av[2]), 0);
    if(rc < 0)
    {
        perror("setxattr failed:"); 
        return -1; 
    }
    char buf[1024] = {'\0'};
    rc = getxattr(av[1], "user.file.state", buf, sizeof(buf));
    if(rc < 0)
    {
        perror("getxattr failed:"); 
        return -1; 
    }
    fprintf(stderr,"state: %s", buf);
    return 0;
}

build it with 

make xattr 

 

run it with 

./xattr file_name "locked"

 

note: 

The name of the extended attributes on linux should always start with "user." prefix, otherwise the file system will keep on throwing error. Always ensure your attribute names are named as "user.file.xxxx".

No Comments

implementing circular queues using boost circular buffers.

by rk on 05/12/2016

n this post i will try to demonstrate the use of boost circular buffers. Circular buffers are called circular queues in general, circular queues are used in many different application scenarios like logging, protocol stacks etc. A circular buffer is a bounded bufferinto which new items are added at the tail and old items are removed from the head. Due to the bounded nature of the circular bufferitems are dropped from the buffer when the queue is full.

Boost circular buffers are template library and can store any item, the general usage of circular queues is to store references or pointers to items in the queue than the items themselves. This way we avoid an extra copy of object and a dynamic allocation. Instead of creating a new datatype i will store the standard library string data type.

Why smart pointers ?
I have used smart pointers in the example, if we have to use a regular pointer we would have caused a memory leak when the queue gets full, the pointers are overwritten with out invoking delete or reset. We have to use smart pointer like unique_ptr or shared_ptr. i will demonstrate examples using both.

Code using shared ptr:

#include <boost/circular_buffer.hpp>
#include <iostream>
#include <string>

static boost::circular_buffer<std::shared_ptr<std::string>> cq(3);

int main(int ac, char **av)
{
    cq.push_back(std::make_shared<std::string>("hello"));
    cq.pop_front();
    return 0;
}

Code using unique_ptr:

#include <boost/circular_buffer.hpp>
#include <iostream>
#include <string>

static boost::circular_buffer<std::unique_ptr<std::string>> cq(3);

int main(int ac, char **av)
{
    std::unique_ptr<std::string> p1(new std::string("hello"));
    cq.push_back(std::move(p1));
    cq.pop_front();
    return 0;
}

As you can see i have to use std::move to store the unique pointers in to the queue, unique_ptr as the name says it cannot be copied, there should always be a single copy of the pointer. so we use std::move to move the pointer with out copying. After the move operation the unique pointer should never be used.

Performance differences:
There is a slight performance improvement between shared_ptr and unique_ptr versions being unique_ptr more performant. This is due to the atomic counter inside the shared_ptr which is used for counting references. When the reference count becomes 0 the memory is deleted and the destructor is invoked.

References:

http://www.boost.org/doc/libs/1_57_0/doc/html/circular_buffer.html

No Comments

recursively walking directory with boost file system library

by rk on 05/12/2016

This post demonstrates how easy it is easy to list all the files in the directory and all its children visiting recursively. Boost recursive directory iterator recursively walks down all the child directories and visits each file. Using boost file system library, developers can develop portable software which runs on different operating systems and also handles the file system differences. We know microsoft handles file names differently than all unix operating systems. Also developer need not worry about handling unicode file names on linux, boost file system library handles it for you.

To get a more detailed overview of boost filesystem libraries and all its facilities please do visit www.boost.org

#include <iostream>
#include <boost/filesystem.hpp>

using namespace std;

int 
main(int ac, char **av)
{
    boost::filesystem::path _startAt(".");
    boost::filesystem::recursive_directory_iterator _walker(_startAt);
    while(_walker != boost::filesystem::recursive_directory_iterator()){
        std::cout<<"\n"<<_walker->path().string();
        ++_walker;
    }
    return 0;
}

Compile with: 

clang++ dir.cc -I/usr/local/include -L/usr/local/lib -lboost_filesystem -lboost_system -o dir

As always i have omitted the error and exception handling for brevity reasons. Please donot use the code as is in production.

No Comments

How not to use bind ?

by rk on 05/12/2016

Arguably bind is the most powerful C++ feature you will ever encounter, people who have programmed for long time in C programming language would know, how much pain is it to declare callbacks and pass them around. The problem with the C style callbacks is that the number and type of arguments for the callback is fixed and once declared, you cannot change them. Programmers get around this by using a void* pointer. Every time the programmer has to create  a temporary structure and pack them in a structure variable and then pass the reference of the variable to the callback.

Bind just allows you to pass the arguments as is, it creates a temporary function object of a generic kind and then passes the reference of the function object to the callback. Instead of the programmer doing the packing of arguments, the library is doing it for you. What we get at the end is a clean and maintainable and understandable code. 

What is wrong with bind ? 
There is nothing wrong with bind in general, but programmers can use it in a wrong way leading to unpredictable bugs and errors. Using bind in a wrong way can lead to memory corruptions and race conditions incase you are using it with threads. I will demonstrate how to use bind with a C++11 threading example. I will discuss what are the potential problems with the code and will show how to fixthe code.

Wrong way:

#include<iostream>
#include<functional>
#include<thread>
#include<unistd.h>

void threadFunc(std::string *sptr)
{
    std::cout<<*sptr;
    return;
}

int main(int ac, char **av)
{
    try
    {
        std::string *sptr = new std::string("hello world");
        std::thread T(std::bind(&threadFunc, sptr));
        T.join();
    }
    catch(std::exception &ex)
    {
        std::cerr<<"failed with exception:"<<ex.what();
    }
    return 0;
}

Explanation:
Not many people are aware of the implementation of bind, bind actually creates a generic function object and stores the arguments of the callback. Since we are using threads in our code, depending on the interleaving of the threads by the scheduler, it might happen that the memory pointed by the pointer could get deleted or pointer could reassigned to another location.
If the memory pointed by the pointer gets freed, it might cause corruption in the heap.
If the pointer gets reassigned we are left with a memory leak.
Above cases arise because we are storing a copy of pointer in the bind object and gets passed to the callback at a later point of time. Sharing raw pointers between threads is always a terrible idea.

Right way:

#include<iostream>
#include<functional>
#include<thread>
#include<memory>
#include<unistd.h>

void threadFunc(std::shared_ptr<std::string> sptr)
{
    std::cout<<*sptr;
    return;
}

int main(int ac, char **av)
{
    try
    {
        std::shared_ptr<std::string> sptr = std::make_shared<std::string>(std::string("hello world"));
        std::thread T(std::bind(&threadFunc, sptr));
        T.join();
    }
    catch(std::exception &ex)
    {
        std::cerr<<"failed with exception:"<<ex.what();
    }
    return 0;
}

Explanation:
We fix the above code by replacing raw pointer with a shared pointer, shared pointers can be shared between threads, shared pointers maintain a reference count on the number of copies, the memory to the pointer will be freed when the reference count becomes zero.

We could have also used std::unique_ptr instead of shared_ptr, the only difference would be, we need to move the object being passed to the thread routine. I will explain this in a later post.

Raw pointers in general are a bad idea and should be replaced with smart pointers, C++11 and boost provide a variety of smart pointers. Raw pointers should be used only when the scope of the pointer is very well defined and is limited in nature.

No Comments

setting up home directories on nfs server for ldap users

by rk on 05/12/2016

In my previous blog i discussed how to create ldap users on ubuntu. Creating just a user in ldap is not enough we need to create home directory for him on nfs server, other wise the user will have home on the client desktop and all of his data and configs will be stored on that computer, if he uses another client desktop then he wont see the old files and configs on the new computer which he is using. This is not the right way to setup the ldap users, the home directory should be available from all the client desktops and there should be only one home directory created per user and that too on the nfs server.
so lets see how a home directory is created on the nfs server for each user

 

NOTE:
Currently nfs4 does not work as documented on ubuntu 11.10, i ran in to very very deep shit trying and wasted many days. The idmapd daemon will just not work and there is very little documentation available for idmapd. my advise is just to stay off from the nfs4.

lets say all of our ldap users are in a group called “eng” with gid : 9999

on the server do the following
become root with “su” command.
#su
Create a directory called “/homes” on the nfs server with the command
#mkdir /homes

lets say we have a user “dick” with uid 1010
create a home directory for him in the “/homes/” folder and then using “chown” command make “dick” owner of the directory and its contents.

#mkdir /homes/dick
#chown -R 1010:9999 /homes/dick

mark the folder “/homes” as an exported one by putting it in the /etc/exports file like below

[root@helen /]# cat /etc/exports
/homes 192.168.0.2/16(fsid=0,rw,insecure,no_subtree_check,sync)

fsid=0 is required because the folder being exported is in “/” directory.

run the “exportfs” command with “-r” option
#exportfs -r

now on the client side (ubuntu 11.10)

set the nfs folder on the nfs-server to be automounted on to client during bootup, this can be done by putting details in the  /etc/fstab file.  put a line like below in /etc/fstab file.
192.168.0.2:/homes    /homes  nfs    nfsvers=3,sync 0 0

before putting the line in the /etc/fstab file just try to mount the nfs directory by hand with command
mount  192.168.0.2:/homes -t nfs -o nfsvers=3  /mnt/nfs
/mnt/nfs folder should be created on the client machine if not present already.
try to see if there are any errors.

now issue a reboot of the server and  client, after the user is logged in check the present dir of the user
with “pwd” command. try to create some files in the folder and write to them.

No Comments