IJRIT International Journal of Research in Information Technology, Volume 1, Issue 4, April 2013, Pg. 7-16

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com

ISSN 2001-5569

Adroit Memory Allocator 1

Rudresh Bagade,2 Sanjog Laddha, 3 Akshay Ukey, 4 Nilesh Diwate, 5 Prof. S.A.Tiwakar UoP, India. 1

[email protected] , 2 [email protected] , 3 [email protected] , 4

[email protected] , 5 [email protected]

Abstract Adroit is a drop-in replacement for malloc () and other standard memory allocation functions, intended for use in persistent multithreaded applications that perform concurrent memory allocations. It was created after we discovered very poor multithreaded performance of the stock threaded heap managers on Solaris, Linux and other popular UNIX and Unix-like operating systems. It is extremely stable and high-performance, though the memory and startup overhead make it less suitable for small, short-lived applications. On some platforms Adroit's single-threaded performance even outstrips non-threaded stock malloc implementations.

1. Introduction The basic purpose of a memory allocator is to allocate the amount of memory requested by any process, let that process take control over that memory area and perform its own computations on it and finally release that piece of allocated memory which is called as deallocation. Memory allocation performance in single and multithreaded environments is an important aspect of any application. Some allocators, such as malloc() in POSIX compatible operating systems work best with single-threaded applications. The goal of Adroit is to present a new memory allocator that builds on the state of the art to provide scalable concurrent allocation for multithreaded applications. Adroit has been engaged to provide scalable support for multi-processor computer systems. The C library’s malloc memory allocator is now a potential bottleneck for multi-threaded applications running on multiprocessor systems. Existing serial memory allocators do not scale well for multithreaded applications, and existing concurrent allocators do not provide one or more of the following features, all of which are needed in order to attain scalable and memory-efficient allocator performance: 1.1 Speed. A memory allocator should perform memory operations (i.e., malloc and free) about as fast as a state-ofthe-art serial memory allocator. This feature guarantees good allocator performance even when a multithreaded program executes on a single processor. 1.2 Scalability. As the number of processors in the system grows, the performance of the allocator must scale linearly with the number of processors to ensure scalable application performance.

7

1.3 False sharing avoidance. The allocator should not introduce false sharing of cache lines in which threads on distinct processors inadvertently share data on the same cache line. 1.4 Low fragmentation. We define fragmentation as the maximum amount of memory allocated from the operating system divided by the maximum amount of memory required by the application. Excessive fragmentation can degrade performance by causing poor data locality, leading to paging.

2. Proposed work Adroit is a drop-in replacement for malloc() and other standard memory allocation functions, intended for use in persistent multithreaded applications that perform concurrent memory allocations. The algorithmic strategy involved and deployed during the project “Design phase” is one of the most important factors in the project development cycle. Because an algorithm is a precise list of precise steps, the order of computation will always be critical to the functioning of the algorithm: 2.1. Divide and Conquer 2.2. Best Fit 2.1 DIVIDE AND CONQUER A divide and conquer algorithm by recursively breaking down a problem into two or more sub-problem of the same (or related) type, until these become simple enough to be solved directly. The solutions to the subproblems are then combined to give a solution to the original problem. Divide and conquer algorithms are naturally adapted for execution in multi-processor machines, especially shared-memory systems where the communication of data between processors does not need to be planned in advance, because distinct sub-problems can be executed on different processors. 2.2 BEST FIT Allocate the smallest block among those that are large enough for the new process. In this method, the OS has to search the entire list, or it can keep it sorted and stop when it hits an entry which has a size larger than the size of new process. This algorithm produces the smallest left over block. However, it requires more time for searching the entire list or sorting it. If sorting is used, merging the area released when a process terminates to neighboring free blocks becomes complicated.

Memory

CPU

CPU

CPU

CPU

Fig 1. Standard approaches by simple allocator

8

Fig 2. Our approach by adroit allocator

3. Literature Survey In this phase, we were required to get thoroughly familiar with memory management and its various techniques. It was necessary to understand the way a memory allocator works, how it is implemented & which functions were required to be implemented to make it work. There are many custom shared multithreaded memory allocators like Hoard, jemalloc, ptmalloc, phkmalloc, dlmalloc etc. However, it was designed at a time when multi-processor systems were rare, and support for multi-threading was spotty. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. After studying various allocators; we came to the conclusion that there was a need for a much efficient memory allocator which led us to using thread library, involvement of a global heap alongside the global pool and modifying the approach to fine grain locking of the allocated size of memory which was request by a thread. Using a single threaded malloc() in a multithreaded application can degrade performance. As memory is being allocated concurrently in multiple threads, all the threads must wait in a queue while malloc() handles one request at a time. With a few extra threads, this can slow down performance, causing a problem known as heap contention. System library implements various approaches to ease the bottleneck of a singly threaded malloc ().

Paper Referred An Optimal Memory Allocation for Application-Specific Multiprocessor System-on-Chip (2001)

Description This paper focuses on a memory allocation step which is based on an integer linear programming model. The effectiveness of this approach is illustrated by a packet routing switch example. Keywords: - Multiprocessor SoC, shared memory, abstraction levels, memory allocation, integer linear programming, code transformation.

9

This paper deal with SOC. Since some variables in the model are boolean, the resolution step can be slow depending on the number of such variables.

Developing Multithreaded Applications: A Platform Consistent Approach(2009)

The objective of this paper is to provide guidelines for developing efficient multithreaded applications across Intel-based symmetric multiprocessors (SMP) and/or systems with HyperThreading Technology.

Hoard: A Scalable Memory Allocator for Multithreaded Applications (2000)

Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Although the hashing method that we use has so far proven to be an effective mechanism for assigning threads to heaps, we plan to develop an efficient method that can adapt to the situation when two concurrentlyexecuting threads map to the same heap.

Table 1

4. Functional Requirements: Following are the functional components of Adroit Header files: 4.1 Supporting C files 4.2 Make file The header files contain function definitions in it. When the multithreaded application program will make use of appropriate header file by including it, the functions defined in it are invoked. These functions call their definitions from the corresponding C files. This interleaved structure of .H and .C files will carry out the memory allocation and deallocation part with extreme efficiency. While doing so the system will make use of mutexes, locks, multiple size heap classes and other phenomena supported by pthread.h library cleverly in order to gain maximum efficiency. GNU Make is a utility that automatically builds executable programs and libraries from

10

source code by reading files called Makefile which specifies how to derive the target program. Make can decide where to start through topological sorting. Though Integrated Development Environments and language-specific compiler features can also be used to manage the build process in modern systems, make remains widely used, especially in Unix-based platforms. Shell script will be used to automate the jobs and convenient execution of the same.

5. Software Requirements: 5.1 Operating System: Any POSIX compliant system 5.2 GNU TOOLCHAIN: 5.2.1 GNU make: Automation tool for compilation and build; 5.2.2 GNU Compiler Collection (GCC): Suite of compilers for several programming languages; 5.2.3 GNU Binutils: Suite of tools including linker, assembler and other tools; 5.2.4 GNU Debugger (GDB): Code debugging tool; 5.3 Any strong shell (preferably BASH) 5.4 Cscope: Cscope is a developer's tool for browsing source code. It has an impeccable UNIX pedigree, having been originally developed at Bell Labs back in the days of the PDP-11. Cscope is a console mode or text-based graphical interface that allows software engineers or developers to search source code.

6. Tracking and control mechanism 6.1 Quality Assurance and Control: A provision is provided by the GNU architecture which detects all possible bugs and automatically reports it to the concerned developers who are present to fix these bugs. The reporting procedure is created during the processing and installing the make file. 6.2 Change Management and Control In the event of any possible change of management and/or control of the management, this report will be sufficient enough to understand the working of the entire project for all the new members or members who changed designation. Here every member’s participation in every phase is important; all the members are requested to go through the report.

7. How to use Using Adroit is simple. As a drop-in malloc() replacement, you simply compile it into your application, and no code changes are necessary. Adroit implements standard malloc API calls as a thin wrapper around private functions. I.e. malloc() in the Adroit library just calls ad_malloc(). Applications using Adroit will also need to link in the POSIX pthreads library. On most Unixes this simply involves linking with "-lpthread", but the exact procedure may vary from system to system.

11

Fig 3

Fig 4

12

F.g 5

F.g 6

13

Fig 7

Fig 8

14

8. Future scope In a future version, we have contemplated providing functionality to allow the process to have a thread dedicated to migrating idle memory to the global pool more aggressively. So far it doesn't seem like it would be of great use for most situations. With the help of Adroit allocator we can handle all the problems faced while using single threaded allocator such as malloc().we can divide memory in different size heaps so that suitable process get allocated by using adroit allocator so we can solve bottleneck and adroit can handle multithreading by calculating appropriate process of execution by considering size of heap, priority and time to execute process.

9. Conclusion In this paper, we have introduced the Adroit memory allocator. Adroit improves on previous memory allocators by simultaneously providing four features that are important for scalable application performance: speed, scalability, false sharing avoidance, and low fragmentation. In addition, we show that Adroit’s performance and fragmentation are robust with respect to its primary parameter, the empty fraction. Since scalable application performance clearly requires scalable architecture and runtime system support, Adroit thus takes key step in this direction.

10. References

[1]

M. Masmano, I. Ripoll, and A. Crespo Universidad Polit´ecnica de Valencia, Spain - “Dynamic storage allocation for real-time embedded systems” IEEE

[2]

SamyMeftali, FeridGharsalli, Frederic Rousseau, Ahmed A Jerraya, TIMA laboratory, 46 av. Felix Viallet 38031 Grenoble cedex (France) - “An Optimal Memory Allocation for Application-Specific Multiprocessor System- on-Chip” ACM.

[3] [4]

“Developing Multithreaded Applications: A Platform Consistent Approach” Intel Emery D. Berger, Kathryn S. McKinleyy, Robert D. Blumofe, Paul R. Wilson “Hoard: A Scalable Memory Allocator for Multithreaded Applications” IEEE.

[5]

YairSadel – “Optimizing C Multithreaded Memory Management Using Thread-Local Storage” School of Computer Science, Tel-Aviv University, Israel

[6]

Jason Evans - “A Scalable Concurrent malloc(3) Implementation for FreeBSD” The FreeBSD Project

[7]

B.W. Kernighan and D.M.Ritchie - The C programming language – IInd Edition – Prentice Hall Software Series.

15

[8]

Peter Van Der Linden - Expert C Programming _ Deep C Secrets - PRENTICE HALL Englewood Cliffs, NJ 07632

[9]

Eric Foster-Johnson, John C. Welch, and Micah Anderson - Beginning Shell Scripting - Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com

[10]

Richard Stones, Neil Matthew – Beginning Linux Programming – 4th Edition, Wiley Publishing, Inc.10475 Crosspoint Boulevard Indianapolis, IN 46256www.wiley.com

[11] Robert Love - Linux System Programming - O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. [12]

Daniel P. Bovet, Marco Cesati - Understanding the Linux Kernel – IIIrd edition – O’Reilly Media, Inc.

[13]

Andrew Tanenbaum - Modern Operating Systems – Pearson Education, Inc. and Dorling Kindersley publishing, Inc.

[14] Maurice J. Bach – The Design of The Unix Operating System - Pearson Education, Inc . and Dorling Kindersley Publishing, Inc.

16

Adroit Memory Allocator

distinct processors inadvertently share data on the same cache line. 1.4 Low .... Adroit implements standard malloc API calls as a thin wrapper around private.

2MB Sizes 1 Downloads 98 Views

Recommend Documents

Adroit Memory Allocator
Scalable Memory Allocator for Multithreaded Applications” IEEE. [5] YairSadel – “Optimizing C Multithreaded Memory Management Using Thread-Local. Storage” School of Computer Science, Tel-Aviv University, Israel. [6] Jason Evans - “A Scalabl

Adroit Memory Allocator Abstract
UoP, India. 1 [email protected] , 2 [email protected] , 3 [email protected] ,. 4 [email protected] , 5 [email protected]. Abstract. Adroit is a drop-in replacement for ... performance, though the memory and startup overhead

Cheap Adroit Powered USB 3.0 4-Port SuperSpeed Compact Hub ...
Cheap Adroit Powered USB 3.0 4-Port SuperSpeed Com ... Adapter For PC Laptop Mac JAN28 drop shipping.pdf. Cheap Adroit Powered USB 3.0 4-Port ...

Graph Coloring Register Allocator for Jikes RVM
cmps, can use only esi/edi .... Real program interference graphs can be of several types, e.g.: perfect graph, chordal ... software pipelining (SWP), if-conversion.

Practical Memory Checking with Dr. Memory - BurningCutlery
call, which is not easy to obtain for proprietary systems like Windows. ..... Dr. Memory, as there is no way for the application to free this memory: it has lost ..... used by a program,” in Proc. of the 3rd International Conference on. Virtual Exe

Practical Memory Checking with Dr. Memory - BurningCutlery
gramming bugs. These errors include use of memory after free- .... redirected through a software code cache by the DynamoRIO dynamic binary translator.

Executive processes, memory accuracy, and memory ...
tap into a basic component of executive function. How .... mental Laboratory software (Schneider, 1990). ..... items loaded on the first factor, accounting for 42% of.

collective memory and memory politics in the central ...
2. The initiation of trouble or aggression by an alien force, or agent, which leads to: 3. A time of crisis and great suffering, which is: 4. Overcome by triumph over the alien force, by the Russian people acting heroically and alone. My study11 has

Memory Mapped Files And Shared Memory For C++ -
Jul 21, 2017 - Files and memory can be treated using the same functions. • Automatic file data ... In some operating systems, like Windows, shared memory is.

On Memory
the political domain: "To have once been a victim gives you the right to complain, to .... tions elicited here, it is to call for a renewal of the phenomenological.

Semantic memory
formal computational models, neural organization, and future directions. 1.1. ... Tulving's classic view of semantic memory as an amodal symbolic store has been ...

Memory Studies.pdf
However, it has fostered solidarity and commitment from indigenous and. non-indigenous people alike. The fight over rights to the Santa Rosa lot dates back to the end of the. 19th century, when the military campaigns of both Argentinean and Chilean n

Memory for pitch versus memory for loudness
incorporate a roving procedure in our 2I-2AFC framework: From trial to trial, the ... fair comparison between pitch and loudness trace decays, it is desirable to ...

Short-term memory and working memory in ...
This is demonstrated by the fact that performance on measures of working memory is an excellent predictor of educational attainment (Bayliss, Jarrold,. Gunn ...

Memory for pitch versus memory for loudness
these data suggested there is a memory store specialized in the retention of pitch and .... corresponding button was turned on for 300 ms; no LED was turned on if the ... to S2 in dB or in cents was large enough to make the task easy. Following ...