Python integer objects implementation

May 15, 2011

This article describes how integer objects are managed by Python internally.

An integer object in Python is represented internally by the structure PyIntObject. Its value is an attribute of type long.

typedef struct {
    PyObject_HEAD
    long ob_ival;
} PyIntObject;

To avoid allocating a new integer object each time a new integer object is needed, Python allocates a block of free unused integer objects in advance.

The following structure is used by Python to allocate integer objects, also called PyIntObjects. Once this structure is initialized, the integer objects are ready to be used when new integer values are assigned to objects in a Python script. This structure is called “PyIntBlock” and is defined as:

struct _intblock {
    struct _intblock *next;
    PyIntObject objects[N_INTOBJECTS];
};
typedef struct _intblock PyIntBlock;

When a block of integer objects is allocated by Python, the objects have no value assigned to them yet. We call them free integer objects ready to be used. A value will be assigned to the next free object when a new integer value is used in your program. No memory allocation will be required when a free integer object’s value is set so it will be fast.

The integer objects inside the block are linked together back to front using their internal pointer called ob_type. As noted in the source code, this is an abuse of this internal pointer so do not pay too much attention to the name.

Each block of integers contains the number of integer objects which can fit in a block of 1K bytes, about 40 PyIntObject objects on my 64-bit machine. When all the integer objects inside a block are used, a new block is allocated with a new list of integer objects available.

A singly-linked list is used to keep track of the integers blocks allocated. It is called “block_list” internally.

Python integer object internals

A specific structure is used to refer small integers and share them so access is fast. It is an array of 262 pointers to integer objects. Those integer objects are allocated during initialization in a block of integer objects we saw above. The small integers range is from -5 to 256. Many Python programs spend a lot of time using integers in that range so this is a smart decision.

#define NSMALLPOSINTS           257
#define NSMALLNEGINTS           5
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

Python integer object internals

The integer object representing the integer -5 is at the offset 0 inside the small integers array. The integers object representing -4 is at offset 1 …

What happens when an integer is defined in a Python script like this one?

>>> a=1
>>> a
1

When you execute the first line, the function PyInt_FromLong is called and its logic is the following:

if integer value in range -5,256:
    return the integer object pointed by the small integers array at the 
    offset (value + 5).
else:
    if no free integer object available:
        allocate new block of integer objects 
    set value of the next free integer object in the current block 
    of integers.
    return integer object

With our example: integer 1 object is pointed by the small integers array at offset: 1+5 = 6. A pointer to this integer object will be returned and the variable “a” will be pointing to that integer object.

Python integer object internals

Let’s a look at a different example:

>>> a=300
>>> a
300

300 is not in the range of the small integers array so the next free integer object’s value is set to 300.

Python integer object internals

If you take a look at the file intobject.c in the Python 2.6 source code, you will see a long list of functions taking care of operations like addition, multiplication, conversion… The comparison function looks like this:

static int
int_compare(PyIntObject *v, PyIntObject *w)
{
    register long i = v->ob_ival;
    register long j = w->ob_ival;
    return (i < j) ? -1 : (i > j) ? 1 : 0;
}

The value of an integer object is stored in its ob_ival attribute which is of type long. Each value is placed in a register to optimize access and the comparison is done between those 2 registers. -1 is returned if the integer object pointed by v is less than the one pointed by w. 1 is returned for the opposite and 0 is returned if they are equal.

That’s it for now. I hope you enjoyed the article. Please write a comment if you have any feedback.

tags:
posted in Uncategorized by Laurent Luce

Follow comments via the RSS Feed | Leave a comment | Trackback URL

8 Comments to "Python integer objects implementation"

  1. Andrea Bisognin wrote:

    interesting post!
    i find very fascinating to look at how my favourite language manage data structures.

  2. Alexi Zaviruha wrote:

    Great! Thanks!

  3. Michal Young wrote:

    Thanks for this and your other posts on Python data structure implementation. I’m teaching an intro CS course this term, using Python, and decided to spend a lecture on “opening up the machine” for my students. These articles are invaluable for my lecture prep.

  4. Kirill Elagin wrote:

    “The small integers range is from -5 to 257”. Not 257 but 256. Zero is counted as a positive integer.

  5. Laurent Luce wrote:

    @Kirill: Thanks! I corrected it.

  6. Chaobin wrote:

    Hi, nice article.

    Also, I was linked to this article with a search trying to find out how Python makes up the ability to represent huge integers.

  7. Ricky Stewart wrote:

    So looking at the PyInt_fromLong logic, I don’t see any way for the virtual machine to reuse integer objects. For example, if I refer to the literal number “300″ multiple times in my Python script, will the virtual machine re-create a “300″ integer object every time, or will it reuse the first 300 object I created? 300 is not within the span of the “small integer” array.

  8. Laurent Luce wrote:

    @Ricky: In the case of 300, it will use a free integer object from the linked-list each time.

Leave Your Comment

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org