<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Python dictionary implementation</title>
	<atom:link href="http://www.laurentluce.com/posts/python-dictionary-implementation/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.laurentluce.com/posts/python-dictionary-implementation/</link>
	<description>Technical blog on web technologies</description>
	<lastBuildDate>Tue, 16 Apr 2013 20:05:20 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: David</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-54935</link>
		<dc:creator>David</dc:creator>
		<pubDate>Sun, 27 Jan 2013 11:46:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-54935</guid>
		<description><![CDATA[@Andreas Paul and @Laurent Luce, thank you you both. I know where my problem is.]]></description>
		<content:encoded><![CDATA[<p>@Andreas Paul and @Laurent Luce, thank you you both. I know where my problem is.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shriram</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-52231</link>
		<dc:creator>Shriram</dc:creator>
		<pubDate>Thu, 17 Jan 2013 12:10:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-52231</guid>
		<description><![CDATA[Thank you so much Laurent for writing this post.
I&#039;ve some knowledge of Hash Table in Java and learnt about techniques and its implementation via wiki and other sources.
Then I started using dict in Python. 
It was good until I had my own question on how it&#039;s implemented.
I came across this post. didn&#039;t read this entirely.
I jumped straight to the comments which&#039;re excellent questions. 
Thank you for patiently answering them !
It was pleasure, enjoyable, and very entertaining to read :)]]></description>
		<content:encoded><![CDATA[<p>Thank you so much Laurent for writing this post.<br />
I&#8217;ve some knowledge of Hash Table in Java and learnt about techniques and its implementation via wiki and other sources.<br />
Then I started using dict in Python.<br />
It was good until I had my own question on how it&#8217;s implemented.<br />
I came across this post. didn&#8217;t read this entirely.<br />
I jumped straight to the comments which&#8217;re excellent questions.<br />
Thank you for patiently answering them !<br />
It was pleasure, enjoyable, and very entertaining to read <img src='http://www.laurentluce.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andreas Paul</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-31699</link>
		<dc:creator>Andreas Paul</dc:creator>
		<pubDate>Sat, 15 Sep 2012 13:37:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-31699</guid>
		<description><![CDATA[Okay, I can answer myself.
When I run my script on a 64bit machine it works just like it&#039;s supposed to. :)

The problem was indeed that I was running it on a 32 bit machine, where the hash function return a negative number for the char &#039;z&#039;, that&#039;s the reason, why perturb doesn&#039;t reach 0 and every table slot is probed.

And the hash function returns a different number, because the char pointer size is 4 bytes on 32bit and 8 bytes on 64bit.]]></description>
		<content:encoded><![CDATA[<p>Okay, I can answer myself.<br />
When I run my script on a 64bit machine it works just like it&#8217;s supposed to. <img src='http://www.laurentluce.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>The problem was indeed that I was running it on a 32 bit machine, where the hash function return a negative number for the char &#8216;z&#8217;, that&#8217;s the reason, why perturb doesn&#8217;t reach 0 and every table slot is probed.</p>
<p>And the hash function returns a different number, because the char pointer size is 4 bytes on 32bit and 8 bytes on 64bit.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andreas Paul</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-31693</link>
		<dc:creator>Andreas Paul</dc:creator>
		<pubDate>Sat, 15 Sep 2012 11:34:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-31693</guid>
		<description><![CDATA[Hi Laurent,

thanks for this great article, it really helps understanding Python&#039;s dict implementation.

I tried to implement the probing algorithm in python:

https://gist.github.com/3724747

As you can see I&#039;m trying to insert several &#039;z&#039; chars into the table, which results in multiple collisions.
What I don&#039;t understand is, after the 5th &#039;z&#039; gets inserted, why doesn&#039;t the probing algorithm ever terminate and never reach one of the empty table slots 1, 2 or 4?
When perturb reaches -1 (I&#039;m on a 32bit OS btw) the probing slot just toggles between slot 3 and 7.

Can this somehow be the correct behaviour, because more than 5 consecutive collisions is very unlikely?
Do you see any mistakes?

BTW @David
Your calculated sequence is the sequence for a table with 32 slots, which makes the mask 31 and the first slot lookup hash(&#039;z&#039;) &amp; 31 = 27
Whereas Laurent used the 3 for the initial slot, which came from hash(&#039;z&#039;) &amp; 7 example, but then used 31 for the mask from the 32 slot table example.]]></description>
		<content:encoded><![CDATA[<p>Hi Laurent,</p>
<p>thanks for this great article, it really helps understanding Python&#8217;s dict implementation.</p>
<p>I tried to implement the probing algorithm in python:</p>
<p><a href="https://gist.github.com/3724747" rel="nofollow">https://gist.github.com/3724747</a></p>
<p>As you can see I&#8217;m trying to insert several &#8216;z&#8217; chars into the table, which results in multiple collisions.<br />
What I don&#8217;t understand is, after the 5th &#8216;z&#8217; gets inserted, why doesn&#8217;t the probing algorithm ever terminate and never reach one of the empty table slots 1, 2 or 4?<br />
When perturb reaches -1 (I&#8217;m on a 32bit OS btw) the probing slot just toggles between slot 3 and 7.</p>
<p>Can this somehow be the correct behaviour, because more than 5 consecutive collisions is very unlikely?<br />
Do you see any mistakes?</p>
<p>BTW @David<br />
Your calculated sequence is the sequence for a table with 32 slots, which makes the mask 31 and the first slot lookup hash(&#8216;z&#8217;) &amp; 31 = 27<br />
Whereas Laurent used the 3 for the initial slot, which came from hash(&#8216;z&#8217;) &amp; 7 example, but then used 31 for the mask from the 32 slot table example.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Want it faster? Hash Table is the answer! &#171; My Developed World</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-30445</link>
		<dc:creator>Want it faster? Hash Table is the answer! &#171; My Developed World</dc:creator>
		<pubDate>Thu, 30 Aug 2012 06:43:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-30445</guid>
		<description><![CDATA[[...] see how Python implements Hash Table, have a look at this link  Share this:Like this:LikeBe the first to like [...]]]></description>
		<content:encoded><![CDATA[<p>[...] see how Python implements Hash Table, have a look at this link  Share this:Like this:LikeBe the first to like [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Laurent Luce</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-21994</link>
		<dc:creator>Laurent Luce</dc:creator>
		<pubDate>Sat, 02 Jun 2012 02:26:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-21994</guid>
		<description><![CDATA[@David: I simulated the probing loop with the following Python code and I am getting the same result as previously noted.

i = 3
mask = 31
perturb = hash(&#039;z&#039;)
while True:
i = (i &lt;&lt; 2) + i + perturb + 1
slot = i &amp; mask
print slot
perturb &gt;&gt;= 5]]></description>
		<content:encoded><![CDATA[<p>@David: I simulated the probing loop with the following Python code and I am getting the same result as previously noted.</p>
<p>i = 3<br />
mask = 31<br />
perturb = hash(&#8216;z&#8217;)<br />
while True:<br />
i = (i < < 2) + i + perturb + 1<br />
slot = i &#038; mask<br />
print slot<br />
perturb >>= 5</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-17338</link>
		<dc:creator>David</dc:creator>
		<pubDate>Sat, 21 Apr 2012 03:16:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-17338</guid>
		<description><![CDATA[Hi,

&quot;Just out of curiosity, let’s look at the probing sequence when the table size is 32 e.g. mask = 31
3 -&gt; 11 -&gt; 19 -&gt; 29 -&gt; 5 -&gt; 6 -&gt; 16 -&gt; 31 -&gt; 28 -&gt; 13 -&gt; 2…&quot;

By my calculation, the sequence is:
3 11 21 29 30 8 11 24 25 30 23 20 5 26 3 16 17 22 15 12 29 18 27 8 9 14 7 4 ...

Can you please confirm your results?
Thanks!]]></description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>&#8220;Just out of curiosity, let’s look at the probing sequence when the table size is 32 e.g. mask = 31<br />
3 -&gt; 11 -&gt; 19 -&gt; 29 -&gt; 5 -&gt; 6 -&gt; 16 -&gt; 31 -&gt; 28 -&gt; 13 -&gt; 2…&#8221;</p>
<p>By my calculation, the sequence is:<br />
3 11 21 29 30 8 11 24 25 30 23 20 5 26 3 16 17 22 15 12 29 18 27 8 9 14 7 4 &#8230;</p>
<p>Can you please confirm your results?<br />
Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-17332</link>
		<dc:creator>David</dc:creator>
		<pubDate>Sat, 21 Apr 2012 01:43:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-17332</guid>
		<description><![CDATA[Hi,

The following algorithm is from function lookdict_string(PyDictObject *mp, PyObject *key, long hash);
1  i is the current slot index
2  set perturb to hash
3  forever loop:
4  set i to i &lt;&lt; 2 + i + perturb + 1
5  set slot index to i &amp; mask
6  if slot is free:
7      return it
8  right shift perturb by 5 bits

I want to implement this algorithm in my own program to store a large number (about 13 M) of key/value pairs. (The keys are strings.) But I find that this algorithm fails to find a free slot after inserting 70,000 key/value pairs, though I set the hash table size to 30 M.

So my question is, does python support a dictionary at this scale (10M entries)?

Thanks!]]></description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>The following algorithm is from function lookdict_string(PyDictObject *mp, PyObject *key, long hash);<br />
1  i is the current slot index<br />
2  set perturb to hash<br />
3  forever loop:<br />
4  set i to i &lt;&lt; 2 + i + perturb + 1<br />
5  set slot index to i &amp; mask<br />
6  if slot is free:<br />
7      return it<br />
8  right shift perturb by 5 bits</p>
<p>I want to implement this algorithm in my own program to store a large number (about 13 M) of key/value pairs. (The keys are strings.) But I find that this algorithm fails to find a free slot after inserting 70,000 key/value pairs, though I set the hash table size to 30 M.</p>
<p>So my question is, does python support a dictionary at this scale (10M entries)?</p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Laurent Luce</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-5511</link>
		<dc:creator>Laurent Luce</dc:creator>
		<pubDate>Mon, 15 Aug 2011 02:35:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-5511</guid>
		<description><![CDATA[@Sean Holdsworth: Thanks for reporting the power/xor mistake. If you are referring to the string&#039;s hash value, it is cached in the string object structure using the attribute ob_shash. See Include/stringobject.h.]]></description>
		<content:encoded><![CDATA[<p>@Sean Holdsworth: Thanks for reporting the power/xor mistake. If you are referring to the string&#8217;s hash value, it is cached in the string object structure using the attribute ob_shash. See Include/stringobject.h.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://www.laurentluce.com/posts/python-dictionary-implementation/comment-page-1/#comment-4963</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Tue, 19 Jul 2011 07:51:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.laurentluce.com/?p=249#comment-4963</guid>
		<description><![CDATA[In your pseudocode for string_hash you translate ^ as power, but of course you mean xor.]]></description>
		<content:encoded><![CDATA[<p>In your pseudocode for string_hash you translate ^ as power, but of course you mean xor.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
