<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Beyond Search &#187; association-rule</title>
	<atom:link href="http://www.guwendong.com/post/tag/association-rule/feed" rel="self" type="application/rss+xml" />
	<link>http://www.guwendong.com</link>
	<description>最好走的路越走越难，最难走的路越走越容易</description>
	<lastBuildDate>Sun, 05 Feb 2012 17:55:54 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>啤酒和尿布的故事</title>
		<link>http://www.guwendong.com/post/2009/beer_and_nappies.html</link>
		<comments>http://www.guwendong.com/post/2009/beer_and_nappies.html#comments</comments>
		<pubDate>Wed, 28 Oct 2009 16:55:41 +0000</pubDate>
		<dc:creator>guwendong</dc:creator>
				<category><![CDATA[推荐系统]]></category>
		<category><![CDATA[association-rule]]></category>
		<category><![CDATA[resys]]></category>

		<guid isPermaLink="false">http://www.guwendong.com/post/2009/beer_and_nappies.html</guid>
		<description><![CDATA[因 Resys Group 里的这个讨论而起，又有朋友找我问起了啤酒和尿布的故事～
Long long ago，有这么一个故事。
在一家超市里，有一个有趣的现象：尿布和啤酒赫然摆在一起出售。但是这个奇怪的举措却使尿布和啤酒的销量双双增加了。这不是一个笑话，而是发生在美国沃尔玛连锁店超市的真实案例，并一直为商家所津津乐道。原来，美国的妇女们经常会嘱咐她们的丈夫下班以后要为孩子买尿布，而丈夫在买完尿布之后又要顺手买回自己爱喝的啤酒。
第一次听到这个故事，是在研一的数据挖掘课程上。当时导师讲完之后，我感觉这是个非常神奇的事情。
这之前我最崇拜的是一位名叫泰勒的大哥，此人左手一个小本，右手一支铅笔，脖子上挂着个秒表，没日没夜地，站在那里观察啊，观察啊，观察啊……观察什么啊？挖煤。对，确实是挖煤。就是这个办法，他凭借一己之力，开创了对工业界，尤其是对日本工业界影响深远的时间动作研究，被后人尊称为工业工程之父。
仰望着泰勒，我想，我要是也用泰勒大师的办法，是不是也能搞一个购物动作研究，成为超级市场之父呢？我为自己的灵光乍现而欢呼雀跃，I fucking couldn&#8217;t be happier！可转念又一想，那我得写秃多少支铅笔，按坏多少个秒表啊。我没有风险投资，这事儿干不了。
下课回到宿舍，我是埋头狂啃了几天关联规则算法。在自认为得道成仙之后，我便开始四处招摇，逢人便问。
我：你认识“啤酒兄”吗？
～摇头～
我：哦，不认识。
我：那你认识“尿布兄”吗？
～继续摇头～
我：什么，这个也不认识！
我：那算了，你还是回火星去吧。
时间长了，问得多了，我才弄明白，原来“啤酒兄”和“尿布兄”才是真正的火星人，在时间回旋里面，一时半会儿到不了地球，很不靠谱。
现在这事儿靠谱了，都有人专门为这两位仁兄著书立传了。等地球人都认识了他们，数据挖掘从业者的春天就真来了。
其实当年在课上，导师已经和我们说了，这个故事多半是有一些些杜撰的成分在里面的，并且这个故事其实是有多个版本的。但数据挖掘技术需要发展，需要进入业界，需要产业化，就必须有一个简单易懂的故事。就好像一提到进行诚实教育，大家自然就会想到“狼来了”，它通俗、易懂、好接受、容易记忆。故事不一定真实，但结论足够说明问题。
我有一位朋友，01 年本科毕业去了 IBM。在他给 IBM 的求职信上，有一段话让我印象深刻。原话记不住了，大约是这样：
“我喜欢写程序。
当我的同学们在浩渺的星际争霸战场上鏖战，或者围坐在饭岛爱老师身边激昂人生的时候，我却孤独地在 MFC 中深入浅出，每一行优雅的代码，都仿佛美丽的音符一般，让我深陷其中无法自拔。
学习就像太空冒险，越是深入，越能体会到他的博大精深。”
让我们共勉。

© guwendong for Beyond Search, 2009.
本文网址：http://www.guwendong.com/post/2009/beer_and_nappies.html
tags: association-rule, resys &#124; 参与讨论
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.resyschina.com%2F2010%2F03%2Fresys-ibmcrl.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Resys 第四次线下活动 – IBM CRL 专场 (@resyschina)</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.resyschina.com%2F2010%2F03%2Fresys-ibmcrl-xlvector.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Resys Group 第四次活动总结 (by xlvector) (@resyschina)</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2010%2Fresys_china.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Resys China 创刊号</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2010%2Fresys_douban.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">迟来的感谢：Resys 豆瓣活动</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">推荐系统：关联规则(1)</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>因 <a title="Resys Group" href="https://groups.google.com/group/resys">Resys Group</a> 里的这个<a title="mazon的推荐，想不通" href="https://groups.google.com/group/resys/browse_thread/thread/e76205f7cf683926#">讨论</a>而起，又有朋友找我问起了啤酒和尿布的故事～</p>
<p>Long long ago，有这么一个故事。<br />
在一家超市里，有一个有趣的现象：尿布和啤酒赫然摆在一起出售。但是这个奇怪的举措却使尿布和啤酒的销量双双增加了。这不是一个笑话，而是发生在美国沃尔玛连锁店超市的真实案例，并一直为商家所津津乐道。原来，美国的妇女们经常会嘱咐她们的丈夫下班以后要为孩子买尿布，而丈夫在买完尿布之后又要顺手买回自己爱喝的啤酒。</p>
<p>第一次听到这个故事，是在研一的数据挖掘课程上。当时导师讲完之后，我感觉这是个非常神奇的事情。</p>
<p>这之前我最崇拜的是一位名叫<a title="F. W. Taylor" href="http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor">泰勒</a>的大哥，此人左手一个小本，右手一支铅笔，脖子上挂着个秒表，没日没夜地，站在那里观察啊，观察啊，观察啊……观察什么啊？挖煤。对，确实是挖煤。就是这个办法，他凭借一己之力，开创了对工业界，尤其是对日本工业界影响深远的时间动作研究，被后人尊称为工业工程之父。</p>
<p>仰望着泰勒，我想，我要是也用泰勒大师的办法，是不是也能搞一个购物动作研究，成为超级市场之父呢？我为自己的灵光乍现而欢呼雀跃，<a title="“F”家族口语禁忌语之完美诠释" href="http://www.yeeyan.com/articles/view/96632/64989">I fucking couldn&#8217;t be happier</a>！可转念又一想，那我得写秃多少支铅笔，按坏多少个秒表啊。我没有风险投资，这事儿干不了。</p>
<p>下课回到宿舍，我是埋头狂啃了几天<a href="http://www.guwendong.com/post/2007/association_rule.html" title="关联规则">关联规则</a>算法。在自认为得道成仙之后，我便开始四处招摇，逢人便问。<br />
我：你认识“啤酒兄”吗？<br />
～摇头～<br />
我：哦，不认识。<br />
我：那你认识“尿布兄”吗？<br />
～继续摇头～<br />
我：什么，这个也不认识！<br />
我：那算了，你还是回火星去吧。<br />
时间长了，问得多了，我才弄明白，原来“啤酒兄”和“尿布兄”才是真正的火星人，在<a href="http://www.douban.com/subject/3041463/" title="时间回旋">时间回旋</a>里面，一时半会儿到不了地球，很不靠谱。</p>
<p>现在这事儿靠谱了，都有人专门为这两位仁兄<a href="http://www.douban.com/subject/3283973/" title="著书立传">著书立传</a>了。等地球人都认识了他们，数据挖掘从业者的春天就真来了。</p>
<p>其实当年在课上，导师已经和我们说了，<a title="这个故事" href="http://www.google.com/search?hl=en&amp;q=beer+and+nappies">这个故事</a>多半是有一些些杜撰的成分在里面的，并且这个故事其实是有多个版本的。但数据挖掘技术需要发展，需要进入业界，需要产业化，就必须有一个简单易懂的故事。就好像一提到进行诚实教育，大家自然就会想到“狼来了”，它通俗、易懂、好接受、容易记忆。故事不一定真实，但结论足够说明问题。</p>
<p>我有一位朋友，01 年本科毕业去了 IBM。在他给 IBM 的求职信上，有一段话让我印象深刻。原话记不住了，大约是这样：<br />
“我喜欢写程序。<br />
当我的同学们在浩渺的星际争霸战场上鏖战，或者围坐在饭岛爱老师身边激昂人生的时候，我却孤独地在 MFC 中深入浅出，每一行优雅的代码，都仿佛美丽的音符一般，让我深陷其中无法自拔。<br />
学习就像太空冒险，越是深入，越能体会到他的博大精深。”</p>
<p>让我们共勉。</p>
<hr id="rss-footer" />
<small>© guwendong for <a href="http://www.guwendong.com">Beyond Search</a>, 2009.<br/>
本文网址：<a href="http://www.guwendong.com/post/2009/beer_and_nappies.html">http://www.guwendong.com/post/2009/beer_and_nappies.html</a><br/>
tags: <a href="http://www.guwendong.com/post/tag/association-rule" rel="tag">association-rule</a>, <a href="http://www.guwendong.com/post/tag/resys" rel="tag">resys</a> | <a href="http://www.guwendong.com/post/2009/beer_and_nappies.html#comments">参与讨论</a>
</small><br><table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.resyschina.com%2F2010%2F03%2Fresys-ibmcrl.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Resys 第四次线下活动 – IBM CRL 专场 (@resyschina)</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.resyschina.com%2F2010%2F03%2Fresys-ibmcrl-xlvector.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Resys Group 第四次活动总结 (by xlvector) (@resyschina)</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2010%2Fresys_china.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Resys China 创刊号</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2010%2Fresys_douban.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">迟来的感谢：Resys 豆瓣活动</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fbeer_and_nappies.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">推荐系统：关联规则(1)</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://www.guwendong.com/post/2009/beer_and_nappies.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>推荐系统：关联规则(3) —— FP-Growth 算法</title>
		<link>http://www.guwendong.com/post/2008/fpgrowth_algorithm.html</link>
		<comments>http://www.guwendong.com/post/2008/fpgrowth_algorithm.html#comments</comments>
		<pubDate>Thu, 14 Aug 2008 05:28:50 +0000</pubDate>
		<dc:creator>guwendong</dc:creator>
				<category><![CDATA[推荐系统]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[association-rule]]></category>
		<category><![CDATA[fpgrowth]]></category>

		<guid isPermaLink="false">http://www.guwendong.com/post/2008/fpgrowth_algorithm.html</guid>
		<description><![CDATA[在 1994 年 Rakesh Agrawal 提出了 Apriori 算法之后，关联规则挖掘技术的可用性得到了很大的提高。而且因为关联规则挖掘与生俱来的商业意义，使得它迅速成为了一个非常热门的研究领域，新的算法也不断地涌现出来。这其中，实用性比较强的一个算法，是由韩家玮教授提出的 FP-Growth 算法。FP-Growth 算法在 2000 年发表的这个 paper 《Mining Frequent Patterns without Candidate Generation》里有详细的介绍。读这篇 paper，我个人建议一定要同时把引文也都看一看，2000 年之前与关联规则挖掘相关的重要 paper，基本上都在里面了。
FP-Growth 算法的核心是 FP-Tree（Frequent Pattern Tree，频繁模式树）的构建，这个特殊的数据结构，是 FP-Growth 算法与 Apriori 算法相比，性能显著提高的原因所在。不过，仔细分析一下 FP-Tree 的实现，可以发现它与字符串处理算法中常用的 Prefix Tree 算法，有着异曲同工之妙。FP-Tree 通过合并一些重复路径，实现了数据的压缩，从而使得将频繁项集加载到内存中成为可能。之后以树遍历的操作，替代了 Apriori 算法中最耗费时间的事务记录遍历，从而大大提高了运算效率。详细的理论讲解可以阅读上面的论文，我这里还是把其中的例子翻译一下。
某数据库 DB 里有 5 条事务记录，取最小支持度（min support threshold）为 3，则生成 FP-Tree 的过程如下：
1、扫描一遍数据库，获取所有频繁项，删除频率小于最小支持度的项。在此操作的过程中，还可以得到每个项的出现频率，供后续步骤使用。这一步完成之后，我们得到以下频繁项， { (c:4), (f:4), (a:3), (b:3), (m:3), (p:3) [...]<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="如何从无到有建立推荐系统" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.alibuybuy.com%2Fposts%2F12756.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/04/08/5265314.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">如何从无到有建立推荐系统 (@alibuybuy)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统应用研究：网上书店" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.alibuybuy.com%2Fposts%2F26579.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2010/12/26/1593614.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统应用研究：网上书店 (@alibuybuy)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(1)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(1)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(2)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(2)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Social Media Algorithm: Hacker News" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fsocial_media_algorithm_hackernews.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/05/1801986.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Social Media Algorithm: Hacker News</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>在 1994 年 <a id="wrzu" href="http://rakesh.agrawal-family.com/" target="_blank" title="Rakesh Agrawal">Rakesh Agrawal</a> 提出了 <a id="wrzu0" href="http://www.guwendong.com/post/2007/apriori_algorithm.html" target="_blank" title="Apriori Algorithm">Apriori</a> 算法之后，关联规则挖掘技术的可用性得到了很大的提高。而且因为关联规则挖掘与生俱来的商业意义，使得它迅速成为了一个非常热门的研究领域，新的算法也不断地涌现出来。这其中，实用性比较强的一个算法，是由<a title="Jiawei Han" target="_blank" href="http://www-faculty.cs.uiuc.edu/%7Ehanj/" id="lmib">韩家玮</a>教授提出的 FP-Growth 算法。FP-Growth 算法在 2000 年发表的这个 paper 《<span id="wrzu1" class="m"><span id="wrzu2" class="l"><a title="Mining Frequent Patterns without Candidate Generation" target="_blank" href="http://citeseer.ist.psu.edu/han99mining.html" id="z1dk">Mining Frequent Patterns without Candidate Generation</a></span></span>》里有详细的介绍。读这篇 paper，我个人建议一定要同时把引文也都看一看，2000 年之前与关联规则挖掘相关的重要 paper，基本上都在里面了。</p>
<p>FP-Growth 算法的核心是 FP-Tree（Frequent Pattern Tree，频繁模式树）的构建，这个特殊的数据结构，是 FP-Growth 算法与 <a id="wrzu5" href="http://www.guwendong.com/post/2007/apriori_algorithm.html" target="_blank" title="Apriori Algorithm">Apriori</a> 算法相比，性能显著提高的原因所在。不过，仔细分析一下 FP-Tree 的实现，可以发现它与字符串处理算法中常用的 <a title="Prefix Tree" target="_blank" href="http://www.answers.com/topic/trie" id="utp.">Prefix Tree</a> 算法，有着异曲同工之妙。FP-Tree 通过合并一些重复路径，实现了数据的压缩，从而使得将频繁项集加载到内存中成为可能。之后以树遍历的操作，替代了 <a id="wrzu6" href="http://www.guwendong.com/post/2007/apriori_algorithm.html" target="_blank" title="Apriori Algorithm">Apriori</a> 算法中最耗费时间的事务记录遍历，从而大大提高了运算效率。详细的理论讲解可以阅读上面的论文，我这里还是把其中的例子翻译一下。</p>
<p>某数据库 DB 里有 5 条事务记录，取最小支持度（min support threshold）为 3，则生成 FP-Tree 的过程如下：</p>
<p>1、扫描一遍数据库，获取所有频繁项，删除频率小于最小支持度的项。在此操作的过程中，还可以得到每个项的出现频率，供后续步骤使用。这一步完成之后，我们得到以下频繁项， { (c:4), (f:4), (a:3), (b:3), (m:3), (p:3) }，“:”之后的数字表示对应项的出现频率。这个结果是排好顺序的，首先按照频率从达到小排序，再按照字母顺序排序。需要注意的是这里的排序非常重要，之后每个事务中的项都要按照这个顺序进行排列，这个是有效合并重复路径的前提。</p>
<p>处理之后的数据库记录为：</p>
<table border="1" cellpadding="3" cellspacing="0" height="202" width="463">
<tbody>
<tr>
<td width="33%">TID</td>
<td width="33%">原始事务数据</td>
<td width="33%">处理后数据</td>
</tr>
<tr>
<td>100</td>
<td>f, a, c, d, g, i, m, p</td>
<td>c, f, a, m, p</td>
</tr>
<tr>
<td>200</td>
<td>a, b, c, f, l, m, o</td>
<td>c, f, a, b, m</td>
</tr>
<tr>
<td>300</td>
<td>b, f, h, j, o</td>
<td>f, b</td>
</tr>
<tr>
<td>400</td>
<td>b, c, k, s, p</td>
<td>c, b, p</td>
</tr>
<tr>
<td>500</td>
<td>a, f, c, e, l, p, m, n</td>
<td>c, f, a, m, p</td>
</tr>
</tbody>
</table>
<p>
2、第二次扫描数据库，在第一次处理完成的结果基础上，构建 FP-Tree。</p>
<div id="wrzu57" style="margin-left: 40px;">1) 取出第一条事务数据，构建 FP-Tree 的第一条路径，{ c, f, a, m, p }。注意其中项的排序与第一步中得到的频繁项集合的排序是一致的。<br />
2) 取出第二条事务数据，{ c, f, a, b, m }，不难发现，它与第一条路径共享了部分数据{ c, f, a }。因此，可以重复利用已有的路径，只需要将其计数加 1，即{ (c:2), (f:2), (a:2) }。而对于后面不同的部分，我们创建新的路径，{ (b:1), (m:1) }，其中，b 为 a 的子节点，m 为 b 的子节点。<br />
3) 取出第三条事务数据，{ f, b }，发现没有重复路径存在。但 f 点是存在的，因此，可以重复利用 f 点，新建一个 b 节点，作为 f 的子节点，得到路径{ {f:3}, (b:1) }。注意，之前已经存在的 b 节点无法重复使用，因为其父节点为 a。<br />
4) 取出第四条事务数据，{ c, b, p }，发现没有重复路径存在。因此，从现有 c 点出发，构建一条新路径{ (c:3), (b:1), (p:1) }。<br />
5) 取出第五条事务数据，{ c, f, a, m, p }，同上原理构建路径，{ (c:4), (f:4), (a:3), (m:2), (p:2) }。</p>
</div>
<p>
经过两遍数据库扫描，完成了 FP-Tree 的构建。在此例中，c 点为整个 FP-Tree 的唯一根节点，但其实多数情况下，根节点并不是唯一的，即有多棵子树。因此，为了方便树结构的遍历，可以人为添加一个超级根节点，通常标记为 root&lt;null&gt;。参照下图，可以更清楚的理解整个过程。</p>
<p><img id="wrzu64" style="width: 274px; height: 540px;" src="http://docs.google.com/File?id=ddjqwpms_106gd347xf2"></p>
<p>得到了 FP-Tree 树之后，再遍历整棵树获取满足一定置信度的关联规则，就比较简单了。具体的理论证明，以及与 <a id="wrzu65" href="http://www.guwendong.com/post/2007/apriori_algorithm.html" target="_blank" title="Apriori Algorithm">Apriori</a> 算法的 performance 对比，论文里讲得非常清楚，有兴趣的朋友可以看一下。</p>
<p>关联规则算法系列文章<br />
1）<a title="关联规则介绍" target="_blank" href="http://www.guwendong.com/post/2007/association_rule.html" id="xa8e">关联规则介绍</a> <br />
2）<a title="Apriori 算法" target="_blank" href="http://www.guwendong.com/post/2007/apriori_algorithm.html" id="mug0">Apriori 算法</a> <br />
3）FP-Growth 算法，这篇文章和上一篇隔得时间有些长了</p>
<hr id="rss-footer" />
<small>© guwendong for <a href="http://www.guwendong.com">Beyond Search</a>, 2008.<br/>
本文网址：<a href="http://www.guwendong.com/post/2008/fpgrowth_algorithm.html">http://www.guwendong.com/post/2008/fpgrowth_algorithm.html</a><br/>
tags: <a href="http://www.guwendong.com/post/tag/algorithm" rel="tag">algorithm</a>, <a href="http://www.guwendong.com/post/tag/association-rule" rel="tag">association-rule</a>, <a href="http://www.guwendong.com/post/tag/fpgrowth" rel="tag">fpgrowth</a> | <a href="http://www.guwendong.com/post/2008/fpgrowth_algorithm.html#comments">参与讨论</a>
</small><br><table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="如何从无到有建立推荐系统" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.alibuybuy.com%2Fposts%2F12756.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/04/08/5265314.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">如何从无到有建立推荐系统 (@alibuybuy)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统应用研究：网上书店" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.alibuybuy.com%2Fposts%2F26579.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2010/12/26/1593614.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统应用研究：网上书店 (@alibuybuy)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(1)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(1)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(2)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(2)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Social Media Algorithm: Hacker News" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fsocial_media_algorithm_hackernews.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/05/1801986.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Social Media Algorithm: Hacker News</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://www.guwendong.com/post/2008/fpgrowth_algorithm.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>推荐系统：关联规则(2)</title>
		<link>http://www.guwendong.com/post/2007/apriori_algorithm.html</link>
		<comments>http://www.guwendong.com/post/2007/apriori_algorithm.html#comments</comments>
		<pubDate>Tue, 10 Jul 2007 18:50:05 +0000</pubDate>
		<dc:creator>guwendong</dc:creator>
				<category><![CDATA[推荐系统]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[apriori]]></category>
		<category><![CDATA[association-rule]]></category>

		<guid isPermaLink="false">http://www.guwendong.com/post/2007/apriori_algorithm.html</guid>
		<description><![CDATA[Apriori Algorithm 是关联规则领域里最具影响力的基础算法。它是由 Rakesh Agrawal 在 1994 年提出的，详细的介绍在这里《Fast Algorithms for Mining Association Rules》。十几年过去了，不少学者围绕着 Apriori 进行了诸多改良。但与 1994 年相比，目前基于互联网的应用，数据量大了几十倍甚至是几百倍，因此，基于 Apriori 的算法逐渐暴露出其运算成本过高的问题。但不管怎样，对于大师及其做出的贡献，我们也只有高山仰止的份儿。
Apriori 是一种广度优先算法，通过多次扫描数据库来获取支持度大于最小支持度的频繁项集。它的理论基础是频繁项集的两个单调性原则：频繁项集的任一子集一定是频繁的；非频繁项集的任一超集一定是非频繁的。晦涩的理论我这里就不多写了，有兴趣的可以去看论文。我把里面的例子给翻译一下，图文并茂，简明易懂。某数据库 DB 里有 4 条事务记录，取最小支持度（min support）为 0.5，则计算频繁项集的过程如下：








               TID               [...]<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="NRelate的内容推荐工具蒸蒸日上" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Ftech2ipo.com%2F43090%2F&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/02/05/14781524.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">NRelate的内容推荐工具蒸蒸日上 (@tech2ipo)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="如何从无到有建立推荐系统" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.alibuybuy.com%2Fposts%2F12756.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/04/08/5265314.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">如何从无到有建立推荐系统 (@alibuybuy)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(3) —— FP-Growth 算法" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/04/1802009.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(3) —— FP-Growth 算法</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(1)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(1)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Social Media Algorithm: Hacker News" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fsocial_media_algorithm_hackernews.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/05/1801986.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Social Media Algorithm: Hacker News</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Apriori_algorithm" title="Apriori Algorithm">Apriori Algorithm</a> 是关联规则领域里最具影响力的基础算法。它是由 <a href="http://rakesh.agrawal-family.com/" title="Rakesh Agrawal">Rakesh Agrawal</a> 在 1994 年提出的，详细的介绍在这里《<a href="http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf" title="Fast Algorithms for Mining Association Rules">Fast Algorithms for Mining Association Rules</a>》。十几年过去了，不少学者围绕着 Apriori 进行了诸多改良。但与 1994 年相比，目前基于互联网的应用，数据量大了几十倍甚至是几百倍，因此，基于 Apriori 的算法逐渐暴露出其运算成本过高的问题。但不管怎样，对于大师及其做出的贡献，我们也只有高山仰止的份儿。</p>
<p>Apriori 是一种广度优先算法，通过多次扫描数据库来获取支持度大于最小支持度的频繁项集。它的理论基础是频繁项集的两个单调性原则：频繁项集的任一子集一定是频繁的；非频繁项集的任一超集一定是非频繁的。晦涩的理论我这里就不多写了，有兴趣的可以去看论文。我把里面的例子给翻译一下，图文并茂，简明易懂。<br />某数据库 DB 里有 4 条事务记录，取最小支持度（min support）为 0.5，则计算频繁项集的过程如下：<br />
<table width="90%" cellspacing="0" cellpadding="3" border="0" align="center">
<tbody>
<tr>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               TID                        </td>
<td width="50%" style="font-weight:bold">               Items                        </td>
</tr>
<tr>
<td width="50%">               100                        </td>
<td width="50%">               A, C, D                        </td>
</tr>
<tr>
<td width="50%">               200                        </td>
<td width="50%">               B, C, E                        </td>
</tr>
<tr>
<td width="50%">               300                        </td>
<td width="50%">               A, B, C, E                        </td>
</tr>
<tr>
<td width="50%">               400                        </td>
<td width="50%">               B, E                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
<td style="text-align:center">       扫描DB            </td>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
<td width="50%" style="font-weight:bold">               Support                        </td>
</tr>
<tr>
<td width="50%">               {A}                        </td>
<td width="50%">               2 (0.5)                        </td>
</tr>
<tr>
<td width="50%">               {B}                        </td>
<td width="50%">               3 (0.75)                        </td>
</tr>
<tr>
<td width="50%">               {C}                        </td>
<td width="50%">               3 (0.75)                        </td>
</tr>
<tr>
<td width="50%">               {D}                        </td>
<td width="50%">               1 (0.25)                        </td>
</tr>
<tr>
<td width="50%">               {E}                        </td>
<td width="50%">               3 (0.75)                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
<td style="text-align:center">       取满足<br />            最小支持度<br />            项集            </td>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
<td width="50%" style="font-weight:bold">               Support                        </td>
</tr>
<tr>
<td width="50%">               {A}                        </td>
<td width="50%">               2                        </td>
</tr>
<tr>
<td width="50%">               {B}                        </td>
<td width="50%">               3                        </td>
</tr>
<tr>
<td width="50%">               {C}                        </td>
<td width="50%">               3                        </td>
</tr>
<tr>
<td width="50%">               {E}                        </td>
<td width="50%">               3                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
</tr>
<tr>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
</tr>
<tr>
<td width="50%">               {A, B}                        </td>
</tr>
<tr>
<td width="50%">               {A, C}                        </td>
</tr>
<tr>
<td width="50%">               {A, E}                        </td>
</tr>
<tr>
<td width="50%">               {B, C}                        </td>
</tr>
<tr>
<td valign="top">               {B, E}                        </td>
</tr>
<tr>
<td valign="top">               {C, E}                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
<td style="text-align:center">       扫描DB            </td>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
<td width="50%" style="font-weight:bold">               Support                        </td>
</tr>
<tr>
<td width="50%">               {A, B}                        </td>
<td width="50%">               1 (0.25)                        </td>
</tr>
<tr>
<td width="50%">               {A, C}                        </td>
<td width="50%">               2 (0.5)                        </td>
</tr>
<tr>
<td width="50%">               {A, E}                        </td>
<td width="50%">               1 (0.25)                        </td>
</tr>
<tr>
<td width="50%">               {B, C}                        </td>
<td width="50%">               2 (0.5)                        </td>
</tr>
<tr>
<td width="50%">               {B, E}                        </td>
<td width="50%">               3 (0.75)                        </td>
</tr>
<tr>
<td valign="top">               {C, E}                        </td>
<td valign="top">               2 (0.5)                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
<td style="text-align:center">       取满足<br />            最小支持度<br />            项集            </td>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
<td width="50%" style="font-weight:bold">               Support                        </td>
</tr>
<tr>
<td width="50%">               {A, C}                        </td>
<td width="50%">               2                        </td>
</tr>
<tr>
<td width="50%">               {B, C}                        </td>
<td width="50%">               2                        </td>
</tr>
<tr>
<td width="50%">               {B, E}                        </td>
<td width="50%">               3                        </td>
</tr>
<tr>
<td width="50%">               {C, E}                        </td>
<td width="50%">               2                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
</tr>
<tr>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
</tr>
<tr>
<td width="50%">               {A, B, C}                        </td>
</tr>
<tr>
<td valign="top">               {A, B, E}                        </td>
</tr>
<tr>
<td valign="top">               {A, C, E}                        </td>
</tr>
<tr>
<td valign="top">               {B, C, E}                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
<td style="text-align:center">       扫描DB            </td>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
<td width="50%" style="font-weight:bold">               Support                        </td>
</tr>
<tr>
<td width="50%">               {A, B, C}                        </td>
<td width="50%">               1 (0.25)                        </td>
</tr>
<tr>
<td width="50%">               {A, B, E}                        </td>
<td width="50%">               1 (0.25)                        </td>
</tr>
<tr>
<td width="50%">               {A, C, E}                        </td>
<td width="50%">               1 (0.35)                        </td>
</tr>
<tr>
<td width="50%">               {B, C, E}                        </td>
<td width="50%">               2 (0.5)                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
<td style="text-align:center">       取满足<br />            最小支持度<br />            项集            </td>
<td>
<div>
<table width="100%" cellspacing="0" cellpadding="3" border="1">
<tbody>
<tr>
<td width="50%" style="font-weight:bold">               Itemset                        </td>
<td width="50%" style="font-weight:bold">               Support                        </td>
</tr>
<tr>
<td width="50%">               {B, C, E}                        </td>
<td width="50%">               2 (0.5)                        </td>
</tr>
</tbody>
</table></div>
<p>            </td>
</tr>
</tbody>
</table>
<p>如上可以看出，在海量数据的情况下，Apriori 算法的运算过程有 2 个问题：
<ol>
<li>     需要多次扫描数据库，时间成本很高；   </li>
<li>     运算过程中需要产生大量的候选集，空间成本也非常高。   </li>
</ol>
<p>针对 Apriori 算法所做的<a href="http://www.google.com/search?complete=1&amp;hl=zh-CN&amp;q=improved+apriori&amp;btnG=Google+%E6%90%9C%E7%B4%A2&amp;meta=" title="google 上的 Apriori 算法改进">改进</a>也基本上是围绕着解决这两个问题进行的，如在扫描DB前首先进行以便事务合并和压缩，数据分区或抽样等。</p>
<p><a href="http://www.cs.waikato.ac.nz/%7Eml/weka/" title="an open-source BI project">Weka</a> 里有 Apriori 算法的 Java 实现，非常值得一看。</p>
<p>貌似 <a href="http://en.wikipedia.org/wiki/Main_Page" title="wikipedia">wikipedia</a> 已经解封了，呵呵！</p>
<p>预报：关联规则(3)，关于 FP-Growth 算法。</p>
<hr id="rss-footer" />
<small>© guwendong for <a href="http://www.guwendong.com">Beyond Search</a>, 2007.<br/>
本文网址：<a href="http://www.guwendong.com/post/2007/apriori_algorithm.html">http://www.guwendong.com/post/2007/apriori_algorithm.html</a><br/>
tags: <a href="http://www.guwendong.com/post/tag/algorithm" rel="tag">algorithm</a>, <a href="http://www.guwendong.com/post/tag/apriori" rel="tag">apriori</a>, <a href="http://www.guwendong.com/post/tag/association-rule" rel="tag">association-rule</a> | <a href="http://www.guwendong.com/post/2007/apriori_algorithm.html#comments">参与讨论</a>
</small><br><table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="NRelate的内容推荐工具蒸蒸日上" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Ftech2ipo.com%2F43090%2F&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/02/05/14781524.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">NRelate的内容推荐工具蒸蒸日上 (@tech2ipo)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="如何从无到有建立推荐系统" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.alibuybuy.com%2Fposts%2F12756.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/04/08/5265314.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">如何从无到有建立推荐系统 (@alibuybuy)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(3) —— FP-Growth 算法" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/04/1802009.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(3) —— FP-Growth 算法</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(1)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(1)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Social Media Algorithm: Hacker News" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fsocial_media_algorithm_hackernews.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/05/1801986.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Social Media Algorithm: Hacker News</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://www.guwendong.com/post/2007/apriori_algorithm.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>推荐系统：关联规则(1)</title>
		<link>http://www.guwendong.com/post/2007/association_rule.html</link>
		<comments>http://www.guwendong.com/post/2007/association_rule.html#comments</comments>
		<pubDate>Thu, 28 Jun 2007 14:57:03 +0000</pubDate>
		<dc:creator>guwendong</dc:creator>
				<category><![CDATA[推荐系统]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[association-rule]]></category>

		<guid isPermaLink="false">http://www.guwendong.com/post/2007/association_rule.html</guid>
		<description><![CDATA[说到推荐系统，就不能不说关联规则。基于关联规则的推荐，是入门级的推荐技术实现，也是目前应用最广泛的一种推荐形式。
就拿刚上线的“蚂蚁”来说吧，打开《引爆流行》的页面，稍微滚动两下鼠标，你就可以看到这个了——“喜欢此宝贝的会员还喜欢”。豆瓣上也有类似的形式，还看《引爆流行》，豆瓣的是——“喜欢引爆流行的人也喜欢”。是不是很像？但别被形式迷惑了，这两个用的是完全不同的技术实现。豆瓣的之前我说过了，他是
Item-Based
方法；蚂蚁的这个应该就是关联规则方法了。当然我是猜的，不过也不是乱猜。有兴趣的可以刷刷上面那两个《引爆流行》的页面，看一下两个推荐区域的内容会有什么不同。
关联规则起源于数据挖掘领域，人们用它来发现大量数据中项集之间（有趣/有用）的关联。它本身是数据挖掘领域中一个重要的研究课题，近些年来更是由于被业界广泛应用而倍受重视。Rakesh
Agrawal 是关联规则领域的大牛，他于 1993 年发表的一篇
paper，《Mining
Association Rules between Sets of Items in Large Databases》，是被引用最多的一篇大作。不过让
google fans 们失望的是，他目前就职于 microsoft 的搜索实验室！^_^
关联规则的最典型例子就是购物篮分析。在一家超市里，有一个有趣的现象：尿布和啤酒赫然摆在一起出售。但是这个奇怪的举措却使尿布和啤酒的销量双双增加了。这不是一个笑话，而是发生在美国沃尔玛连锁店超市的真实案例，并一直为商家所津津乐道。原来，美国的妇女们经常会嘱咐她们的丈夫下班以后要为孩子买尿布。而丈夫在买完尿布之后又要顺手买回自己爱喝的啤酒，因此啤酒和尿布在一起购买的机会还是很多的。这个故事听起来是不是很酷？没错，这就是技术的力量！
但是，和任何其他经典的故事一样——这事儿听起来带劲儿，做起来很难！真正做过关联规则挖掘的人，一定都有这样的体会：想从浩瀚的记录集里，挖掘一条带劲儿的关联规则出来，简直太难了。（什么，你问有多难？请参照朱广沪~~~）
对于挖掘得到的关联规则，都会制定一些指标来衡量它们的有效程度，最经典的包括，支持度和置信度。简单来讲，


    支持度是指，商品A、商品B在全部销售订单中所占的比例。
  

    置信度是指，购买商品A并且同时购买了商品B的订单，在所有包含商品A的订单中所占的比例。
  


当然，这里的商品和订单是个泛化的概念，具体指代是的什么，就得具体问题具体分析了。

© guwendong for Beyond Search, 2007.
本文网址：http://www.guwendong.com/post/2007/association_rule.html
tags: algorithm, association-rule &#124; 参与讨论
<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="Goodlook.me：世界上最聪明的服装推荐系统" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Ftech2ipo.com%2F2010%2F11%2Fgoodlook-me%25EF%25BC%259A%25E4%25B8%2596%25E7%2595%258C%25E4%25B8%258A%25E6%259C%2580%25E8%2581%25AA%25E6%2598%258E%25E7%259A%2584%25E6%259C%258D%25E8%25A3%2585%25E6%258E%25A8%25E8%258D%2590%25E7%25B3%25BB%25E7%25BB%259F%2F&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/07/21/18336869.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Goodlook.me：世界上最聪明的服装推荐系统 (@tech2ipo)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统resys小组线下活动见闻2009-08-22" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.wentrue.net%2Fblog%2F%3Fp%3D559&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2010/12/27/1608365.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统resys小组线下活动见闻2009-08-22 (@wentrue)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(3) —— FP-Growth 算法" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/04/1802009.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(3) —— FP-Growth 算法</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(2)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(2)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Social Media Algorithm: StumbleUpon" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fsocial_media_algorithm_stumbleupon.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/08/20/23627877.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Social Media Algorithm: StumbleUpon</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>说到推荐系统，就不能不说<a href="http://en.wikipedia.org/wiki/Association_rule_learning" title="关联规则">关联规则</a>。基于关联规则的推荐，是入门级的推荐技术实现，也是目前应用最广泛的一种推荐形式。</p>
<p>就拿刚上线的“<a href="http://www.mayi.com/" title="蚂蚁">蚂蚁</a>”来说吧，打开《<a href="http://goods.mayi.com/view/83794/" title="mayi上的引爆流行">引爆流行</a>》的页面，稍微滚动两下鼠标，你就可以看到这个了——“喜欢此宝贝的会员还喜欢”。豆瓣上也有类似的形式，还看《<a href="http://www.douban.com/subject/1014964/" title="douban上的引爆流行">引爆流行</a>》，豆瓣的是——“喜欢引爆流行的人也喜欢”。是不是很像？但别被形式迷惑了，这两个用的是完全不同的技术实现。豆瓣的之前我说过了，他是<br />
<a href="http://my.donews.com/clickstone/2006/12/29/lopauzjdjbrgestuvipycpibuheaoiuzwkzr/" title="推荐系统：协同过滤 之 Item-based Collaborative Filtering">Item-Based</a><br />
方法；蚂蚁的这个应该就是关联规则方法了。当然我是猜的，不过也不是乱猜。有兴趣的可以刷刷上面那两个《引爆流行》的页面，看一下两个推荐区域的内容会有什么不同。</p>
<p>关联规则起源于数据挖掘领域，人们用它来发现大量数据中项集之间（有趣/有用）的关联。它本身是数据挖掘领域中一个重要的研究课题，近些年来更是由于被业界广泛应用而倍受重视。<a href="http://rakesh.agrawal-family.com/" title="Rakesh Agrawal">Rakesh<br />
Agrawal</a> 是关联规则领域的大牛，他于 1993 年发表的一篇<br />
paper，《<a href="http://rakesh.agrawal-family.com/papers/sigmod93assoc.pdf" title="Mining Association Rules between Sets of Items in Large Databases">Mining<br />
Association Rules between Sets of Items in Large Databases</a>》，是被引用最多的一篇大作。不过让<br />
google fans 们失望的是，他目前就职于 microsoft 的搜索实验室！^_^</p>
<p>关联规则的最典型例子就是购物篮分析。在一家超市里，有一个有趣的现象：尿布和啤酒赫然摆在一起出售。但是这个奇怪的举措却使尿布和啤酒的销量双双增加了。这不是一个笑话，而是发生在美国沃尔玛连锁店超市的真实案例，并一直为商家所津津乐道。原来，美国的妇女们经常会嘱咐她们的丈夫下班以后要为孩子买尿布。而丈夫在买完尿布之后又要顺手买回自己爱喝的啤酒，因此啤酒和尿布在一起购买的机会还是很多的。这个故事听起来是不是很酷？没错，这就是技术的力量！</p>
<p>但是，和任何其他经典的故事一样——这事儿听起来带劲儿，做起来很难！真正做过关联规则挖掘的人，一定都有这样的体会：想从浩瀚的记录集里，挖掘一条带劲儿的关联规则出来，简直太难了。（什么，你问有多难？请参照朱广沪~~~）</p>
<p>对于挖掘得到的关联规则，都会制定一些指标来衡量它们的有效程度，最经典的包括，支持度和置信度。简单来讲，</p>
<ol>
<li>
    支持度是指，商品A、商品B在全部销售订单中所占的比例。
  </li>
<li>
    置信度是指，购买商品A并且同时购买了商品B的订单，在所有包含商品A的订单中所占的比例。
  </li>
</ol>
<p>
当然，这里的商品和订单是个泛化的概念，具体指代是的什么，就得具体问题具体分析了。</p>
<hr id="rss-footer" />
<small>© guwendong for <a href="http://www.guwendong.com">Beyond Search</a>, 2007.<br/>
本文网址：<a href="http://www.guwendong.com/post/2007/association_rule.html">http://www.guwendong.com/post/2007/association_rule.html</a><br/>
tags: <a href="http://www.guwendong.com/post/tag/algorithm" rel="tag">algorithm</a>, <a href="http://www.guwendong.com/post/tag/association-rule" rel="tag">association-rule</a> | <a href="http://www.guwendong.com/post/2007/association_rule.html#comments">参与讨论</a>
</small><br><table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="Goodlook.me：世界上最聪明的服装推荐系统" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Ftech2ipo.com%2F2010%2F11%2Fgoodlook-me%25EF%25BC%259A%25E4%25B8%2596%25E7%2595%258C%25E4%25B8%258A%25E6%259C%2580%25E8%2581%25AA%25E6%2598%258E%25E7%259A%2584%25E6%259C%258D%25E8%25A3%2585%25E6%258E%25A8%25E8%258D%2590%25E7%25B3%25BB%25E7%25BB%259F%2F&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/07/21/18336869.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Goodlook.me：世界上最聪明的服装推荐系统 (@tech2ipo)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统resys小组线下活动见闻2009-08-22" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.wentrue.net%2Fblog%2F%3Fp%3D559&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2010/12/27/1608365.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统resys小组线下活动见闻2009-08-22 (@wentrue)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(3) —— FP-Growth 算法" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2008%2Ffpgrowth_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/01/04/1802009.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(3) —— FP-Growth 算法</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="推荐系统：关联规则(2)" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fapriori_algorithm.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">推荐系统：关联规则(2)</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Social Media Algorithm: StumbleUpon" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2009%2Fsocial_media_algorithm_stumbleupon.html&from=http%3A%2F%2Fwww.guwendong.com%2Fpost%2F2007%2Fassociation_rule.html">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/08/20/23627877.png" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Social Media Algorithm: StumbleUpon</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://www.guwendong.com/post/2007/association_rule.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

