关于数据挖掘关联规则的Oracle实现

2020-05-22 00:00:00 规则 事务 关联 包含 数据挖掘

呵呵,前几天拿到了数据挖掘基础教程一书,感觉部分算法是基于统计学的原理的,而统计学是可以通过Oracle来实现。

关于数据挖掘关联规则的介绍,可以参见:http://baike.baidu.com/view/1076817.htm?fr=ala0_1

关联规则是形如X→Y的蕴涵式,

其中且, X和Y分别称为关联规则的先导(antecedent或left-hand-side, LHS)和后继(consequent或 right-hand-side, RHS) 。

关联规则在D中的支持度(support)是D中事务同时包含X、Y的百分比,即概率;=X^Y/D

置信度(confidence)是包含X的事务中同时又包含Y的百分比,即条件概率。  =(X^Y)/X

关联规则是有趣的,如果满足小支持度阈值和小置信度阈值。

若给定小支持度α = n,小置信度β = m,则分别通过以上的X^Y/D和(X^Y)/X,可获知是否存在关联

使用的原始数据

反范式后的数据

待统计项


代码示例

  1. --创建各个购买单元项视图

  2. create view distinct_trans as select distinct tranobject from purchase;


  3. --创建各个事务内部的购买单元项

  4. --可以用wm_concat函数

  5. create view all_trans as

  6. SELECT tranid,MAX(tranobjects) tranobjects

  7. FROM (SELECT tranid,WMSYS.WM_CONCAT(tranobject) OVER(PARTITION BY tranid ORDER BY tranobject) tranobjects

  8. FROM purchase)

  9. GROUP BY tranid;


  10. --也可以用sys_connect_by_path函数

  11. create view all_trans as

  12. select tranid,substr(tranobjects,2) tranobjects

  13. from --格式化前面的逗号和空格

  14. (select distinct tranid,FIRST_VALUE(tranobjects) OVER(PARTITION BY tranid ORDER BY levels desc ) AS tranobjects --保留大的那个

  15. from

  16. (select tranid,sys_connect_by_path(tranobject,',') tranobjects,level levels --各购买事务的内部排列组合

  17. from purchase

  18. connect by tranid=prior tranid and tranobject

  19. )

  20. );



  21. --对所有购买单元项进行排列组合,即数据挖掘的X^Y

  22. create view all_zuhe as

  23. select substr(sys_connect_by_path(tranobject,','),2) zuhe

  24. from (select distinct tranobject from purchase)

  25. connect by nocycle tranobject


  26. select * from all_zuhe


  27. --筛选出符合要求的排列组合,即数据挖掘的X项和Y

  28. create view full_zuhe as

  29. select a.zuhe X,b.zuhe Y from all_zuhe a,all_zuhe b

  30. where instr(a.zuhe,b.zuhe)= and instr(b.zuhe,a.zuhe)=

  31. and not exists(select 1 from distinct_trans c

  32. where instr(a.zuhe,c.tranobject)> and instr(b.zuhe,c.tranobject)>)



  33. select * from full_zuhe


  34. create or replace view tongji as

  35. select xy,xy_total,x,x_total,y,y_total,transtotal

  36. from

  37. (select y||','||x xy,

  38. (select count(*) from all_trans a where instr(a.tranobjects,c.x||','||c.y)> or instr(a.tranobjects,c.y||','||c.x)>) xy_total, --包含xy的事务数

  39. y,

  40. (select count(*) from all_trans b where instr(b.tranobjects,c.y)>) y_total, --包含y的事务数

  41. x,

  42. (select count(*) from all_trans b where instr(b.tranobjects,c.x)>) x_total, --包含x的事务数

  43. d.transtotal --总事务数

  44. from full_zuhe c,(select count(distinct tranid) transtotal from purchase) d

  45. order by xy_total desc,x_total desc

  46. )


  47. select * from tongji where xy_total>=3 and y_total>=3



相关文章