Sequential Pattern Mining with Wildcards

Sequential pattern mining is an important research task in many domains, such as biological science. In this paper, we study the problem of mining frequent patterns from sequences with wildcards. The user can specify the gap constraints with flexibility. Given a subject sequence, a minimal support t...

Full description

Saved in:
Bibliographic Details
Published in2010 22nd IEEE International Conference on Tools with Artificial Intelligence Vol. 1; pp. 241 - 247
Main Authors Fei Xie, Xindong Wu, Xuegang Hu, Jun Gao, Dan Guo, Yulian Fei, Ertian Hua
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2010
Subjects
Online AccessGet full text
ISBN1424488176
9781424488179
ISSN1082-3409
DOI10.1109/ICTAI.2010.42

Cover

Abstract Sequential pattern mining is an important research task in many domains, such as biological science. In this paper, we study the problem of mining frequent patterns from sequences with wildcards. The user can specify the gap constraints with flexibility. Given a subject sequence, a minimal support threshold and a gap constraint, we aim to find frequent patterns whose supports in the sequence are no less than the given support threshold. We design an efficient mining algorithm MAIL that utilizes the candidate occurrences of the prefix to compute the support of a pattern that avoids the rescanning of the sequence. We present two pruning strategies to improve the completeness and the time efficiency of MAIL. Experiments show that MAIL mines 2 times more patterns than one of its peers and the time performance is 12 times faster on average than its another peer.
AbstractList Sequential pattern mining is an important research task in many domains, such as biological science. In this paper, we study the problem of mining frequent patterns from sequences with wildcards. The user can specify the gap constraints with flexibility. Given a subject sequence, a minimal support threshold and a gap constraint, we aim to find frequent patterns whose supports in the sequence are no less than the given support threshold. We design an efficient mining algorithm MAIL that utilizes the candidate occurrences of the prefix to compute the support of a pattern that avoids the rescanning of the sequence. We present two pruning strategies to improve the completeness and the time efficiency of MAIL. Experiments show that MAIL mines 2 times more patterns than one of its peers and the time performance is 12 times faster on average than its another peer.
Author Yulian Fei
Xuegang Hu
Xindong Wu
Jun Gao
Dan Guo
Fei Xie
Ertian Hua
Author_xml – sequence: 1
  surname: Fei Xie
  fullname: Fei Xie
  email: xiefei9815057@sina.com
  organization: Coll. of Comput. Sci. & Info. Eng., Hefei Univ. of Tech., Hefei, China
– sequence: 2
  surname: Xindong Wu
  fullname: Xindong Wu
  email: xwu@cs.uvm.edu
  organization: Coll. of Comput. Sci. & Info. Eng., Hefei Univ. of Tech., Hefei, China
– sequence: 3
  surname: Xuegang Hu
  fullname: Xuegang Hu
  organization: Coll. of Comput. Sci. & Info. Eng., Hefei Univ. of Tech., Hefei, China
– sequence: 4
  surname: Jun Gao
  fullname: Jun Gao
  organization: Coll. of Comput. Sci. & Info. Eng., Hefei Univ. of Tech., Hefei, China
– sequence: 5
  surname: Dan Guo
  fullname: Dan Guo
  organization: Coll. of Comput. Sci. & Info. Eng., Hefei Univ. of Tech., Hefei, China
– sequence: 6
  surname: Yulian Fei
  fullname: Yulian Fei
  organization: Coll. of Comput. Sci. & Info. Eng., Zhejiang Gongshang Univ., Hangzhou, China
– sequence: 7
  surname: Ertian Hua
  fullname: Ertian Hua
  organization: Coll. of Comput. Sci. & Info. Eng., Zhejiang Gongshang Univ., Hangzhou, China
BookMark eNotjE1Lw0AURQesYFu7dOUmSzep773MV5YlaA1UFKy4LC-TGR2JoyYR8d8b0Ls5nMW5CzFL78kLcYawRoTysq72m3pNMLmkI7FASVJai0bPxBzBUl5IKE_EahheYZoiIw3MxcWD__zyaYzcZfc8jr5P2W1MMT1n33F8yZ5i1zru2-FUHAfuBr_651I8Xl_tq5t8d7etq80uj2jUmDfWsQ3Oo1clsAmG2BJo47CxSnPpGrbWBYMtm8JZYsaGaCoCa40BiqU4__uN3vvDRx_fuP85KG0AJBa_OqJCUg
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICTAI.2010.42
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 247
ExternalDocumentID 5670041
Genre orig-research
GroupedDBID 23M
29O
6IE
6IF
6IH
6IK
6IL
6IN
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i175t-b8ca8fce1e590a7f72a82067c1b856a9cba88cf71da73c82aa1b228fcfa661f03
IEDL.DBID RIE
ISBN 1424488176
9781424488179
ISSN 1082-3409
IngestDate Wed Aug 27 03:03:20 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-b8ca8fce1e590a7f72a82067c1b856a9cba88cf71da73c82aa1b228fcfa661f03
PageCount 7
ParticipantIDs ieee_primary_5670041
PublicationCentury 2000
PublicationDate 2010-Oct.
PublicationDateYYYYMMDD 2010-10-01
PublicationDate_xml – month: 10
  year: 2010
  text: 2010-Oct.
PublicationDecade 2010
PublicationTitle 2010 22nd IEEE International Conference on Tools with Artificial Intelligence
PublicationTitleAbbrev ictai
PublicationYear 2010
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000527470
ssj0020523
Score 1.504646
Snippet Sequential pattern mining is an important research task in many domains, such as biological science. In this paper, we study the problem of mining frequent...
SourceID ieee
SourceType Publisher
StartPage 241
SubjectTerms Algorithm design and analysis
Bioinformatics
candidate occurrence pruning
Complexity theory
DNA
Genomics
one-off condition
Pattern matching
Postal services
sequential pattern mining
wildcard
Title Sequential Pattern Mining with Wildcards
URI https://ieeexplore.ieee.org/document/5670041
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwHA1zJ09TN_GbHDx4MFvTJE1ylOHYhMnADXYbSZqAKJ1Id_GvN0k_JuLBW1so9EdIXn8f7z0Abi31MOaRHAmcMUQpSZBgmUQ8VcZJTlxFpJ0_Z9MVfVqzdQfct1wYa20cPrPDcBl7-fnW7EKpbMQCpySw1A84lxVXq62nJCzkV0mbbIVyZzVcnyLik5iG1CUE5lmj9VTfy7345mg2Xj7MqpGvYN_-w3IlIs6kB-bNt1aDJm_DXamH5uuXjON_gzkCgz23Dy5a1DoGHVucgF5j7gDrvd4Hdy9xyNofAO9wEUU4CziPbhIw1G6hP05yExhbA7CaPC7HU1TbKqBX_69QIi2MEs5YbJlMFHd-WaKIu8HaL5OSRishjOM4V5wYkSqFdZr6N5zyYO4Scgq6xbawZwBKRjUmOjj4UA9suRKhXeCsdTnJMyrPQT8EvvmolDM2dcwXfz--BIexNx9H5a5At_zc2WsP-aW-iWv9DQj6oiU
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFH-MedDT1E38tgcPHszWNkmbHmU4Nl3HwA12G0magCidSHfxrzdJPybiwVtbKPQRmpf33u8D4FYRk8ZMJkcsiCgiBPuI0ShBccilTmKsSyJtOovGS_K0oqsW3DdcGKWUA5-pvr10s_xsI7e2VTagllNiWep71FQVccnWajoqPrUVlt-UW7bhWcLrQ4RNGVPTuhgL4qhWe6ruk5385mAyXDxMStCXNXD_Ybrics6oA2n9tSXU5K2_LURffv0ScvxvOIfQ27H7vHmTt46gpfJj6NT2Dl71t3fh7sXBrM0W8O7NnQxn7qXOT8Kz3VvPbCiZtJytHixHj4vhGFXGCujVnBYKJJjkTEsVKJr4PNZmYZyMuwyEWSieSMEZkzoOMh5jyULOAxGG5g3NTTrXPj6Bdr7J1Sl4CSUiwMJ6-BCT2jLO7MBAK6UznEUkOYOuDXz9UWpnrKuYz_9-fAP740U6XU8ns-cLOHCTegecu4R28blVV-YAUIhrt-7fyPyldg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2010+22nd+IEEE+International+Conference+on+Tools+with+Artificial+Intelligence&rft.atitle=Sequential+Pattern+Mining+with+Wildcards&rft.au=Fei+Xie&rft.au=Xindong+Wu&rft.au=Xuegang+Hu&rft.au=Jun+Gao&rft.date=2010-10-01&rft.pub=IEEE&rft.isbn=9781424488179&rft.issn=1082-3409&rft.volume=1&rft.spage=241&rft.epage=247&rft_id=info:doi/10.1109%2FICTAI.2010.42&rft.externalDocID=5670041
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1082-3409&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1082-3409&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1082-3409&client=summon