Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 15; no. 7; pp. 1553 - 1568
Main Authors	Evangelopoulos, Georgios, Zlatintsi, Athanasia, Potamianos, Alexandros, Maragos, Petros, Rapantzikos, Konstantinos, Skoumas, Georgios, Avrithis, Yannis
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.11.2013 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Applied sciences Artificial intelligence Attention audio saliency Biological and medical sciences Computational modeling Computer science; control theory; systems Cues Data processing. List processing. Character string processing Exact sciences and technology Feature extraction Fundamental and applied biological sciences. Psychology fusion Memory organisation. Data processing Modulation Motion pictures movie summarization multimodal saliency multistream processing Pattern recognition. Digital image processing. Computational geometry Perception Psychology. Psychoanalysis. Psychiatry Psychology. Psychophysiology Semantics Software Speech and sound recognition and synthesis. Linguistics Streaming media Streams Task analysis text saliency video summarization Vision Visual visual saliency Visualization Waveforms Space time correlation Tracking multimodal saliency Video signal Modeling Audiovisual multistream processing Linguistics Selection criterion text saliency Multimodal interface Audiovisual equipment audio saliency Pattern extraction video summarization Textual data Streaming Computer vision Abstract movie summarization Attention Model driven architecture Text Annotation Grammatical inference fusion Dimension reduction Cognitive theory Hearing visual saliency Multimodality Bottom up method Stimulus salience Data fusion Feature extraction Visual information Visual attention
Online Access	Get full text
ISSN	1520-9210 1941-0077
DOI	10.1109/TMM.2013.2267205

Cover

Abstract	Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.
AbstractList	Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.
Author	Zlatintsi, Athanasia Evangelopoulos, Georgios Rapantzikos, Konstantinos Avrithis, Yannis Skoumas, Georgios Potamianos, Alexandros Maragos, Petros
Author_xml	– sequence: 1 givenname: Georgios surname: Evangelopoulos fullname: Evangelopoulos, Georgios email: gevag@cs.ntua.gr organization: School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece – sequence: 2 givenname: Athanasia surname: Zlatintsi fullname: Zlatintsi, Athanasia email: nzlat@cs.ntua.gr organization: School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece – sequence: 3 givenname: Alexandros surname: Potamianos fullname: Potamianos, Alexandros email: potam@telecom.tuc.gr organization: Department of Electronics and Computer Engineering, Technical University of Crete, Chania, Greece – sequence: 4 givenname: Petros surname: Maragos fullname: Maragos, Petros email: maragos@cs.ntua.gr organization: School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece – sequence: 5 givenname: Konstantinos surname: Rapantzikos fullname: Rapantzikos, Konstantinos email: rap@image.ntua.gr organization: School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece – sequence: 6 givenname: Georgios surname: Skoumas fullname: Skoumas, Georgios email: iavr@image.ntua.gr organization: School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece – sequence: 7 givenname: Yannis surname: Avrithis fullname: Avrithis, Yannis email: gskoumas@dblab.ece.ntua.gr organization: School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
BackLink	http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=27861117$$DView record in Pascal Francis
BookMark	eNp9kUFr3DAQRkVJoUnae6EXQynkUG9mRpZsH7ehaQtZcsi2VzOVx6DgtVPJLkl-feTukkMOPc1IvKcR852oo2EcRKn3CCtEqM-3m82KAPWKyJYE5pU6xrrAHKAsj1JvCPKaEN6okxhvAbAwUB6rdjP3k9-NLffZDfdeBveQ8dBml3P045B1Y8g2418v2c2823Hwjzwt9184SpulZj0H7j9nv3ycl7qoW7mf0iFbT5MMC_1Wve64j_LuUE_Vz8uv24vv-dX1tx8X66vcFQRTTgU7RudM57CoDbZUd8LG8W8Ddd06bnVlKyvAzmiNbSW6pBbQmLKQzmp9qs72796F8c8scWp2Pjrpex5knGODhS2MoaqChH58gd6OcxjS7xJVkLZEhhL16UBxdNx3gQfnY3MXfFrFQ0NlZRGxTBzsORfGGIN0zwhCs8TTpHiaJZ7mEE9S7AvF-enfbqfAvv-f-GEvehF5nmMNlZpIPwHs-J12
CODEN	ITMUF8
CitedBy_id	crossref_primary_10_1109_ACCESS_2023_3308967 crossref_primary_10_1109_TMM_2018_2839523 crossref_primary_10_1016_j_imavis_2021_104216 crossref_primary_10_1016_j_inffus_2023_02_028 crossref_primary_10_1109_TMM_2022_3141256 crossref_primary_10_1016_j_neucom_2021_04_072 crossref_primary_10_1111_psyp_70036 crossref_primary_10_1080_00087041_2024_2436687 crossref_primary_10_1109_TMM_2019_2918730 crossref_primary_10_1142_S0219649222500666 crossref_primary_10_1007_s11042_017_4807_6 crossref_primary_10_1016_j_patrec_2018_01_002 crossref_primary_10_1109_TMM_2017_2703939 crossref_primary_10_1109_TIP_2019_2936112 crossref_primary_10_4018_IJMDEM_2019070101 crossref_primary_10_1109_ACCESS_2017_2776344 crossref_primary_10_1186_s13640_017_0194_1 crossref_primary_10_1109_TMM_2018_2829162 crossref_primary_10_1049_ipr2_13310 crossref_primary_10_1016_j_actpsy_2024_104206 crossref_primary_10_1007_s11042_018_5969_6 crossref_primary_10_1016_j_engappai_2022_105667 crossref_primary_10_1007_s11042_016_4061_3 crossref_primary_10_1016_j_aca_2024_343302 crossref_primary_10_1109_TMM_2018_2876046 crossref_primary_10_1111_cgf_13654 crossref_primary_10_1007_s10994_021_06112_5 crossref_primary_10_1109_TMM_2020_2987682 crossref_primary_10_7498_aps_66_109501 crossref_primary_10_1016_j_neucom_2024_128270 crossref_primary_10_1145_3322240 crossref_primary_10_1109_MMUL_2018_2883127 crossref_primary_10_3390_app10093056 crossref_primary_10_1007_s42835_020_00461_2 crossref_primary_10_1016_j_ipm_2014_12_001 crossref_primary_10_1007_s10462_023_10429_z crossref_primary_10_1016_j_neucom_2016_08_129 crossref_primary_10_1109_TMM_2018_2859590 crossref_primary_10_1016_j_engappai_2024_108844 crossref_primary_10_1016_j_knosys_2021_106970 crossref_primary_10_1016_j_patrec_2018_07_016 crossref_primary_10_1016_j_inffus_2021_04_016 crossref_primary_10_1145_2632267 crossref_primary_10_1109_TAFFC_2023_3265653 crossref_primary_10_1016_j_datak_2023_102150 crossref_primary_10_1109_TMM_2022_3157993 crossref_primary_10_1145_3617833 crossref_primary_10_1155_2016_7437860 crossref_primary_10_1109_TPAMI_2018_2798607 crossref_primary_10_1109_TITS_2016_2601655 crossref_primary_10_1109_TCSS_2024_3411486 crossref_primary_10_1145_3508361 crossref_primary_10_1109_TKDE_2021_3080293 crossref_primary_10_1371_journal_pone_0228579 crossref_primary_10_1007_s11704_021_0611_6 crossref_primary_10_1109_TMM_2023_3249481 crossref_primary_10_1109_LSP_2017_2775212 crossref_primary_10_1016_j_eswa_2019_01_003 crossref_primary_10_1016_j_jvcir_2024_104279 crossref_primary_10_1109_TMM_2019_2935678 crossref_primary_10_3389_fnins_2023_1173704 crossref_primary_10_1109_TIP_2016_2615289 crossref_primary_10_3389_frobt_2015_00028 crossref_primary_10_1007_s12559_015_9326_z crossref_primary_10_3390_e24060764 crossref_primary_10_1109_TCSVT_2018_2844780 crossref_primary_10_1155_2019_3581419 crossref_primary_10_1142_S0219749923500041 crossref_primary_10_1109_JPROC_2021_3117472 crossref_primary_10_1007_s10844_016_0441_4 crossref_primary_10_1016_j_imavis_2021_104267 crossref_primary_10_1007_s12193_018_0268_0 crossref_primary_10_1007_s00521_024_09908_3 crossref_primary_10_1145_3656580 crossref_primary_10_1109_TMM_2017_2777665 crossref_primary_10_1109_TMM_2019_2940851 crossref_primary_10_1145_3445794 crossref_primary_10_1109_ACCESS_2022_3216890 crossref_primary_10_1109_TMM_2020_3006372 crossref_primary_10_1145_3584700 crossref_primary_10_3233_JIFS_223752 crossref_primary_10_1080_08839514_2025_2462382 crossref_primary_10_1016_j_procs_2015_03_209 crossref_primary_10_1109_TCSVT_2022_3203421 crossref_primary_10_1098_rstb_2016_0101 crossref_primary_10_1145_2996463 crossref_primary_10_1007_s11042_015_3210_4 crossref_primary_10_1007_s11042_021_10977_y crossref_primary_10_1016_j_image_2015_08_004 crossref_primary_10_3390_ai1040030 crossref_primary_10_1016_j_inffus_2020_08_006 crossref_primary_10_1016_j_asoc_2016_03_022 crossref_primary_10_1016_j_image_2016_03_005 crossref_primary_10_1007_s00530_022_01040_3 crossref_primary_10_1109_TNNLS_2022_3161314 crossref_primary_10_3390_app11115260 crossref_primary_10_1016_j_jvcir_2017_02_005 crossref_primary_10_1109_ACCESS_2019_2955637 crossref_primary_10_1109_JTEHM_2018_2863386 crossref_primary_10_1109_TIP_2020_2966082 crossref_primary_10_1109_TCDS_2021_3094974 crossref_primary_10_1038_s42256_022_00488_2 crossref_primary_10_1016_j_image_2019_05_001 crossref_primary_10_1109_TAFFC_2024_3354382 crossref_primary_10_1145_3347712 crossref_primary_10_1109_TMM_2018_2794265 crossref_primary_10_1109_TKDE_2018_2848260 crossref_primary_10_1007_s11042_016_3577_x crossref_primary_10_1364_JOSAA_34_000814 crossref_primary_10_1049_ipr2_12960 crossref_primary_10_1016_j_iintel_2023_100061 crossref_primary_10_1016_j_dsp_2018_03_010 crossref_primary_10_1016_j_neucom_2023_03_013 crossref_primary_10_1109_TPAMI_2023_3325770 crossref_primary_10_1109_TMM_2018_2844689 crossref_primary_10_3390_ijgi10100636 crossref_primary_10_1007_s41095_015_0015_3 crossref_primary_10_1109_TMM_2019_2929943 crossref_primary_10_1109_TMM_2019_2947352 crossref_primary_10_1016_j_jksuci_2022_09_005 crossref_primary_10_3389_fneur_2024_1444795 crossref_primary_10_1007_s00498_017_0207_8 crossref_primary_10_1109_TPAMI_2022_3171983 crossref_primary_10_1007_s11042_014_2126_8
Cites_doi	10.1371/journal.pbio.1000129 10.1109/TASL.2010.2047756 10.1109/TIP.2009.2030969 10.1038/35058500 10.1145/345508.345566 10.1109/79.888862 10.3389/fnhum.2010.00168 10.1007/978-0-387-71305-2_5 10.1613/jair.1523 10.1109/ICASSP.2011.5946961 10.1109/TASL.2009.2014795 10.1109/76.809162 10.1109/TCSVT.2007.890857 10.1109/LSP.2005.853050 10.1016/j.jvcir.2007.04.002 10.1037/0033-295X.113.4.766 10.1109/TPAMI.2009.27 10.1162/neco.2007.19.10.2780 10.1016/j.patrec.2010.02.005 10.1016/S0042-6989(01)00250-4 10.1109/MSP.2006.1621451 10.1109/TPAMI.2011.53 10.1007/978-3-642-00958-7_37 10.1146/annurev.neuro.30.051606.094256 10.1023/A:1012460413855 10.1109/TMM.2005.854410 10.1007/978-3-540-30586-6_70 10.1109/TCSVT.2004.841694 10.1146/annurev.ne.13.030190.000325 10.1109/ICCV.2001.937645 10.1016/0306-4573(88)90021-0 10.1109/TASL.2006.872625 10.1007/11526346_1 10.1109/78.258071 10.1145/265563.265572 10.1109/ICME.2004.1394309 10.1016/j.jvcir.2010.01.007 10.1109/ICIP.2010.5650991 10.1109/CVPR.1997.609414 10.1016/j.cub.2005.09.040 10.1016/j.tics.2004.08.008 10.1145/1198302.1198305 10.1109/TIP.2011.2156803 10.1121/1.414997 10.1002/9781118219546.ch21 10.1109/ICASSP.2009.4960393 10.1109/CVPR.2009.5206596 10.1109/34.730558 10.1007/s12559-011-9097-0 10.1007/s00530-010-0182-0 10.1007/3-540-36127-8_20 10.1038/nrn1411 10.1016/j.conb.2007.07.011 10.1167/9.3.5 10.1145/215206.215333 10.1006/csla.1998.0043 10.1109/78.277799 10.1109/CVPR.2009.5206525 10.1109/WIIAT.2008.175 10.1109/TNN.2004.832710 10.1146/annurev.ne.18.030195.001205
ContentType	Journal Article
Copyright	2015 INIST-CNRS Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Nov 2013
Copyright_xml	– notice: 2015 INIST-CNRS – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Nov 2013
DBID	97E RIA RIE AAYXX CITATION IQODW 7SC 7SP 8FD JQ2 L7M L~C L~D 7U5 F28 FR3
DOI	10.1109/TMM.2013.2267205
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Pascal-Francis Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts ANTE: Abstracts in New Technology & Engineering Engineering Research Database
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList	Technology Research Database Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science Applied Sciences
EISSN	1941-0077
EndPage	1568
ExternalDocumentID	3100663611 27861117 10_1109_TMM_2013_2267205 6527322
Genre	orig-research
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION ABTAH IQODW 7SC 7SP 8FD JQ2 L7M L~C L~D 7U5 F28 FR3
ID	FETCH-LOGICAL-c420t-24aca1cc5fc14951d29fea5cab5099dcad38686e0ac5331d8e372d015574ef633
IEDL.DBID	RIE
ISSN	1520-9210
IngestDate	Mon Sep 29 06:41:06 EDT 2025 Mon Jun 30 04:22:39 EDT 2025 Wed Apr 02 07:21:45 EDT 2025 Wed Oct 01 01:33:21 EDT 2025 Thu Apr 24 23:00:07 EDT 2025 Wed Aug 27 06:27:50 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	7
Keywords	Space time correlation Tracking multimodal saliency Video signal Modeling Audiovisual multistream processing Linguistics Selection criterion text saliency Multimodal interface Audiovisual equipment audio saliency Pattern extraction video summarization Textual data Streaming Computer vision Abstract movie summarization Attention Model driven architecture Text Annotation Grammatical inference fusion Dimension reduction Cognitive theory Hearing visual saliency Multimodality Bottom up method Stimulus salience Data fusion Feature extraction Visual information Visual attention
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html CC BY 4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c420t-24aca1cc5fc14951d29fea5cab5099dcad38686e0ac5331d8e372d015574ef633
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23
PQID	1442362252
PQPubID	75737
PageCount	16
ParticipantIDs	proquest_journals_1442362252 crossref_citationtrail_10_1109_TMM_2013_2267205 pascalfrancis_primary_27861117 crossref_primary_10_1109_TMM_2013_2267205 proquest_miscellaneous_1464552880 ieee_primary_6527322
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2013-11-01
PublicationDateYYYYMMDD	2013-11-01
PublicationDate_xml	– month: 11 year: 2013 text: 2013-11-01 day: 01
PublicationDecade	2010
PublicationPlace	New York, NY
PublicationPlace_xml	– name: New York, NY – name: Piscataway
PublicationTitle	IEEE transactions on multimedia
PublicationTitleAbbrev	TMM
PublicationYear	2013
Publisher	IEEE Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: Institute of Electrical and Electronics Engineers – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref57 ref13 ref56 ref12 ref59 ref15 ref58 ref14 ref53 ref52 ref55 ref54 gong (ref22) 2000 ref17 ref16 derpanis (ref64) 2005 ref18 koch (ref6) 1985; 4 ref51 ref50 ref46 ref45 ref48 ref47 ref42 ref41 ref43 jurafsky (ref70) 2008 ref49 ref8 ref7 ref9 ref4 ref3 ref5 zhuang (ref19) 1998 ref40 ref34 ref37 ref36 ref31 ref74 ref30 ref33 schmid (ref69) 1994 ref32 ref2 ref1 ref39 deng (ref67) 2000 zlatintsi (ref75) 2012 coensel (ref44) 2010 ponceleon (ref26) 1999 guo (ref35) 2010; 19 ref71 ref73 ref72 gao (ref38) 2009; 31 ref68 ref24 ref23 ma (ref10) 2005; 7 ref25 ref63 ref66 ref21 uchihashi (ref20) 1999 ref28 ref27 ngo (ref11) 2005; 15 ref29 hering (ref62) 1964 ref60 pellom (ref65) 2001 yuille (ref61) 1996
References_xml	– ident: ref5 doi: 10.1371/journal.pbio.1000129 – start-page: 1294 year: 2012 ident: ref75 article-title: A saliency-based approach to audio event detection and summarization publication-title: Proc 20th Eur Signal Process Conf – ident: ref68 doi: 10.1109/TASL.2010.2047756 – volume: 19 start-page: 185 year: 2010 ident: ref35 article-title: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression publication-title: IEEE Trans Image Process doi: 10.1109/TIP.2009.2030969 – ident: ref7 doi: 10.1038/35058500 – ident: ref55 doi: 10.1145/345508.345566 – ident: ref14 doi: 10.1109/79.888862 – ident: ref9 doi: 10.3389/fnhum.2010.00168 – ident: ref43 doi: 10.1007/978-0-387-71305-2_5 – ident: ref54 doi: 10.1613/jair.1523 – ident: ref73 doi: 10.1109/ICASSP.2011.5946961 – ident: ref45 doi: 10.1109/TASL.2009.2014795 – ident: ref21 doi: 10.1109/76.809162 – ident: ref12 doi: 10.1109/TCSVT.2007.890857 – ident: ref49 doi: 10.1109/LSP.2005.853050 – ident: ref18 doi: 10.1016/j.jvcir.2007.04.002 – ident: ref31 doi: 10.1037/0033-295X.113.4.766 – volume: 31 start-page: 989 year: 2009 ident: ref38 article-title: Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2009.27 – start-page: iii year: 2005 ident: ref64 article-title: Three-dimensional <formula formulatype="inline"><tex Notation="TeX">$n$</tex></formula>th derivative of Gaussian separable steerable filters publication-title: Proc IEEE Int Conf Image Process – ident: ref72 doi: 10.1162/neco.2007.19.10.2780 – ident: ref46 doi: 10.1016/j.patrec.2010.02.005 – ident: ref29 doi: 10.1016/S0042-6989(01)00250-4 – ident: ref15 doi: 10.1109/MSP.2006.1621451 – ident: ref30 doi: 10.1109/TPAMI.2011.53 – ident: ref51 doi: 10.1007/978-3-642-00958-7_37 – ident: ref2 doi: 10.1146/annurev.neuro.30.051606.094256 – start-page: 806 year: 2000 ident: ref67 article-title: Large-vocabulary speech recognition under adverse acoustic environments publication-title: Proc Int Conf Spoken Language Process – ident: ref36 doi: 10.1023/A:1012460413855 – volume: 7 start-page: 907 year: 2005 ident: ref10 article-title: A generic framework of user attention model and its application in video summarization publication-title: IEEE Trans Multimedia doi: 10.1109/TMM.2005.854410 – start-page: 866 year: 1998 ident: ref19 article-title: Adaptive key frame extraction using unsupervised clustering publication-title: Proc IEEE Int Conf Image Process – year: 2008 ident: ref70 publication-title: Speech and Language Processing – ident: ref52 doi: 10.1007/978-3-540-30586-6_70 – volume: 15 start-page: 296 year: 2005 ident: ref11 article-title: Video summarization and scene detection by graph modeling publication-title: IEEE Trans Circuits Syst Video Technol doi: 10.1109/TCSVT.2004.841694 – start-page: 174 year: 2000 ident: ref22 article-title: Video summarization using singular value decomposition publication-title: Proc IEEE Conf Comput Vis Pattern Recognit – ident: ref1 doi: 10.1146/annurev.ne.13.030190.000325 – ident: ref23 doi: 10.1109/ICCV.2001.937645 – start-page: 44 year: 1994 ident: ref69 article-title: Probabilistic part-of-speech tagging using decision trees publication-title: Proc Int Conf New Methods Language Process – ident: ref50 doi: 10.1016/0306-4573(88)90021-0 – ident: ref48 doi: 10.1109/TASL.2006.872625 – ident: ref27 doi: 10.1007/11526346_1 – ident: ref60 doi: 10.1109/78.258071 – ident: ref24 doi: 10.1145/265563.265572 – ident: ref28 doi: 10.1109/ICME.2004.1394309 – ident: ref17 doi: 10.1016/j.jvcir.2010.01.007 – start-page: 123 year: 1996 ident: ref61 publication-title: Bayesian Decision Theory and Psychophysics – ident: ref39 doi: 10.1109/ICIP.2010.5650991 – ident: ref25 doi: 10.1109/CVPR.1997.609414 – ident: ref8 doi: 10.1016/j.cub.2005.09.040 – ident: ref59 doi: 10.1016/j.tics.2004.08.008 – ident: ref16 doi: 10.1145/1198302.1198305 – ident: ref32 doi: 10.1109/TIP.2011.2156803 – ident: ref47 doi: 10.1121/1.414997 – ident: ref74 doi: 10.1002/9781118219546.ch21 – start-page: 383 year: 1999 ident: ref20 article-title: Video Manga: generating semantically meaningful video summaries publication-title: Proc 7th ACM Int Conf Multimedia – ident: ref13 doi: 10.1109/ICASSP.2009.4960393 – start-page: 887 year: 2010 ident: ref44 article-title: A model of saliency-based auditory attention to environmental sound publication-title: Proc 20th Int Congress Acoust – ident: ref34 doi: 10.1109/CVPR.2009.5206596 – ident: ref63 doi: 10.1109/34.730558 – ident: ref41 doi: 10.1007/s12559-011-9097-0 – ident: ref71 doi: 10.1007/s00530-010-0182-0 – ident: ref57 doi: 10.1007/3-540-36127-8_20 – year: 2001 ident: ref65 publication-title: ?SONIC The University of colorado continuous speech recognizer ? – ident: ref33 doi: 10.1038/nrn1411 – ident: ref3 doi: 10.1016/j.conb.2007.07.011 – volume: 4 start-page: 219 year: 1985 ident: ref6 article-title: Shifts in selective visual attention: towards the underlying neural circuitry publication-title: Human Neurobiol – ident: ref37 doi: 10.1167/9.3.5 – ident: ref56 doi: 10.1145/215206.215333 – ident: ref66 doi: 10.1006/csla.1998.0043 – start-page: 199 year: 1999 ident: ref26 article-title: CueVideo automated multimedia indexing and retrieval publication-title: Proc ACM Int l Conf Multimedia – ident: ref58 doi: 10.1109/78.277799 – ident: ref40 doi: 10.1109/CVPR.2009.5206525 – ident: ref53 doi: 10.1109/WIIAT.2008.175 – year: 1964 ident: ref62 publication-title: Outlines of a Theory of the Light Sense – ident: ref42 doi: 10.1109/TNN.2004.832710 – ident: ref4 doi: 10.1146/annurev.ne.18.030195.001205
SSID	ssj0014507
Score	2.5340168
Snippet	Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive...
SourceID	proquest pascalfrancis crossref ieee
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	1553
SubjectTerms	Algorithms Applied sciences Artificial intelligence Attention audio saliency Biological and medical sciences Computational modeling Computer science; control theory; systems Cues Data processing. List processing. Character string processing Exact sciences and technology Feature extraction Fundamental and applied biological sciences. Psychology fusion Memory organisation. Data processing Modulation Motion pictures movie summarization multimodal saliency multistream processing Pattern recognition. Digital image processing. Computational geometry Perception Psychology. Psychoanalysis. Psychiatry Psychology. Psychophysiology Semantics Software Speech and sound recognition and synthesis. Linguistics Streaming media Streams Task analysis text saliency video summarization Vision Visual visual saliency Visualization Waveforms
Title	Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention
URI	https://ieeexplore.ieee.org/document/6527322 https://www.proquest.com/docview/1442362252 https://www.proquest.com/docview/1464552880
Volume	15
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1941-0077 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014507 issn: 1520-9210 databaseCode: RIE dateStart: 19990101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VnuBAoQWxUCojcUFqdhPHcZxjQawqpHDpFvUWOfZEQtAEscmFX8-M81B5CHFKpNiKkxl7vrHnmwF4jalqjHd1RNZOkoOSYGSKOo0Sh6k1aJo80MfKj_ryWn24yW4O4HzhwiBiCD7DNd-Gs3zfuYG3yjaas4VJWnDv5UaPXK3lxEBlgRpN5iiOCvJj5iPJuNjsypJjuNI1QY1ccqG6OyYo1FThiEi7p5_SjNUs_liYg7XZHkE5j3MMMvmyHvp67X78lsLxfz_kETycYKe4GPXkMRxgewxHc0kHMc3wY3hwJz_hCfhAz73tPHW9IsDONE1hWy-2A2-yCQK8ouzIsoqrwIGbOJ3iLZlGL-jmgpN6nItPn_cDX7nrjqzBwEPp-zHS8glcb9_v3l1GU1mGyCkZ95FU1tnEuaxx7F4lXhYN2szZmsBH4Z31qdFGY2wdYcnEG0xz6Rmb5QobnaZP4bDtWnwGwnjtEqfIqfKZSlBbpxNbs5eK5Hl6vYLNLKnKTTnLuXTG1yr4LnFRkWwrlm01yXYFb5Ye38Z8Hf9oe8KiWdpNUlnB2S_KsDyXpHVkGvIVnM7aUU0zfk8uFAFTTasj9X-1PKa5ygcwtsVu4DZaZZmkJfP531_9Au7zAEeu4ykc9t8HfEmgp6_Pgrb_BOTO_TY
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5V5QAcKLQgFkoxEhekZjdxbMc5FsRqgaaXblFvkWM7EgISxCYXfj0zzkPlIcQplmwrTsb2fGPPNwPw0qei1s5WEWo7jgZK4iOdV2mUWJ8a7XWdBfpYcaE2V-L9tbzeg9OZC-O9D85nfknFcJfvWtvTUdlKUbQwjhvuLSmEkANba74zEDKQo1EhxVGOlsx0KRnnq21RkBdXukSwkXFKVXdDCYWsKuQTaXb4W-ohn8UfW3PQN-sDKKaRDm4mn5d9Vy3tj9-COP7vp9yHeyPwZGfDTHkAe745hIMpqQMb1_gh3L0RofAIXCDofm0ddr1EyE5ETWYax9Y9HbMxhLysaFG3ssvAghtZnew1KkfHsHBGYT1O2cdPu56e1HWL-qCnoXTd4Gv5EK7Wb7dvNtGYmCGygsddxIWxJrFW1pYMrMTxvPZGWlMh_MidNS7VSisfG4toMnHapxl3hM4y4WuVpo9gv2kb_xiYdsomVqBZ5aRIvDJWJaYiO9Wj7enUAlaTpEo7Ri2n5BlfymC9xHmJsi1JtuUo2wW8mnt8GyJ2_KPtEYlmbjdKZQEnv0yGuZ5nWqFyyBZwPM2OclzzOzSiEJoq3B-x_4u5GlcrXcGYxrc9tVFCSo6b5pO_v_o53N5si_Py_N3Fh6dwhwY7MB-PYb_73vtnCIG66iTM_J9IFQCS
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multimodal+Saliency+and+Fusion+for+Movie+Summarization+Based+on+Aural%2C+Visual%2C+and+Textual+Attention&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Evangelopoulos%2C+Georgios&rft.au=Zlatintsi%2C+Athanasia&rft.au=Potamianos%2C+Alexandros&rft.au=Maragos%2C+Petros&rft.date=2013-11-01&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=15&rft.issue=7&rft.spage=1553&rft.epage=1568&rft_id=info:doi/10.1109%2FTMM.2013.2267205&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2013_2267205
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon