HiBid: A Cross-Channel Constrained Bidding System With Budget Allocation by Hierarchical Offline Deep Reinforcement Learning

Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., tota...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computers Vol. 73; no. 3; pp. 815 - 828
Main Authors	Wang, Hao, Tang, Bo, Liu, Chi Harold, Mao, Shangqin, Zhou, Jiahong, Dai, Zipeng, Sun, Yaqi, Xie, Qianlong, Wang, Xingxing, Wang, Dong
Format	Journal Article
Language	English
Published	New York IEEE 01.03.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Advertising Budgets Constraints Costs cross-channel bidding Data augmentation Deep learning deep reinforcement learning Real-time bidding systems Real-time systems Reinforcement learning Resource management Strategy Training
Online Access	Get full text
ISSN	0018-9340 1557-9956
DOI	10.1109/TC.2023.3343111

Cover

Abstract	Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called "HiBid", consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.
AbstractList	Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called "HiBid", consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.
Author	Wang, Xingxing Tang, Bo Wang, Dong Sun, Yaqi Zhou, Jiahong Xie, Qianlong Liu, Chi Harold Mao, Shangqin Dai, Zipeng Wang, Hao
Author_xml	– sequence: 1 givenname: Hao orcidid: 0009-0004-0199-0488 surname: Wang fullname: Wang, Hao organization: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China – sequence: 2 givenname: Bo orcidid: 0000-0001-7129-0250 surname: Tang fullname: Tang, Bo organization: Meituan, Beijing, China – sequence: 3 givenname: Chi Harold orcidid: 0000-0002-0252-329X surname: Liu fullname: Liu, Chi Harold email: liuchi02@gmail.com organization: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China – sequence: 4 givenname: Shangqin orcidid: 0000-0002-3247-0483 surname: Mao fullname: Mao, Shangqin organization: Meituan, Beijing, China – sequence: 5 givenname: Jiahong orcidid: 0000-0002-1319-2369 surname: Zhou fullname: Zhou, Jiahong organization: Meituan, Beijing, China – sequence: 6 givenname: Zipeng orcidid: 0000-0002-2479-9801 surname: Dai fullname: Dai, Zipeng organization: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China – sequence: 7 givenname: Yaqi orcidid: 0009-0001-3961-7652 surname: Sun fullname: Sun, Yaqi organization: Meituan, Beijing, China – sequence: 8 givenname: Qianlong orcidid: 0000-0002-5400-7924 surname: Xie fullname: Xie, Qianlong organization: Meituan, Beijing, China – sequence: 9 givenname: Xingxing orcidid: 0000-0001-5495-0827 surname: Wang fullname: Wang, Xingxing organization: Meituan, Beijing, China – sequence: 10 givenname: Dong orcidid: 0000-0002-1964-3984 surname: Wang fullname: Wang, Dong organization: Meituan, Beijing, China
BookMark	eNp9kLFPGzEYxS0EUgPt3IXBEvMF29_ZF7OFgzaVIiFBUMeTz_5MjC6-4HOGSPzxvTQMqEOnt7zfe9LvnJzGPiIh3zmbcs709aqeCiZgClAC5_yETLiUVaG1VKdkwhifFRpK9oWcD8MrY0wJpifkfRFug7uhc1qnfhiKem1ixI7WfRxyMiGio2PBhfhCn_ZDxg39HfKa3u7cC2Y677remhz6SNs9XQRMJtl1sKajD953I07vELf0EUP0fbK4wZjpEk2K4-JXcuZNN-C3j7wgzz_uV_WiWD78_FXPl4UVmuXClMCULJ23VSUBrbQavBCts-iwKsFKVSnr27YCJxQq7TygYb5V0kI7Q7ggV8fdberfdjjk5rXfpTheNkKLkoECMRtb18eWPZhI6JttChuT9g1nzUFxs6qbg-LmQ_FIyH8IG_JfGwd13X-4yyMXEPHTCygGEuAPke2LeA
CODEN	ITCOB4
CitedBy_id	crossref_primary_10_1109_TKDE_2024_3523472
Cites_doi	10.1145/3447548.3467113 10.1145/3097983.3098134 10.1109/ICDM.2019.00122 10.1145/3447548.3467199 10.1109/TC.2015.2444843 10.1145/3292500.3330681 10.1109/TCST.2005.847331 10.1145/2983323.2983656 10.1145/3357384.3358031 10.1145/2623330.2623633 10.1109/TC.2014.2346204 10.1145/3485447.3512109 10.1109/TC.2015.2435784 10.1007/s10479-005-5724-z 10.1145/3219819.3219918 10.1145/3534678.3539211 10.1609/aaai.v35i5.16580 10.1038/nature14236 10.1145/2020408.2020604 10.1145/3018661.3018702 10.1201/9781315140223 10.1109/TC.2023.3251850 10.1109/TC.2015.2409864 10.1257/aer.99.2.430 10.1145/3269206.3271748 10.1145/3447548.3467167 10.1145/2187836.2187888 10.1109/TKDE.2017.2775228
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TC.2023.3343111
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) - NZ CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1557-9956
EndPage	828
ExternalDocumentID	10_1109_TC_2023_3343111 10360353
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: U23A20310; U21A20519 funderid: 10.13039/501100001809
GroupedDBID	--Z -DZ -~X .55 .DC 0R~ 29I 3EH 3O- 4.4 5GY 5VS 6IK 85S 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK ACNCT AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 IAAWW IBMZZ ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ MVM O9- OCL P2P PQQKQ RIA RIE RNI RNS RXW RZB TAE TN5 TWZ UHB UKR UPT VH1 X7M XJT XOL XZL YXB YYQ YZZ ZCG AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c290t-a430654dfc7753ec5c93f22bdcede743c5676cfbb73d26e69df3ea0fb65c3b8e3
IEDL.DBID	RIE
ISSN	0018-9340
IngestDate	Mon Jun 30 06:18:41 EDT 2025 Thu Apr 24 23:03:48 EDT 2025 Wed Oct 01 00:45:31 EDT 2025 Wed Aug 27 02:12:03 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c290t-a430654dfc7753ec5c93f22bdcede743c5676cfbb73d26e69df3ea0fb65c3b8e3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-5400-7924 0000-0002-3247-0483 0009-0004-0199-0488 0000-0002-1964-3984 0000-0002-2479-9801 0000-0001-7129-0250 0000-0002-1319-2369 0009-0001-3961-7652 0000-0002-0252-329X 0000-0001-5495-0827
PQID	2924036328
PQPubID	85452
PageCount	14
ParticipantIDs	crossref_citationtrail_10_1109_TC_2023_3343111 proquest_journals_2924036328 crossref_primary_10_1109_TC_2023_3343111 ieee_primary_10360353
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-03-01
PublicationDateYYYYMMDD	2024-03-01
PublicationDate_xml	– month: 03 year: 2024 text: 2024-03-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on computers
PublicationTitleAbbrev	TC
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref12 ref36 ref10 ref2 ref1 ref16 Sohn (ref37) 2015; 28 ref19 ref18 Ding (ref33) 2020; 33 Zhang (ref17) 2021; 34 García (ref32) 2015; 16 Ajay (ref13) 2021 ref24 ref23 ref26 ref25 ref20 ref42 ref41 ref22 ref44 ref21 ref43 Ran (ref39) 2023; 202 ref28 Taha (ref11) 2003; 7 ref27 ref8 ref7 Lyu (ref15) 2022 ref9 Jaques (ref30) 2019 ref4 ref3 Tessler (ref35) 2019 ref6 Paternain (ref38) 2019 ref5 Fujimoto (ref31) 2019 Gehring (ref14) 2021; 34 Kumar (ref29) 2019; 32 ref40 Le (ref34) 2019
References_xml	– start-page: 1711 volume-title: Proc. NeurIPS year: 2022 ident: ref15 article-title: Mildly conservative Q-learning for offline reinforcement learning – ident: ref20 doi: 10.1145/3447548.3467113 – volume-title: Proc. ICLR year: 2021 ident: ref13 article-title: Opal: Offline primitive discovery for accelerating offline reinforcement learning – ident: ref6 doi: 10.1145/3097983.3098134 – ident: ref24 doi: 10.1109/ICDM.2019.00122 – volume: 7 volume-title: Operations Research: An Introduction year: 2003 ident: ref11 – ident: ref27 doi: 10.1145/3447548.3467199 – ident: ref43 doi: 10.1109/TC.2015.2444843 – ident: ref23 doi: 10.1145/3292500.3330681 – ident: ref41 doi: 10.1109/TCST.2005.847331 – ident: ref26 doi: 10.1145/2983323.2983656 – ident: ref9 doi: 10.1145/3357384.3358031 – ident: ref18 doi: 10.1145/2623330.2623633 – volume: 16 start-page: 1437 issue: 42 year: 2015 ident: ref32 article-title: A comprehensive survey on safe reinforcement learning publication-title: J. Mach. Learn. Res. – ident: ref3 doi: 10.1109/TC.2014.2346204 – volume: 34 start-page: 11553 volume-title: Proc. NeurIPS year: 2021 ident: ref14 article-title: Hierarchical skills for efficient exploration – ident: ref16 doi: 10.1145/3485447.3512109 – ident: ref2 doi: 10.1109/TC.2015.2435784 – ident: ref40 doi: 10.1007/s10479-005-5724-z – volume: 202 start-page: 28701 volume-title: Proc. of the 40th Internl Conf on Machine Learning year: 2023 ident: ref39 article-title: Policy regularization with dataset constraint for offline reinforcement learning – ident: ref28 doi: 10.1145/3219819.3219918 – ident: ref8 doi: 10.1145/3534678.3539211 – ident: ref25 doi: 10.1609/aaai.v35i5.16580 – ident: ref36 doi: 10.1038/nature14236 – ident: ref4 doi: 10.1145/2020408.2020604 – start-page: 7555 volume-title: Proc. NeurIPS year: 2019 ident: ref38 article-title: Constrained reinforcement learning has zero duality gap – start-page: 2052 volume-title: Proc. ICML year: 2019 ident: ref31 article-title: Off-policy deep reinforcement learning without exploration – ident: ref7 doi: 10.1145/3018661.3018702 – year: 2019 ident: ref30 article-title: Way off-policy batch deep reinforcement learning of implicit human preferences in dialog – volume: 34 start-page: 20410 volume-title: Proc. NeurIPS year: 2021 ident: ref17 article-title: Bcorle($\lambda$λ): An offline reinforcement learning and evaluation framework for coupons allocation in e-commerce market – ident: ref42 article-title: Mindopt studio – volume: 28 start-page: 3483 volume-title: Proc. NeurIPS year: 2015 ident: ref37 article-title: Learning structured output representation using deep conditional generative models – ident: ref12 doi: 10.1201/9781315140223 – ident: ref1 doi: 10.1109/TC.2023.3251850 – volume: 32 volume-title: Proc. NeurIPS year: 2019 ident: ref29 article-title: Stabilizing off-policy Q-learning via bootstrapping error reduction – start-page: 3703 volume-title: Proc. ICML year: 2019 ident: ref34 article-title: Batch policy learning under constraints – ident: ref44 doi: 10.1109/TC.2015.2409864 – ident: ref5 doi: 10.1257/aer.99.2.430 – ident: ref22 doi: 10.1145/3269206.3271748 – volume-title: Proc. ICLR year: 2019 ident: ref35 article-title: Reward constrained policy optimization – ident: ref19 doi: 10.1145/3447548.3467167 – ident: ref10 doi: 10.1145/2187836.2187888 – ident: ref21 doi: 10.1109/TKDE.2017.2775228 – volume: 33 start-page: 8378 volume-title: Proc. NeurIPS year: 2020 ident: ref33 article-title: Natural policy gradient primal-dual method for constrained Markov decision processes
SSID	ssj0006209
Score	2.4243135
Snippet	Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	815
SubjectTerms	Advertising Budgets Constraints Costs cross-channel bidding Data augmentation Deep learning deep reinforcement learning Real-time bidding systems Real-time systems Reinforcement learning Resource management Strategy Training
Title	HiBid: A Cross-Channel Constrained Bidding System With Budget Allocation by Hierarchical Offline Deep Reinforcement Learning
URI	https://ieeexplore.ieee.org/document/10360353 https://www.proquest.com/docview/2924036328
Volume	73
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9956 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006209 issn: 0018-9340 databaseCode: RIE dateStart: 19680101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Rb9MwED6xPcEDhTFEt4HugQdeEhLbcZq9dd2mCokhoU70LYrt86iouoqlD5v242c7zjRAlXjLwzlydHe-O-fu-wA-urrYZhUrEiWtSYSSPBnlhU0EGd5QXpac-eHkrxdyeim-zIt5HFYPszBEFJrPKPWP4V--udYbf1XmPJzLjBd8B3bKkeyGtR6PXdn3c-TOg7nIIo5PnlWfZ5PUs4SnnLtwmed_hKDAqfLPQRyiy_kALvp9dU0lv9JNq1J99xdk439v_BW8jHkmjjvDeA3PaLUHg57DAaNL78GLJ4CEb-B-ujhZmGMc48RvPfGzBytaoqf1DGQSZNAJ-HiHHdY5_li0P_FkY66oxfHSh0avalS3OF344ebAtbLEb9b6hBZPidb4nQJeqw5XkxghXq_24fL8bDaZJpGfIdGsytqkEYF23lhduqKHdKErbhlTRpMhl5noQpZSW6VKbpgkWRnLqcmskoXmakT8Leyurlf0DlC6RNUIqTV3VbJtctXo3JYFs5UlYbUYQtqrrNYRvNx_9rIORUxW1bNJ7XVcRx0P4dPjgnWH27FddN9r7IlYp6whHPVGUUfHvqlZ5QEMJWejgy3LDuG5e7vo-tSOYLf9vaH3LnFp1YdgsA_DQOow
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5BOUAPLZRW3T7ABw5cEhK_sultu6UK0C4S2oreotgelxWrbdVmD6348diOUxVQJW45jBVHM-OZcWa-D-Cdq4ttVlKRKGlNwpVkyTAXNuFoWIN5UTDqh5NPJ7I645_PxXkcVg-zMIgYms8w9Y_hX7651Et_VeY8nMmMCfYUngnOuejGte4PXtl3dOTOhxnPIpJPnpUfpuPU84SnjLmAmed_BKHAqvLPURziy_E6TPqddW0lP9Nlq1J99xdo439v_SWsxUyTjDrTeAVPcLEB6z2LA4lOvQGrDyAJX8OvanY4MwdkRMZ-64mfPljgnHhiz0AngYY4AR_xSId2Tr7P2h_kcGkusCWjuQ-OXtlE3ZJq5sebA9vKnHy11qe05AjxinzDgNiqw-UkiSCvF5twdvxxOq6SyNCQaFpmbdLwQDxvrC5c2YNa6JJZSpXRaNDlJlrIQmqrVMEMlShLYxk2mVVSaKaGyLZgZXG5wG0g0qWqhkutmauTbZOrRue2ENSWFrnVfABpr7JaR_hy_9nzOpQxWVlPx7XXcR11PID39wuuOuSOx0U3vcYeiHXKGsBebxR1dO2bmpYewlAyOtx5ZNlbeF5NT0_qk0-TL7vwwr2Jd11re7DSXi9x36UxrXoTjPc3KmftfQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HiBid%3A+A+Cross-Channel+Constrained+Bidding+System+With+Budget+Allocation+by+Hierarchical+Offline+Deep+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+computers&rft.au=Wang%2C+Hao&rft.au=Tang%2C+Bo&rft.au=Liu%2C+Chi+Harold&rft.au=Mao%2C+Shangqin&rft.date=2024-03-01&rft.issn=0018-9340&rft.eissn=1557-9956&rft.volume=73&rft.issue=3&rft.spage=815&rft.epage=828&rft_id=info:doi/10.1109%2FTC.2023.3343111&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TC_2023_3343111
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9340&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9340&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9340&client=summon