HiBid: A Cross-Channel Constrained Bidding System With Budget Allocation by Hierarchical Offline Deep Reinforcement Learning

Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., tota...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computers Vol. 73; no. 3; pp. 815 - 828
Main Authors Wang, Hao, Tang, Bo, Liu, Chi Harold, Mao, Shangqin, Zhou, Jiahong, Dai, Zipeng, Sun, Yaqi, Xie, Qianlong, Wang, Xingxing, Wang, Dong
Format Journal Article
LanguageEnglish
Published New York IEEE 01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0018-9340
1557-9956
DOI10.1109/TC.2023.3343111

Cover

Abstract Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called "HiBid", consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.
AbstractList Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called "HiBid", consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.
Author Wang, Xingxing
Tang, Bo
Wang, Dong
Sun, Yaqi
Zhou, Jiahong
Xie, Qianlong
Liu, Chi Harold
Mao, Shangqin
Dai, Zipeng
Wang, Hao
Author_xml – sequence: 1
  givenname: Hao
  orcidid: 0009-0004-0199-0488
  surname: Wang
  fullname: Wang, Hao
  organization: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
– sequence: 2
  givenname: Bo
  orcidid: 0000-0001-7129-0250
  surname: Tang
  fullname: Tang, Bo
  organization: Meituan, Beijing, China
– sequence: 3
  givenname: Chi Harold
  orcidid: 0000-0002-0252-329X
  surname: Liu
  fullname: Liu, Chi Harold
  email: liuchi02@gmail.com
  organization: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
– sequence: 4
  givenname: Shangqin
  orcidid: 0000-0002-3247-0483
  surname: Mao
  fullname: Mao, Shangqin
  organization: Meituan, Beijing, China
– sequence: 5
  givenname: Jiahong
  orcidid: 0000-0002-1319-2369
  surname: Zhou
  fullname: Zhou, Jiahong
  organization: Meituan, Beijing, China
– sequence: 6
  givenname: Zipeng
  orcidid: 0000-0002-2479-9801
  surname: Dai
  fullname: Dai, Zipeng
  organization: School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
– sequence: 7
  givenname: Yaqi
  orcidid: 0009-0001-3961-7652
  surname: Sun
  fullname: Sun, Yaqi
  organization: Meituan, Beijing, China
– sequence: 8
  givenname: Qianlong
  orcidid: 0000-0002-5400-7924
  surname: Xie
  fullname: Xie, Qianlong
  organization: Meituan, Beijing, China
– sequence: 9
  givenname: Xingxing
  orcidid: 0000-0001-5495-0827
  surname: Wang
  fullname: Wang, Xingxing
  organization: Meituan, Beijing, China
– sequence: 10
  givenname: Dong
  orcidid: 0000-0002-1964-3984
  surname: Wang
  fullname: Wang, Dong
  organization: Meituan, Beijing, China
BookMark eNp9kLFPGzEYxS0EUgPt3IXBEvMF29_ZF7OFgzaVIiFBUMeTz_5MjC6-4HOGSPzxvTQMqEOnt7zfe9LvnJzGPiIh3zmbcs709aqeCiZgClAC5_yETLiUVaG1VKdkwhifFRpK9oWcD8MrY0wJpifkfRFug7uhc1qnfhiKem1ixI7WfRxyMiGio2PBhfhCn_ZDxg39HfKa3u7cC2Y677remhz6SNs9XQRMJtl1sKajD953I07vELf0EUP0fbK4wZjpEk2K4-JXcuZNN-C3j7wgzz_uV_WiWD78_FXPl4UVmuXClMCULJ23VSUBrbQavBCts-iwKsFKVSnr27YCJxQq7TygYb5V0kI7Q7ggV8fdberfdjjk5rXfpTheNkKLkoECMRtb18eWPZhI6JttChuT9g1nzUFxs6qbg-LmQ_FIyH8IG_JfGwd13X-4yyMXEPHTCygGEuAPke2LeA
CODEN ITCOB4
CitedBy_id crossref_primary_10_1109_TKDE_2024_3523472
Cites_doi 10.1145/3447548.3467113
10.1145/3097983.3098134
10.1109/ICDM.2019.00122
10.1145/3447548.3467199
10.1109/TC.2015.2444843
10.1145/3292500.3330681
10.1109/TCST.2005.847331
10.1145/2983323.2983656
10.1145/3357384.3358031
10.1145/2623330.2623633
10.1109/TC.2014.2346204
10.1145/3485447.3512109
10.1109/TC.2015.2435784
10.1007/s10479-005-5724-z
10.1145/3219819.3219918
10.1145/3534678.3539211
10.1609/aaai.v35i5.16580
10.1038/nature14236
10.1145/2020408.2020604
10.1145/3018661.3018702
10.1201/9781315140223
10.1109/TC.2023.3251850
10.1109/TC.2015.2409864
10.1257/aer.99.2.430
10.1145/3269206.3271748
10.1145/3447548.3467167
10.1145/2187836.2187888
10.1109/TKDE.2017.2775228
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TC.2023.3343111
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL) - NZ
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1557-9956
EndPage 828
ExternalDocumentID 10_1109_TC_2023_3343111
10360353
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: U23A20310; U21A20519
  funderid: 10.13039/501100001809
GroupedDBID --Z
-DZ
-~X
.55
.DC
0R~
29I
3EH
3O-
4.4
5GY
5VS
6IK
85S
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
ACNCT
AENEX
AETEA
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
MVM
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RXW
RZB
TAE
TN5
TWZ
UHB
UKR
UPT
VH1
X7M
XJT
XOL
XZL
YXB
YYQ
YZZ
ZCG
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c290t-a430654dfc7753ec5c93f22bdcede743c5676cfbb73d26e69df3ea0fb65c3b8e3
IEDL.DBID RIE
ISSN 0018-9340
IngestDate Mon Jun 30 06:18:41 EDT 2025
Thu Apr 24 23:03:48 EDT 2025
Wed Oct 01 00:45:31 EDT 2025
Wed Aug 27 02:12:03 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c290t-a430654dfc7753ec5c93f22bdcede743c5676cfbb73d26e69df3ea0fb65c3b8e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-5400-7924
0000-0002-3247-0483
0009-0004-0199-0488
0000-0002-1964-3984
0000-0002-2479-9801
0000-0001-7129-0250
0000-0002-1319-2369
0009-0001-3961-7652
0000-0002-0252-329X
0000-0001-5495-0827
PQID 2924036328
PQPubID 85452
PageCount 14
ParticipantIDs crossref_citationtrail_10_1109_TC_2023_3343111
proquest_journals_2924036328
crossref_primary_10_1109_TC_2023_3343111
ieee_primary_10360353
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-03-01
PublicationDateYYYYMMDD 2024-03-01
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-03-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on computers
PublicationTitleAbbrev TC
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref12
ref36
ref10
ref2
ref1
ref16
Sohn (ref37) 2015; 28
ref19
ref18
Ding (ref33) 2020; 33
Zhang (ref17) 2021; 34
García (ref32) 2015; 16
Ajay (ref13) 2021
ref24
ref23
ref26
ref25
ref20
ref42
ref41
ref22
ref44
ref21
ref43
Ran (ref39) 2023; 202
ref28
Taha (ref11) 2003; 7
ref27
ref8
ref7
Lyu (ref15) 2022
ref9
Jaques (ref30) 2019
ref4
ref3
Tessler (ref35) 2019
ref6
Paternain (ref38) 2019
ref5
Fujimoto (ref31) 2019
Gehring (ref14) 2021; 34
Kumar (ref29) 2019; 32
ref40
Le (ref34) 2019
References_xml – start-page: 1711
  volume-title: Proc. NeurIPS
  year: 2022
  ident: ref15
  article-title: Mildly conservative Q-learning for offline reinforcement learning
– ident: ref20
  doi: 10.1145/3447548.3467113
– volume-title: Proc. ICLR
  year: 2021
  ident: ref13
  article-title: Opal: Offline primitive discovery for accelerating offline reinforcement learning
– ident: ref6
  doi: 10.1145/3097983.3098134
– ident: ref24
  doi: 10.1109/ICDM.2019.00122
– volume: 7
  volume-title: Operations Research: An Introduction
  year: 2003
  ident: ref11
– ident: ref27
  doi: 10.1145/3447548.3467199
– ident: ref43
  doi: 10.1109/TC.2015.2444843
– ident: ref23
  doi: 10.1145/3292500.3330681
– ident: ref41
  doi: 10.1109/TCST.2005.847331
– ident: ref26
  doi: 10.1145/2983323.2983656
– ident: ref9
  doi: 10.1145/3357384.3358031
– ident: ref18
  doi: 10.1145/2623330.2623633
– volume: 16
  start-page: 1437
  issue: 42
  year: 2015
  ident: ref32
  article-title: A comprehensive survey on safe reinforcement learning
  publication-title: J. Mach. Learn. Res.
– ident: ref3
  doi: 10.1109/TC.2014.2346204
– volume: 34
  start-page: 11553
  volume-title: Proc. NeurIPS
  year: 2021
  ident: ref14
  article-title: Hierarchical skills for efficient exploration
– ident: ref16
  doi: 10.1145/3485447.3512109
– ident: ref2
  doi: 10.1109/TC.2015.2435784
– ident: ref40
  doi: 10.1007/s10479-005-5724-z
– volume: 202
  start-page: 28701
  volume-title: Proc. of the 40th Internl Conf on Machine Learning
  year: 2023
  ident: ref39
  article-title: Policy regularization with dataset constraint for offline reinforcement learning
– ident: ref28
  doi: 10.1145/3219819.3219918
– ident: ref8
  doi: 10.1145/3534678.3539211
– ident: ref25
  doi: 10.1609/aaai.v35i5.16580
– ident: ref36
  doi: 10.1038/nature14236
– ident: ref4
  doi: 10.1145/2020408.2020604
– start-page: 7555
  volume-title: Proc. NeurIPS
  year: 2019
  ident: ref38
  article-title: Constrained reinforcement learning has zero duality gap
– start-page: 2052
  volume-title: Proc. ICML
  year: 2019
  ident: ref31
  article-title: Off-policy deep reinforcement learning without exploration
– ident: ref7
  doi: 10.1145/3018661.3018702
– year: 2019
  ident: ref30
  article-title: Way off-policy batch deep reinforcement learning of implicit human preferences in dialog
– volume: 34
  start-page: 20410
  volume-title: Proc. NeurIPS
  year: 2021
  ident: ref17
  article-title: Bcorle($\lambda$λ): An offline reinforcement learning and evaluation framework for coupons allocation in e-commerce market
– ident: ref42
  article-title: Mindopt studio
– volume: 28
  start-page: 3483
  volume-title: Proc. NeurIPS
  year: 2015
  ident: ref37
  article-title: Learning structured output representation using deep conditional generative models
– ident: ref12
  doi: 10.1201/9781315140223
– ident: ref1
  doi: 10.1109/TC.2023.3251850
– volume: 32
  volume-title: Proc. NeurIPS
  year: 2019
  ident: ref29
  article-title: Stabilizing off-policy Q-learning via bootstrapping error reduction
– start-page: 3703
  volume-title: Proc. ICML
  year: 2019
  ident: ref34
  article-title: Batch policy learning under constraints
– ident: ref44
  doi: 10.1109/TC.2015.2409864
– ident: ref5
  doi: 10.1257/aer.99.2.430
– ident: ref22
  doi: 10.1145/3269206.3271748
– volume-title: Proc. ICLR
  year: 2019
  ident: ref35
  article-title: Reward constrained policy optimization
– ident: ref19
  doi: 10.1145/3447548.3467167
– ident: ref10
  doi: 10.1145/2187836.2187888
– ident: ref21
  doi: 10.1109/TKDE.2017.2775228
– volume: 33
  start-page: 8378
  volume-title: Proc. NeurIPS
  year: 2020
  ident: ref33
  article-title: Natural policy gradient primal-dual method for constrained Markov decision processes
SSID ssj0006209
Score 2.4243135
Snippet Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 815
SubjectTerms Advertising
Budgets
Constraints
Costs
cross-channel bidding
Data augmentation
Deep learning
deep reinforcement learning
Real-time bidding systems
Real-time systems
Reinforcement learning
Resource management
Strategy
Training
Title HiBid: A Cross-Channel Constrained Bidding System With Budget Allocation by Hierarchical Offline Deep Reinforcement Learning
URI https://ieeexplore.ieee.org/document/10360353
https://www.proquest.com/docview/2924036328
Volume 73
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1557-9956
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006209
  issn: 0018-9340
  databaseCode: RIE
  dateStart: 19680101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Rb9MwED6xPcEDhTFEt4HugQdeEhLbcZq9dd2mCokhoU70LYrt86iouoqlD5v242c7zjRAlXjLwzlydHe-O-fu-wA-urrYZhUrEiWtSYSSPBnlhU0EGd5QXpac-eHkrxdyeim-zIt5HFYPszBEFJrPKPWP4V--udYbf1XmPJzLjBd8B3bKkeyGtR6PXdn3c-TOg7nIIo5PnlWfZ5PUs4SnnLtwmed_hKDAqfLPQRyiy_kALvp9dU0lv9JNq1J99xdk439v_BW8jHkmjjvDeA3PaLUHg57DAaNL78GLJ4CEb-B-ujhZmGMc48RvPfGzBytaoqf1DGQSZNAJ-HiHHdY5_li0P_FkY66oxfHSh0avalS3OF344ebAtbLEb9b6hBZPidb4nQJeqw5XkxghXq_24fL8bDaZJpGfIdGsytqkEYF23lhduqKHdKErbhlTRpMhl5noQpZSW6VKbpgkWRnLqcmskoXmakT8Leyurlf0DlC6RNUIqTV3VbJtctXo3JYFs5UlYbUYQtqrrNYRvNx_9rIORUxW1bNJ7XVcRx0P4dPjgnWH27FddN9r7IlYp6whHPVGUUfHvqlZ5QEMJWejgy3LDuG5e7vo-tSOYLf9vaH3LnFp1YdgsA_DQOow
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5BOUAPLZRW3T7ABw5cEhK_sultu6UK0C4S2oreotgelxWrbdVmD6348diOUxVQJW45jBVHM-OZcWa-D-Cdq4ttVlKRKGlNwpVkyTAXNuFoWIN5UTDqh5NPJ7I645_PxXkcVg-zMIgYms8w9Y_hX7651Et_VeY8nMmMCfYUngnOuejGte4PXtl3dOTOhxnPIpJPnpUfpuPU84SnjLmAmed_BKHAqvLPURziy_E6TPqddW0lP9Nlq1J99xdo439v_SWsxUyTjDrTeAVPcLEB6z2LA4lOvQGrDyAJX8OvanY4MwdkRMZ-64mfPljgnHhiz0AngYY4AR_xSId2Tr7P2h_kcGkusCWjuQ-OXtlE3ZJq5sebA9vKnHy11qe05AjxinzDgNiqw-UkiSCvF5twdvxxOq6SyNCQaFpmbdLwQDxvrC5c2YNa6JJZSpXRaNDlJlrIQmqrVMEMlShLYxk2mVVSaKaGyLZgZXG5wG0g0qWqhkutmauTbZOrRue2ENSWFrnVfABpr7JaR_hy_9nzOpQxWVlPx7XXcR11PID39wuuOuSOx0U3vcYeiHXKGsBebxR1dO2bmpYewlAyOtx5ZNlbeF5NT0_qk0-TL7vwwr2Jd11re7DSXi9x36UxrXoTjPc3KmftfQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HiBid%3A+A+Cross-Channel+Constrained+Bidding+System+With+Budget+Allocation+by+Hierarchical+Offline+Deep+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+computers&rft.au=Wang%2C+Hao&rft.au=Tang%2C+Bo&rft.au=Liu%2C+Chi+Harold&rft.au=Mao%2C+Shangqin&rft.date=2024-03-01&rft.issn=0018-9340&rft.eissn=1557-9956&rft.volume=73&rft.issue=3&rft.spage=815&rft.epage=828&rft_id=info:doi/10.1109%2FTC.2023.3343111&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TC_2023_3343111
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9340&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9340&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9340&client=summon