Big data : concepts, technology and architecture

"This book offers comprehensive coverage of Big Data tools, terminologies and technologies for researchers, business professionals and graduates. This book begins with an overview of what Big Data is and emphasizes all the key concepts of big data end to end. Big Data concepts, technologies, te...

Full description

Saved in:
Bibliographic Details
Main Authors: Balusamy, Balamurugan, (Author), R, Nandhini Abirami, (Author), Kadry, Seifedine, 1977- (Author), Gandomi, Amir Hossein, (Author)
Format: eBook
Language: English
Published: Hoboken, NJ : John Wiley and Sons, Inc., 2021.
Edition: First edition.
Subjects:
ISBN: 9781119701859
1119701856
9781119701866
1119701864
9781119701873
1119701872
9781119701828
1119701821
Physical Description: 1 online resource (xii, 356 pages) : illustrations (some color)

Cover

Table of contents

LEADER 10731cam a2200517 i 4500
001 kn-on1159644350
003 OCoLC
005 20240717213016.0
006 m o d
007 cr cn|||||||||
008 200527t20212021njua o 001 0 eng
040 |a DLC  |b eng  |e rda  |c DLC  |d OCLCF  |d OCLCO  |d OCLCQ  |d EBLCP  |d YDX  |d N$T  |d UKAHL  |d UKMGB  |d IEEEE  |d YDX  |d OCLCO  |d OCLCA  |d OCLCQ  |d OCLCO  |d OCLCL  |d OCLCA 
020 |a 9781119701859  |q electronic book 
020 |a 1119701856  |q electronic book 
020 |a 9781119701866  |q electronic book 
020 |a 1119701864  |q electronic book 
020 |a 9781119701873  |q electronic book 
020 |a 1119701872  |q electronic book 
020 |z 9781119701828  |q hardcover 
020 |z 1119701821  |q hardcover 
024 7 |a 10.1002/9781119701859  |2 doi 
035 |a (OCoLC)1159644350  |z (OCoLC)1240670884  |z (OCoLC)1241452124 
042 |a pcc 
100 1 |a Balusamy, Balamurugan,  |e author. 
245 1 0 |a Big data :  |b concepts, technology and architecture /  |c Balamurugan Balusamy, Nandhini Abirami. R, Seifedine Kadry, and Amir H. Gandomi. 
250 |a First edition. 
264 1 |a Hoboken, NJ :  |b John Wiley and Sons, Inc.,  |c 2021. 
264 4 |c ©2021 
300 |a 1 online resource (xii, 356 pages) :  |b illustrations (some color) 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
500 |a Includes index. 
505 0 |a <P>Big Data -- concepts, Technology and Architecture. 1</p> <p>Book Description.. 11</p> <p>1.1 Understanding Big Data. 13</p> <p>1.2 Evolution of Big Data. 14</p> <p>1.3 Failure of Traditional database in handling Big Data. 15</p> <p>1.3 (a) Data Mining Vs Big Data. 16</p> <p>1.4 3 V's of Big Data. 17</p> <p>1.4.1 Volume. 17</p> <p>1.4.2 Velocity. 18</p> <p>1.4.3 Variety. 19</p> <p>1.5 Sources of Big Data. 19</p> <p>1.6 Different Types of Data. 21</p> <p>1.6.1 Structured Data. 22</p> <p>1.6.2 Unstructured Data. 22</p> <p>1.6.3 Semi-Structured Data. 23</p> <p>1.7 Big Data Infrastructure. 24</p> <p>1.8 Big Data Life Cycle. 25</p> <p>1.8.1 Big Data Generation. 26</p> <p>1.8.2 Data Aggregation. 26</p> <p>1.8.3 Data Preprocessing. 27</p> <p>1.7.<i>3</i>Big Data Analytics. 31</p> <p>1.7.4 Visualizing Big Data. 32</p> <p>1.8 Big Data Technology. 32</p> <p>1.8.1 Challenges faced by Big Data technology. 34</p> <p>1.8.1 Heterogeneity and incompleteness. 34</p> <p>1.8.2 Volume and velocity of the Data. 35</p> <p>1.8.3 Data Storage. 35</p> <p>1.8.4 Data Privacy. 36</p> <p>1.9 Big Data Applications. 36</p> <p>1.10 Big Data Use Cases. 37</p> <p>1.9. 1 Healthcare. 37</p> <p>1.9.2 Telecom.. 38</p> <p>1.9.3 Financial Services. 39</p> <p>Chapter 1 refresher: 40</p> <p>Conceptual short Questions with answers. 43</p> <p>Frequently asked Interview questions. 45</p> <p>Chapter Objective. 46</p> <p>Big Data Storage Concepts. 46</p> <p>2.1 Cluster computing. 47</p> <p>2.1.1 Types of cluster. 49</p> <p>2.1.1.1 High availability cluster. 50</p> <p>2.1.1.2 Load balancing cluster. 50</p> <p>2.1.2 Cluster structure. 51</p> <p>2.3 Distribution Models. 53</p> <p>2.3.1 Sharding. 54</p> <p>2.3.2 Data Replication. 56</p> <p>2.3.2.1 Master-Slave model 57</p> <p>2.3.2.2 Peer-to-Peer model 58</p> <p>2.3.3 Sharding and Replication. 59</p> <p>2.4 Distributed file system.. 60</p> <p>2.5 Relational and Non Relational Databases. 61</p> <p>CoursesOffered. 62</p> <p>Figure 2.12 Data divided across multiple related tables. 62</p> <p>2.4.2 RDBMS Databases. 63</p> <p>2.4.3 NoSQL Databases. 63</p> <p>2.4.4 NewSQL Databases. 64</p> <p>2.5 Scaling Up and Scaling Out Storage. 65</p> <p>Chapter 2 refresher. 67</p> <p>Conceptual short questions with answers. 69</p> <p>Chapter Objective. 72</p> <p>3.1 Introduction to NoSQL. 72</p> <p>3.2 Why NoSQL. 72</p> <p>3.3 CAP theorem.. 73</p> <p>3.4 ACID.. 75</p> <p>3.5 BASE. 76</p> <p>3.6 Schemaless Database. 77</p> <p>3.7 NoSQL (Not Only SQL) 77</p> <p>3.7.1 NoSQL Vs RDBMS. 78</p> <p>3.7.2Features of NoSQL database. 79</p> <p>3.7.3Types of NoSQL Technologies. 80</p> <p>3.7.3.1 Key-Value store database. 81</p> <p>3.7.3.2 Column-store database. 82</p> <p>3.7.3.3 Document Oriented Database. 84</p> <p>3.7.3.4 Graph-oriented Database. 86</p> <p>3.7.4 NoSQL Operations. 93</p> <p>3.9 Migrating from RDBMS to NoSQL. 98</p> <p>Chapter 3 refresher. 99</p> <p>Conceptual short questions with answers. 102</p> <p>Chapter Objective. 104</p> <p>4.1 Data Processing. 104</p> <p>4.2 Shared Everything Architecture. 106</p> <p>4.2.1 Symmetric multiprocessing architecture. 107</p> <p>4.2.2 Distributed Shared memory. 108</p> <p>4.3 Shared nothing architecture. 109</p> <p>4.4 Batch Processing. 110</p> <p>4.5 Real-Time Data Processing. 111</p> <p>4.6 Parallel Computing. 112</p> <p>4.7 Distributed Computing. 113</p> <p>4.8 Big Data Virtualization. 113</p> <p>4.8.1 Attributes of Virtualization. 114</p> <p>4.8.1.1 Encapsulation. 115</p> <p>4.8.1.2 Partitioning. 115</p> <p>4.8.1.3 Isolation. 115</p> <p>4.8.2Big Data Server Virtualization. 116</p> <p>4.9 Introduction. 116</p> <p>4.10 Cloud computing types. 118</p> <p>4.11Cloud Services. 120</p> <p>4.12 Cloud Storage. 121</p> <p>4.12.1 Architecture of GFS. 121</p> <p>4.12.1.1 Master. 123</p> <p>4.12.1.2 Client. 123</p> <p>4.13 Cloud Architecture. 127</p> <p>Cloud Challenges. 129</p> <p>Chapter 4 Refresher. 130</p> <p>Conceptual short questions with answers. 133</p> <p>Chapter Objective. 139</p> <p>5.1 Apache Hadoop. 139</p> <p>5.1.1 Architecture of Apache Hadoop. 140</p> <p>5.1.2Hadoop Ecosystem Components Overview.. 140</p> <p>5.2 Hadoop Storage. 142</p> <p>5.2.1HDFS (Hadoop Distributed File System). 142</p> <p>5.2.2Why HDFS?. 143</p> <p>5.2.3HDFS Architecture. 143</p> <p>5.2.4HDFS Read/Write Operation. 146</p> <p>5.2.5Rack Awareness. 148</p> <p>5.2.6Features of HDFS. 149</p> <p>5.2.6.1Cost-effective. 149</p> <p>5.2.6.2Distributed storage. 149</p> <p>5.2.6.3Data Replication. 149</p> <p>5.3 Hadoop Computation. 149</p> <p>5.3.1MapReduce. 149</p> <p>5.3.1.1Mapper. 151</p> <p>5.3.1.2Combiner. 151</p> <p>5.3.1.3 Reducer. 152</p> <p>5.3.1.4 JobTracker and TaskTracker. 153</p> <p>5.3.2 MapReduce Input Formats. 154</p> <p>5.3.3 MapReduce Example. 156</p> <p>5.3.4 MapReduce Processing. 157</p> <p>5.3.5 MapReduce Algorithm.. 160</p> <p>5.3.6 Limitations of MapReduce. 161</p> <p>5.4Hadoop 2.0. 161</p> <p>5.4.1Hadoop 1.0 limitations. 162</p> <p>5.4.2 Features of Hadoop 2.0. 163</p> <p>5.4.3 Yet Another Resource Negotiator (YARN). 164</p> <p>5.4.3 Core components of YARN.. 165</p> <p>5.4.3.1 ResourceManager. 165</p> <p>5.4.3.2 NodeManager. 166</p> <p>5.4.4 YARN Scheduler. 169</p> <p>5.4.4.1 <i>FIFO scheduler</i>. 169</p> <p>5.4.4.2 <i>Capacity Scheduler</i>. 170</p> <p>5.4.4.3 <i>Fair Scheduler</i>. 170</p> <p>5.4.5 Failures in YARN.. 171</p> <p>5.4.5.1ResourceManager failure. 171</p> <p>5.4.5.2 ApplicationMaster failure. 172</p> <p>5.4.5.3 NodeManagerFailure. 172</p> <p>5.4.5.4 Container Failure. 172</p> <p>5.3 HBASE. 173</p> <p>5.4 Apache Cassandra. 176</p> <p>5.5 SQOOP. 177</p> <p>5.6 Flume. 179</p> <p>5.6.1 Flume Architecture. 179</p> <p>5.6.1.1 Event. 180</p> <p>5.6.1.2 Agent. 180</p> <p>5.7 Apache Avro. 181</p> <p>5.8 Apache Pig. 182</p> <p>5.9 Apache Mahout. 183</p> <p>5.10 Apache Oozie. 183</p> <p>5.10.1 Oozie Workflow.. 184</p> <p>5.10.2 Oozie Coordinators. 186</p> <p>5.10.3 Oozie Bundles. 187</p> <p>5.11 Apache Hive. 187</p> <p>5.11 Apache Hive. 187</p> <p>Hive Architecture. 189</p> <p>Hadoop Distributions. 190</p> <p>Chapter 5refresher. 191</p> <p>Conceptual short questions with answers. 194</p> <p>Frequently asked Interview Questions. 199</p> <p>Chapter Objective. 200</p> <p>6.1 Terminologies of Big Data Analytics. 201</p> <p><i>Data Warehouse</i>. 201</p> <p><i>Business Intelligence</i>. 201</p> <p><i>Analytics</i>. 202</p> <p>6.2 Big Data Analytics. 202</p> <p>6.2.1 Descriptive Analytics. 204</p> <p>6.2.2 Diagnostic Analytics. 205</p> <p>6.2.3 Predictive Analytics. 205</p> <p>6.2.4 Prescriptive Analytics. 205</p> <p>6.3 Data Analytics Lifecycle. 207</p> <p>6.3.1 Business case evaluation and Identify the source data. 208</p> <p>6.3.2 Data preparation. 209</p> <p>6.3.3 Data Extraction and Transformation. 210</p> <p>6.3.4 Data Analysis and visualization. 211</p> <p>6.3.5 Analytics application. 212</p> <p>6.4 Big Data Analytics Techniques. 212</p> <p>6.4.1 Quantitative Analysis. 212</p> <p>6.4.3 Statistical analysis. 214</p> <p>6.4.3.1 A/B testing. 214</p> <p>6.4.3.2 Correlation. 215</p> <p>6.4.3.3 Regression. 218</p> <p>6.5 Semantic Analysis. 220</p> <p>6.5.1 Natural Language Processing. 220</p> <p>6.5.2 Text Analytics. 221</p> <p>6.7 Big Data Business Intelligence. 222</p> <p>6.7.1 Online Transaction Processing (OLTP). 223</p> <p>6.7.2 Online Analytical Processing (OLAP). 223</p> <p>6.7.3 Real-Time Analytics Platform (RTAP). 224</p> <p>6.6Big Data Real Time Analytics Processing. 225</p> <p>6.7 Enterprise Data Warehouse. 227</p> <p>Chapter 6 Refresher. 228</p> <p>Concept 
506 |a Plný text je dostupný pouze z IP adres počítačů Univerzity Tomáše Bati ve Zlíně nebo vzdáleným přístupem pro zaměstnance a studenty 
520 |a "This book offers comprehensive coverage of Big Data tools, terminologies and technologies for researchers, business professionals and graduates. This book begins with an overview of what Big Data is and emphasizes all the key concepts of big data end to end. Big Data concepts, technologies, terminologies and storing, processing and analysis techniques and much more -- are all logically organized and reinforced by diagrams and case studies. This book refines readers' understanding of Big Data with in-depth analysis of key concepts. The case studies provided in this book give insight on key concepts. The initial chapters of the book shed light on various characteristics of Big Data that distinguish it from traditional Database Management systems. Big Data Analytics are covered in detail in a separate chapter. Hadoop, the heart of Big Data is handled in the Big Data processing chapter and a deep understanding of its concepts is provided"--  |c Provided by publisher. 
590 |a Knovel  |b Knovel (All titles) 
650 0 |a Big data. 
650 0 |a Data mining. 
655 7 |a elektronické knihy  |7 fd186907  |2 czenas 
655 9 |a electronic books  |2 eczenas 
700 1 |a R, Nandhini Abirami,  |e author. 
700 1 |a Kadry, Seifedine,  |d 1977-  |e author. 
700 1 |a Gandomi, Amir Hossein,  |e author. 
776 0 8 |i Print version:  |a Balusamy, Balamurugan.  |t Big data  |b First edition.  |d Hoboken, NJ : John Wiley and Sons, Inc., 2021.  |z 9781119701828  |w (DLC) 2020024528 
856 4 0 |u https://proxy.k.utb.cz/login?url=https://app.knovel.com/hotlink/toc/id:kpBDCTA001/big-data-concepts?kpromoter=marc  |y Full text