Clojure data analysis cookbook : dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process
This book is for those with a basic knowledge of Clojure, who are looking to push the language to excel with data analysis.
Saved in:
| Main Author | |
|---|---|
| Format | Electronic eBook |
| Language | English |
| Published |
Birmingham, UK :
Packt Publishing,
2015.
|
| Edition | Second edition. |
| Subjects | |
| Online Access | Full text |
| ISBN | 9781784399955 1784399957 1784390291 9781784390297 |
| Physical Description | 1 online resource (1 volume) : illustrations |
Cover
Table of Contents:
- Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Importing Data for Analysis; Introduction; Creating a new project; Reading CSV data into Incanter datasets; Reading JSON data into Incanter datasets; Reading data from Excel with Incanter; Reading data from JDBC databases; Reading XML data into Incanter datasets; Scraping data from tables in web pages; Scraping textual data from web pages; Reading RDF data; Querying RDF data with SPARQL; Aggregating data from different formats; Chapter 2: Cleaning and Validating Data.
- IntroductionCleaning data with regular expressions; Maintaining consistency with synonym maps; Identifying and removing duplicate data; Regularizing numbers; Calculating relative values; Parsing dates and times; Lazily processing very large data sets; Sampling from very large data sets; Fixing spelling errors; Parsing custom data formats; Validating data with Valip; Chapter 3: Managing Complexity with Concurrent Programming; Introduction; Managing program complexity with STM; Managing program complexity with agents; Getting better performance with commute; Combining agents and STM.
- Maintaining consistency with ensureIntroducing safe side effects into the STM; Maintaining data consistency with validators; Monitoring processing with watchers; Debugging concurrent programs with watchers; Recovering from errors in agents; Managing large inputs with sized queues; Chapter 4: Improving Performance with Parallel Programming; Introduction; Parallelizing processing with pmap; Parallelizing processing with Incanter; Partitioning Monte Carlo simulations for better pmap performance; Finding the optimal partition size with simulated annealing; Combining function calls with reducers.
- Parallelizing with reducersGenerating online summary statistics for data streams with reducers; Using type hints; Benchmarking with Criterium; Chapter 5: Distributed Data Processing with Cascalog; Introduction; Initializing Cascalog and Hadoop for distributed processing; Querying data with Cascalog; Distributing data with Apache HDFS; Parsing CSV files with Cascalog; Executing complex queries with Cascalog; Aggregating data with Cascalog; Defining new Cascalog operators; Composing Cascalog queries; Transforming data with Cascalog; Chapter 6: Working with Incanter Datasets; Introduction.
- Loading Incanter's sample datasetsLoading Clojure data structures into datasets; Viewing datasets interactively with view; Converting datasets to matrices; Using infix formulas in Incanter; Selecting columns with ; Selecting rows with ; Filtering datasets with where; Grouping data with group-by; Saving datasets to CSV and JSON; Projecting from multiple datasets with join; Chapter 7: Statistical Data Analysis with Incanter; Introduction; Generating summary statistics with rollup; Working with changes in values; Scaling variables to simplify variable relationships.