Mahmoud Parsian

About the Author

Mahmoud Parsian, Ph.D. in Computer Science, is a practicing software professional with 35+ years of experience as a developer, designer, architect, and author. For the past 20 years, he has been involved in Java server-side, databases, MapReduce, Spark, PySpark, and distributed computing. Dr. Parsian currently leads Illumina's Big Data team, which is focused on large-scale genome analytics and distributed computing. He leads and develops scalable genomics algorithms and DNA pipelines using Java, Python, MapReduce, Hadoop, HBase, PySpark, Spark, Snowflake, and open source tools. Dr. Parsian is an adjunct faculty at the Santa Clara University and teaches Big Data Modeling & Analytics and Machine Learning.He is the author of 5 books: https://github.com/mahmoudparsian/1. Data Algorithms with Spark (2022)• O'Reilly book: https://www.oreilly.com/library/view/data-algorithms-with/9781492082378/• Github: https://github.com/mahmoudparsian/data-algorithms-with-sparkFOREWORD by Dr. Matei Zaharia (Original Creator of Apache Spark):for [Data Algorithms with Spark, by Mahmoud Parsian]• https://github.com/mahmoudparsian/data-algorithms-with-spark/blob/master/images/data-alg-foreword2.pdf"When I started the Apache Spark project a decade ago, one of my main goals was to make it easier for a wide range of users to implement parallel algorithms. New algorithms acting on large-scale data are having a profound impact in all areas of computing, and I wanted to help developers implement new algorithms and reason about their performance without having to build a distributed system from scratch.I am therefore very excited to see this new book by Dr. Mahmoud Parsian on data algorithms with Spark. Dr. Parsian has extensive research and practical experience with large-scale data-parallel algorithms, including developing new algorithms for bioinformatics as the lead of Illumina’s big data team. In this book, he introduces Spark through its Python API, PySpark, and shows how to implement a wide range of useful algorithms efficiently using Spark’s distributed computing primitives. He also explains the workings of the underlying Spark engine and how to optimize your algorithms through techniques such as controlling data partitioning. This book will be a great resource for both readers looking to implement existing algorithms in a scalable fashion and readers who are developing new, custom algorithms using Spark.I am also thrilled that Dr. Parsian has included working code examples for all the algorithms he discusses, using real-world problems where possible. These will serve as a great starting point for readers who want to implement similar computations. Whether you intend to use these algorithms directly or build your own, custom algorithms using Spark, I hope that you enjoy this book as an introduction to the open-source engine, its inner workings, and the modern parallel algorithms that are having such a broad impact across computing." Matei Zaharia Assistant Professor of Computer Science, Stanford Chief Technologist, Databricks Original Creator of Apache Spark2. Data Algorithms (2015)• O'Reilly book: https://www.oreilly.com/library/view/data-algorithms/9781491906170/• Github: https://github.com/mahmoudparsian/data-algorithms-book3. PySpark Algorithms (2019)• Amazon: https://www.amazon.com/PySpark-Algorithms-version-Mahmoud-Parsian-ebook/dp/B07WQHTVCJ/ref=sr_1_1• Github: https://github.com/mahmoudparsian/pyspark-algorithms4. JDBC Recipes: https://link.springer.com/book/10.1007/978-1-4302-0061-15. JDBC Metadata Recipes: https://link.springer.com/book/10.1007/978-1-4302-0134-2

Mahmoud Parsian

About the Author

Author's Books