数据科学
Python数据科学手册 豆瓣
Python Data Science Handbook: Essential Tools for Working with Data
作者: Jake VanderPlas 译者: 陶俊杰 / 陈小莉 出版社: 人民邮电出版社 2018 - 1
本书是对以数据深度需求为中心的科学、研究以及针对计算和统计方法的参考书。本书共五章,每章介绍一到两个Python数据科学中的重点工具包。首先从IPython和Jupyter开始,它们提供了数据科学家需要的计算环境;第2章讲解能提供ndarray对象的NumPy,它可以用Python高效地存储和操作大型数组;第3章主要涉及提供DataFrame对象的Pandas,它可以用Python高效地存储和操作带标签的/列式数据;第4章的主角是Matplotlib,它为Python提供了许多数据可视化功能;第5章以Scikit-Learn为主,这个程序库为最重要的机器学习算法提供了高效整洁的Python版实现。
本书适合有编程背景,并打算将开源Python工具用作分析、操作、可视化以及学习数据的数据科学研究人员。
R语言入门与实践 豆瓣
Hands-On Programming with R
作者: [美] Garrett Grolemund 译者: 冯凌秉 出版社: 人民邮电出版社 2016 - 6
本书精心策划了三个虚拟项目,将数据科学家必需的专业技能融合其中,教会读者如何将数据存储到计算机内存中,如何在必要的时候转换内存中的数据值,如何用R编写自己的程序并将其用于数据分析和模拟运行。读者将跟随世界一流的RStudio培训师掌握宝贵的编程技能,并借助这些技能成为优秀的数据科学家。
R数据科学 豆瓣
R for Data Science: Visualize, Model, Transform, Tidy, and Import Data
10.0 (5 个评分) 作者: [新西兰] 哈德利 • 威克姆 / [美] 加勒特 • 格罗勒芒德 译者: 陈光欣 出版社: 人民邮电出版社 2018 - 7
本书的目标是教会读者使用最重要的数据科学工具,从而为实施数据科学奠定坚实的基础。读完本书后,你将掌握R语言的精华,并能够熟练使用多种工具来解决各种数据科学难题。每一章都按照这样的顺序组织内容:先给出一些引人入胜的示例,以便你可以整体了解这一章的内容,然后再深入细节。本书的每一节都配有习题,以帮助你实践所学到的知识。
ggplot2: Elegant Graphics for Data Analysis (Use R!) 豆瓣
作者: Hadley Wickham 出版社: Springer 2016 - 6
This new edition to the classic book by ggplot2 creator Hadley Wickham highlights compatibility with knitr and RStudio. ggplot2 is a data visualization package for R that helps users create data graphics, including those that are multi-layered, with ease. With ggplot2, it's easy to:
produce handsome, publication-quality plots with automatic legends created from the plot specification
superimpose multiple layers (points, lines, maps, tiles, box plots) from different data sources with automatically adjusted common scales
add customizable smoothers that use powerful modeling capabilities of R, such as loess, linear models, generalized additive models, and robust regression
save any ggplot2 plot (or part thereof) for later modification or reuse
create custom themes that capture in-house or journal style requirements and that can easily be applied to multiple plots
approach a graph from a visual perspective, thinking about how each component of the data is represented on the final plot
This book will be useful to everyone who has struggled with displaying data in an informative and attractive way. Some basic knowledge of R is necessary (e.g., importing data into R). ggplot2 is a mini-language specifically tailored for producing graphics, and you'll learn everything you need in the book. After reading this book you'll be able to produce graphics customized precisely for your problems, and you'll find it easy to get graphics out of your head and on to the screen or page.
Understanding Advanced Statistical Methods 豆瓣
作者: Peter Westfall / Kevin S. S. Henning 出版社: Chapman and Hall/CRC 2013 - 5
Providing a much-needed bridge between elementary statistics courses and advanced research methods courses, Understanding Advanced Statistical Methods helps students grasp the fundamental assumptions and machinery behind sophisticated statistical topics, such as logistic regression, maximum likelihood, bootstrapping, nonparametrics, and Bayesian methods. The book teaches students how to properly model, think critically, and design their own studies to avoid common errors. It leads them to think differently not only about math and statistics but also about general research and the scientific method. With a focus on statistical models as producers of data, the book enables students to more easily understand the machinery of advanced statistics. It also downplays the "population" interpretation of statistical models and presents Bayesian methods before frequentist ones. Requiring no prior calculus experience, the text employs a "just-in-time" approach that introduces mathematical topics, including calculus, where needed. Formulas throughout the text are used to explain why calculus and probability are essential in statistical modeling. The authors also intuitively explain the theory and logic behind real data analysis, incorporating a range of application examples from the social, economic, biological, medical, physical, and engineering sciences. Enabling your students to answer the why behind statistical methods, this text teaches them how to successfully draw conclusions when the premises are flawed. It empowers them to use advanced statistical methods with confidence and develop their own statistical recipes. Ancillary materials are available on the book's website.
概率图模型:原理与技术 豆瓣
作者: [美]Daphne Koller / [以色列]Nir Friedman 译者: 王飞跃 / 韩素青 出版社: 清华大学出版社 2015 - 3
概率图模型将概率论与图论相结合,是当前非常热门的一个机器学习研究方向。本书详细论述了有向图模型(又称贝叶斯网)和无向图模型(又称马尔可夫网)的表示、推理和学习问题,全面总结了人工智能这一前沿研究领域的最新进展。为了便于读者理解,书中包含了大量的定义、定理、证明、算法及其伪代码,穿插了大量的辅助材料,如示例(examples)、技巧专栏(skill boxes)、实例专栏(case study boxes)、概念专栏(concept boxes)等。另外,在第 2章介绍了概率论和图论的核心知识,在附录中介绍了信息论、算法复杂性、组合优化等补充材料,为学习和运用概率图模型提供了完备的基础。
本书可作为高等学校和科研单位从事人工智能、机器学习、模式识别、信号处理等方向的学生、教师和研究人员的教材和参考书。
== 序 言 ==
很高兴能够看到我们所著的《概率图模型》一书被翻译为中文出版。我们了解到这本书涵盖的课题已在中国引起了巨大的兴趣。已有众多中国读者写信向我们解释这本书对于他们的学习的重要性,并希望获得更易理解的版本。随着众多来自中国研究机构或国外研究机构的中国学者署名或共同署名的文章的发表,中国研究者已在概率图领域中扮演了非常重要的角色。这些文章对于概率图模型领域的发展起到了非常重要的作用。我们相信《概率图模型》中文版的出版将帮助许多中国读者学习并掌握这一重要课题的基础。同时,这也将进一步提高中国学者应用概率图模型思想的能力,并为这一领域的发展做出贡献。
本书的翻译工作由王飞跃研究员主导,并得到了王珏研究员及其众多助手和合作者的支持。这是一份历时 5年、具有里程碑意义的努力,我深深地感谢该团队所有为本书翻译做出贡献的人员。我尤其希望借此机会感谢王珏研究员——一位中国机器学习领域的开拓者。王珏研究员是此项翻译工作的十分重要的推动者。没有他的支持,没有他的众多杰出的机器学习领域的学生的帮助,可能这项工作到现在还没有结果。很遗憾王珏研究员于 2014年 12月死于癌症,终年 66岁,已不能看到他努力的结果。然而,他的思想活在他的学生们的工作中,与本书的出版同在。
Daphne Koller
(复杂系统管理与控制国家重点实验室王晓翻译)
Time Series Analysis 豆瓣
作者: George E. P. Box / Gwilym M. Jenkins 出版社: Wiley 2008 - 6
A modernized new edition of one of the most trusted books on time series analysis. Since publication of the first edition in 1970, Time Series Analysis has served as one of the most influential and prominent works on the subject. This new edition maintains its balanced presentation of the tools for modeling and analyzing time series and also introduces the latest developments that have occurred n the field over the past decade through applications from areas such as business, finance, and engineering. The Fourth Edition provides a clearly written exploration of the key methods for building, classifying, testing, and analyzing stochastic models for time series as well as their use in five important areas of application: forecasting; determining the transfer function of a system; modeling the effects of intervention events; developing multivariate dynamic models; and designing simple control schemes. Along with these classical uses, modern topics are introduced through the book's new features, which include: A new chapter on multivariate time series analysis, including a discussion of the challenge that arise with their modeling and an outline of the necessary analytical tools New coverage of forecasting in the design of feedback and feedforward control schemes A new chapter on nonlinear and long memory models, which explores additional models for application such as heteroscedastic time series, nonlinear time series models, and models for long memory processes Coverage of structural component models for the modeling, forecasting, and seasonal adjustment of time series A review of the maximum likelihood estimation for ARMA models with missing values Numerous illustrations and detailed appendices supplement the book,while extensive references and discussion questions at the end of each chapter facilitate an in-depth understanding of both time-tested and modern concepts. With its focus on practical, rather than heavily mathematical, techniques, Time Series Analysis , Fourth Edition is the upper-undergraduate and graduate levels. this book is also an invaluable reference for applied statisticians, engineers, and financial analysts.
点击链接进入中文版:
时间序列分析:预测与控制
Computer Age Statistical Inference 豆瓣
作者: Bradley Efron / Trevor Hastie 出版社: Cambridge University Press 2016 - 7
The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. 'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.
Clarifies both traditional methods and current, popular algorithms (e.g. neural nets, random forests)
Written by two world-leading researchers
Addressed to all fields that work with data
精通数据科学:从线性回归到深度学习 豆瓣
作者: 唐亘 出版社: 人民邮电出版社 2018 - 5
数据科学是一门内涵很广的学科,它涉及到统计分析、机器学习以及计算机科学三方面的知识和技能。本书深入浅出、全面系统地介绍了这门学科的内容。
本书分为13章,最初的3章主要介绍数据科学想要解决的问题、常用的IT工具Python以及这门学科所涉及的数学基础。第4-7章主要讨论数据模型,主要包含三方面的内容:一是统计中最经典的线性回归和逻辑回归模型;二是计算机估算模型参数的随机梯度下降法,这是模型工程实现的基础;三是来自计量经济学的启示,主要涉及特征提取的方法以及模型的稳定性。接下来的8-10章主要讨论算法模型,也就是机器学习领域比较经典的模型。这三章依次讨论了监督式学习、生成式模型以及非监督式学习。目前数据科学最前沿的两个领域分别是大数据和人工智能。本书的第11章将介绍大数据中很重要的分布式机器学习,而本书的最后两章将讨论人工智能领域的神经网络和深度学习。
本书通俗易懂,而且理论和实践相结合,可作为数据科学家和数据工程师的学习用书,也适合对数学科学有强烈兴趣的初学者使用。同时也可作为高等院校计算机、数学及相关专业的师生用书和培训学校的教材。
Learning From Data 豆瓣
10.0 (7 个评分) 作者: Yaser S. Abu-Mostafa / Malik Magdon-Ismail 出版社: AMLBook 2012 - 3
Machine learning allows computational systems to adaptively improve their performance with experience accumulated from the observed data. Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.
Statistical Rethinking 豆瓣
作者: Richard McElreath 出版社: Chapman and Hall/CRC 2015
Statistical Rethinking: A Bayesian Course with Examples in R and Stan builds readers’ knowledge of and confidence in statistical modeling. Reflecting the need for even minor programming in today’s model-based statistics, the book pushes readers to perform step-by-step calculations that are usually automated. This unique computational approach ensures that readers understand enough of the details to make reasonable choices and interpretations in their own modeling work.
The text presents generalized linear multilevel models from a Bayesian perspective, relying on a simple logical interpretation of Bayesian probability and maximum entropy. It covers from the basics of regression to multilevel models. The author also discusses measurement error, missing data, and Gaussian process models for spatial and network autocorrelation.
By using complete R code examples throughout, this book provides a practical foundation for performing statistical inference. Designed for both PhD students and seasoned professionals in the natural and social sciences, it prepares them for more advanced or specialized statistical modeling.