First draft of data gate of the whole stack completed


The door of the whole stack data (temporary title) is the accumulation of working skills in the last 5 years. The first official account was written 8 months ago, and I didn’t know how many lonely nights I had spent.

Writing an article is a hard work, and writing a book requires more.

The original plan was to write seven chapters, seven in each chapter, with a total of 77 chapters and 49 chapters. Only in the process of writing, part of the original plan was abandoned. I feel that with my current skills, I can’t write characteristics, or I don’t have good application scenarios, or I don’t have the motivation to write, or In short, reasons can always be found, just as men can always find reasons for cheating.

As a result, there are still seven chapters, only six articles left in each chapter, with a total of 42 articles. In fact, the number of six is excellent, which is exactly the number of six trigrams in each of the eight trigrams. Besides, friends who have read the book Lu Ding Ji should know that there is a book called 42 chapters Jing. Maybe the number of 42 is not bad.

It may be another three or five years before the materials that have been discarded for the time being will be compiled into a volume if there is any improvement in technology in the overall field.

If this book can be called “the gate of full stack data” (the specific title of the book has to be discussed with the editor of the publishing house), then maybe the next book will be called “the road of full stack data”. Just like climbing a mountain, you have to find a way to go up the mountain.

At present, the word count is about 120000. Because some of the codes and conclusions are captured directly, the word count will suffer some losses.

To get back to the point, the quantity may not be very important, and I believe readers are more concerned about the quality of books. This is the reason why the first draft has not been submitted yet. Because, next is the beginning of the second draft.

There are still many things to be done in the second draft, and a lot of details need to be corrected. At present, we can think of the following:

  1. Correct typos and punctuation to make the article readable.

  2. Add more charts to make the article more clear.

  3. To increase the readability of the article, maybe we should be more bullshit, so that the point of view is more obvious in the bullshit or metaphor, rather than the blunt words.

  4. To perfect the introduction of each chapter, except for the six main texts, the introduction is in the plan.

  5. Unify the style of the article, including the use of the environment, presentation data, this is a big project.

  6. Adjust the content of some articles and add as many subtitles as possible to make the article more organized. It will also revise part of the description to make the whole book look smoother.

  7. (finish the above first, and then do the rest. Don’t have too many goals.).

Of course, if there are any better suggestions, please leave a message or reply to the official account. Let me know.

The outline of the first draft is attached below. If you want to help review the draft or participate in improvement, please contact me.

01 Linux, light of freedom (6 / 6)

0x10 [introduction] Linux introduction
0X11 [draft] Linux foundation, starting from scratch
0x12 [draft] grep and sed, text processing
0x13 [draft] data Langya stick, camouflager awk
0x14 [first draft] shell shortcut key, Emacs entry
0x15 [first draft] originated from Linux, once in MAC, it’s wrong for life
0x16 [draft] integrator, cluster installation

02 python, data analysis (6 / 6)

0x20 [introduction] Python introduction
0x21 [first draft] Tao is natural, python comes out of the hole
0x22 [draft] anaconda, IPython
0x23 [draft] beautiful, python tool
0x24 [draft] SQL skills, necessary MySQL
0x25 [draft] pandas, data frame
[first draft] Zeppelin, unifying the world

03 big data, no exception (6 / 6)

0x30 [introduction] big data introduction
[first draft] living in the world, Hadoop
0x32 [draft] the beauty of divide and conquer, MapReduce
[first draft] hive foundation, honeycomb and warehouse
0x34 [first draft] hive has deep experience
0x35 [draft] SQL and NoSQL, sqoop as media
0x36 [first draft] nothing big, ecological framework

04 machine learning, human out of control (6 / 6)

0x40 [introduction] Introduction to machine learning
0x41 [draft] sklearn, machine learning
0x42 [draft] model evaluation, cross validation
0x43 [draft] data mining, consistent cleaning
Chinese vector, bag of words model
0x45 [first draft] close to Zhu zhechi, blind date KNN
0x46 [first draft] self taught data, leading the dance

05 algorithm prediction, Divination (6 / 6)

0x50 [introduction] algorithm Introduction
It’s silly, naive, naive
0x52 [draft] the tree of Bodhi
0x53 [draft] random beauty, random forest
0x54 [draft] isolated forest, mining anomalies
0x55 [draft] self encoder, the gate of depth
0x56 [first draft] collective wisdom, out of control philosophy

06 spark, fast but not broken (6 / 6)

0x60 [introduction] spark introduction
0x61 [draft] pyspark, alliance of the strong
0x62 [draft] RDD operator, the soul of operation
0x63 [first draft] father of artifact, Scala enters the world
0x64 [draft] distributed SQL
0x65 [draft] dataframe
Machine heart, learning ml Library

07 data science, full stack wisdom (6 / 6)

0x70 [introduction] data field introduction
0x71 [first draft] data scientist, necessary shell
0x72 [first draft] beginning of analysis, descriptive statistics
0x73 [draft] big data analysis, seven basic skills
0x74 [draft] data geek, position information
Data science, seven skills
Color is empty, blank and null