Data is the foundation of data scientists, so it is important to understand many methods of loading data for analysis. Here, we will introduce five Python data entry technologies and provide code examples for your reference.
As a beginner, you may know only one use_ P andas.read_ csv_ Function to read data (usually in CSV_ Format). It’s one of the most mature and powerful features, but other methods are very helpful and will certainly come in handy sometimes.
The method I want to discuss is:
- genfromtxtF function
The dataset we will use to load data can be found in thehereFind it. It’s called 100 sales records.
We will use numpy, pandas, and pickle packages, so import them.
1. Manual Function
This is the most difficult because you have to design a custom function that can load data for you. You have to deal with Python’s general archiving concept and use it to read.csvDocuments.
Let’s do this on 100 sales record files.
Well, what is this???? It seems a bit complicated code!!! Let’s break it down step by step so that you can understand what’s going on and you can apply similar logic to read your own_ .csv_ Documents.
Here, I created oneload_csvFunction, which takes the path of the file to be read as an argument.
I have a name called_ data_ It will have my CSV file data while another listcolWill have my column name. Now, after manually checking the CSV, I know that the column name is in the first row, so in my first iteration, I have to store the data in the first row inCol,The remaining lines are stored in the_ Data.
To check the first iteration, I used a_ checkcol_ , which is false, and when it is false in the first iteration, it stores the data of the first row in theCol, and then_ checkcol_ Set to true, so we will handle_ Data_ List and store the remaining values in the_ Data_ List.
I mainly use logic here_ readlines（）_ Functions in Python iterate through the file. This function returns a list of all the lines in the file.
When reading the title, it detects new lines as nCharacter, the line termination character, so in order to remove it, I used thestr.replaceFunction.
Because this is a. CSV ofSo I have to be based on different thingscommaSo I’ll hold a string each_ ，_ usestring.split（“”）。 For the first iteration, I’ll store the first row, which contains a list of column names called_ col_。 Then, I’ll attach all the data to the_ Data’s_ List.
To read the data more beautifully, I return it as a data frame format, because it’s easier to read a data frame than a numpy array or a python list.
advantages and disadvantages
The important benefit is that you have all the flexibility and control over the file structure, and that you can read and store it in any format and manner you want.
You can also use your own logic to read files that do not have a standard structure.
An important drawback is that, especially for standard type files, it is complex to write because they are easy to read. You have to hard code the logic that needs to be tried again and again.
Use the file only if it is not in a standard format or if you want flexibility and read it in a way that the library cannot provide.
Two Numpy.loadtxt function
This is a built-in function in numpy, the famous number library in Python. Loading data is a very simple function. This is useful for reading data of the same data type.
It’s hard to read when the data is more complex, but it’s really powerful when the file is simple.
To get a single type of data, you can downloadthisVirtual data set. Let’s jump to the code.
Here, we simply use the_ Delimiter middle_ As‘，’_ Of_ loadtxtfunction，Because this is a CSV file.
Now, if we print_ Df_ We’ll see pretty good data in the numpy array that you can use.
Due to the large amount of data, we only printed the first five lines.
advantages and disadvantages
An important aspect of using this feature is that you can quickly load data from a file into a numpy array.
The disadvantage is that you cannot have other data types or missing rows in the data.
We will use the dataset, the dataset used in the first example, “100 sales.” Records.csv ”To prove that it can contain multiple data types.
Let’s jump to the code.
To see it more clearly, we can see it in data frame format, that is
what is it? Oh, it has skipped all columns with string data types. How to deal with it?
Just add another onedtypeParameter and set the_ dtype_ Set to none, which means it has to take care of the data type of each column itself. The entire data is not converted to a single dtype.
It’s much better than the first, but the column heading here is row. To make it a column heading, we have to add another parameter, which isname, and set it toTrue，So it uses the first row as the column heading.
df3 = np.genfromtxt(‘100 Sales Records.csv’, delimiter=’,’, dtype=None, names=True, encoding=’utf-8′)
We can print it as
Pandas is a very popular data manipulation library, which is very commonly used_ read_ csv（）_ Is very important andripeOne of the features is that it can easily read any.csvFile and help us operate. Let’s operate on a dataset of 100 sales records.
This feature is easy to use and therefore very popular. You can compare it with our previous code and check it out.
Guess what? We’re done. It’s actually so simple and easy to use. Pandas.read_ CSV certainly provides many other parameters to adjust our dataset, such as in ourconvertcsv.csvIn the file, we don’t have a column name, so we can read it as
We can see that it has read the header freecsvDocuments. You can use theView hereAll other parameters in the official documentation.
If its binary format is not good, then you can understand it as human. You can then easily reload it using the pickle library.
We will get a CSV file of 100 sales records and first save it in pickle format so that we can read it.
This creates a new filetest.pkl, which containsPandasCaptionedpdDf 。
Now use pickle to open it, we just need to usepickle.loadFunction.
Here, we have succeeded from_ pandas.DataFrame_ The data is loaded in the pickle file.
You now know five different ways to load data files in Python, which can help you load datasets in different ways as you work with everyday projects.
Excellent links of past issues:
Don’t leave after watching, there are still surprises!
I carefully collated the 2TB video courses and books related to computer / Python / machine learning / deep learning, worth 1W yuan. Focus on WeChat official account “computer and AI”, click on the menu below to get SkyDrive links.