How to use Python to set random seeds
How to Set a Random Seed in Python
1. Introduction
When using Python for data analysis and machine learning, random numbers are often required. They can be used for a variety of purposes, such as generating simulated data, sampling data, and splitting data for model training. However, computer-generated random numbers are actually pseudo-random numbers, generated based on an initial random seed. Therefore, if a random seed is not set, the random number sequence obtained will be different each time the program is run.
For experimental reproducibility, we may want to obtain the same random number sequence each time the program is run. In this case, setting a random seed is necessary. This article will detail how to set a random seed in Python and provide example code.
2. The random module
In Python, you can use the random
module to generate pseudorandom numbers. This module provides various functions and methods for generating random numbers, such as generating random integers, random floating-point numbers, and randomly selecting elements from a sequence.
To use the random
module, you must first import it:
import random
3. Setting a random seed
To set a random seed, use the random.seed()
function. This function accepts an integer as a parameter, which sets the random seed. Running the program multiple times with the same random seed will produce the same sequence of random numbers.
The sample code is as follows:
import random
# Set the random seed to 1
random.seed(1)
# Generate a random integer
print(random.randint(1, 100))
# Generate a random floating-point number
print(random.random())
# Randomly select an element from a sequence
print(random.choice(['apple', 'banana', 'orange']))
The running result is as follows:
17
0.13436424411240122
banana
From the sample code and running results above, we can see that the random number sequence obtained each time we run the program is the same.
4. Application Scenarios
Next, we will introduce several typical application scenarios and explain why setting a random seed is necessary.
4.1 Data Simulation
In data analysis and machine learning tasks, it is sometimes necessary to generate simulated data for experiments. If the random number sequence generated each time the program is run is different, it is impossible to compare the performance of different models or algorithms on the same dataset.
By setting a random seed, we can ensure that the simulated data generated each time is identical, allowing for accurate comparisons.
4.2 Data Sampling
In data analysis, it is often necessary to sample large datasets. Random sampling is one common method. If the samples generated each time are different, it is impossible to compare the performance of different sampling methods.
By setting a random seed, we can ensure that the samples generated each time are identical, allowing for accurate comparisons.
The sample code is as follows:
import random
# Set the random seed to 1
random.seed(1)
# Random sampling
sample = random.sample(range(100), 10)
print(sample)
The running results are as follows:
[80, 75, 59, 8, 26, 71, 15, 47, 51, 33]
As you can see, the samples obtained each time the program is run are the same.
4.3 Model Training
In machine learning tasks, it is often necessary to divide the dataset into training and test sets. To evaluate the performance of the model, it is usually necessary to divide the dataset multiple times and use different training and test sets. If the dataset obtained from each division is different, it is impossible to accurately evaluate the performance of the model.
By setting a random seed, we can ensure that the dataset obtained from each partition is identical, thus enabling accurate evaluation.
The sample code is as follows:
import random
from sklearn.model_selection import train_test_split
# Load data
X, y = load_data()
# Set the random seed to 1
random.seed(1)
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print(X_train.shape, X_test.shape)
The running results are as follows:
(800, 10) (200, 10)
As you can see, the training and test set partitions are the same each time you run the program.
5. Summary
This article detailed how to set a random seed in Python. By setting a random seed, you can ensure that the random number sequence generated each time you run the program is the same, thus achieving experimental reproducibility. Setting a random seed is essential in application scenarios such as data simulation, data sampling, and model training.