How do you split a 1d array in python?


This function divides the array into subarrays along a specified axis. The function takes three parameters.

numpy.split(ary, indices_or_sections, axis)

Where,

Sr.No.Parameter & Description
1

ary

Input array to be split

2

indices_or_sections

Can be an integer, indicating the number of equal sized subarrays to be created from the input array. If this parameter is a 1-D array, the entries indicate the points at which a new subarray is to be created.

3

axis

Default is 0

Example

import numpy as np 
a = np.arange(9) 

print 'First array:' 
print a 
print '\n'  

print 'Split the array in 3 equal-sized subarrays:' 
b = np.split(a,3) 
print b 
print '\n'  

print 'Split the array at positions indicated in 1-D array:' 
b = np.split(a,[4,7])
print b 

Its output is as follows −

First array:
[0 1 2 3 4 5 6 7 8]

Split the array in 3 equal-sized subarrays:
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

Split the array at positions indicated in 1-D array:
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8])]

numpy_array_manipulation.htm

How to randomly sample NumPy arrays in Python without scikit-learn or Pandas.

Photo by Sergi Viladesau on Unsplash

Although there are packages such as sklearn and Pandas that manage trivial tasks like randomly selecting and splitting samples, there may be times when you need to perform these tasks without them.

In this article we will learn how to randomly select and manage data in NumPy arrays for machine learning without scikit-learn or Pandas.

Split and Stack Arrays

In machine learning, a common way to think about data structures is to have features and targets. In a simple case, let’s say we have data about animals that are either dogs or cats. The task at hand is to prepare an array for machine learning without the use of helpful libraries.

In this example, consider a spreadsheet-like array were each row is an observation and each column has data about that observation. The rows represent samples and the columns contain data about each sample. Finally, the last column is the target, or label for each sample.

How do you split a 1d array in python?

Figure 1 — One way to think about features and targets in an array for machine learning. Image from the author, credit Justin Chae

To get started on a machine learning project that predicts cats and dogs. The array might have a few columns and rows or thousands (or millions!) — whatever the case, the major steps are going to be the same: split and stack.

Split Dataset

You may need to split a dataset for two distinct reasons. First, split the entire dataset into a training set and a testing set. Second, split the features columns from the target column. For example, split 80% of the data into train and 20% into test, then split the features from the columns within each subset.

# given a one dimensional array
one_d_array = np.array([1,2,3,4,5,6,7,8,9,10])
# randomly select without replacement
train = np.random.choice(one_d_array, size=8, replace=False)
print(train)""" output
[ 3 5 10 9 6 8 4 7]
"""

Moreover, instead of always picking the first 80% of samples as they appear in the array, it helps to randomly select subsets. As a result, when we split, we actually want to randomly select and then split.

To randomly select, the first thing you might reach for is np.random.choice(). For example, to randomly sample 80% of an array, we can pick 8 out of 10 elements randomly and without replacement. As shown above, we are able to randomly select from a 1D array of numbers.

Random sampling is especially desired if the first half of the data contains all cats, since it prevents us from training on only cats and no dogs.

# a example array of data extended from Figure 1
# with shape (10, 4)

animals = np.array([[1,0,1,0],
[1,1,0,1],
[1,0,1,0],
[1,1,0,1],
[1,1,0,1],
[1,0,1,0],
[1,1,0,1],
[1,0,1,0],
[1,0,1,0],
[1,1,0,1]])
train = np.random.choice(animals, size=8, replace=True)
print(train)
""" output
ValueError Traceback (most recent call last)
in ()
----> 1 train = np.random.choice(animals, size=8, replace=True)
2 print(train)
mtrand.pyx in numpy.random.mtrand.RandomState.choice()
ValueError: a must be 1-dimensional
"""

Oops — np.random.choice() only works on 1D arrays. As a result, it fails to sample from our animals array and returns an ugly error message. How to work around this issue?

First option. Turn the problem sideways and instead of sampling the array directly, sample the array’s index, then split the array by index.

How do you split a 1d array in python?

Figure 2 — Randomly sample the index of integers, then use the result to select from the array. Image from the author, credit Justin Chae

If the array has 10 rows, the idea is to randomly select numbers from 0 through 9 and then index by the array by the resulting lists of numbers.

# length of data as indices of n_data
n_data = animals.shape[0]
# get n_samples based on percentage of n_data
n_samples = int(n_data * .8)
# make n_data a list from 0 to n
n_data = list(range(n_data))
# randomly select from range of n_data as indices
idx_train = np.random.choice(n_data, n_samples, replace=False)
idx_test = list(set(n_data) - set(idx_train))
print('indicies')
print(idx_train, idx_test)
print('test array')
print(animals[idx_test, ])
""" output of split indices and the smaller test array
indicies
[5 4 6 3 2 1 7 0] [8, 9]
test array
[[1 0 1 0]
[1 1 0 1]]
"""

Second option. If the goal is to return random subsets of an array, another way to accomplish the goal is to first shuffle the array and then sample it. Note that unlike some of the other methods, np.random.shuffle() performs the operation in place. Given the shuffled array, slice and dice it however you want to return subsets.

How do you split a 1d array in python?

Figure 3 — Randomly shuffle the entire array, select from the array. Image from the author, credit Justin Chae

With this second method, since the array is shuffled, simply taking the first 80% of rows represents a random sample.

# shuffle the same array as before, in place
np.random.shuffle(animals)
# slice the first-n and rest-of-n of an array
tst = animals[:8, ]
trn = animals[8:, ]

Split Array

Previously, we split the entire dataset, but what about the array, column-wise? In the example animals array, columns 0, 1, and 2 are the features and column 3 is the target. Sure, we could just return the 3rd column, but what if we have 5 or 100 features? In this case, negative indexing is a wonderful friend.

# negative index to slice the last column
# works, no matter how many columns

trgts = animals[:,-1]
print(trgts)
""" output is a flattened version of the last column
[0 1 1 1 0 0 1 0 1 0]
"""

In the example above, the negative index slices the last column off, but it is now a 1D array. In some cases, this is desirable; however, the features and targets arrays have different shapes — this is a problem if we want to put them back together again. Instead, we can take care to slice the numbers of rows with negative indexing to reserve the 2D shape.

# len data as indices of n_data
n_data = animals.shape[0]
# n_samples based on percentage of n_data
n_samples = int(n_data * .8)
# m_samples as the difference (the rest)
m_samples = n_data - n_samples
# slice n_samples up until the last column as feats
train_feats = animals[:n_samples, :-1]
# slice n_samples of only the last column as trgts
train_trgts = animals[:n_samples, -1:]
# ... repeat for m_samples

Stack Array

At this point, we’ve shuffled and split the dataset and split the features from targets. Now, how about putting everything back together again? To stack left-right and up-down, we can use np.hstack() and np.vstack().

How do you split a 1d array in python?

Figure 4— A single array split four ways to train and test with features and targets. Image from the author, credit Justin Chae

To put Humpty Dumpty back together again, stack horizontally and then stack vertically.

# combine side-by-side
train = np.hstack((train_feats, train_trgts))
test = np.hstack((test_feats, test_trgts))
# combined up-down, returns original array
orig = np.vstack((train, test))

Structure Arrays

If you can’t or don’t use Pandas and only have NumPy, there are some ways to leverage the power of NumPy with the ease of Pandas without actually importing Pandas. But how?

How do you split a 1d array in python?

Figure 5— Make a NumPy array structured with column names instead of just indices. Image from the author, credit Justin Chae

I found there are some cases where it is important to track the actual name of the feature (or the column) throughout the program. One way to do this is to pass around a list of names with the array, but it is a lot to keep track of. Instead, I found it extremely helpful to transform the array to be structured.

With a structured array, column index 0 is also indexed by the word ‘fur’ and so on.

# import a new library in addition to numpy
import numpy.lib.recfunctions as rfn
# column names as a list of strings
col_names = ['fur', 'meow', 'bark', 'label']
# an array
animals = np.array([[1,0,1,0],
[2,1,0,1],
[3,0,1,0],
[4,1,0,1]])
# necessary to set the datatype for each cell
# set n dtypes to integer based on col_names

dtypes = np.dtype([(n, 'int') for n in col_names])
# use refunctions library to set array to structured
structured = rfn.unstructured_to_structured(animals, dtypes)
print(structured['fur'])""" output
[1 2 3 4]
"""

For more on operations with structured arrays see Joining Structured Arrays; these methods discovered via Stack Overflow at https://stackoverflow.com/questions/55577256/numpy-how-to-add-column-names-to-numpy-array.

Summary

In this story, I present some of the NumPy functions that I learned and relied on while taking a university course in machine learning. The course restricted the use of pre-built libraries such as sci-kit learn and Pandas to reinforce specific learning objectives. These are just some of my notes that I hope are helpful to others seeking a few tips and tricks on NumPy with machine learning.

Thanks for reading, hope it works for you. Let me know if I can make any improvements or cover new topics.

How do you split an array in Python?

Use the array_split() method, pass in the array you want to split and the number of splits you want to do.

How do you split an array into two?

To divide an array into two, we need at least three array variables. We shall take an array with continuous numbers and then shall store the values of it into two different variables based on even and odd values.

How do you split an array?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

How do you split an array into smaller arrays in Python?

How to split Numpy Arrays.
split(): Split an array into multiple sub-arrays of equal size..
array_split(): It Split an array into multiple sub-arrays of equal or near-equal size. ... .
hsplit(): Splits an array into multiple sub-arrays horizontally (column-wise)..