Numpy#
AKA Numeric Python
Numpy advantages:
optimized storage in memory
50 times faster than a python list
allows mathematical operators that don´t work with lists
Numpy data Types#
A numpy array can hold just one data type at the same time, even tough it still support the following data types:
If you want to see the data type that is holding your array, use the dtype method
import numpy as np
arr = np.array([1,2,3,4,5])
print(arr.dtype)
int32
change the data type of an array#
we might want to transform these data types into float
note that the numbers are originally int64, let´s turn them into float64
this is convenient if you are working with tensorflow, pytorch and Neural Networks. code will run faster
import numpy as np
arr2 = np.array([1,2,3,4,5], dtype = "float64")
print(arr2.dtype)
print(arr2)
float64
[1. 2. 3. 4. 5.]
another way to redefine the data type given an array
this will turn data into float64
note: to turn data into float64 use the method:
array_name.astpye(np.float64)
#import numpy as np
#creating the array
arr3 = np.array([1,2,3,4,5])
#changing the array data type
arr3 = arr3.astype(np.float64)
#now print the array and its data type
print(arr3)
print(arr3.dtype)
[1. 2. 3. 4. 5.]
float64
to turn data into boolean use the method:
array_name.astype(np.bool_)
#import numpy as np
#creating the array
arr4 = np.array([0,1,2,3,4,5])
#changing the array data type
arr4 = arr4.astype(np.bool_)
#now print the array and its data type
print(arr4)
print(arr4.dtype)
print("note that the only false value is that one corresponding to 0")
[False True True True True True]
bool
note that the only false value is that one corresponding to 0
to turn data into string use the method:
array_name.astype(np.string_)
#import numpy as np
#creating the array
arr5 = np.array([0,1,2,3,4,5])
#changing the array data type
arr5 = arr5.astype(np.string_)
#now print the array and its data type
print(arr5)
print(arr5.dtype)
[b'0' b'1' b'2' b'3' b'4' b'5']
|S11
can we do the opposite?
from string to integer
btw use the method: array_name.astype(np.int8)
#import numpy as np
#creating the array of strings
arr6 = np.array(["0","1","2","3","4","5"])
#changing the array data type
arr6 = arr6.astype(np.int8)
#now print the array and its data type
print(arr6)
print(arr6.dtype)
[0 1 2 3 4 5]
int8
note that the strings CANNOT have letters, just numbers, otherwise it will shoot an error
Numpy data type & definition#
In numpy, the main data type is an array. The arrays are defined as follows:
a = np.array([1, 2, 3])
arrays are built from python lists
Operators & indexing#
Mathematical operators in numpy#
suppose you have a list and you want to calculated each element to the power of two
x = [1, 2, 3, 4, 5]
print(x)
[1, 2, 3, 4, 5]
if you try x**2 it will shoot you an error, i promise.
To solve these problems, you can use Numpy arrays
# first step
import numpy as np
# this is how you transform a list into an array
x = np.array(x)
#now we use the ** operator on the array without getting an error
print(type(x))
print(x**2)
<class 'numpy.ndarray'>
[ 1 4 9 16 25]
Numpy arrays can only contain one data type!!!!!!!!!!!!!!!!
#import numpy as np
a = np.array([1, "is", True])
print(type(a))
print(type(a[0])) #first element
print("note that everything will be turned into a string")
print(type(a[2])) #third element
<class 'numpy.ndarray'>
<class 'numpy.str_'>
note that everything will be turned into a string
<class 'numpy.str_'>
Performable operations with Numpy Arrays#
#import numpy as np
firstlist = [1, 2, 3]
numpylist = np.array(firstlist)
print(firstlist + firstlist)
print(numpylist + numpylist)
[1, 2, 3, 1, 2, 3]
[2 4 6]
Note that the list is doing an append, while the array is summing the arrays element by element (and that makes more sense)
Numpy Indexing#
In Numpy, you can index as in a list.
Structure
Array[conditions]
example:
#import numpy as np
#creating a 1D array
crazylist = [1,2,3,4,5,6,7,8,9,10,11,12]
crazyarray = np.array(crazylist)
#let´s print the first element of the array
print(crazyarray[0])
#let´s slice from the 2nd to the 5th elements
print(crazyarray[1:6])
#slice from second object to the last
print(crazyarray[1:])
#slice from the beginning until the sixth element
print(crazyarray[:7])
1
[2 3 4 5 6]
[ 2 3 4 5 6 7 8 9 10 11 12]
[1 2 3 4 5 6 7]
If put a condition to an array, you will create an array of booleans (booleans will be true if condition is met)
example:
crazyarray < 3 will create a booleans array in which the numbers that are less than 3 will be true
#import numpy as np
crazylist = [1,2,3,4,5,6,7,8,9,10,11,12]
crazyarray = np.array(crazylist)
#let´s print an array of conditions
conditionals = crazyarray<3
print(conditionals)
[ True True False False False False False False False False False False]
Subset with conditions#
And finally, if you subset that list of booleans, you can get the actual elements from the array that meet the condition.
import numpy as np
crazylist = [1,2,3,4,5,6,7,8,9,10,11,12]
crazyarray = np.array(crazylist)
#let´s say that i want to print all the numbers less than 6
#this is the condition
crazyarray < 6
#this is the actual array that i want to print (subset the condition)
print(crazyarray[crazyarray < 6])
#if we want to set multiple conditions
print(crazyarray[(crazyarray>3) & (crazyarray<7)])
[1 2 3 4 5]
[4 5 6]
a third argument inside the indexing box, would indicate the number of steps:
#import numpy as np
crazylist = [1,2,3,4,5,6,7,8,9,10,11,12]
crazyarray = np.array(crazylist)
#three steps sequence
print(crazyarray[::3])
[ 1 4 7 10]
2D Numpy arrays (just arrays with rows and columns)#
Let´s create a numpy array from two lists, the lists have n objects. Then the array will have two rows and n columns.
The Structure to create a 2D np.array is:
np.array( [ [row_1] , [row_2], [row_3], [row_n] ] ).
creating a 2D array#
example: create an array of 3 rows and 3 columns with numbers from 1 to 9
my_array = np.array( [ [1,2,3] , [4,5,6] , [7,8,9] ] )
print(my_array)
#now print the number of rows and columns
print("this array has the following amount of rows and columns respectively:")
print(my_array.shape)
[[1 2 3]
[4 5 6]
[7 8 9]]
this array has the following amount of rows and columns respectively:
(3, 3)
another example:
import numpy as np
np_height = [1.73, 1.68, 1.71, 1.89, 1.79]
np_weight = [65.4, 59.2, 63.6, 88.4, 68.7]
#this is the 2D array
np_bigarray = np.array([np_height, np_weight])
print(np_bigarray)
print("rows and columns respectively are:")
print(np_bigarray.shape)
[[ 1.73 1.68 1.71 1.89 1.79]
[65.4 59.2 63.6 88.4 68.7 ]]
rows and columns respectively are:
(2, 5)
how to subset in a 2D array#
structure:
arrayname[rowindex, colindex]
or
arrayname[rowindex][colindex]
example:
#import numpy as np
np_height = [1.73, 1.68, 1.71, 1.89, 1.79]
np_weight = [65.4, 59.2, 63.6, 88.4, 68.7]
np_bigarray = np.array([np_height, np_weight])
b = np_bigarray[0,0] #note that this will be element located in first row and first col
alsob = np_bigarray[0][0] #same thing but with another structure
print(b)
print(alsob)
1.73
1.73
row-col subsetting#
Now your mission is print the element from row 2 and col 3.
remember that the array is:
[ [1.73, 1.68, 1.71, 1.89, 1.79], [65.4, 59.2, 63.6, 88.4, 68.7] ]
#import numpy as np
np_height = [1.73, 1.68, 1.71, 1.89, 1.79]
np_weight = [65.4, 59.2, 63.6, 88.4, 68.7]
np_bigarray = np.array([np_height, np_weight])
element = np_bigarray[1,2] #note that this will be element located in row 2 and col 3
print(element)
if element == 63.6:
print("Good job")
else:
print("F bro")
63.6
Good job
full row subsetting#
Let´s subset a full row:
array: [ [1.73, 1.68, 1.71, 1.89, 1.79], [65.4, 59.2, 63.6, 88.4, 68.7] ]
#import numpy as np
np_height = [1.73, 1.68, 1.71, 1.89, 1.79]
np_weight = [65.4, 59.2, 63.6, 88.4, 68.7]
np_bigarray = np.array([np_height, np_weight])
#first row
print(np_bigarray[0])
[1.73 1.68 1.71 1.89 1.79]
full column subsetting#
structure[ : , columnindex]
note that “:” means take all rows, then columnindex is just the index of the column that you wish
#import numpy as np
np_height = [1.73, 1.68, 1.71, 1.89, 1.79]
np_weight = [65.4, 59.2, 63.6, 88.4, 68.7]
np_bigarray = np.array([np_height, np_weight])
#first column
print(np_bigarray[ : , 0])
#third and fourth column
print(np_bigarray[ : , 2:4])
# 2:4 means:
#"take from column index 2 (which is actual column 3) to column index 3 (which is actual column 4)"
#btw the column index 4 is not included
[ 1.73 65.4 ]
[[ 1.71 1.89]
[63.6 88.4 ]]
Numpy statistical functions#
np.mean(array) #this can tell you the mean of your data np.median(array) #this can tell you the median of your data np.corrcoef(array[col1], [col2]) #this can tell you if col1 and col2 are correlated np.std(array) #standart deviation np.var(array) #variance
array = np.array([
[1.64, 71.78],
[1.37, 63.35],
[1.6 , 55.09],
[2.04, 74.85],
[2.04, 68.72],
[2.01, 73.57]
])
#calculate mean from first column
print("mean:")
print(np.mean(array[:, 0]))
#calculate median from first column
print("median:")
print(np.median(array[:, 0]))
#calculate correlation between first and second col
print("correlation:")
print(np.corrcoef(array[:, 0], array[:, 1] ))
#calculate stdev from first column
print("stdev:")
print(np.std(array[:, 0]))
mean:
1.7833333333333332
median:
1.8249999999999997
correlation:
[[1. 0.64924262]
[0.64924262 1. ]]
stdev:
0.2608107018935807
Copy an array (BE CAREFUL)#
supose we have an array of the numbers from 1 to 10
my_array = np.arange(0, 11)
print(my_array)
[ 0 1 2 3 4 5 6 7 8 9 10]
and suddenly we just want to work with the first 5 elements
sub_array = my_array[0:6]
print(sub_array)
[0 1 2 3 4 5]
ok, now i would like to modify the elements of sub_array
sub_array[:] = 0
print(sub_array)
[0 0 0 0 0 0]
nice! now, i just made the changes in my sub array
was my original array modified as well?
let’s see
print(my_array)
[ 0 0 0 0 0 0 6 7 8 9 10]
This is not what we want!!!!! the original array was modified!!!!
This can seriously damage your dataset
the solution is
.copy()
my_array_copy = my_array.copy()
print("before transformation:")
print("my_array", my_array)
print("my_array_copy", my_array_copy)
#let's transform my_array
my_array[:] = 0
print("after transformation:")
print("my_array", my_array)
print("my_array_copy", my_array_copy)
before transformation:
my_array [ 0 0 0 0 0 0 6 7 8 9 10]
my_array_copy [ 0 0 0 0 0 0 6 7 8 9 10]
after transformation:
my_array [0 0 0 0 0 0 0 0 0 0 0]
my_array_copy [ 0 0 0 0 0 0 6 7 8 9 10]
Numpy Useful Methods#
Generate data with a normal distribution#
you can generate data with a normal distribution with np.random.normal()
structure: np.random.normal (mean, sd, sample_size)
import numpy as np
#let´s create an array of height with mean 1.75 and sd = 0.20 for 5000 people
height = np.random.normal(1.75, 0.20, 5000)
print(height)
[1.83487471 1.59298401 1.97363294 ... 1.82536185 1.07024984 1.82003362]
Generate a random number between 0 and 1#
#with np.random.rand() you generate a random number between 0 and 1
import numpy as np
print(np.random.rand())
#if you add an argument n, it will generate an array containing n random numbers between 0 and 1
print( np.random.rand(5) )
#if you add two arguments m,n. It will generate an array of random numbers between 0 & 1 and
#m columns x n rows
print( np.random.rand(5,7) )
0.7217664441405948
[0.41969204 0.51746925 0.47584953 0.28610208 0.76432213]
[[0.18397203 0.738653 0.83709204 0.49924172 0.34224397 0.42791657
0.92612386]
[0.08081806 0.1280082 0.19119259 0.25129989 0.58912103 0.92936342
0.97576107]
[0.91017151 0.35406689 0.7114037 0.84794622 0.9467253 0.00920893
0.72588155]
[0.3991321 0.12905765 0.4995936 0.50861837 0.70815928 0.97624481
0.82421081]
[0.33757498 0.33519419 0.05755154 0.72333293 0.53912569 0.55919818
0.54597557]]
Generate a random number between a and b#
import numpy as np
#random between a & b
print( np.random.randint(1,20) )
#1 = a & 20 = b
#random ARRAY between a & b
print( np.random.randint(1,20, (5,5)) )
#the last tuple (5,5) just indicates the dimensions of the array
6
[[17 3 10 5 19]
[14 5 4 16 1]
[12 19 1 2 14]
[ 1 7 11 17 10]
[10 3 17 1 18]]
Concatenate arrays#
import numpy as np
a = np.array([[1,2], [3,4]])
b = np.array([5,6])
b = np.expand_dims(b, axis=0)
c = np.concatenate( (a, b), axis=0 )
print(a)
print(b)
print(c)
[[1 2]
[3 4]]
[[5 6]]
[[1 2]
[3 4]
[5 6]]
merge two arrays into a 2D array#
import numpy as np
#let´s create an array of height with mean 1.75 and sd = 0.20 for 5000 people
height = np.random.normal(1.75, 0.20, 5000)
#let´s create an array of weight with mean 60.32 and sd = 15 for 5000 people
weight = np.random.normal(60.32, 15, 5000)
np_city = np.column_stack((height, weight))
print(np_city)
[[ 1.64934121 55.92081081]
[ 1.65947285 89.04802018]
[ 1.82978068 74.56895188]
...
[ 1.69891834 59.34915382]
[ 1.64297808 46.20537603]
[ 1.72259225 58.23620619]]
Generate an array given an interval#
the method np.arange is used to create an array given a numeric interval.
Structure:
np.arange(leftlimit, rightlimit, steps)
#Let´s create an array from 0 to 100
import numpy as np
a = np.arange(0, 101)
print(a)
#now let´s create an array from 0 to 100 every 2 steps
b = np.arange(0, 101, 2)
print(b)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100]
[ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
72 74 76 78 80 82 84 86 88 90 92 94 96 98 100]
Create useful arrays#
import numpy as np
#this is a zero matrix
zerosmatrix = np.zeros((10, 10)) #the arguments are the dimensions, 10 rows & 10 cols
print(zerosmatrix)
#this is a ones matrix
onesmatrix = np.ones((10, 10)) #the arguments are the dimensions, 10 rows & 10 cols
print(onesmatrix)
#and now let´s get an equally distributed matrix
eqdistmat = np.linspace(0, 11, 6) #arguments mean (from cero, to 11, 6 numbers)
print(eqdistmat)
#you can also create the idenity matrix
identitymatrix = np.eye(3) #the identity matrix will be 3x3 in this case
print(identitymatrix)
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
[ 0. 2.2 4.4 6.6 8.8 11. ]
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
Know the array dimensions#
# you can use the np.shape() method to know the dimensions of the array.
import numpy as np
#let´s create a 1D array
firstarray = np.array([1,2,3,4,5])
print(firstarray.shape)
# notice that this has 5 columns and 1 row
#now let´s do a 2D array
secondarray = np.array(
[
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]
]
)
print(secondarray.shape)
#note that this array is 3 rows & 4 cols
(5,)
(3, 4)
Reshape an array#
import numpy as np
# let´s modify the dimensions of an array
originalarray = np.array(
[
[1,2],
[3,4],
[5,6]
]
)
print(originalarray)
#let´s reshape to 1 row and 6 col
firstmodarray = originalarray.reshape(1,6)
print(firstmodarray)
#let´s reshape to 2 rows and 3 cols
secmodarray = np.reshape(originalarray,(2,3), "C")
#IMPORTANT: third argument indicates "organize as the C language does"
#C is the base of python
print(secmodarray)
#let´s reshape to 2 rows and 3 cols
thirdmodarray = np.reshape(originalarray,(2,3), "F")
#IMPORTANT: third argument indicates "organize as the Fortran language does"
print(thirdmodarray)
#let´s reshape to 2 rows and 3 cols
thirdmodarray = np.reshape(originalarray,(2,3), "A")
#IMPORTANT: third argument indicates "organize as the language with the most optimum storage"
#in my device
print(thirdmodarray)
[[1 2]
[3 4]
[5 6]]
[[1 2 3 4 5 6]]
[[1 2 3]
[4 5 6]]
[[1 5 4]
[3 2 6]]
[[1 2 3]
[4 5 6]]
Numpy main functions#
.max() .min() .argmax() & .argmin()#
import numpy as np
#let's create an array
arr = np.random.randint(1, 20, 10)
print("array:")
print(arr)
#let's create a matrix
matrixx = arr.reshape(2, 5)
print("matrix:")
print(matrixx)
#.max(): in an array, returns the largest number in our array
print("array maximum element:")
print(arr.max())
#.max(): in a matrix, returns the largest number in the given the dimension
print(".max() column level:")
print(matrixx.max(0))
print(".max() row level:")
print(matrixx.max(1))
#.argmax(): returns the index that contains the largest number given the dimension
print(".argmax() column level:")
print(matrixx.argmax(0))
print(".argmax() row level:")
print(matrixx.argmax(1))
array:
[19 8 17 8 16 9 4 18 12 13]
matrix:
[[19 8 17 8 16]
[ 9 4 18 12 13]]
array maximum element:
19
.max() column level:
[19 8 18 12 16]
.max() row level:
[19 18]
.argmax() column level:
[0 0 1 1 0]
.argmax() row level:
[0 2]
.ptp()#
import numpy as np
#let's create an array
arr = np.random.randint(1, 20, 10)
print("array:")
print(arr)
#let's create a matrix
matrixx = arr.reshape(2, 5)
print("matrix:")
print(matrixx)
#ptp():tells you what is the peak to peak distance (meaning maxvalue - minvalue)
print("given the array", arr, ", the .ptp() method returns:" )
print(arr.ptp())
print("which equals", arr.max(), "-", arr.min())
#in a matrix
#column level
print(".ptp() in column level:")
print(matrixx.ptp(0))
#row level
print(".ptp() in row level:")
print(matrixx.ptp(1))
array:
[15 13 1 7 3 10 13 15 2 9]
matrix:
[[15 13 1 7 3]
[10 13 15 2 9]]
given the array [15 13 1 7 3 10 13 15 2 9] , the .ptp() method returns:
14
which equals 15 - 1
.ptp() in column level:
[ 5 0 14 5 6]
.ptp() in row level:
[14 13]
.percentile()#
import numpy as np
#let's create an array
arr = np.random.randint(1, 20, 10)
print("array:")
print(arr)
#let's create a matrix
matrixx = arr.reshape(2, 5)
print("matrix:")
print(matrixx)
#percentile zero
print(np.percentile(arr, 0))
#percentile fifty
print(np.percentile(arr, 50))
#percentile 100
print(np.percentile(arr, 100))
array:
[14 9 1 7 5 8 6 9 9 7]
matrix:
[[14 9 1 7 5]
[ 8 6 9 9 7]]
1.0
7.5
14.0