Fastest way of creating and sorting the timestamp data with Python?

Issue

Lets say I will have two arrays. The first row would specify the timestamp and 2nd row would be data.

timeStamp = ['0001','0002','0003',...,'9999']

data = [6234,2372,1251,...,5172]

What would be the best way to store them? And let’s say I would like to sort the data from smallest to bigger number with keeping their timestamp values attached to them?

Solution

Multiple ways of doing this. Let’s take the following data –

timeStamp = [9,1,2,3,9999]
data = [1245, 6234,2372,1251,5172]

Using base python and zip

The default way of handling data, specifically lists. zip method allows you to quite literally zip two or more lists element-wise, creating a list of tuples. You can then use sorted with a lamda function that sorts the combined lists by specific position of the element.

l = zip(timeStamp, data) #storing 2 arrays by attaching them elementwise
print(sorted(l, key=lambda x: x))
[(1, 6234), (2, 2372), (3, 1251), (9, 1245), (9999, 5172)]

Using numpy and argsort

Numpy allows you to work with multidimensional arrays. For 2 lists, you can simply np.stack them together to create a 2D array.

In order to sort, you can use argsort() on the first column (timestamp) which returns the indexes of the sorted ordered column. Then you can use these indexes to index the original 2D array to get the sorted order for the array by Timestamps.

arr = np.stack([timeStamp, data])
arr[:,arr.argsort()]
array([[   1,    2,    3,    9, 9999],
[6234, 2372, 1251, 1245, 5172]])

Using pandas datafames and sort_values

Finally, best way to work on multiple lists in conjunction is to consider them as columns in a DataFrame. Pandas provides a handy framework to work with column/row arranged data which in this case is very useful as you can also use column names to identify each array/column.

The sort_values allows you to quickly sort the complete data based on the column name.

import pandas as pd

df = pd.DataFrame(zip(timeStamp, data), columns=['timeStamp','data'])
print(df.sort_values('timeStamp'))
timeStamp  data
1          1  6234
2          2  2372
3          3  1251
0          9  1245
4       9999  5172