如何创建一个新列,用于计算特定行中两列值之间的随机整数。
示例df:
import pandas as pd import numpy as np data = pd.DataFrame({'start': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'end': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}) data = data.iloc[:, [1, 0]]结果:
现在我正在尝试这样的事情:
data['rand_between'] = data.apply(lambda x: np.random.randint(data.start, data.end))要么
data['rand_between'] = np.random.randint(data.start, data.end)但它当然不起作用,因为data.start是一个系列而不是数字。 我怎样才能将numpy.random与列中的数据一起用作向量化操作?
How can I create a new column that calculates random integer between values of two columns in particular row.
Example df:
import pandas as pd import numpy as np data = pd.DataFrame({'start': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'end': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}) data = data.iloc[:, [1, 0]]Result:
Now I am trying something like this:
data['rand_between'] = data.apply(lambda x: np.random.randint(data.start, data.end))or
data['rand_between'] = np.random.randint(data.start, data.end)But it doesn't work of course because data.start is a Series not a number. how can I used numpy.random with data from columns as vectorized operation?
最满意答案
您已经接近,需要按行为进程数据指定axis=1并将data.start/end更改为x.start/end以使用标量:
data['rand_between'] = data.apply(lambda x: np.random.randint(x.start, x.end), axis=1)另一种可能的方案
data['rand_between'] = [np.random.randint(s, e) for s,e in zip(data['start'], data['end'])]print (data) start end rand_between 0 1 10 8 1 2 20 3 2 3 30 23 3 4 40 35 4 5 50 30 5 6 60 28 6 7 70 60 7 8 80 14 8 9 90 85 9 10 100 83
You are close, need specify axis=1 for process data by rows and change data.start/end to x.start/end for working with scalars:
data['rand_between'] = data.apply(lambda x: np.random.randint(x.start, x.end), axis=1)Another possible solution:
data['rand_between'] = [np.random.randint(s, e) for s,e in zip(data['start'], data['end'])]print (data) start end rand_between 0 1 10 8 1 2 20 3 2 3 30 23 3 4 40 35 4 5 50 30 5 6 60 28 6 7 70 60 7 8 80 14 8 9 90 85 9 10 100 83Pandas:两列中值之间的随机整数(Pandas: Random integer between values in two columns)
如何创建一个新列,用于计算特定行中两列值之间的随机整数。
示例df:
import pandas as pd import numpy as np data = pd.DataFrame({'start': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'end': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}) data = data.iloc[:, [1, 0]]结果:
现在我正在尝试这样的事情:
data['rand_between'] = data.apply(lambda x: np.random.randint(data.start, data.end))要么
data['rand_between'] = np.random.randint(data.start, data.end)但它当然不起作用,因为data.start是一个系列而不是数字。 我怎样才能将numpy.random与列中的数据一起用作向量化操作?
How can I create a new column that calculates random integer between values of two columns in particular row.
Example df:
import pandas as pd import numpy as np data = pd.DataFrame({'start': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'end': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]}) data = data.iloc[:, [1, 0]]Result:
Now I am trying something like this:
data['rand_between'] = data.apply(lambda x: np.random.randint(data.start, data.end))or
data['rand_between'] = np.random.randint(data.start, data.end)But it doesn't work of course because data.start is a Series not a number. how can I used numpy.random with data from columns as vectorized operation?
最满意答案
您已经接近,需要按行为进程数据指定axis=1并将data.start/end更改为x.start/end以使用标量:
data['rand_between'] = data.apply(lambda x: np.random.randint(x.start, x.end), axis=1)另一种可能的方案
data['rand_between'] = [np.random.randint(s, e) for s,e in zip(data['start'], data['end'])]print (data) start end rand_between 0 1 10 8 1 2 20 3 2 3 30 23 3 4 40 35 4 5 50 30 5 6 60 28 6 7 70 60 7 8 80 14 8 9 90 85 9 10 100 83
You are close, need specify axis=1 for process data by rows and change data.start/end to x.start/end for working with scalars:
data['rand_between'] = data.apply(lambda x: np.random.randint(x.start, x.end), axis=1)Another possible solution:
data['rand_between'] = [np.random.randint(s, e) for s,e in zip(data['start'], data['end'])]print (data) start end rand_between 0 1 10 8 1 2 20 3 2 3 30 23 3 4 40 35 4 5 50 30 5 6 60 28 6 7 70 60 7 8 80 14 8 9 90 85 9 10 100 83
发布评论