PANDA:基于多列对数据表的行运行计算,并将输出存储在新列中

2022-04-15 00:00:00 python pandas dataframe distance haversine

问题描述

我正在尝试计算两个位置之间的距离,我已经得到了这两个目的地的经度和纬度。在我的CSV中,我有4列(LAT1、LON1、LAT2、LON2),我如何应用下面的代码,以便用下面的代码计算出的距离创建名为‘Distance’的第5列?

import math
from math import sin, cos, sqrt, atan2, radians

# approximate radius of earth in km
R = 6373.0

#Test
lat1 = radians(25.2296756)
lon1 = radians(36.0122287)
lat2 = radians(51.406374)
lon2 = radians(20.9251681)

dlon = lon2 - lon1
dlat = lat2 - lat1

a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))

distance = R * c

print("Result:", distance)
print("Should be:", 3181.11, "km")

数据帧:

df = pd.DataFrame({'Normalised': {(0, 'London,', 'United', 'Kingdom'): '-',
  (1, 'Johannesburg,', 'South', 'Africa'): '-',
  (2, 'London,', 'United', 'Kingdom'): '-',
  (3, 'Johannesburg,', 'South', 'Africa'): '-',
  (4, 'London,', 'United', 'Kingdom'): '-'},
 'City': {(0, 'London,', 'United', 'Kingdom'): 'New',
  (1, 'Johannesburg,', 'South', 'Africa'): 'London,',
  (2, 'London,', 'United', 'Kingdom'): 'New',
  (3, 'Johannesburg,', 'South', 'Africa'): 'London,',
  (4, 'London,', 'United', 'Kingdom'): 'Singapore,'},
 'Pair': {(0, 'London,', 'United', 'Kingdom'): 'York,',
  (1, 'Johannesburg,', 'South', 'Africa'): 'United',
  (2, 'London,', 'United', 'Kingdom'): 'York,',
  (3, 'Johannesburg,', 'South', 'Africa'): 'United',
  (4, 'London,', 'United', 'Kingdom'): 'Singapore'},
 'Departure': {(0, 'London,', 'United', 'Kingdom'): 'United',
  (1, 'Johannesburg,', 'South', 'Africa'): 'Ki...',
  (2, 'London,', 'United', 'Kingdom'): 'United',
  (3, 'Johannesburg,', 'South', 'Africa'): 'Ki...',
  (4, 'London,', 'United', 'Kingdom'): 'SIN'},
 'Code': {(0, 'London,', 'United', 'Kingdom'): 'Stat.',
  (1, 'Johannesburg,', 'South', 'Africa'): 'JNB',
  (2, 'London,', 'United', 'Kingdom'): 'Stat',
  (3, 'Johannesburg,', 'South', 'Africa'): 'JNB',
  (4, 'London,', 'United', 'Kingdom'): 'LHR'},
 'Arrival': {(0, 'London,', 'United', 'Kingdom'): 'LHR',
  (1, 'Johannesburg,', 'South', 'Africa'): 'LHR',
  (2, 'London,', 'United', 'Kingdom'): 'LHR',
  (3, 'Johannesburg,', 'South', 'Africa'): 'LHR',
  (4, 'London,', 'United', 'Kingdom'): '1.3'},
 'Code.1': {(0, 'London,', 'United', 'Kingdom'): 'JFK',
  (1, 'Johannesburg,', 'South', 'Africa'): '-26.1',
  (2, 'London,', 'United', 'Kingdom'): 'JFK',
  (3, 'Johannesburg,', 'South', 'Africa'): '-26.1',
  (4, 'London,', 'United', 'Kingdom'): '103.98'},
 'Departure_lat': {(0, 'London,', 'United', 'Kingdom'): 51.5,
  (1, 'Johannesburg,', 'South', 'Africa'): 28.23,
  (2, 'London,', 'United', 'Kingdom'): 51.5,
  (3, 'Johannesburg,', 'South', 'Africa'): 28.23,
  (4, 'London,', 'United', 'Kingdom'): 51.47},
 'Departure_lon': {(0, 'London,', 'United', 'Kingdom'): -0.45,
  (1, 'Johannesburg,', 'South', 'Africa'): 51.47,
  (2, 'London,', 'United', 'Kingdom'): -0.45,
  (3, 'Johannesburg,', 'South', 'Africa'): 51.47,
  (4, 'London,', 'United', 'Kingdom'): -0.45},
 'Arrival_lat': {(0, 'London,', 'United', 'Kingdom'): 40.64,
  (1, 'Johannesburg,', 'South', 'Africa'): -0.45,
  (2, 'London,', 'United', 'Kingdom'): 40.64,
  (3, 'Johannesburg,', 'South', 'Africa'): -0.45,
  (4, 'London,', 'United', 'Kingdom'): np.nan},
 'Arrival_lon': {(0, 'London,', 'United', 'Kingdom'): -73.79,
  (1, 'Johannesburg,', 'South', 'Africa'): np.nan,
  (2, 'London,', 'United', 'Kingdom'): -73.79,
  (3, 'Johannesburg,', 'South', 'Africa'): np.nan,
  (4, 'London,', 'United', 'Kingdom'): np.nan}})

解决方案

您可以为距离计算定义一个自定义函数。然后,使用.apply()对每行调用并应用该函数,以获得每行的距离。

1.定义距离计算的自定义函数,如下所示:

import math
from math import sin, cos, sqrt, atan2, radians

def get_distance(in_lat1, in_lon1, in_lat2, in_lon2):
    # approximate radius of earth in km
    R = 6373.0

    lat1 = radians(in_lat1)
    lon1 = radians(in_lon1)
    lat2 = radians(in_lat2)
    lon2 = radians(in_lon2)

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = R * c

    return distance

2.使用.apply()对每行调用并应用函数,得到每行的距离,如下所示:

df['Distance'] = df.apply(lambda x: get_distance(x['Departure_lat'], x['Departure_lon'], x['Arrival_lat'], x['Arrival_lon']), axis=1)

演示

输入数据帧

        City  Departure_lat  Departure_lon  Arrival_lat  Arrival_lon
0  CityName1      25.229676      36.012229    51.406374    20.925168

输出

        City  Departure_lat  Departure_lon  Arrival_lat  Arrival_lon    Distance
0  CityName1      25.229676      36.012229    51.406374    20.925168  3181.11039

相关文章