关联独立事件的序列 - 计算时间交叉点

2022-01-10 00:00:00 sql powerbi sql-server dax

我们正在构建一个 PowerBI 报告解决方案,我(以及 Stack)解决了

如果你不完全理解度量表达式,这里是T-SQL翻译:

选择dt.VehicleID,dt.ReasonCodeID,dt.LocationID,SUM(dt.Diff) [总天数]从(选择案件当 a.StartDateTime <= b.StartDateTime AND a.EndDateTime >= b.EndDateTime -- 在范围内然后 DATEDIFF(天,b.StartDateTime,b.EndDateTime)当 a.StartDateTime >b.StartDateTime AND a.EndDateTime >= b.EndDateTime -- |-----|*****|....|THEN DATEDIFF(DAY, a.StartDateTime, b.EndDateTime)当 a.StartDateTime <= b.StartDateTime AND a.EndDateTime <= b.EndDateTime -- |...|****|-----|然后 DATEDIFF(天,b.StartDateTime,a.EndDateTime)当 a.StartDateTime >= b.StartDateTime AND a.EndDateTime <= b.EndDateTime -- |---|****|-----然后 DATEDIFF(天,a.StartDateTime,a.EndDateTime)结束差异,a.车辆ID,b.ReasonCodeID,a.LocationID --a.StartDateTime, a.EndDateTime, b.StartDateTime, b.EndDateTimeFROM LocationEventsCROSS JOIN VehicleStatusEvents b其中 a.VehicleID = b.VehicleID和((a.StartDateTime <= b.EndDateTime)AND (a.EndDateTime >= b.StartDateTime))) dt按 dt.VehicleID 分组,dt.ReasonCodeID,dt.LocationID

注意在 T-SQL 中你也可以使用 INNER JOIN 操作符.

如果这有帮助,请告诉我.

We are building a PowerBI reporting solution and I (well Stack) solved one problem and the business came up with a new reporting idea. Not sure of the best way to approach it as I know very little about PowerBI and the business seems to want quite complex reports.

We have two sequences of events from separate data sources. They both contain independent events occurring to vehicles. One describes what location a vehicle is within - the other describes incident events which have a reason code for the incident. The business wants to report on time spent in each location for each reason. Vehicles can change location totally independent of the incident events occurring - and events actually are datetime and occur at random points throughtout day. Each type of event has a startime/endtime and a vehicleID.

Vehicle Location Events

+------------------+-----------+------------+-----------------+----------------+
| LocationDetailID | VehicleID | LocationID |  StartDateTime  |  EndDateTime   |
+------------------+-----------+------------+-----------------+----------------+
|                1 |         1 |          1 |        2012-1-1 |      2016-1-1  |
|                2 |         1 |          2 |        2016-1-1 |      2016-4-1  |
|                3 |         1 |          1 |        2016-4-1 |      2016-11-1 |
|                4 |         2 |          1 |        2011-1-1 |      2016-11-1 |
+------------------+-----------+------------+-----------------+----------------+

Vehicle Status Events

+---------+---------------+-------------+-----------+--------------+
| EventID | StartDateTime | EndDateTime | VehicleID | ReasonCodeID |
+---------+---------------+-------------+-----------+--------------+
|       1 | 2012-1-1      | 2013-1-1    |         1 |            1 |
|       2 | 2013-1-1      | 2015-1-1    |         1 |            3 |
|       3 | 2015-1-1      | 2016-5-1    |         1 |            4 |
|       4 | 2016-5-1      | 2016-11-1   |         1 |            2 |
|       5 | 2015-9-1      | 2016-2-1    |         2 |            1 |
+---------+---------------+-------------+-----------+--------------+

Is there anyway I can correlate the two streams together and calculate total time per Vehicle per ReasonCode per location? This would seem to require me to be able to relate the two events - so a change of location may occur part way through a given ReasonCode.

Calculation Example ReasonCodeID 4

  • VehicleID 1 is in location ID 1 from 2012-1-1 to 2016-1-1 and 2016-4-1 to 2016-11-1
  • VehicleID 1 is in location ID 2 from 2016-1-1 to 2016-4-1
  • VehcileID 1 has ReasonCodeID 4 from 2015-1-1 to 2016-5-1

Therefore first Period in location 1 intersects with 365 days of ReasonCodeID 4 (2015-1-1 to 2016-1-1). 2nd period in location 1 intersects with 30 days (2016-4-1 to 2016-5-1). In location 2 intersects with 91 days of ReasonCodeID 4(2016-1-1 to 2016-4-1

Desired output would be the below.

+-----------+--------------+------------+------------+
| VehicleID | ReasonCodeID | LocationID | Total Days |
+-----------+--------------+------------+------------+
|         1 |            1 |          1 |        366 |
|         1 |            3 |          1 |        730 |
|         1 |            4 |          1 |        395 |
|         1 |            4 |          2 |         91 |
|         1 |            2 |          1 |        184 |
|         2 |            1 |          1 |        154 |
+-----------+--------------+------------+------------+

I have created a SQL fiddle that shows the structure here

Vehicles have related tables and I'm sure the business will want them grouped by vehicle class etc but if I can understand how to calculate the intersection points in this case that would give me the basis for rest of reporting.

解决方案

I think this solution requires a CROSS JOIN implementation. The relationship between both tables is Many to Many which implies the creation of a third table that bridges LocationEvents and VehicleStatusEvents tables so I think specifying the relationship in the expression could be easier.

I use a CROSS JOIN between both tables, then filter the results only to get those rows which VehicleID columns are the same in both tables. I am also filtering the rows that VehicleStatusEvents range dates intersects LocationEvents range dates.

Once the filtering is done I am adding a column to calculate the count of days between each intersection. Finally, the measure sums up the days for each VehicleID, ReasonCodeID and LocationID.

In order to implement the CROSS JOIN you will have to rename the VehicleID, StartDateTime and EndDateTime on any of both tables. It is necessary for avoiding ambigous column names errors.

I rename the columns as follows:

VehicleID : LocationVehicleID and StatusVehicleID
StartDateTime : LocationStartDateTime and StatusStartDateTime
EndDateTime : LocationEndDateTime and StatusEndDateTime

After this you can use CROSSJOIN in the Total Days measure:

Total Days =
SUMX (
    FILTER (
        ADDCOLUMNS (
            FILTER (
                CROSSJOIN ( LocationEvents, VehicleStatusEvents ),
                LocationEvents[LocationVehicleID] = VehicleStatusEvents[StatusVehicleID]
                    && LocationEvents[LocationStartDateTime] <= VehicleStatusEvents[StatusEndDateTime]
                    && LocationEvents[LocationEndDateTime] >= VehicleStatusEvents[StatusStartDateTime]
            ),
            "CountOfDays", IF (
                [LocationStartDateTime] <= [StatusStartDateTime]
                    && [LocationEndDateTime] >= [StatusEndDateTime],
                DATEDIFF ( [StatusStartDateTime], [StatusEndDateTime], DAY ),
                IF (
                    [LocationStartDateTime] > [StatusStartDateTime]
                        && [LocationEndDateTime] >= [StatusEndDateTime],
                    DATEDIFF ( [LocationStartDateTime], [StatusEndDateTime], DAY ),
                    IF (
                        [LocationStartDateTime] <= [StatusStartDateTime]
                            && [LocationEndDateTime] <= [StatusEndDateTime],
                        DATEDIFF ( [StatusStartDateTime], [LocationEndDateTime], DAY ),
                        IF (
                            [LocationStartDateTime] >= [StatusStartDateTime]
                                && [LocationEndDateTime] <= [StatusEndDateTime],
                            DATEDIFF ( [LocationStartDateTime], [LocationEndDateTime], DAY ),
                            BLANK ()
                        )
                    )
                )
            )
        ),
        LocationEvents[LocationID] = [LocationID]
            && VehicleStatusEvents[ReasonCodeID] = [ReasonCodeID]
    ),
    [CountOfDays]
)

Then in Power BI you can build a matrix (or any other visualization) using this measure:

If you don't understand completely the measure expression, here is the T-SQL translation:

SELECT
    dt.VehicleID,
    dt.ReasonCodeID,
    dt.LocationID,
    SUM(dt.Diff) [Total Days]
FROM 
(
    SELECT
        CASE
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- Inside range
               THEN DATEDIFF(DAY, b.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime > b.StartDateTime AND a.EndDateTime >= b.EndDateTime  -- |-----|*****|....|
               THEN DATEDIFF(DAY, a.StartDateTime, b.EndDateTime)
            WHEN a.StartDateTime <= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |...|****|-----|
               THEN DATEDIFF(DAY, b.StartDateTime, a.EndDateTime)
            WHEN a.StartDateTime >= b.StartDateTime AND a.EndDateTime <= b.EndDateTime  -- |---|****|-----
               THEN DATEDIFF(DAY, a.StartDateTime, a.EndDateTime)
        END Diff,
        a.VehicleID,
        b.ReasonCodeID,
        a.LocationID --a.StartDateTime, a.EndDateTime, b.StartDateTime, b.EndDateTime
    FROM LocationEvents a
        CROSS JOIN VehicleStatusEvents b
    WHERE a.VehicleID = b.VehicleID
        AND 
        (
            (a.StartDateTime <= b.EndDateTime)
                AND (a.EndDateTime >= b.StartDateTime)
        )
) dt
GROUP BY dt.VehicleID,
         dt.ReasonCodeID,
         dt.LocationID

Note in T-SQL you could use an INNER JOIN operator too.

Let me know if this helps.

相关文章