OnTime
This dataset contains data from Bureau of Transportation Statistics.
Creating a table
Import from Raw Data
Downloading data:
Loading data with multiple threads:
(if you will have memory shortage or other issues on your server, remove the -P $(nproc)
part)
Import from a saved copy
Alternatively, you can import data from a saved copy by the following query:
The snapshot was created on 2022-05-29.
Queries
Q0.
Q1. The number of flights per day from the year 2000 to 2008
Q2. The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008
Q3. The number of delays by the airport for 2000-2008
Q4. The number of delays by carrier for 2007
Q5. The percentage of delays by carrier for 2007
Better version of the same query:
Q6. The previous request for a broader range of years, 2000-2008
Better version of the same query:
Q7. Percentage of flights delayed for more than 10 minutes, by year
Better version of the same query:
Q8. The most popular destinations by the number of directly connected cities for various year ranges
Q9.
Q10.
Bonus:
You can also play with the data in Playground, example.
This performance test was created by Vadim Tkachenko. See:
- https://www.percona.com/blog/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/
- https://www.percona.com/blog/2009/10/26/air-traffic-queries-in-luciddb/
- https://www.percona.com/blog/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/
- https://www.percona.com/blog/2014/04/21/using-apache-hadoop-and-impala-together-with-mysql-for-data-analysis/
- https://www.percona.com/blog/2016/01/07/apache-spark-with-air-ontime-performance-data/
- http://nickmakos.blogspot.ru/2012/08/analyzing-air-traffic-performance-with.html