Analysis of airport utilization using PANDAS

I am trying to find a vectorized (or more efficient) solution to an iteration problem, where the only solution I found requires row by row iteration of a DataFrame with multiple loops. The actual data file is huge, so my current solution is practically unfeasible. I included line profiler outputs at the very end, if you’d like to have a look.

The main issue seems to be the increased overhead due to frequent Pandas access for filtering operations, especially when creating the temporary DataFrame dfTemp. Any ideas on modifying the algorithm and minimizing Pandas usage are welcome. I am also open to Numpy/Cython/Numba implementations.

Problem Statement

An airport has two landing strips side by side. Each plane lands (arrival time), taxis on one of the landing strips for a while, then takes off (departure time). Looking for solutions to two cases:

  1. Build a list of events where there are more than one plane present on a single strip at a time. Do not include subsets of events (e.g. do not show [3,4] if there is a valid [3,4,5] case). The list should store the indices of the actual DataFrame rows. See function findSingleEvents() for a solution for this case (runs around 5 ms).

  2. Build a list of events where there is at least one plane on each strip at a time. Do not count subsets of an event, only record the event with maximum number of planes. (e.g. do not show [3,4] if there is a [3,4,5] case). Do not count events that fully occur on a single strip. The list should store the indices of the actual DataFrame rows. See function findMultiEvents() for a solution for this case (runs around 15 ms).

Code with sample input

import numpy as np import pandas as pd import itertools from __future__ import division  data =  [{'PLANE':0, 'STRIP':1, 'ARRIVAL':85.00, 'DEPARTURE':86.00},          {'PLANE':1, 'STRIP':1, 'ARRIVAL':87.87, 'DEPARTURE':92.76},          {'PLANE':2, 'STRIP':2, 'ARRIVAL':88.34, 'DEPARTURE':89.72},          {'PLANE':3, 'STRIP':1, 'ARRIVAL':88.92, 'DEPARTURE':90.88},          {'PLANE':4, 'STRIP':2, 'ARRIVAL':90.03, 'DEPARTURE':92.77},          {'PLANE':5, 'STRIP':2, 'ARRIVAL':90.27, 'DEPARTURE':91.95},          {'PLANE':6, 'STRIP':2, 'ARRIVAL':92.42, 'DEPARTURE':93.58},          {'PLANE':7, 'STRIP':2, 'ARRIVAL':94.42, 'DEPARTURE':95.58}]  df = pd.DataFrame(data, columns = ['PLANE','STRIP','ARRIVAL','DEPARTURE'])  def findSingleEvents(df):     events = []     for row in df.itertuples():         #Create temporary dataframe for each main iteration         dfTemp = df[(row.DEPARTURE>df.ARRIVAL) & (row.ARRIVAL<df.DEPARTURE)]         if len(dfTemp)>1:             #convert index values to integers from long             current_event = [int(v) for v in dfTemp.index.tolist()]             #loop backwards to remove elements that do not comply             for i in reversed(current_event):                 if (dfTemp.loc[i].ARRIVAL > dfTemp.DEPARTURE).any():                     current_event.remove(i)             events.append(current_event)     #remove duplicate events     events = map(list, set(map(tuple, events)))     return events  def findMultiEvents(df):     events = []     for row in df.itertuples():         #Create temporary dataframe for each main iteration         dfTemp = df[(row.DEPARTURE>df.ARRIVAL) & (row.ARRIVAL<df.DEPARTURE)]         if len(dfTemp)>1:             #convert index values to integers from long             current_event = [int(v) for v in dfTemp.index.tolist()]             #loop backwards to remove elements that do not comply             for i in reversed(current_event):                 if (dfTemp.loc[i].ARRIVAL > dfTemp.DEPARTURE).any():                     current_event.remove(i)             #remove elements only on 1 strip             if len(df.iloc[current_event].STRIP.unique()) > 1:                 events.append(current_event)     #remove duplicate events     events = map(list, set(map(tuple, events)))     return events  print findSingleEvents(df[df.STRIP==1]) print findSingleEvents(df[df.STRIP==2]) print findMultiEvents(df) 

Output

[[1, 3]] [[4, 5], [4, 6]] [[1, 3, 4, 5], [1, 4, 6], [1, 2, 3]] 

Line Profiler Logs

%lprun -f findSingleEvents findSingleEvents(df[df.STRIP==1])  Timer unit: 2.85099e-07 s  Total time: 0.0172055 s File: <ipython-input-33-220dd9d5b99b> Function: findSingleEvents at line 1  Line #      Hits         Time  Per Hit   % Time  Line Contents ==============================================================      1                                           def findSingleEvents(df):      2         1          9.0      9.0      0.0      events = []      3         4       8702.0   2175.5     14.4      for row in df.itertuples():      4         3      31604.0  10534.7     52.4          dfTemp = df[(row.DEPARTURE>df.ARRIVAL) & (row.ARRIVAL<df.DEPARTURE)]      5         3         65.0     21.7      0.1          if len(dfTemp)>1:      6         6        334.0     55.7      0.6              current_event = [int(v) for v in dfTemp.index.tolist()]      7         6         50.0      8.3      0.1              for i in reversed(current_event):      8         4      19537.0   4884.2     32.4                  if (dfTemp.loc[i].ARRIVAL > dfTemp.DEPARTURE).any():      9                                                               current_event.remove(i)     10         2         12.0      6.0      0.0              events.append(current_event)     11         1         33.0     33.0      0.1      events = map(list, set(map(tuple, events)))     12         1          3.0      3.0      0.0      return events  %lprun -f findMultiEvents findMultiEvents(df)  Timer unit: 2.85099e-07 s  Total time: 0.0532152 s File: <ipython-input-28-97265d757453> Function: findMultiEvents at line 1  Line #      Hits         Time  Per Hit   % Time  Line Contents ==============================================================      1                                           def findMultiEvents(df):      2         1         18.0     18.0      0.0      events = []      3         9      21661.0   2406.8     11.6      for row in df.itertuples():      4         8      60694.0   7586.8     32.5          dfTemp = df[(row.DEPARTURE>df.ARRIVAL) & (row.ARRIVAL<df.DEPARTURE)]      5         8        145.0     18.1      0.1          if len(dfTemp)>1:      6        32       1208.0     37.8      0.6              current_event = [int(v) for v in dfTemp.index.tolist()]      7        32        152.0      4.8      0.1              for i in reversed(current_event):      8        26      87007.0   3346.4     46.6                  if (dfTemp.loc[i].ARRIVAL > dfTemp.DEPARTURE).any():      9         6         67.0     11.2      0.0                      current_event.remove(i)     10         6      15636.0   2606.0      8.4              if len(df.iloc[current_event].STRIP.unique()) > 1:     11         6         38.0      6.3      0.0                  events.append(current_event)     12         1         27.0     27.0      0.0      events = map(list, set(map(tuple, events)))     13         1          2.0      2.0      0.0      return events 

What does “$E$ is not bounded above” mean? I am confused. “Principles of Mathematical Analysis” by Walter Rudin Theorem 3.17.

I am reading Walter Rudin’s “Principles of Mathematical Analysis”.

There are the following definition and theorem and its proof in this book.

Definition 3.16:

Let $ \{ s_n \}$ be a sequence of real numbers. Let $ E$ be the set of numbers $ x$ (in the extended real number system) such that $ s_{n_k} \rightarrow x$ for some subsequence $ \{s_{n_k}\}$ . This set $ E$ contains all subsequential limits, plus possibly the numbers $ +\infty$ , $ -\infty$ .

Put $ $ s^* = \sup E,$ $ $ $ s_* = \inf E.$ $

Theorem 3.17:

Let $ \{s_n \}$ be a sequence of real numbers. Let $ E$ and $ s^*$ have the same meaning as in Definition 3.16. Then $ s^*$ has the following two properties:

(a) $ s^* \in E$ .

(b) If $ x> s^*$ , there is an integer $ N$ such that $ n \geq N$ implies $ s_n < x$ .

Moreover, $ s^*$ is the only number with the properties (a) and (b).

Of course, an analogous result is true for $ s_*$ .

Proof:

(a)
if $ s^* = +\infty$ , then $ E$ is not bounded above; hence $ \{s_n\}$ is not bounded above, and there is a subsequence $ \{s_{n_k}\}$ such that $ s_{n_k} \to +\infty$ .

If $ s^*$ is real, then $ E$ is bounded above, and at least one subsequential limit exists, so that (a) follows from Theorems 3.7 and 2.28.

If $ s^* = -\infty$ , then $ E$ contains only one element, namely $ -\infty$ , and there is no subsequential limit. Hence, for any real $ M$ , $ s_n > M$ for at most a finite number of values of $ n$ , so that $ s_n \to -\infty$ .

This establishes (a) in all cases.

I cannot understand the following argument:

(a)
if $ s^* = +\infty$ , then $ E$ is not bounded above; hence $ \{s_n\}$ is not bounded above, and there is a subsequence $ \{s_{n_k}\}$ such that $ s_{n_k} \to +\infty$ .

What does “$ E$ is not bounded above” mean?
p.12, Rudin wrote “It is then clear that $ +\infty$ is an upper bound of every subset of the extended real number system”.
And $ E$ is a subset of the extended real number system.

Professional Keyword Research and Analysis for $3

Welcome SEO SERVICES I will do all the work 100% manually. You can hire me for website keyword Research and also Amazon, YouTube, eBay and Walmart keyword Research . I also provide bonus Competitor Analysis.You can get 100% accurate information by this steps. which I provide. That are 1. 300+Keywords 2. Trends (Rising/Stable/Dropping) 3. Avg. monthly search volume(global) 4. Avg. monthly search volume(Local) 5. Pay Per Click (PPC) 6. Cost Per Click (CPC) 7. allintitle 8. Exact KW in title 9. allinurl 10. Root domain/ Homepage 11. Ranking difficulty (out of 100) 12. web 2./wiki/q.a/news portal/Social Domain/Sub- Domain/.gov/.edu If you don’t understand that’s steps. Please watch this Video Video link : https://www.youtube.com/watch?v=wcEVkAqpho4 you get 100% VIP 24/7 Support and Unlimited Review And Finally if you will hire me, I will do for you my Best.

by: somon122
Created: —
Category: Onsite SEO & Research
Viewed: 201


What are some advanced techniques of UX Competitor Analysis?

What are some advanced techniques of UX Competitor Analysis and what is the number of competitors that must be analyzed to have relevant results and get enough data to create a solid product? Also, how complex it should be? My main concern is to have a never-ending list with the data that is hard to interpret.

According to Nielsen Norman Group’s “User Experience Careers” survey report, 61% of UX professionals prefer to do the competitive analysis for their projects and the benefits of carrying out this type of analysis are obvious, but are few resources of how complex this research should be and how to find info that really matters.

Besides the unique features, how do you identify user loyalty and engagement in the apps of our competitors and find out if their approach really works?

How to take advantage of this method when your product needs to be the first on the market and not only in a small niche and also if the type of business is in a relatively new domain of activity?

How do wireless keyboards’ encryption prevent frequency analysis?

I’ve found very little information on this topic after much googling. The only partial answer I found was by Microsoft: Microsoft_AES_Technical_Factsheet

By adding random data to each message, each message is unique even if the same letters are typed over and over. This prevents frequency analysis from finding identical messages to track.

which makes sense but leaves me wanting to know:

A) How random data can be added to the messages without causing noise in the signal, and

B) How other manufacturers prevent frequency analysis. I could find nothing regarding Logitech, for example.

Is it simply an industry standard to ‘just add noise’, therefore it can be safely assumed to be implemented?

On page SEO, Keyword research, Competitor analysis for $10

If you’re looking for a highly qualified Professional SEO Expert then you came the right person. I’ve been working as an SEO Professional for 7 years now, which places me in the best position to handle your project. I look forward to working with you on not just this project but on future projects too; so you can totally rely on whatever finished work you get from me. I provide Keyword Research service in the following areas : Key-word ResearchCompetitor Analysis, and so on What you will get from me after finished work : Keyword ListYour actual competitor listMonthly Search Volume, and so on…. If you have any requirements outside the services listed above or further questions, feel free to contact me for support. I’ll get back to you as soon as possible. I look forward to working with you.

by: Arheem
Created: —
Category: Onsite SEO & Research
Viewed: 210


Rudin’s functional analysis Theorem 3.18, second part.

Just a follow up to the following two questions:

Rudin's functional analysis 3.18, every originally bounded subset of a locally convex space is weakly bounded.

Theorem 3.18, Rudin's functional analysis

The second question has almost all the proof, I’m trying to understand the second part.

Since $ E$ is weakly bounded, there corresponds to each $ \Lambda \in X^*$ a number $ \gamma(\Lambda) < \infty$ such that $ $ |\Lambda x | \leq \gamma(\Lambda) \;\;\;\; (x \in E) $ $

It’s not entirely clear to me how we get such bound, if $ E$ is weakly bounded than for any arbitrary union of finite intersections of counter images of linear functionals (call such set $ \Omega$ ) there’s a $ t > 0$ such that $ $ E \subset t \Omega $ $ I suppose the bound $ | \Lambda x | \leq \gamma(\Lambda)$ comes from choosing $ $ \gamma(\Lambda) = \sup_{x \in E} |\Lambda x|, $ $ why does such sup exists though? how do I use the weak-bounded condition to prove it exists.

The rest of the proof seems clear to me.

Can you wipe a USB flash drive so securely that it’s impossible to recover deleted files through forensic analysis?

Internal SSD can be wiped with TRIM, but USB sticks are external SSD. They’re apparently difficult to wipe securely enough to make forensic analysis of the device impossible. This question has been asked before, about 4 or 5 years ago and the information is probably outdated. Looking for up-to-date advice on how to securely wipe files on a USB from a bootable media.

Is it possible to have a setup that can securely wipe USB with TRIM? Anyone know of any programs that will securely shred files on USB, making them impossible to recover via forensic recovery software?

Token Scanner for programming Language(Lexical Analysis)

enter image description here This DFA is a token scanner for a programming language.I would like to add keywords of the programming language(if,else,end … etc) in the DFA so the lexical analyzer can recognize them.

The Question is : Do i have to convert the entire given DFA to an ε-NFA(which will result in more states),add the keywords(with ε transitions too)and then convert back to a DFA or i just have to add the keywords so that the DFA becomes an NFA and then convert back to a DFA

Big O analysis for problem where number of items searched is unknown

Consider this problem: you are searching an array of elements and are comparing the square of the current element to some number K. Essentially, you are looking to see if the square root of K is in the array. With this algorithm, chances are, you will not search the entire array because you will either find the square root or you will find that the square root is not in the array.

As such, you are searching only a fraction of the array, which lets say has M elements. Does this mean that the big O is still O(M), even though you are not searching all of the elements?