Value flow (and economics) in stacked reinforcement learning systems: agent as reinforcement environment for other agents?

There is evolving notion of stacked reinforcement learning systems, e.g. https://www.ijcai.org/proceedings/2018/0103.pdf – where one RL systems executes actions of the second RL system and it itself executes action of the thrid and the reward and value flows back.

So, one can consider the RL system RLn with:

  • S – space of states;
  • A={Ai, Ainner, Ao} – space of actions which can be divided into 3 subspaces:
    • Ai – actions that RLn exposes to other RL systems (like RLn-1, RLn-2, etc.) and for whome the RLn provides reward to those other systems;
    • Ainner – actions that RLn executes internally under its own will and life;
    • Ao – actions that RLn imports from other RL systems (like RLn+1, RLn+2, etc.) and that RLn executes on the other systems and form which RLn gets some rewards, that can be used for providing rewards for Ai actions.

So, one can consider the network (possibly hierarchical) of RL systems . My questions are:

  • is there some literature that consider such stacks/networks of reinforcement learning systems/environments?
  • is there economic research about value flow, about accumulation of wealth, about starvation and survival proceses and evolution in such stacks/networks of RL systems/environments

Essentially – my question is about RL environments that can function in the role of agents for some other environments and about agents that can function in the role of environemnts for some other agents. How computer science/machine learning/artificial intelligence name such systems, what methods are used for research and how the concepts of economics/governance/preference/utility are used in the research of evolution of such systems?

Agent based model in Python

My model is based on this game https://en.wikipedia.org/wiki/Ultimatum_game . I won’t go into the intuition behind it but generally speaking it functions as follows:

  1. The game consists of a n x n lattice on which an agent is placed at each node.

  2. During each time step, each player on each node plays against a random neighbour by playing a particular strategy.

  3. Each of their strategies (a value between 1-9) has a propensity attached to it (which is randomly assigned and is just some number). The propensity then in turn determines the probability of playing that strategy. The probability is calculated as the propensity of that strategy over the sum of the propensities of all strategies.

  4. If a game results in a positive payoff, then the payoffs from that game get added to the propensities for those strategies.

  5. These propensities then determine the probabilities for their strategies in the next time step, and so on.

  6. The simulation ends after time step N is reached.

For games with large lattices and large time steps, my code runs really really slowly. I ran cProfiler to check where the bottleneck(s) are, and as I suspected the update_probabilitiesand play_rounds functions seem to be taking up a lot time. I want to be able to run the game with gridsize of about 40×40 for about 100000+ time steps, but right now that is not happening.

So what would be a more efficient way to calculate and update the probabilities/propensities of each player in the grid?

(For those who might be wondering, the get_error_term just returns the propensities for strategies that are next to strategies that receive a positive payoff, for example if the strategy 8 works, then 7 and 9’s propensities also get adjusted upwards and this is calculated by said function. And the first for loop inside update_probabilities just makes sure that the sum of propensities don’t grow without bound).

 import numpy as np import random from random import randint from numpy.random import choice from numpy.random import multinomial import cProfile  mew = 0.001  error = 0.05  def create_grid(row, col):     return [[0 for j in range(col)] for i in range(row)]  def create_random_propensities():     propensities = {}     pre_propensities = [random.uniform(0, 1) for i in range(9)]     a = np.sum(pre_propensities)     for i in range(1, 10):         propensities[i] = (pre_propensities[i - 1]/a) * 10     return propensities  class Proposer:     def __init__(self):         self.propensities = create_random_propensities()         self.probabilites = []         self.demand = 0      def pick_strat(self, n_trials):         results = multinomial(n_trials, self.probabilites)         i, = np.where(results == max(results))         if len(i) > 1:             return choice(i) + 1         else:             return i[0] + 1      def calculate_probability(self, dict_data, index, total_sum):         return dict_data[index]/total_sum      def calculate_sum(self, dict_data):         return sum(dict_data.values())      def initialize(self):         init_sum = self.calculate_sum(self.propensities)         for strategy in range(1, 10):             self.probabilites.append(self.calculate_probability(self.propensities, strategy, init_sum))          self.demand = self.pick_strat(1)      def update_strategy(self):         self.demand = self.pick_strat(1)      def update_probablities(self):         for i in range(9):             self.propensities[1 + i] *= 1 - mew          pensity_sum = self.calculate_sum(self.propensities)         for i in range(9):             self.probabilites[i] = self.calculate_probability(self.propensities, 1 + i, pensity_sum)      def update(self):         self.update_probablities()         self.update_strategy()  class Responder:     def __init__(self):         self.propensities = create_random_propensities()         self.probabilites = []         self.max_thresh = 0      def pick_strat(self, n_trials):         results = multinomial(n_trials, self.probabilites)         i, = np.where(results == max(results))         if len(i) > 1:             return choice(i) + 1         else:             return i[0] + 1      def calculate_probability(self, dict_data, index, total_sum):         return dict_data[index]/total_sum      def calculate_sum(self, dict_data):         return sum(dict_data.values())      def initialize(self):         init_sum = self.calculate_sum(self.propensities)         for strategy in range(1, 10):             self.probabilites.append(self.calculate_probability(self.propensities, strategy, init_sum))          self.max_thresh = self.pick_strat(1)      def update_strategy(self):         self.max_thresh = self.pick_strat(1)      def update_probablities(self):         for i in range(9):             self.propensities[1 + i] *= 1 - mew          pensity_sum = self.calculate_sum(self.propensities)         for i in range(9):             self.probabilites[i] = self.calculate_probability(self.propensities, 1 + i, pensity_sum)      def update(self):         self.update_probablities()         self.update_strategy()  class Agent:     def __init__(self):         self.prop_side = Proposer()         self.resp_side = Responder()         self.prop_side.initialize()         self.resp_side.initialize()      def update_all(self):         self.prop_side.update()         self.resp_side.update()  class Grid:     def __init__(self, rowsize, colsize):         self.rowsize = rowsize         self.colsize = colsize      def make_lattice(self):         return [[Agent() for j in range(self.colsize)] for i in range(self.rowsize)]      @staticmethod     def get_moore_neighbours(grid, row, col):          neighbours = []         for x, y in (                 (row - 1, col), (row + 1, col), (row, col - 1),                 (row, col + 1), (row - 1, col - 1), (row - 1, col + 1),                 (row + 1, col - 1), (row + 1, col + 1)):             if not (0 <= x < len(grid) and 0 <= y < len(grid[x])):                 # out of bounds                 continue             else:                 neighbours.append(grid[x][y])         return neighbours      @staticmethod     def von_neumann_neighbourhood(array, row, col, wrapped=True):         neighbours = set([])          if row + 1 <= len(array) - 1:             neighbours.add(array[row + 1][col])          if row - 1 >= 0:             neighbours.add(array[row - 1][col])          if col + 1 <= len(array[0]) - 1:             neighbours.add(array[row][col + 1])          if col - 1 >= 0:                 neighbours.add(array[row][col - 1])         #if wrapped is on, conditions for out of bound points         if row - 1 < 0 and wrapped == True:             neighbours.add(array[-1][col])          if col - 1 < 0 and wrapped == True:             neighbours.add(array[row][-1])          if row + 1 > len(array) - 1 and wrapped == True:             neighbours.add(array[0][col])          if col + 1 > len(array[0]) - 1 and wrapped == True:             neighbours.add(array[row][0])         return neighbours  def get_error_term(pay, strategy):     index_strat_2, index_strat_8 = 2, 8     if strategy == 1:         return (1 - (error/2)) * pay, error/2 * pay, index_strat_2     if strategy == 9:         return (1 - (error/2)) * pay, error/2 * pay, index_strat_8     else:         return (1 - error) * pay, error/2 * pay, 0  class Games:     def __init__(self, n_rows, n_cols, n_rounds):         self.rounds = n_rounds         self.rows = n_rows         self.cols = n_cols         self.lattice = Grid(self.rows, self.cols).make_lattice()         self.lookup_table = np.full((self.rows, self.cols), False, dtype=bool)      def reset_look_tab(self):         self.lookup_table = np.full((self.rows, self.cols), False, dtype=bool)      def run_game(self):         n = 0         while n < self.rounds:             for r in range(self.rows):                 for c in range(self.cols):                     if n != 0:                         self.lattice[r][c].update_all()                          self.lookup_table[r][c] = True                     self.play_rounds(self.lattice, r, c)             self.reset_look_tab()             n += 1      def play_rounds(self, grid, row, col):           neighbours = Grid.von_neumann_neighbourhood(grid, row, col)         neighbour = random.sample(neighbours, 1).pop()          neighbour_index = [(ix, iy) for ix, row in enumerate(self.lattice) for iy, i in enumerate(row) if i == neighbour]         if self.lookup_table[neighbour_index[0][0]][neighbour_index[0][1]] == False:             neighbour.update_all()         player = grid[row][col]         coin_toss = randint(0, 1)          if coin_toss == 1:             if player.prop_side.demand <= neighbour.resp_side.max_thresh:                 payoff, adjacent_payoff, index = get_error_term(player.prop_side.demand, player.prop_side.demand)                 if player.prop_side.demand == 1 or player.prop_side.demand == 9:                     player.prop_side.propensities[player.prop_side.demand] += payoff                     player.prop_side.propensities[index] += adjacent_payoff                 else:                     player.prop_side.propensities[player.prop_side.demand] += payoff                     player.prop_side.propensities[player.prop_side.demand - 1] += adjacent_payoff                     player.prop_side.propensities[player.prop_side.demand + 1] += adjacent_payoff             else:                 return 0, 0          if coin_toss != 1:             if neighbour.prop_side.demand <= player.resp_side.max_thresh:                 payoff, adjacent_payoff, index = get_error_term(10 - neighbour.prop_side.demand, player.resp_side.max_thresh)                 if player.resp_side.max_thresh == 1 or player.resp_side.max_thresh == 9:                     player.resp_side.propensities[player.resp_side.max_thresh] += payoff                     player.resp_side.propensities[index] += adjacent_payoff                 else:                     player.resp_side.propensities[player.resp_side.max_thresh] += payoff                     player.resp_side.propensities[player.resp_side.max_thresh - 1] += adjacent_payoff                     player.resp_side.propensities[player.resp_side.max_thresh + 1] += adjacent_payoff             else:                 return 0, 0  #pr = cProfile.Profile() #pr.enable()  my_game = Games(1, 2, 50000) # (rowsize, colsize, n_steps) my_game.run_game()  #pr.disable() #pr.print_stats(sort='time')  

how to make imideat change in file format disk of temporary of azure agent without reboot?

i tried edit in /etc/waagent.conf this lines :

ResourceDisk.MountPoint=/tempo # instead of /mnt  ResourceDisk.Format=y # isntead of n 

and even removed the line from /etc/fstab

/dev/disk/cloud/azure_resource-part1    /mnt    auto    defaults,nofail,x-systemd.requires=cloud-init.service,comment=cloudconfig       0       2 

and restart the service via

sudo service walinuxagent restart 

but stil in lsblk output is :

sdb      8:16   0    4G  0 disk └─sdb1   8:17   0    4G  0 part /mnt 

only after reboot sudo reboot lsblk shows that the mopunt has cahnged :

sdb      8:16   0    4G  0 disk └─sdb1   8:17   0    4G  0 part /tempo 

i am using ubuntu 16.04 LTS image. how can i make this mount without reboot ?

Where/How are SNMP v3 passwords stored on the agent?

Security wise, SNMP v3 improves over v2 by using users for polling agents or agents sending trap data to the management system. But what if the agent is compromised? Will the attacker be able to get the password of the user used for SNMP data transfer? As far as i know the user/password credentials for SNMP v3 are stored on the agent, hence my question: Where and how are are SNMP v3 user and password credentials stored on the agent (on different operating systems, with the most common implementation of SNMP agents, i.e. net-snmp on linux, ? on windows, cisco iOS?,…)?

Как менять User Agent с Selenium на Python в браузере Firefox

Скажите пожалуйста почему данный код не берёт из файла txt элементы(строки User Agent) по порядку, он берёт первый элемент, а затем больше не меняет User Agent в цикле, а мне нужно чтобы он брал следующий элемент из блокнота и т.д.

UA = open('useragents.txt').read().split('\n') for i in range(3): useragent = UA.pop(0) profile = webdriver.FirefoxProfile() profile.set_preference("general.useragent.override", useragent) driver = webdriver.Firefox(firefox_profile=profile, executable_path="D:/WebDrivers/geckodriver-v0.24.0-win64/geckodriver.exe") time.sleep(3) driver.get("http://atomurl.net/myip/") time.sleep(5) driver.close() 

Flight compensation with agent

I had recently traveled to London from Warsaw with LOT Polish Airlines and had connecting flight in Zurich (both flights were on same ticket sharing same booking reference number). Flight was delayed by 40 min and because of which I was denied boarding on my connecting flight. Airline offered me another flight which was after 5 hours and gave me one food coupon. I reached my destination more than 6 hours later than scheduled. I am trying to get compensation from the Airline.

I have contacted the Swiss Federal Office of Civil Aviation (FOCA, which is the national EC261 enforcement body) but they replied with some complicated procedure and lot of legal jargon. Hence, I am thinking of applying for this compensation through some agent. Few of my colleagues suggested me few names like AirHelp, flightright, refund.me etc. However, I am little bit unsure about their authenticity, procedure and reviews. If anyone of you used such services, can you please guide me regarding this? Or suggest some alternative way without getting into large legal process.

cheap multi agent path planning algorithms

I am trying to make a tower defence game where some critters are trying to make their ways through a 2D grid-like obstacle course. The idea is these critters cannot walk into the obstacles or into each other during their path, and they would like to get to the other side as soon as possible.

I just need a cheap, not-optimal solution that can path all the agents from one side of the map to another, moving in the grid (up-down-left-right) directions while not walking into each other nor into the obstacles.

I am not wanting an optimal solution, just one that looks reasonably greedily optimal would be nice (i.e. not like not moving the 2nd agent without completing the path on the first agent, etc).

I’m thinking about something with potential fields with a little bit of A* is as much as I am willing to compute. If there’s anything simple to implement I’d be happy to know ! !