How to do supervised learning without knowing the target

I’m working on an optimization algorithm that I think could be considered machine learning, but I’m not sure. Basically I have a model that I want to optimize by adjusting its parameters. I don’t have any target data to compare the output of my model to, so I don’t know the loss factor. However, I’m able to generate information on whether increasing or decreasing a parameter will improve my model.

This information consists of a sum of positive and negative numbers that I’ll call votes, and they are very noisy. I know I should increase the parameter if the average vote is positive and decrease it otherwise. I can generate as many votes as I like if I invest the computational resources, and I need to generate a sufficient number to over come the noise. I only know I’ve optimized the parameters if the average votes are zero.

I’m having a hard time optimizing my algorithm. Basically it’s hard to decide how many votes I should take before shifting the parameters, and then how much I should shift them. I believe the optimal amounts depend very much on the circumstances. I’m wondering if my algorithm falls under a class of optimization algorithms that have a literature I could research. Or if anyone knows any literature that my be useful to what I’m working on.