Backstory (You can skip)
I work with Qt almost exclusively, and with the design philosophy, of
Develop everything as an API
The practice particular to this question, is the tracking of class members, emitting signals whenever their value changes.
For example, here is a general way I treat class members with the API philosophy, that for each member, there are a minimum of four components:
public: QUrl url(); // returns a copy of m_Url public slots: void setUrl(QUrl url); // A designated url setter signals: void urlChanged(); // emits a signal whenever setUrl() is used private: QUrl m_Url; // the object itself, with its state protected
This has worked very well for me so far, given that as soon as I need to add functionality dependent upon
m_Url, I need only do this:
connect(this, &MyClass::urlChanged, &foo, &MyOtherClass::doBar);
And this is dependable because fundamentally,
Now, I must say HTML and web development were not my original fortes into programming, and as such I did not understand why these signals would exist:
But, there was no signal for
htmlChanged. Instead, all you have is:
Which is a function that will grab the current state. I would have to manually call it on a timer every time I wanted to check for a state change.
So that is what I did, to which I then discovered that even a static web page’s HTML, or at least the one I was working with, was in constant flux, in which the changes were not meaningful whatsoever, and as such is why no such signal was included by the Qt developers.
The heart of the issue
So I know what you are saying, “
meaningful is a subjective and situational term”, so let me define what I mean by
This is a snippet of HTML, say out of 10000 lines, the change that I was picking up upon a webpage:
<div class="chat-bubble right" style="top: 0px; opacity: 1;">
changes into this
<div class="chat-bubble right" style="top: 0px; opacity: 0.997396;">
and goes back again in a constant loop. The fact that it goes back to the original state in a constant loop, is what makes this change
not meaningful, and for that issue, it is not something that I know how to intelligently keep track of. My original strategy of just detecting every change in the html string and parsing it on an interval, seems to me, to be a very bad approach.
The fundamental questions
How do I intelligently discover and keep track of meaningful changes in HTML? Like for example, lets say that only one value changes in a 100,000 line html document changes; Do I reparse the entire document, compare, and find the change, or is there a way to avoid all of that and fish out the difference?
What API design philosophies, or existing API’s, should I employ when interfacing with HTML, that will give me the correct signals to do code injection, or when to scrape data?
The answer ideally needs to take into account, that the backend is Qt/C++ oriented, so I can interface easily with the other libraries I depend on. I can do things like inject JQuery into the webpage, or track printfs and the like. If it matters, the current engine I am using is based upon Chromium, https://wiki.qt.io/QtWebEngine