I live in Sweden, but this applies to all other countries as well.
I have a general interest in, and fascinations of, statistics and working with data in databases. By far the biggest obstacle has nothing to do with technically dealing with the database software, writing SQL queries, or designing databases. Rather, the #1 problem is:
Nobody wants to provide useful data!
I have spent a significant part of the last 20 years searching for databases/data files of all kinds. Time and time again, I end up at a “contact us for pricing” webpage, or a “Buy now for only $ 4,799!” text. Oddly, this does not just apply to commercial entities, but also authorities.
Even though the Swedish government has been talking about “open data” and “free information for all” for a very long time, the actual reality is that virtually none of that juicy data is available for you and I to grab and use. Instead, they have multiple layers of “red tape”, requiring you to pay through the nose for any kind of access, and in many cases, you aren’t even allowed to pay for it unless you run a major corporation with special ties to the government. It’s really bizarre.
The data they do allow you to look at is meaningless/shallow statistics, rarely if ever provided in a format which can be reasonably parsed by a computer and fed into my database for further analysis. The so-called “open data for everyone” often consists of nothing more than a bunch of formatted PDFs, useless for my purposes.
I’m not interested in static columns showing how many new people were born in 2020. I want a list of those people, with their names, genders, race, blood type, etc.
I realize that all data cannot be open without heavy abuse inevitably resulting from it. However, at least the Swedish government has this idea of “public records”, where you are theoretically allowed to request all kinds of data. The problem is that they only allow you to do this in person, over phone or via e-mail, and you have to do it manually and only request at most three (3) “units” each time. In practice, this makes it useless unless.
If this information is allegedly “public”, why are they so unwilling to actually make it available? I could send an e-mail to a Swedish government entity right now, requesting all kinds of information (including their full social security number) for a given person, and they will respond within 24 hours with it, no questions asked. I’ve done it many times. However, if I ask them for a Swedish_people.csv file with every person registered in Sweden and the same information I requested manually for one or up to three persons, they will refuse.
Major corporations are able to pay a lot of money to get access to their government APIs, but it costs a fortune and they wouldn’t let me buy access to it even if I had the money (because I don’t run a major company).
It doesn’t make any sense to me. I wonder why they have these double standards, and how they can possibly charge money for “public” records.
A dream of mine would be to be able to do:
SELECT name, email_address, physical_address, passport_photo FROM people WHERE current_city = $ 1 AND gender = $ 2 AND age >= $ 3 AND age <= $ 4 AND civil_status = $ 5 ORDER BY distance_from_me DESC;
Of course, this is completely unrealistic, but you get the idea. I wish to have actual, curated records from (semi) trusted sources rather than having to play with the few, measly databases which are freely available to the public at no charge.
A perfect example of something very basic would be the telephone book. Back in the day, they sent out a complete book of every single person’s name, telephone number and address to every household in the entire country. This was standard practice all over the world, I believe. A digital version of that would be a .csv file which I could just download from a government website at a static URL, always kept updated. Nope. Nothing like that. I’m forced to use these third-party, commercial websites where I get to enter individual people’s names and send this information to the company in question. They are paying the government a lot of money to get this information, even though it could be made available for virtually no cost at all.
Why, since they used to provide this information in physical form, is it now unthinkable in the digital age?