- Let us go back to the data security topic. One of your latest articles on Big Data begins with the lines: "Every exchange with social media, every digital process, and every connected device generate large data that will be used by different companies". Could you tell me about the current situation on data use in the country?
- I'd like to tell you about one interesting case. How a small company, let's call it "X", for 2 years became one of the "raw" sellers of personal data. It started from Social fishing or identification services. The main task of the service is to get data about your profile in social networks. All you have to do is just visit a site where a special code is installed, and the site will receive information that is in your social network profiles: name, date of birth, phone number, and e-mail.
How does it work?
There are plenty of options of events. Let us analyze the main of them:
"Invisible" buttons or click jacking: you visit the site of some company, looking through the pages. Suddenly a manager of that company calls or texts you offering something to buy, although you have never left your phone number.
Personal data gets stolen on this site. And you cannot protect yourself from that.
When you click the mouse, no matter, if it is a link or blank space, the script slides you an invisible element from the social network (for example, the button of joining a group or authorizing a malicious application). You click on an element and thereby give permission to view your personal data. The identification percentage of such services is from 15 to 60%.
The question arises: "Where do they take your phone and e-mail from?" I assure you: no one is hacking you. 80% of services working on the principle of click jacking, use the social networks VKontakte, Mail.ru or Odnoklassniki.
The Internet sells a data base of phone numbers and e-mail addresses with ID (your unique number or nickname) in VKontakte. Then the program is simply checked with ID in data base and uploads the phone number and e-mail in personal area.
However, such mechanics does not work in all social networks. For example, it does not work with Facebook, so another strategy is used: with the help of special software, the previously generated database screening is made with all the combinations of numbers.
An example of this mechanics in Facebook: let us take 16 codes of numbers (702,701,747 and so on) and for each of them we will set numbers from 0 to 10 million. Since the density of users per phone number is high enough, you can divide the list into groups of 50 numbers. As a result, we will get 160,000,000/50=3,200,000 blocks. The request to the Facebook server takes up to 200 ms on average (depends on the channel). Thus, one client can generate 5 requests per second into a single stream. 3.2M/5/60sec/60min ≈ 177.7 hours, which is slightly longer than a week). Same with e-mail addresses, the specified range of e-mail is “parsed” (automatically processed in order to obtain the necessary data - Ed.) with the help of requests.
In Facebook's privacy settings, there is a feature (Facebook-Settings- Privacy) "Who can find me?" where you can set the Friends and Friends of Friends parameter. I do not want to upset you, but these settings will not protect you. There are large networks of bots that are professionally engaged in data collection today.
Odnoklassniki: By the certificate substitution the same action can be done as on Facebook.
YouTube: You can also pull out the id (your nickname) on YouTube via click jacking. Your nick does not carry any information itself. But in the general mass of information and the combined database Data is linked from social networks and other sources, which helps to identify the real person behind the nick on YouTube.
What do the search systems say?
Search systems calculate and reduce the search results by sites where similar ways of obtaining personal data are noticed, thereby reducing sites or even banning websites with malicious software. Yandex has released a section that describes how to deal with such sites. Google has also begun to identify them: there have been cases that such portals would not have passed the test in Adwords advertising network. But many services bypass the search system robots without much difficulty.
- And what happened to that little “X” Company?
- They moved on and began to collect information about users from all over the Internet. For example, you take a phone number or e-mail and run it through the large websites as Krysha, OLX, Kolesa, upload all numbers onto the database and get all personal data on the output. Later, these guys learned to collect geo-tags, the data on 140 parameters in total were collected from open sources. In fact, no one hacked them, there had been large data leaks before; they just collected everything in one place, on GetContact. This is one of the largest "black" data providers in the territory of the former CIS. You pay 150 tenge and get a full portrait of the person.
There are bases of personal cards with data on you. Where from? There are people and companies that sell this information both individually and in batches, by millions of lines. Data files are the social networks where you are registered in: name, birthday, occupation, education, relatives, interests, city of residence, messengers that you use, time spent on social networks, and much more. Usually the bases are bought by companies to clarify the portrait of the target audience, or they simply enrich the client base by the phone number.
In general, in Kazakhstan they constantly trade in bases. For example, the client base, including Kazakhstan, of one large Russian training company is freely accessible, you can safely download it. There is surely no IIN, but it is not a problem to find it out. By entering a person's name on certain websites, these data appear, and with IIN, you can find out information in law firms and a bunch of other information. Many companies use this data to gain a client base, and it is not a question of ethics. Different companies may also have called you and offered their services. But, in fact, people themselves constantly "generate" different data, violate copyrights, for instance, by placing Marvell movies on their pages.
As for data monitoring, we take Kazakhstan, divide all users, for example, 200 thousand people wrote about politics on such and such topics. Monitoring in social networks by keywords, for example, in Facebook, costs about 350 thousand tenge per month. There are comprehensive intelligence services that monitor the entire Internet, and all social networks. We do not have such a complete system in our country yet, we are using Russian systems, for example, Brand Analytics, YouScan, etc. Such services cost about 200-600 thousand. It all depends on the number of keywords entered, the mentioning rate.
- Increasingly frequently in the West, the issue of these people security on the web, fraud cases, etc., has been raised. In this case, how can users protect themselves from their data use, or it is already impossible without a complete cessation of work on the Internet?
Again, people, themselves, upload persona data, photos in public: where they travel to, whom they meet and so on. Even if the account is closed, you can get a friend request by a special "bot" and copy your data, but only if you add it to your friends. But if you leave a comment on an open page, special services can also read that. I think that in a year and a half from now programming specialists will learn to read pictures.
- What are the basic safety standards?
- I, for example, took myself another phone number, created another e-mail and put them all in my social networks. There is no other way out. You should always keep in mind that you are being watched, whatever is you do on the Internet. And your data can be used for different purposes.
- How legitimate is this data collection these guys have made?
- For example, in Kazakhstan, according to the Law on Personal Data, if a person cannot be identified, this is not considered as personal data. If there is a name and a photo, this is personal information. Many companies simply encode this parameter (name, IIN), calling it, for example, cell E75. So even if they come with an inspection, it will be almost impossible to prove that this information is personal.
- How often and who buys this data?
- Quite often, and different companies mostly. If, for example, a client came to them and left his phone number, so by entering it into the database they bought, they can learn everything about him: his whole portrait, financial state, preferences, etc. You can certainly go further and create a product that's already a matter of ethics and the company mission.
- But since the data in the social networks on the users' pages are completely open and accessible for viewing by everyone, companies can safely use them to understand whether the person is their potential client or not, and if he is, how to influence on him. However, there were precedents when, in response to the offer of certain services to individuals, based on information on their pages, there were claims of unethical use of their personal information. Although, I repeat, it was completely open. How do you think the companies should act, and where does the ethical line lie?
- I would not recommend doing this. It is better to take a more ethical way. For example, use e-mail, retargeting, when online advertising is sent to those users who have already viewed the advertised product by visiting the advertiser's web page. You can use special programs to download potential customers' phones on Facebook, Yandex, and your ads will be shown to them only. We need to first "warm up" the client, give him something free of charge, discounts, or gifts. For example, Tinkoff works this way in Russia, it micro-segments the audience and works individually. In the West they understand that it is much easier to retain a client than to attract a new one; that is why they are working more on their service and reputation, trying to raise more confidence and work for a long perspective, at least 10 years ahead. They will not work on a "black base", because it's just unethical.
- What recommendations would you give the companies on the correct use of Big Data?
When you decide to implement Big Data, you need to clearly realize what information is available and what results it allows achieve.
- Do you think Kazakhstan should take similar measures following the European Union example of protecting the data of its citizens and how much will these measures be adequate for our current situation? I would like to remind that as of May 25 this year the EU has announced the General Data Protection Regulation (GDPR) entry into force of in the European Union countries. For instance, according to these rules, as of now every user has the right to delete all the data accumulated about him, that is, to have the "right to be forgotten". And if his demand is not fulfilled within 72 hours, the executors may be fined 2% of their annual world income, or 10 million euros, whichever is greater. In case of illegal transfer of a person's data to another country or violation of the principles of use data processing, the data protection authority may impose a fine of 20 million euros.
Link to the first part of the interview: http://lifeinsurance.kz/ekspert/big-data-obratnaya-storona-medali