The social danger of Internet Activism

The social danger of Internet Activism

While I write this article, we are in June 2018. I feel that it is about time for an old activist like myself to chime in about the phenomenon of Internet Activism, also known as Clicktivism, Slacktivism and Hashtagtivism depending on who writes about it.

Let’s first try to get some context to the social activism. When I was a high school student, in the Italian 70’s, we had the so-called Years of Lead. The police were using tanks and armoured vehicles to kill students and workers during the demonstrations by running over them, causing our violent response in many occasions driving the country on the brink of civil war by the end of the 70’s, when from all sides groups and police forces were shooting at each other with deaths on all sides.

Back then, being an activist had a definite meaning, no matters on what side one was. Being socially active implied leafleting, putting up posters, becoming an active organiser of public events and marches, setting up demonstrations, occupying schools and so on. It was a very social experience, with so many meetings every day, with real people that were your friends as well. Being socially active meant a fully immersive experience made of pizza, movies, meetings and demonstrations. We did achieve a lot paying a hefty price on all involved sides, but we are still proud of what was accomplished. Remember the 150 hours of paid study leave for metal-mechanic workers (right to study), then slowly extended to all categories. The right for the police forces to have a union; the introduction of voted representatives in military forces. The right to assembly and institution of students representatives in collegial bodies in all Italian schools and so many other laws passed because of the people pressure on the state.

Well, that was back in the 70’s in Italy. Actually, it was still true until not long ago, but then, at the beginning of the new millennium, something happened. Electronic mailing lists started to appear on the horizon of social activism. It was not a big thing, mostly to carry on discussions, debates among activists, then petitions began to circulate via email as well. All good: it was a way to extend the collaboration among organisers. But it also was the beginning of a different trend, something that took actual form with a series of new companies launching in 2005 and 2006: Bebo, MySpace, Reddit, YouTube, Facebook, Twitter. Don’t get me wrong: we had our networking already, like a lot of BBS, Fido-Net and more, based on voluntary hubs and dial-up connections every few hours to sync the boards, and even the first online petition in 1999, but it was nothing compared to what happened with the spreading of Internet availability. We went from a few million users worldwide to billions: that is what I call a game changer.

The foundation of Avaaz in 2007 made the Online Petition system something that inspired the Internet. Online Petitions were something already tested by the UK government (2006) and the Scottish Parliament (1999): even the United States created in 2011 the petition platform “We the People”. But Avaaz had something more: it was independent, not for profit, international, open to everybody.

The growing popularity of social networking and the Hash-Tag technology made Virtual Activism very easy and popular, so this should be a good thing, at least in principle. Let’s look at what changed in the activism that I was involved in back in the 70’s. Well… that’s the thing: it’s almost disappeared. If only a tenth of those that “click” on “Like” would show up in a real demonstration we could really shake the world, but nobody is showing up any longer! The most successful demonstrations ever only count a few thousand participants… pretty useless.

If you wonder why governments let you post that freely against them instead of shooting at you as they did in the 70’s… here’s your answer: they don’t fear virtual turmoil! Virtual activism is not scoring as big as any real activism ever did. And this is why when people demonstrate police are always there to violently disband meaningful demonstrations, and it’s comfortable with so little numbers gathered in real life. Meanwhile, real soldiers are sent to kill real people in proxy wars, and we can see so many clicks and petitions to stop them, but almost nobody is really getting out there confronting the system, and this is the real problem.

Social media and online petitions are a great tool to raise awareness about a problem, but to solve it one should be ready to be active like we were in the 70’s. Otherwise, it’s all so meaningless! Look how Israel feels free to use snipers to kill unarmed civilians guilty of trying to get the world’s attention on the ongoing genocide in Palestine.

The world is all flying the Palestinian flag, but nobody takes any action to force governments to end that genocide, actually in Ireland the police are arresting people waving the Palestinian flag during public events! Same happens for the forgotten Tibet genocide, not to mention US soldiers shooting radioactive bullets in Afghanistan: significant virtual turmoil, no real-life action, so nothing changes.

It’s sad to see how people feel satisfied with just clicking on “Like” or re-sharing a post. It should begin with those virtual actions, not end there! Once you act virtually you are helping to raise awareness, but that is an empty act if nothing tangible follows, and the result is right in front of us.

So, if you have read up to this ending sentence maybe there is hope for you: now stop reading and go out there, do something in real life to make the difference. It’s up to you: act now!

General Data Protection Regulation (GDPR) practical impact on software architecture

General Data Protection Regulation (GDPR) practical impact on software architecture

1   Introduction

On the 27th day of April 2016, the GDPR was adopted and was then published in the Official Journal of the European Union the 4th day of May 2016 [1]. The clock started on the 24th of May 2016, counting down to the 25th day of May 2018, when that regulation will take the full force of law in all Member States of the European Union (EU).

The first thing to clarify is that the GDPR is a European regulation, not a directive. The difference is as follows:

DIRECTIVE REGULATION
Defines what the EU Member States shall implement in their legislation. A directive is received and implemented in many different flavours by each Member State as each government sees fit. It is a legal instrument that replaces any conflicting legislation on the same matter in all EU Member States. It has the force of law without any government intervention of EU Member States.

This difference is substantial, and still, in March 2018 I was told by many Security Officers that the 25th of May was not that important, because “…anyway our government has to legislate to implement it so that it can take effect”. The feeling is that most of the companies are in denial about the GDPR coming into force and bringing fines up to €20 millions for those found non-compliant.

Once they understand that it is happening and it’s just in a month, then the denial moves to what is necessary to become compliant, trying to find any possible excuse not to intervene properly.

This new form of denial is by far more dangerous because it introduces false confidence in what the interpretation of the legal text must be.

In this paper, we are now starting a very practical and pragmatic journey in those parts of the legal instrument that affect software architecture. I will avoid all IT related issues, leaving them for a different study as they merit a separate analysis due to the complexity of making an IT infrastructure GDPR compliant.

While reading this paper, there is one thing that must be in your mind every time you say “do I really have to do this?”: imagine to be in a courtroom, on the stand, being sued for having breached the GDPR, and the plaintiff’s attorney asks you the big question, “Did you or did you not implement the State of the Art in Data Protection? Could you or could you not, in your professional opinion, do more to prevent the data leakage?”. Remember that in a court of law you cannot lie unless you want to bear heavy consequences, and the GDPR is all about implementing and keeping the State of the Art in the field of Data Protection in our solutions. Also remember that the plaintiff will hire a consultant to prove that you did not implement such State of the Art in the field of Data Protection in your solution, so you better think twice every time you are tempted to dismiss a data protection standard.

2   Data Protection

When we talk about Data Protection, we often do not stop to think about the actual meaning of it, nor its relationship with Privacy. Moreover, there is a tendency to protect the data after a breach, and this approach is no longer acceptable under the GDPR.

 

2.1 Privacy Vs Data Protection

The first notion we must digest is the difference between Privacy and Data Protection. There is often some confusion about these concepts, so the best option I have is to start our journey by clarifying them both.

2.1.1 Privacy

We, Data Subjects, have right to privacy, and this is granted by various legal instruments in many different flavours depending on jurisdictions, but the overall take away for us is that Privacy is a Right. All the legal instruments establishing this fundamental right, from International Conventions down to local ByLaws, have tried to guarantee in some form the protection of our privacy.

2.1.2 Data Protection

There is only one way to guarantee our right to privacy: protecting our data. This is a simple concept, easy to digest: Data Protection is the mean by which Privacy is guaranteed.

Now, a mean is normally realized by using tools, and in our case, this applies. Data Protection is implemented in software architecture and design by selecting tools, pattern and standards that facilitate Data Protection, and where the tool is missing we create one.

2.2 Data Protection by Design and by Default

This mandate established by Article 25 of the GDPR is the most crucial piece of legislation for us to understand and fully digest. This is also where almost all business actors I talked with (product owners, account managers, CEOs and so on) tend to look for some escape route. Let us dig into the first phrase of Art. 25(1):

Taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing, the controller shall…

Do we all notice the same way that part that reads: “…the cost of implementation…”? Now, the mantra business people keep reciting sounds like this: “It costs too much to protect these data using the state of the art of technology! The GDPR itself says that!” Ok, that is why we have recitals in all complex legislation, to help us to understand those passages that could drive to misinterpretation of the law and consequent fines. Recital 26 clarifies this point about costs to consider, and it makes sense from a technical perspective as well:

To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.

As we can see, the costs to consider are those representing the effort for an attacker to “identify the natural person”. We, technical people, use this approach all the time, and our goal is always to ensure that breaking our security measures is so costly and entails so much development that no person in his/her right mind will ever even start trying to break in. Recital 83 uses a simpler form, like Art. 25(1), Art. 17(2) and 32(1).

Naturally, as it is clear to the reader, I am playing devil’s advocate here. Certainly, it makes sense to consider the actual cost of implementation of the necessary measures to protect the data, but this applies to balancing the choices, reducing the cost by implementing a level of security adequate but among those that can be implemented with the less possible cost.

Nowhere in the GDPR, we have been given a choice to opt for not protecting the data because it costs too much in our view: such an attitude would cost heavily to the company in the case gets sued for non-compliance.

So, now that we have clarified that pain-point, let us understand what the meaning of Data Protection “by Design and by Default” is.

2.2.1 By Design

The wording of Art. 25 is all about establishing by design all the necessary measures to protect personal data. This clarifies that before discussing any implementation, the design of the solution must preemptively consider how to protect the data, and specifically applying the Six Principle established in Article 5(1):

  • Lawfulness, fairness and transparency
  • Purpose Limitation
  • Data minimisation
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality

The technical means to achieve such compliance often mentioned in the GDPR are:

  • encryption
  • pseudonymization
  • security of processing

We, the Architects, are tasked to architect and design our solutions implementing those principles, not moving forward if we are not satisfied that we did all we can given the context and the state of the art of the technology.

2.2.2 By Default

Stating “by default” seems redundant, but it is another very important concept that the legislator has established.

In our design, all behaviours must default to the strictest rules of Data Protection. Quite often in the past, the default behaviour in software was represented by the minimum set, the basics. Now we are called to ensure that if we don’t know or don’t have enough information at any point in our solution and its processes, then we apply the strictest rules, this to guarantee that no data will ever be processed or leaked by mistake.

The Privacy by Design approach is characterized by proactive rather than reactive measures. It anticipates and prevents privacy invasive events before they happen. PbD does not wait for privacy risks to materialize, nor does it offer remedies for resolving privacy infractions once they have occurred – it aims to prevent them from occurring. In short, Privacy by Design comes before-the-fact, not after. [2]

3 Data Processing

A special mention is necessary for processing the data. Let us first make clear what processing means, especially when law interpretation comes into play.

If I take note on a piece of paper of a person name and address, I am processing that person’s data. If I take that note and I put it in my wallet, I am again processing those data. When it comes to computing, then the notion is easy, taking the definition from the Oxford dictionary:

Operate on data by means of a program.

Any operation involving the data in any way represents Data Processing, not just the use of data in a computational algorithm relates to those data. This is stressed in the GDPR in multiple passages: we must limit data processing to the strict minimum that is necessary to carry out the operations that have been authorised by the data subject.

In principle, if for a given operation I need just first name and last name, I am supposed to retrieve only those from the database, not a full record including email and address: I am supposed to create a data view that satisfies that minimalistic data access.

This because the more data we move around in memory the more data we put at risk in case of an attacker watching our memory operations, and this brings us to data encryption, but first it is worth to clarify the meaning of State of the Art.

3.1 State of The Art

We have seen that the GDPR mandates to use the state of the art of current technology. This is mentioned in four places: Recitals 78 and 83, Articles 25(1) and 32(1).

The definition of State of the Art is a moving target: it changes over time. The GDPR established that Data Protection must be guaranteed by implementing the State of the Art of technology at the time of processing. Therefore we have now a legal obligation to keep our software solutions current, fully up to date. Implementing the State of the Art of technology for Data Protection today does not guarantee that in three months time the solution is not obsoleted by new findings.

3.2 Database encryption

The fact that we must architect, design, implement and maintain using the State of the Art of technology has a huge impact on the database we use.

There are two main levels of DB encryptions: at rest and in memory. The encryption at rest was the state of the art until live DB encryption was introduced, obsoleting de facto the simple at rest option because that encompasses both, at rest and in-memory data encryption.

While I write this paper, there are only two databases that I know of featuring both encryptions, Microsoft SQL Server starting at version 2016 and Oracle Database Server with the Transparent Data Encryption module, so at this point in time, any other database seems to be non-GDPR-compliant. This represents a big problem for us because many solutions are based on non-compliant databases, and there is no known roadmap for those to become compliant. In truth, some free DB will not have enough funding ever to be compliant, so they are necessarily destined to be dropped from any solution dealing with European citizens’ personal data.

The situation is different for NoSQL databases: none of them, in my knowledge, currently offers both at rest and in memory encryption. Therefore encryption at rest is the state of the art feature as for now, encompassing many of the most commonly used NoSQL DBs, like Mongo DB (only Enterprise with WiredTiger Engine) and Cosmos DB.

3.3 Database structure and pseudonymization

Once we have selected the correct database, we have to think about its structure. When we talk security, we all know the old say about not keeping all your eggs in one basket, and that applies quite well in this case.

Designing databases using the 3rd Normal Form (3NF) or Boyce-Codd Normal Form (BCNF) seems to be no longer enough. The principle of pseudonymization established in the GDPR cannot be easily implemented within our usual DB design practices.

The principle of pseudonymization is known to us with the more familiar term of steganography. What we must do is to hide information by replacing it with a pseudonym, thus hiding the data in plain sight in a way that the attacker sees different and meaningful information in its place.

Let us take this number, imagining it is a valid social security number:

1228475

Storing this as a steganographic information can be represented, for instance, as a list of US cities:

Rome, Denver, Washington, Arlington, Lebanon, Madison, Greenville

This results by applying the following translation table:

Original 1st Occurrence 2nd Occurrence
0 Clayton Auburn
1 Rome Hudson
2 Denver Washington
3 Springfield Franklin
4 Lebanon Clinton
5 Greenville Bristol
6 Fairview Salem
7 Madison Georgetown
8 Arlington Ashland
9 Dover Oxford

 

Obviously, we need more columns for the digit occurrences, but this gives you a practical example of using steganography to apply pseudonyms to data, to the end of making them useless to an intruder.

This implies that the column where we store the social security number should be defined in the DB something like VisitedPlaces, so we close the circle about fooling the intruder.

We also must tackle the issue of where we store the translation table. Storing that in the same DB along with the data we apply the table to would be an error. Therefore we should put the translation tables in separate storage, possibly an encrypted NoSQL DB.

One more thing needs to be told to secure the steganographic system: remember to replace the translation table regularly. To make this operation efficient you need a column to signify which translation table has been used to encode the column, and this can be any conventional value. For instance, if you replace the table every week, then you can have the week number stored. This is necessary because replacing the translation table entails parsing all the records in the table, and it could be a long operation executed at low priority. Therefore if the table is significant, at some point we will have some value still encoded using last week table, and we have to know this unless we lock the whole table while we replace the encoder, but that would be bad practice, especially for big tables parsed at low priority.

3.4 Issues with in-memory data

We know how often an intruder is nothing but a memory observer, a program that monitors the RAM looking for data it can recognise and snatch, often referred to as memory sniffers or memory scrapers. Against these attacks, we have the DB in memory encryption, but at some point in time we do have to retrieve the data to use them computationally, and that is the moment we become vulnerable again. This happens even sooner when using a NoSQL DB because, as we have determined, they only offer encryption at rest.

It is evident that we cannot use the data if we do not have them in clear text, so at some point, we will be anyway vulnerable if a sniffer is watching our process, therefore the trick is to keep the data in clear text for the shortest possible amount of time. In case we need those data for multiple processes then we must keep it in an encrypted memory area and decrypt them only for the microseconds necessary to our computational need.

Nowadays technology allows us to encrypt and decrypt on the fly at a non-significant time cost from a human user perspective. Therefore this is the best approach we can apply using today’s cryptographic state of the art libraries.

3.5 Sensitive data

Sensitive data are listed in Article 9(1):

“racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation”

When in need to process this type of data we must ensure a better and stronger cryptography, so the RSA key pair must be no less than 2048 bits, and 512 bits for block cyphers.

The most important thing about data, and especially about sensitive data, is to ask ourselves: do we need to process them? Are these data necessary to carry out the service? Do we have an explicit consent to process the data subject’s sensitive data?

Do remember that we can no longer collect data with generic or implicit consent: we must ensure that the data subject is fully aware of the fact that we are processing these data, (s)he must explicitly consent to the processing using a form that also clarifies the scope and timeframe of the data processing. Moreover, we must securely store the consent, and in case we must process the data for a longer period a new consent is necessary, especially for sensitive data.

3.6 True random numbers generators

Every time we mention cryptography, we are implying the generation of random numbers, and those shall be cryptographically secure.

State of the art for cryptographically secure random numbers generation is hardware cards using shot noise, photoelectric effects and other electron or photon-based systems.

These cards should be compliant with NIST SP800-90 B [3] and C (still in draft) [4], like the Comscire PQ4000KS [5] or SwiftRNG Pro [6]. Given the low cost and high availability reached by these true random number generators, there is no longer any justification to avoid implementing the technology: we can safely state that the era of pseudo-random numbers based on algorithms instead of dedicated hardware is over, hence we must adopt this technology to be compliant with the GDPR.

3.7 Integrity and security of processing

Four are the principal elements that must be considered to guarantee data integrity and security of its processing:

  • Always implement referential integrity constraints at the database level
  • Never delete physically any data unless they have reached the limit of their storage period: use logical delete only (a boolean column Is_Deleted)
  • Define a clear backup policy and comply with it
  • Use Test Driven Development and have a QA team working in isolation from the development team

None of the four points above is more important than another one: all have a considerable impact on being able to guarantee that the system we architect, design and implement is as robust as the GDPR requires it.

If in a court of law will be proven that the product has weak data integrity and that there was no standard QA during all phases of the software lifecycle, your company will most likely be found in breach of the GDPR.

3.7 Securely storing passwords

 

Figure 1 – Sample algorithm for secure password generation

Passwords have been a serious problem and always underestimated when it comes to storing them in a safe mode. Most users have the habit of using the same passwords in rotation on all their password-protected resources, and this is an understandable behaviour because otherwise, the average user would end up with an average of 20 passwords to manage, and there is still little confidence in the public in the password managers on the market.

Due to this habit, if one password is hacked then the user must deal with a breach in many services, not simply the one that suffered the hacking. It is our responsibility to make it hard to hack the password, therefore we cannot just hash it as it is common habit in most architecture: there are too many excellent brute force hash solvers available, and because the computing power available to any hacker is growing exponentially we cannot leave the passwords exposed by simply hashing them with a standard hash algorithm, not counting online brute-force with dictionary helpers resources like https://hashkiller.co.uk/, https://crackstation.net/ and hundreds more, almost all available online for free.

So, what if we must play the role of the identity provider, having to store users’ credentials? I am offering here (Figure 1) a sample solution, an algorithm I published a few years back. The first password hashing always happen on the client side, so that the original password chosen by the user is never sent over the wire in clear text. In a context of a signup process, we have both a username and a password; therefore in the algorithm, I propose to derive the salt from the username in a creative way, like, for instance, the following (taken from my previous article).

Imagine a Mr John Doe signing up and the email being the username as well (for simplicity):

john.doe@somedomain.com

moreover, he decides to use as a password

MyPassword

The first step on the client where our new user is signing up is to compute the salt. To do so first, we note the ASCII value of the first letter in the username:

j = 152

Being 152 an even number we arbitrarily decide to pick from the username all the characters placed in an odd position, so we get our raw salt:

jh.o@oeoancm

So, let us now get the MD5 hash of this salt to get a better form of it to use:

c4211ead299f2bd80a3465ab9be18c05

Now we add the salt to the password, getting the following

MyPassword * c4211ead299f2bd80a3465ab9be18c05

I am adding “ * “ in between just as an added complexity and better presentation in this sample. We now have something complex enough: let us translate this into a Base 64 encoded string of the SHA512 hash of it:

723b6f133818b87215e9f476350d06e55815c8de00fc13af58aa94dee4c66398b441b2a1060dd1e2f7f82dd3ab2a420ee4245b943fc7721ab59b765f84eefa26

This is a typical Base 64 string, with all alphabetic characters in lowercase. At this point let us make it more interesting, so, again because j is an even number, we arbitrarily decide that all characters in odd position shall be uppercase, getting the following password:

723b6f133818B87215E9F476350d06E55815C8De00Fc13Af58Aa94DeE4C66398B441B2A1060dD1E2F7F82dD3Ab2a420eE4245b943fC7721aB59b765f84EeFa26

This represents the actual password the client software will send to the server.

As you can see from Figure 1, on the server we do something similar, but not identical, just any other creative way, like picking the 11th letter of the hashed password and do something based on that being even or odd in ASCII, or any other rule you come up with.

The best approach is to create an algorithm unique to your company so that even gaining access to the database it will be impossible to try to reverse-compute the original value that the user picked as a password.

3.8 About software libraries and access

I did refer to algorithms for steganography, in-memory encryption, and password protection, and you will end up with many more before your solution is complete, and this will bring up another problem: the security of the implemented algorithms.

It comes without saying that the libraries offering the Data Protection functionalities must be heavily obfuscated so that the cost to disassemble them would be too high for any hacker or organisation to afford. But what about the most dangerous person we can deal with? One might ask “Who is that person?”; only one is the worst regarding dangerousness: the disgruntled employee.

Let say that Mr Doodle is a chief analyst in our data analysis department and he got fired due to workforce reduction. He is angry and has access to the data, so he dumps all the databases we manage on Torrent. Well, because we implemented the measures mentioned before, from a Data Protection perspective we do not care much because we will incur in no consequences in respect of the GDPR: all data are steganographically protected and no clear password is stored anywhere. Thus nobody can be even identified. Well, this is true if our Mr Doodle has no access to the algorithms we use and no access to the source code of their implementation. Otherwise, he can push those as well on a Torrent and then we will be in some serious trouble.

It is very important that we take all relevant precautions to ensure that no single person has full knowledge of the Data Protection algorithms nor access to the full implementation source code. This implies that the development of the implementations must happen atomically, having different teams developing different parts of them, leveraging functionalities offered by other obfuscated libraries, so that it would be impossible to figure out what the Data Protection code is doing. This means that in case of leaking of the algorithms only the architect that designed them or the custodian of the files would be the source of the leak, making simple to identify the culprit in such case.

Here as well I am sure you are thinking “Isn’t this too much?” and again I invite you to figure yourself on the stand in a court of law being asked “Did you or did you not implement the State of the Art in Data Protection? Could you or could you not, in your professional opinion, do more to prevent the data leakage?”.

Setting up the correct process and access levels is a simple thing; therefore, it is not justifiable not to put those security measures in place in any company from medium size and up. A small company should take it seriously, but adapt this to a reasonable infrastructure considering the available means.

3.9 Data erasure (Right to be Forgotten)

Another technical trouble comes from the right to be forgotten established in Article 17 of the GDPR. This is a problem that has no simple solution because we must have a data backup plan implemented to be compliant with Article 32 of the GDPR, so when a data subject requests to be forgotten (full erasure of all his personal data) we have a problem.

The solution, in this case, cannot be technical, because the cost of safely erasing data from backups is too high. The GDPR allows exceptions as long as the user consented to those in the first place so, in my opinion, when we collect the data subject’s consent we shall have a clear clause, explicitly accepted, where we clarify that in case of the consent to be revoked exercising the Right to be Forgotten, the data on backup media will stay and will be destroyed along with the backup media when the retention time limit is reached.

A clear process must be in place in case the data are restored to ensure that none of those data will be restored along. It is reasonable to fully restore a backup and have a batch process immediately after the restore operation executed to delete the data that must be erased.

I know that this is not a clean solution, but the alternative would be to process all the backups we have in a sandboxed environment, restore them, then erase the data in question and make a clean backup again and destroy the original one: the cost of this operation would be unreasonable in most cases.

3.10 Right to Data Portability

The GDPR establishes for data subjects the right to have their data exported in a portable standard (Article 20). It goes even farter, by establishing that, where technically possible, if the data subject needs the data to be ported to another controller, that process shall be executed (Article 20, paragraph 2).

In simple terms, this means that if I have a Google Plus account and I want all to be moved to Facebook, where Facebook has an endpoint to allow such a data transfer, Google will be compelled by the GDPR to execute the transfer of all the data to Facebook.

This doesn’t present any technical problem, but, because all the data are scrambled and encrypted, we must put in place a process to allow the exercise of the Right to Data Portability, and it has to be in place before any user exercises this right, otherwise the time to execute it risks to go beyond any reasonable delay and the data subject will be entitled to a compensation while it’ll be likely for our company to be fined for not having established a proper mean to export the data in a timely fashion in conformity with the GDPR.

 

4 Client and Server

Most, if not all, data processing involves at least two tiers: a client, usually running on the operator device, enabling the operator to query, add, amend, or delete data, and a server, where the data are stored and where most of the computation is performed.

Sometimes there are multiple servers, where usually the server interfacing the human client is, in turn, a client of other servers. All this talking between machines has one major implication: data are exchanged over a network, which, by its very nature, is almost never secure.

Figure 2 – Enhanced Gatekeeper implementation

4.1 An enhanced Gatekeeper

As I mentioned in the introduction, in this paper I will not touch the matter of securing the IT infrastructure, so let us go up a few logical levels, up to the Application Layer, I will illustrate how to secure the data we exchange.

The obvious first step is to design the application to leverage the Transport Layer Security (TLS, ex SSL, Secure Socket Layer), so we are assuming that our Client-Server architecture is based on a RESTful approach, not on a transport layer connection, otherwise we have to re-invent the wheel, and that is not something we like to do, especially due to the costs involved in establishing and then maintain a custom security protocol based on raw TCP/IP packet exchange.

In Figure 2 we can see the process diagram of my own implementation (which I put in the public domain a few years ago) of the known Gatekeeper Design Pattern. In my vision the Gatekeeper has one single REST endpoint, expecting a POST message structured as follows:

Field Value
Sender Public Key The sender public key that the receiver will then use to encrypt the CBC key used to encrypt the answer
Encrypted CBC Key This is a SHA256 key used to encrypt the Body. This key is encrypted using the other party’s RSA Public Key
Body This contains the actual payload (API call and body) encrypted with the CBC key

The SHA265 CBC (Cipher Block Chaining) key is generated anew every time we send a message: never recycling this key is paramount to securing the data transferred between Client and Server.

Two are the assumptions I make in this algorithm:

  • The Server’s RSA Public Key can be retrieved from a location known to the client: this allows to regenerate this key every day.
  • The client will generate a new RSA Key Pair every time it is executed

The body is the actual API call that the Gatekeeper will relay, acting as a proxy. The usual structure of the encrypted body is a JSON structure carrying four fields, representing the API call to be forwarded by the Gatekeeper,
as shown in the following table:

 

Field Value
API Query The legal query string, something like /api/user&param=xyz
Body The eventual payload to pass to the API endpoint
Verb The HTTP Verb to use (GET, POST, PUT and so on)
Headers The headers value for the endpoint, especially necessary to pass on a JWT token for authenticated services

 

The Gatekeeper will add the base URL, prepare a new message using the proper header and body information provided and place the call on behalf of the client on the internal API server.

Naturally, the Gatekeeper will rely on the result back to the client using the same approach. Therefore the raw answer from the internal API Server will be encrypted using a new CBC key, and the key will be encrypted using the client RSA Public Key.

This approach will guarantee state of the art in data protection, at least while I am writing this paper.

Naturally, the Gatekeeper can be extended to also play the role of an API Gateway simply by implementing a set of Access Control List (ACL) to the internal APIs available so that the Gatekeeper will allow or deny access based on the rules set in said ACL.

4.2 Never trust the client

We all have been trained to repeat the mantra “Never, ever trust the Client”, still it is good to touch this subject, especially in the context of the liabilities established by the GDPR.

We must remember in our design that any client can easily be hacked, a client device can be lost or stolen, so we must design our solutions assuming that our clients will fall in the hands of an attacker.

The consequence of this is that no business logic will be executed on the client, and especially no personal data will ever be stored on the client device. We also have to pay attention to always encrypt in-memory strings and also design the user interface in a way that on each screen there is the least possible identifiable data and the way it is displayed should be easy to understand only to a human, this in anticipation of the device being contaminated by a screen-scraper, a malicious software capturing screenshots and trying to extract meaningful information out of that.

The screen-scraper protection can be implemented by designing the user interface using uncommon fonts, variable for each data type, so it will be easier and meaningful to the user but will make little sense to a program trying to decode the content of the screen.

Another measure, optional but very effective, worth implementing in case the client consumes sensitive data, is to ensure the use of screens with a low viewing angle and to leverage the LCD colour distortion as suggested by Harrison and Hudson in their paper [7], so to scramble the screen content when viewed by anyone that is not in front of the monitor.

5. Conclusion

In this paper, I have explored the extent of the impact of the GDPR on software architecture and proposed practical solutions to ensure full compliance with the regulation. The proposed solutions have been verified in real-world applications. Thus they represent a pragmatic approach to implement compliance. Because the legislator in writing the GDPR used a form that has considered the pace technology evolves, we shall establish a protocol for a periodical revision of our solutions to ensure that they are still compliant. Companies shall reflect these protocols in their ISO certifications processes and in their software maintenance budgets.

References

  • [1] European Union (2016). Regulation (EU) 2016/679. Brussels: European Union, pp.1-88. (URL link http://bit.ly/GDPR_Pdf).
  • [2] Cavoukian, “Privacy by Design – The 7 Foundational Principles,” Internet Architecture Board, 02-Nov-2010. [Online]. Available: http://bit.ly/AnnCavoukianPhDPbDD. [Accessed: 05-Feb-2018].
  • [3] S. Turan, E. Barker, J. Kelsey, K. McKay, M. Baish, M. Boyle, NIST, and NSA, “SP 800-90B, Recommendation for the Entropy Sources Used for Random Bit Generation,” SP 800-90B, Entropy Sources Used for Random Bit Generation | CSRC. [Online]. Available: http://bit.ly/SP800-90B. [Accessed: 17-Feb-2018].
  • [4] Barker, J. Kelsey, and NIST, “SP 800-90C (DRAFT), Recommendation for Random Bit Generator (RBG) Constructions,” SP 800-90C (DRAFT), Recommendation for RBG Constructions | CSRC. [Online]. Available: http://bit.ly/SP800-90C-DRAFT. [Accessed: 02-Apr-2018].
  • [5] “PureQuantum® Model PQ4000KS,” ComScire. [Online]. Available: https://comscire.com/product/pq4000ks/. [Accessed: 17-Jan-2018].
  • [6] “SwiftRNG Pro – TectroLabs,” – TectroLabs. [Online]. Available: https://tectrolabs.com/swiftrng-pro/. [Accessed: 17-Feb-2018].
  • [7] Harrison and S. E. Hudson, “A New Angle on Cheap LCDs: Making Positive Use of Optical Distortion,” Chris Harrison | A New Angle on Cheap LCDs. [Online]. Available: http://chrisharrison.net/index.php/Research/ObliqueLCD. [Accessed: 14-Apr-2018].
Did you know that C# foreach statement is your enemy in games development?

Did you know that C# foreach statement is your enemy in games development?

When a developer lands in the games industry he has to change his state of mind about performances. In this industry we have to perform a lot of operations in less than 33 milliseconds (30 FPS, frames per second), possibly tuning the logic and the art assets to achieve 60 FPS on standalone (Windows/Linux/Mac) and consoles (Xbox One/PS4) and that means rendering the scene content, computing physics and game logic in no more than 16 milliseconds! Not really an easy task, that’s why in our industry every CPU tick counts really a lot.

So, what about the foreach statement? Well, this one is really bad, killing hundreds CPU ticks just to allow the programmer to write less code!  You think I’m exaggerating here? Let’s have a look to some code to give definitive proof.

Let’s open Visual Studio (originally tested in VS2008 Professional, then VS2010 Professional, then VS2015 Enterprise and tested again with VS2017 Enterprise with .Net 4.6.2, to produce the compiled code below) and let’s create a simple C# console app, and in that let’s write the following very simple code:

That’s an easy one, right? these two methods perform the same job, but one costs a lot in term of CPU ticks… let’s see why. I use ILSpy (http://ilspy.net/) to look into the compiled code, so let’s analyze the IL (intermediate language) I get after Visual Studio builds it (result unchanged over the years!).

Let’s start with the Cheap method:

So, nothing odd in the above, it’s pretty much what I would expect: a simple loop and a straight move of reference value, nothing more.

Now let’s have a look to what we get in IL from the Costly method:

Well, well, well… it’s many lines longer, and it contains some quite nasty logic. As we can see it allocates a generic enumerator (IL_0006) that gets disposed finally (IL_0028 to IL_002e), and that apparently is creating load on the GC (Garbage Collector). Is that it? Not really! We can also see (IL_0015) the nasty unbox operation, one of the most costly and slow in the framework! Please also note how the loop end is caught by the finally clause in case something happens (mostly an invalid casting), not really code we would write in the first place… and still we get it just using a foreach.

So, imagine to have a few of these in your game logic executing at every frame… obviously it’s never simple code like in this example so that it will be way nastier than the result shown in this above.

We struggle already so much to keep our games above 30FPS while presenting beautiful artwork (really costly to render), and a lot of nice VFX (visual effects, definitely costly) and we all love to rely on the underlying physics engine to improve the overall gaming experience: all that costs quite a lot… so when it comes to the game logic we have to write, every clock cycle and CPU tick are so valuable… we cannot possibly waste any of them, so let’s remember two rule of thumbs:

  • Language helpers that make it easier to code come with a performance cost
  • Always verify the efficiency of your code habits looking into the generated IL code

In the game industry we are all aiming at improving gamers’ experiences, making it immersive as much as technically possible: gamers are quite demanding, so let’s make sure that we always keep performance testing at the top of our coding practice, because losing even one frame in a second can be a failure factor from a market perspective.

How to securely store user credentials

How to securely store user credentials

The most common mistake is storing passwords in clear text, accompanied by the equally dangerous mistake of sending them in clear text over the network, the latter based on the naïve assumption that an SSL connection grants enough security.

SSL connectivity alone cannot guarantee security: every week new vulnerabilities are identified and fixed, and that shows how wrong it is to assume that SSL alone grants security. That assumption also implies that all users always install the latest security fixes, and that is another quite wrong assumption.

Now let’s talk about “salting” mistakes, but first let’s define it: when we add something to a password to make it more complex, we say that we are adding salt to it. For instance, if the user chooses mypass as his/her password, we might  want to generate a random number to salt it, let’s say 1234. We add that to the original password, so getting the salted password mypass1234.

Salting the password and then storing the salt in the database is another common mistake: why? Because doing that we assume that the attacker is an entity outside of the company, but that is a incorrect assumption. What about a disgruntled employee dumping the client’s database on the Torrent network? This is becoming an issue we encounter frequently, trending all over the world because of current bad employment practices. We need to remember that threats do not come only from outside the company, they might as well hit us from within, and those are the most dangerous.

Because of this we should never store any salt in the database. We make sure that we code the algorithm that computes the salt in a deterministic way. We also need to ensure that the algorithm is known only by a close circle of people, and that its parts have been developed either within the “circle of trust”; if this is not possible then make sure to break it down into methods that can be aggregated separately and have those produced by different people. The full source code should never be available outside the company’s “circle of trust”.

One other definition we now need to continue is “hashing”. The actual dictionary definition brings us quite close to what we do: we break down into pieces and scramble the bits of a word or phrase, and in doing so we reduce it to a fixed length string of hexadecimal values. To do so we implement algorithms that have been approved by NIST (and other organizations). In this algorithm we’ll use MD5 and SHA256 and SHA512.

The password generation process in details

The above implies that the first password hashing must always happen on the client side, and the result of that is sent over to the server. But is this safe enough? Not entirely, we need to add “salt“ to ensure security.

What does that mean? Let’s look into it from an attacker’s perspective. The common way to hack an account is to use lookup tables to find common passwords, so the hacker takes the encrypted password that he found on the hacked database and runs it against a database of password hashes like the followings:

The above are simple test resources, in reality any hacker has a seriously complete database compiled against full phraseological dictionaries and thesauruses in multiple languages: that’s what we are up against. Have you tested your email on Have I Been Pwned? That resource (quite nice!) can tell you if your email was part of a known security breach, and you can see what the breach was about. Most likely your email address and your hashed password have been compromised, and if the hash was simply the MD5 or SHA256 of the password itself then a hacker relying on a good dictionary database will most likely get your password back in clear text.

Because of this we have to be very careful, starting with establishing a strong password protection algorithm. Mine is shown in the following diagram.

Let’s see this in a step by step practical example from a real world algorithm in use in DFT Games Ltd games (very simplified version). In the sign up phase, the user types his user name, let’s assume it’s the email address as it’s a common scenario (we force it lower case to ensure determinism in the next steps):

jhon.doe@somedomain.com

then he types his password:

my password

Such weak passwords are painfully common, so we have to make sure we correct this to protect the client and zero out our liabilities. To do that, let’s compute our salt from the user name. Here I use one of many possible approaches, any other is ok as long as it’s deterministic.

Because the first letter is “j” its value is 152 in ASCII, an even number, therefore we pick all odd characters from the user name, so that

jhon.doe@somedomain.com

becomes for our purpose the following string:

jo.o@oeoancm

Now we hash this new string, and for this step a simple MD5 will be enough, giving us the following:

d9e2feaea42f0f4b6891f8030f357041

Now we have all we need to fix that weak password, so we chain all together adding a star character in between just to increase the complexity, getting the following string:

my password* d9e2feaea42f0f4b6891f8030f357041

Now this looks much better and it is quite hard to hack using common tools, therefore this is what we are now going to hash using SHA512, getting our first secure value as all lower case Base 64:

27677977880efb384b4ef40dbc8713650d6aa41e6b75619a6e29c540f9b75eae721958c35001bbf102b481847716699114bcb31dbf97f744a13ecb1f5e4eabc6

Well, how can we improve this even more? For instance, we can apply the same odd/even rule to this hash, making sure that, because “j” is even, every non numeric character in an odd position is upper case, getting this final string:

27677977880eFb384b4eF40dBc8713650d6aA41e6b75619a6e29C540F9B75eAe721958C35001Bbf102B481847716699114BcB31dBf97f744A13eCb1f5e4eAbC6

This final one is not just the SHA512 of a salted password but has been parsed to apply a letter casing rule derived by the username exponentially reducing the already dramatically low odds to be able to crack the code via any brute force attack, with or without any hash database.

But… will this be enough? Not really! We still could be cracked given enough computing power because the only source of this hash is the username and password: we need to add something on top of this on the server side. A good way to do this is the one I show in the diagram above. Basically, when the user signs up, the server creates the user record first, then it takes the record’s time-stamp as a string, and strips all the white spaces from it:

2016-11-0616:31:05.026

then selects a part of this new string, maybe using an approach similar to the one used for the password by the client algorithm, but with some changes, like odd/even value of the third character in the password hash sent by the client, getting a result like this:

061663:506

Then we concatenate this new information to the hash received from the client to salt it, and we hash that again via SHA512, getting the final hash:

8da1b8d36898a72601a92311f63f442f55de12122ac698308e54d8db2b43b54f8597b08d109bbe0706765619a6e956d6d177e9488575b26a229e59223572fb39

If you want, you can also perform the upper case process on this result, to make it harder to crack. The result of all the above steps is the one we store in the database. We’ll perform the very same steps when the user signs in again in the future.

There is no dictionary attack that can crack this algorithm, no matters how good the hacker is… even the US NSA and CIA combined would certainly fail to crack this, no matter how many resources could be put on the task!

Naturally, the above steps are just a sample, a suggestion: be creative and original, don’t just use it as it is in this article because… I just published it, so it’s now possible to crack it 🙂 Use different combinations to compute your salt on both client and server and make sure it’s complex enough that it cannot be easily derived.

The actual full algorithm of the above real-life implementation is in use in our family company, DFT Games Ltd., and the real details about the salt are known only to family members and provided via a scrambled DLL to the development teams for in-app implementation. Keeping the client and server algorithms a secret is key to make this really secure. On top of this we have two-factor authentication based on our own authenticator app (and algorithm) for specific high-security applications.

Women in Games, Latinos in Technology and so on

Women in Games, Latinos in Technology and so on

I have spent my whole life since I was 13 (I’m 57 while I’m writing this) as a front-line Human Rights activist. I fought the good fight when women were treated like objects in western countries like Italy, France, USA, and it was a hard fight. Same applies to anti-racist fights when the majority of the people still thought that black people could only do manual work.

Back then it was hard, but in my view, those battles in western countries have been won. Today’s society stigmatises racism and sexual discrimination, so I feel really good about that! My focus is now helping on these very same causes in those countries that still have these issues, mostly in Africa and the middle east. However, what I see in western countries is not really to my liking, so I feel necessary to say something about this.

It seems that now there is an odd phenomenon going on: as women and racial groups are no longer oppressed, they have now so-called leaders that promote real racial and gender hate groups, keeping alive a partition that is no longer there, leveraging the social stigma around non-discrimination to be able to spread their lies.

They are reversing the phenomenon, organising groups like “Women in Games”, “Latinos in Technology” and so on. The same way we see black people adopting a strong racially biased language, stressing “being black” like it would somehow matter to any intelligent person…

Are there racist idiots out there? Sure! Are there “pussy-grabbing” males? Sure, there are, but they are stigmatised, they are seen as old idiots, so let’s not empower them by keeping alive groups that are now meaningless.

Why should I even care if a colleague is a Woman, Latino, or Black? What does that mean today? Nothing at all if you ask any intelligent person. A good developer is a good developer; I can’t care less about gender, skin colour or religion…

Moreover, about the sex-related issue, why “Women in Games”? We are winning a more current and socially significant struggle with LGBT communities (leaving aside Mr Trump very narrow mind), so I can understand groups like LGBT In Tech because that cause is current, essential and about to be won for good.

I hope not to see any longer those anachronistic and sectarian groups keeping alive old issues that are no longer there, wasting our time and empowering a small minority of idiots. Let’s focus on real social problems, like LGBT rights, Women rights in third-world countries, freedom of information, child-soldiers and the like. Let’s fight the good fight!