Artificial Intelligence Security : Introduction
This article is still under writing : Last Updated (11/16/2021)
Updating the content by updating it (there are a million things to talk about). You can forward your suggestions to cihan@deeplab.co.
One of the most challenging problems in the IT sector in terms of software, technology and general scope is undoubtedly security and the other is performance. In general, every big enterprise/company makes/has to make the most serious investment in these two topics while developing and developing its general infrastructure. The reason for this is that these two titles are multidisciplinary, that is, they are related to hundreds and sometimes even thousands of sub-concepts, algorithms, knowledge and experience, both technically and theoretically. Personally, I try to understand and apply the performance/security logic by focusing specifically on software, database, artificial intelligence, because of my interest in these two topics. In this article series, I will focus on artificial intelligence security.
Avoiding Confusion:
Generally, when AI Security is mentioned, it can be misunderstood as the use of artificial intelligence in the security industry. However, both are different matters.
AI for Cyber Security: It refers to the use of artificial intelligence in the cyber security industry. In other words, it is meant to strengthen a product / service developed for cyber security or a subject with a cyber security problem with artificial intelligence.
AI Security: An artificial intelligence infrastructure and/or algorithm (model) has its own security vulnerabilities (you can’t even imagine its depth!) and includes the necessary attack and defense mechanisms to prevent these vulnerabilities from being manipulated and used for malicious purposes. So its goal is to protect the artificial intelligence infrastructure itself.
Apart from this article, you can reach the live broadcast recording that I covered on the same subject here.
https://www.youtube.com/watch?v=oN6twhEwRWw (Turkish)
You can also find some AI Security and AI for Security demo videos I’ve posted here:
AI Security Demos
- Attack on Image (CNN) with GANs & PyTorch
- GANs Adversarial Attack
AI for Cyber Security Demos
- Malicious URL Detection with Deep Learning (CNN/LSTM)
- Phishing URL/Website Detection
- SQL Injection Detection with Deep Learning
- Static Code Analysis with Deep Learning
- XSS Detection with Deep Learning
As I mentioned in the introduction, cyber security is a multidisciplinary topic that is deeply related to many fields. And when we include artificial intelligence in the subject, the process becomes much more complex. I will try to explain this confusion in depth but briefly. Otherwise, it will not be possible to convey the logic of a subject of this depth correctly.
Therefore, I will gather my AI Security narrative under two main headings. These:
- AI Infrastructure Security
- AI Model Security
AI Infrastructure Security
It is the part that is deeply related to general cyber security issues, related to the security and development elements of database, web, system, hardware, architecture and many more concepts.
This title falls within the AI Engineer’s area of interest and expertise. AI Engineer, on the other hand, is simply the person responsible for the development of an end-to-end artificial intelligence project (model and infrastructure) and transferring it to the production environment, and its development, maintenance, performance, scaling, tracking and monitoring processes, which we generally call AIOps/MLOps.
Those who want to join our AI Engineer and AI Security Engineer training programs: https://cihanozhan.com/ai
AI Model Security
It is the area that focuses entirely on the security and deception (hacking) of the artificial intelligence model (algorithm).
This title includes what kind of security vulnerabilities the developed artificial intelligence model has, which techniques and how these vulnerabilities are manipulated, and protection methods. Although this field is very deep in itself, such as AI Infrastructure Security, it also includes vulnerabilities that do not have exact solutions due to logical and technical limitations (there is no 100% security). However, the protection methods of these vulnerabilities are both being researched and can be minimized by using different technologies and software techniques.
‘About Me’ in a nutshell (for those who want to get straight to the point, skip here)
In the early 2000s, I started my IT studies with hacking/security studies. Due to my interest in security research, I tried to examine the sub-headings of the security research techniques I used in depth. After dealing with security-focused studies with these techniques (OWASP, Data Security, etc.) for a long time, I went a little deeper to understand these techniques more deeply and moved towards software/database/artificial intelligence-oriented studies, and since then I have been working on both software/database and these topics. I am advancing my security/performance studies in parallel. In other words, both I develop products/projects in these areas and security/performance are workspaces in the default package for me.
As a result of the software research I have done due to this security-oriented curiosity, I published Turkey’s first ‘Oracle Database Programming’ course in 2010, and my book ‘Advanced T-SQL Programming’ on SQL Server in 2013/March, and I have been using C#, Python and Go for years. I give ‘Secure Application Development’ training/consultancy on both advanced programming and corporate.
Cyber Security Overview
Before getting into artificial intelligence security, it is necessary to understand the general concept of Cyber Security and the logic under it. We’ll go through this part very quickly.
Cyber security (IT Security) is a general overarching title and includes many main titles under it. Some of those:
- Web Security
- Database Security
- Network Security
- System Security
- Application Security
- Hardware Security
Of course, these are very generalized and the most common topics we need. These titles have plenty of sub-branches. Also, two more have been added to these titles in recent years:
- Blockchain Security
- Artificial Intelligence Security
If we branch out according to jobs, roles and specialties, many more titles may emerge. For example, there is a need for security-oriented specializations in the field of autonomous systems, which we have been working on for years.
- Autonomous Systems Security
- Self-Driving Car Security
- Autonomous Factory Security (Industrial Security)
- Smart Home Security
- Smart City Security
However, without distracting the subject, we will just give information about a web-oriented approach:
OWASP (Open Web Application Security)
The OWASP organization has existed since 2001, when I started my security studies, and it has/is doing very useful work. Its main purpose is to compile and collect the security risks encountered in web applications/projects and present them with reports and rankings at regular intervals. It not only provides statistical information, but also describes the methods of protection in a technology-agnostic way.
OWASP (Wikipedia) : https://en.wikipedia.org/wiki/OWASP
And it also publishes a list called OWASP Top 10 since its inception. This list is updated every few years. To access this list:
Here are a few resources for those curious about OWASP and developer-focused security:
To give a few software-focused examples of software/AI security and the depth of their interrelationship:
1 - Database systems and operating systems running under them can be manipulated with SQL.
With SQL queries, database/database servers with configuration errors or security weaknesses or operating systems running under them can be hacked. Because advanced SQL servers are also capable of running operating system commands.
The code above is the code for the security section of my publicly shared book. In code samples, information such as IP / PORT or server name can be obtained by accessing the operating system via SQL (with SQL Injection), user / authorization operations can be performed on the operating system, the server operating system can be formatted with SQL, the server can be closed (IIS or web server) Even the copier and coffee machine on the server’s network can be hacked… Of course, the data in the relevant database can be changed/deleted as desired while these are being done. If you can access the operating system via SQL, the files on the OS (including your AI project’s datasets/AI model/settings¶meter files) are also accessible.
Also, we don’t just use the database to save data. Millions/billions of data kept in a DB, the relevant application constantly uses all the insecure (not filtered and checked for security focus) data in this database during its years of existence. In other words, you can use a malicious data line that was added to the database years ago as learning data for your artificial intelligence application when the time comes, causing it to poison itself.
2 — An application/dll can be hacked and modified using a programming language and infrastructure.
In this scenario, both your data source, data paths, and AI pipeline/MLOps infrastructure can be hacked and modified by gaining access to your application source code. It also means that all of your Online Machine Learning(Realtime Learning) infrastructure accessed through your hacked application system can be manipulated. Although it is difficult, one of the possible (and encountered) scenarios is that the relevant assembly files (without security signature) are decompiled in runtime after the server systems are accessed, and then the working mechanism of the relevant system can be changed by compile it again in runtime after the desired manipulations are made in the codes. So even changing the ML pipeline in a professional hacking attack can be one of the possible scenarios.
To understand the logic of decompiling code in Runtime:
c-sharpcorner.com/UploadFile/84c85b/using-reflection-with-C-Sharp-net/
c-sharpcorner.com/uploadfile/puranindia/reflection-and-reflection-emit-in-C-Sharp/
Subject headings ‘Reflection’ and ‘Reflection.Emit’
Since Python is so frequently used in ML/DL projects, it may be helpful to focus specifically on this language.
For Python, you can start with:
- www.python.org/dev/security/
- snyk.io/blog/python-security-best-practices-cheat-sheet/
- py.checkio.org/blog/how-to-write-secure-code-in-python/
- securecoding.com/blog/python-security-practices-you-should-maintain/
I also recommend you to review the open source synk software for security purposes.
My favorite programming languages are Go, Python, Rust and C#. In this direction, I am sharing a .NET-oriented review document.
.NET-focused resource: https://res-4.cloudinary.com/eventpower/image/upload/v1/19ncs/presentation_files/sao3ktg3tjrhairwd7hb.pptx.pdf
Regarding C#, you can start with:
- https://snyk.io/blog/snyk-code-security-scanning-c-sharp-dot-net/
- https://www.microsoft.com/en-us/securityengineering/sdl/practices
Go, Rust and C# from the programming languages I mentioned above are designed with a ‘security focus’ by default. I won’t go into security-focused in-depth analysis of these, but overall, all three have very strong language design. I usually prefer Go as a back-end for large-scale projects, but Rust for networking, high security or performance-oriented projects or related parts of a related project. However, in general, Go is a language where you can develop both back-end and ML/DL models without any problems, with healthy and high performance.
You can also start with Go here:
- https://github.com/OWASP/Go-SCP
- https://golang.org/security
- https://snyk.io/blog/go-security-cheatsheet-for-go-developers/
Rust is also used in tight projects in a certain/niche segment of the industry due to its high security and performance. I think the rise of Go and Rust in AI-driven projects is unstoppable.
Finally, a few resources on Rust:
- https://www.rust-lang.org/policies/security
- https://github.com/rust-secure-code/projects
- https://anssi-fr.github.io/rust-guide/
Unfortunately, whether it is web, mobile or AI, the root of security problems is often the developer’s lack of knowledge of the relevant technology/programming language and not knowing its security perspective and depth. To draw attention to this awareness, I have given a few starting resources for each language in particular. In general, you can do your research in the form of ‘Secure Application Development’ or ‘Secure Coding’.
3 — Manipulating decision mechanisms by directly targeting OWASP (web, mobile, etc.) focused and hundreds of other critical security vulnerabilities of artificial intelligence application architecture.
Many security researchers or programmers think that security vulnerabilities are limited to OWASP, and this is a very common mistake.
If we need to generalize the security vulnerabilities, we can divide them into 50%/50% as back-end and front-end sources. In this case, it requires not only the database or the software developed in the back-end, but also the front-end at the same rate. Moreover, client-side ML/DL projects with JavaScript (such as TensorFlow.js) developed in recent years are not to be underestimated. As a matter of fact, the algorithm developed for the client-side and the protection of the offered service against manipulations are as important as the back-end.
Online Machine Learning (Realtime ML) : It is the technique/method/infrastructure of the artificial intelligence model to ensure the learning continuity of the artificial intelligence model by using the instant/current data obtained in the production environment and the data in the existing datasets.
- Wikipedia : https://en.wikipedia.org/wiki/Online_machine_learning
- Medium : https://medium.com/value-stream-design/online-machine-learning-515556ff72c5
Introduction to Artificial Intelligence
In recent years, the rise of artificial intelligence continues at a much higher rate than in the past decades.
For those who have not studied artificial intelligence, I recommend that they first follow the guide above. In order to understand the security perspective in artificial intelligence, one must first understand the concept correctly.
The Trick : ‘Learning by Algorithms’
Everything called data and systems can be hacked.
Model
What we define as the learning algorithm in Machine Learning/Deep Learning projects is the model.
ML Model = ML Algorithm
For more information on the MNIST model and code sample:
- https://www.tensorflow.org/tutorials/quickstart/beginner
- https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb
First of all, let’s try to analyze an artificial intelligence project in production and understand the sub-logic of possible security problems.
The first thing to realize when developing a real artificial intelligence product/service is that the artificial intelligence algorithm (model) occupies a very small place in the overall software, BigData and Data Engineering architecture. This is the ML Code field indicated in the image of the document published by Google in 2015, which you see above. At that time, ML Code was defined as 5% of the project, but today this can constitute a maximum of 2%. The reason for this is the proliferation of software-oriented products and services that we use to manage and maintain the AI infrastructure within AI Infrastructure/Architecture.
To review the “Hidden Technical Debt in Machine Learning System” document I mentioned above:
https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
Machine Learning Model Lifecycle
There is a lifecycle at every stage of software and system. The situation is similar for our ML/DL algorithms.
To explain the ML Model Lifecycle a little more;
Preparation of data, training process, packaging, validation, deployment of the model, and then monitoring processes. This lifecycle image represents the most primitive lifecycle of a model. The complexity of this cycle is also increasing with many criteria such as the targeting and size of the project.
Deployment a Machine Learning Project
The publishing process of the ML project is generally similar to the deployment processes of a web or mobile application. But there are some critical differences.
Above, you can see a process from almost development to production. And of course, there are specialties responsible for each of these processes. Although AI Engineers may be involved in designing/developing end-to-end systems in general, these processes are distributed across different specializations in large teams. You can see an example of this below.
It can be difficult to correctly understand the concept of Data Engineer in this image. Not responsible for data labeling. :) Let me explain this person’s task with a simple algorithm;
Data Engineer = (DevOps+AI)
The person concerned does not need to be an end-to-end expert in AI. Only the AI project is expected to be a DevOps persona with a general knowledge of model and production issues.
Artificial Intelligence, Data and Security Relationship
As we mentioned, artificial intelligence applications have some security vulnerabilities of their own. We can visualize some of them as follows.
With these security weaknesses, the decision of an artificial intelligence system can be changed, a developed model can be captured by hackers, the editing / logic of the ML system can be changed by manipulating, the AI infrastructure can be destroyed, many malicious scenarios such as manipulating the data used for educational purposes and producing false / biased results. can be created.
I plan to explain these security vulnerabilities separately over time. Unfortunately, each is too detailed to fit into a single article.
Now let’s take a closer look at the data types part.
We work on various types of data in AI projects. Let’s take a look at these data types and potential security issues.
Data types we use for training models developed in Artificial Intelligence:
- Text
- File
- Image
- Voice
- Video
- 3D Object
- Frequency
- Data
Our main data types are as above, they have many sub-workspaces. For example, you can work on AI through a game or simulation, but this can be considered in the image/video/voice or even text category. It can be considered in the category of voice if you are going to work with sounds in the simulation, video if you are going to work over a streaming visual data (frames), or 3D Object if you are working on a 3D physical/virtual object. You can adapt this scenario to any industry. The only difference between working on image data (image, video) taken from MR devices in the healthcare field from an AI project that recognizes cats/dogs is domain (medical knowledge) expertise.
We’ve talked about data types and logic so far, but why?
This is because every digital asset (in fact, physical ones) has security vulnerabilities. For example, independently of artificial intelligence, we have been adding noise data (also called poison) for years in order to manipulate a picture, video or audio file or to hack computers that download it to their computer/buffer. Those who want to examine this subject can start from Steganography:
Steganography : Steganography is the practice of concealing a message within another message or a physical object. In computing/electronic contexts, a computer file, message, image, or video is concealed within another file, message, image, or video.
Wikipedia : https://en.wikipedia.org/wiki/Steganography
Comptia : https://www.comptia.org/blog/what-is-steganography
- https://www.sentinelone.com/blog/hiding-code-inside-images-malware-steganography/
- https://www.virusbulletin.com/virusbulletin/2016/04/how-it-works-steganography-hides-malware-image-files/
- https://blog.reversinglabs.com/blog/malware-in-images
- https://null-byte.wonderhowto.com/how-to/hide-virus-inside-fake-picture-0168183/
Video format-oriented resources:
- https://www.opswat.com/blog/can-video-file-contain-virus
- https://securityintelligence.com/articles/how-video-became-a-dangerous-delivery-vehicle-for-malware-attacks/
Video and Sound Based Attacks
Computer vision and artificial intelligence-oriented studies on image data have been in our lives for decades. It plays a critical role in everything from healthcare to education, from our computers to our smartphones, from unmanned cars to military systems. So, are we working on security issues in this area?
- Hacking Risk for Computer Vision Systems in Autonomous Cars
- Robust Physical-World Attacks on Deep Learning Visual Classification
- Hacking Autonomous Vehicles: Is This Why We Don’t Have Self-Driving Cars Yet?
- The Role of Machine Learning in Autonomous Vehicles
- How Adversarial Attacks Could Destabilize Military AI Systems
Some examples of Adversarial Attacks…
NLP-Focused Attacks
Similar attack scenarios can also target Natural Language Processing (NLP), one of the sub-titles of Machine Learning.
Frequency data is also used in artificial intelligence for defense, medicine, daily life and many more advanced purposes. So how to hack a frequency or is it possible?
- Radio Hacking: Cars, Hardware, and more! — Samy Kamkar — AppSec California 2016 (https://www.youtube.com/watch?v=1RipwqJG50c)
- A New Threat for Pseudorange-Based RAIM: Adversarial Attacks on GNSS Positioning
In fact, the frequency issue is much deeper… Considering that everything in the universe has its own frequency and works with frequencies, it is possible to benefit both by developing AI-oriented products in this field, and to cause individual and mass problems by maliciously manipulating these frequencies and AI projects. possible.
Here are a few resources for those who want to understand the connection between frequency, the universe, humans, and natural life:
- Understanding Emotions
- Emotion classification based on brain wave: a survey
- High-Frequency Electroencephalographic Activity in Left Temporal Area Is Associated with Pleasant Emotion Induced by Video Clips
- Deep Learning for Signal Processing: What You Need to Know
- Model-Aided Deep Learning Method for Path Loss Prediction in Mobile Communication Systems at 2.6 GHz
As can be seen from only a few sources among hundreds of thousands of studies, signals/frequencies are present everywhere from the human brain/emotions to home furnishings in your home, from smart cars to satellites in space, and both software and artificial intelligence-oriented studies can be done on them. Where software and artificial intelligence exist, the importance of their security becomes indisputable.
Cihan Ozhan