Sling Academy
Home/JavaScript/JavaScript: Ways to Remove CSS and Scripts from Raw HTML

JavaScript: Ways to Remove CSS and Scripts from Raw HTML

Last updated: February 01, 2024

Introduction

In the development of web applications, it is sometimes necessary to strip out CSS and scripts from raw HTML. This process could be required for reasons such as security (preventing execution of malicious scripts), data extraction, or transforming contents for a different environment where CSS and scripts are not needed.

Understanding the Situation

When dealing with raw HTML, it typically comes from one of two sources: server response data or directly from the DOM (Document Object Model). In either case, the HTML is usually a string containing a mix of elements, text nodes, script tags, link tags (for CSS), and style attributes/tags.

The methods of removing CSS and scripts vary depending on the situation:

  • Using browser-based JavaScript to manipulate the DOM.
  • Handling HTML strings within a Node.js environment.

Let’s examine ways to handle this in both contexts.

Browser-based JavaScript

Removing Script Tags

const container = document.createElement('div');
container.innerHTML = rawHTML;
container.querySelectorAll('script').forEach(script => script.remove());

In the above example, we create an element as a container and then inject the raw HTML as its innerHTML. This parses the HTML into a temporary DOM structure within the container. We then select and remove all script tags using querySelectorAll and the remove method.

container.querySelectorAll('link[rel="stylesheet"]').forEach(link => link.remove());

We follow a similar procedure to remove <link> tags that link to external CSS stylesheets.

Removing Inline Styles

container.querySelectorAll('[style]').forEach(el => el.removeAttribute('style'));

This removes the style attribute from any element, effectively stripping away any inline CSS.

Removing Style Tags

container.querySelectorAll('style').forEach(style => style.remove());

Any <style> tags within the HTML are also removed using the same technique.

The processed HTML can then be extracted using container.innerHTML.

Node.js Environment

When manipulating raw HTML within a Node.js environment, the same approach doesn’t work because we don’t have straightforward access to the DOM like we do in the browser. Instead, we use modules like jsdom or cheerio that emulate a DOM-like environment on the server.

Using jsdom

const { JSDOM } = require('jsdom');
const dom = new JSDOM(rawHTML);
const { window } = dom;

const scriptElements = window.document.querySelectorAll('script');
scriptElements.forEach(script => script.remove());

The JSDOM constructor parses the HTML string and provides a `window` object that simulates the browser’s window. We use this to remove script elements.

Using Cheerio

const cheerio = require('cheerio');
const $ = cheerio.load(rawHTML);
$('script').remove();
$('link[rel="stylesheet"]').remove();
$('[style]').removeAttr('style');
$('style').remove();

Cheerio provides a jQuery-like API for the server. We can use familiar jQuery syntax to manipulate the loaded HTML string.

Considerations

When you’re removing scripts and styles:

  • Always validate the incoming HTML to prevent against XSS attacks.
  • Ensuring performance if handling large HTML documents is important—consider stream-based processing.
  • Be aware of the collateral effect on the functionality of the HTML content after removing scripts and styles.

In conclusion, JavaScript offers multiple ways to remove CSS and scripts from raw HTML, suitable for both browser-based applications and server-side processing with Node.js. The methods shared in this article serve as a foundation, and you can adapt them to fit the specific requirements of your projects.

Next Article: JavaScript: How to Extract all Links from Raw HTML (3+ Approaches)

Previous Article: JavaScript: Extracting all Headings from Raw HTML

Series: JavaScript Fun Examples

JavaScript

You May Also Like

  • Handle Zoom and Scroll with the Visual Viewport API in JavaScript
  • Improve Security Posture Using JavaScript Trusted Types
  • Allow Seamless Device Switching Using JavaScript Remote Playback
  • Update Content Proactively with the JavaScript Push API
  • Simplify Tooltip and Dropdown Creation via JavaScript Popover API
  • Improve User Experience Through Performance Metrics in JavaScript
  • Coordinate Workers Using Channel Messaging in JavaScript
  • Exchange Data Between Iframes Using Channel Messaging in JavaScript
  • Manipulating Time Zones in JavaScript Without Libraries
  • Solving Simple Algebraic Equations Using JavaScript Math Functions
  • Emulating Traditional OOP Constructs with JavaScript Classes
  • Smoothing Out User Flows: Focus Management Techniques in JavaScript
  • Creating Dynamic Timers and Counters with JavaScript
  • Implement Old-School Data Fetching Using JavaScript XMLHttpRequest
  • Load Dynamic Content Without Reloading via XMLHttpRequest in JavaScript
  • Manage Error Handling and Timeouts Using XMLHttpRequest in JavaScript
  • Handle XML and JSON Responses via JavaScript XMLHttpRequest
  • Make AJAX Requests with XMLHttpRequest in JavaScript
  • Customize Subtitle Styling Using JavaScript WebVTT Integration