This project involves a comprehensive analysis of Netflix's movies and TV shows data using SQL. The goal is to extract valuable insights and answer various business questions based on the dataset. The following README provides a detailed account of the project's objectives, business problems, solutions, findings, and conclusions.
- Analyze the distribution of content types: Movies vs TV shows.
- Identify the most common ratings: Understand the most frequent ratings for movies and TV shows.
- List and analyze content: Group content by release years, countries, and durations.
- Explore and categorize content: Use specific criteria and keywords to classify content.
The data for this project is sourced from the Kaggle dataset:
- Dataset Link : Movie Dataset
CREATE TABLE netflix
(
show_id VARCHAR(6),
type VARCHAR(10),
title VARCHAR(110),
director VARCHAR(210),
casts VARCHAR(1000),
country VARCHAR(200),
date_added VARCHAR(50),
release_year INT,
rating VARCHAR(10),
duration VARCHAR(15),
listed_in VARCHAR(250),
description VARCHAR(500)
)SELECT
type,
COUNT(*) AS Total_Content
FROM netflix
GROUP BY type;SELECT
type,
rating
FROM
(SELECT
type,
rating,
COUNT(*) AS frequency_of_rating,
ROW_NUMBER() OVER(PARTITION BY type ORDER BY COUNT(*) DESC)AS ranking
FROM netflix
GROUP BY type, rating) AS t1
WHERE ranking = 1SELECT * FROM netflix
WHERE
type = 'Movie'
AND
release_year = 2020SELECT TOP 5
TRIM(value) AS new_country,
COUNT(DISTINCT n.show_id) AS total_content
FROM netflix AS n
CROSS APPLY STRING_SPLIT(n.country, ',')
GROUP BY TRIM(value)
ORDER BY total_content DESC;SELECT TOP 1
title,
CAST(LEFT(duration, CHARINDEX(' ', duration) - 1) AS INT) AS movie_length
FROM netflix
WHERE type = 'Movie'
ORDER BY movie_length DESC;SELECT
title,
date_added
FROM netflix
WHERE date_added >= DATEADD(YEAR, -5, GETDATE())
ORDER BY date_added DESC;SELECT
title,
type
FROM netflix
WHERE director LIKE '%Rajiv Chilaka%';SELECT
title,
duration
FROM netflix
WHERE type = 'TV Show'
AND CAST(LEFT(duration, CHARINDEX(' ', duration) - 1) AS INT) > 5;SELECT
value AS genre,
COUNT(show_id) AS number_of_content
FROM netflix
CROSS APPLY STRING_SPLIT(listed_in, ',')
GROUP BY value
ORDER BY number_of_content DESC;SELECT
YEAR(date_added) AS date_added_year,
COUNT(*) AS no_of_content,
ROUND(
CAST(COUNT(*) AS FLOAT) / (SELECT CAST(COUNT(*) AS FLOAT) FROM netflix WHERE country LIKE '%India%') * 100, 2
) AS average_content
FROM netflix
WHERE country LIKE '%India%'
GROUP BY YEAR(date_added);SELECT *
FROM netflix
WHERE listed_in LIKE '%Documentaries%';SELECT *
FROM netflix
WHERE director IS NULL;SELECT
COUNT(*) AS no_of_movies_actor_appeared
FROM netflix
WHERE casts LIKE '%Salman Khan%'
AND release_year > YEAR(GETDATE()) - 10;SELECT TOP 10
value AS actor,
COUNT(DISTINCT show_id) AS no_of_times_actor_appeared
FROM netflix
CROSS APPLY STRING_SPLIT(casts,',')
WHERE country LIKE '%India%'
GROUP BY value
ORDER BY no_of_times_actor_appeared DESC;15. Categorize the content based on the presence of the keywords 'kill' and 'violence' in the description field. Label content contatining these keywords as 'Not Kid Friendly' and all other content as 'Kid Friendly'. Count how many items fall into each category.
WITH new_table AS
(SELECT
*,
CASE
WHEN description LIKE '%kill%' OR description LIKE '%violence%' THEN 'not_kid_friendly'
ELSE 'kid_friendly'
END AS category
FROM netflix)
SELECT
category,
COUNT(*) as total_content
FROM new_table
GROUP BY category;Objective: Categorize content as 'not_kid_friendly' if it contains 'kill' or 'violence' and 'kid_friendly' otherwise. Count the number of items in each category.
- Content Distribution: The dataset contains a diverse range of movies and TV shows with varying ratings and genres.
- Common Ratings: Insights into the most common ratings provide an understanding of the content's target audience.
- Geographical Insights: The top countries and the average content releases by India highlight regional content distribution.
- Content Categorization: Categorizing content based on specific keywords helps in understanding the nature of content available on Netflix.
This analysis provides a comprehensive view of Netflix's content and can help inform content strategy and decision-making.
- Email: anusandrawilliam@gmail.com
- LinkedIn: Anu Sandra William
