AI News Hub Logo

AI News Hub

Python 101

DEV Community
Renee Mumbi

PYTHON Python is an interpreted, high-level, general-purpose programming language designed for readability and simplicity. Created by Gudio Van Rossum and first released in 1991, it has become one of the most popular languages in the world due to its English like syntax and versatility. Understanding python is easy due to it's syntax. Python is like Oxygen in the tech world. It is one of the most influential programming languages in the tech world because it prioritizes simplicity and readability. It allows developers to turn complex ideas into working code quickly, often with fewer lines and less complexities than many other languages. Before analysis can begin, the data must be cleaned to remove errors and inconsistencies. This process is known as data cleaning. This is a step that cannot be missed in Data analytics. Python is well suited for cleaning data due to libraries such as Pandas provide simple, powerful tools to detect and fix common problems. Poor quality data leads to inaccurate insights and unreliable decisions. Cleaning ensures that: Calculations are correct Visualizations are meaningful Models are trustworthy Business decisions are based on accurate information There are various data issues in python. For example: Missing values - Data entries that are absent, blank, or recorded as NULL. This is an issue because it cannot process nulls therefore causing the code to break. Duplicate records - These are identical rows that appear multiple times in a dataset. It leads to false decisions due to over representation of data. They are also known as redundant data. Incorrect data types - Data is stored in a format that does not match its nature, such as numbers stored as text yet numbers should be stored as int meaning integer. It is impossible to perform a mathematical calculation on a number that the data type is wrong which will therefore lead it being an error. It is recommended to ensure data type of a value before manipulating it. Inconsistent text formatting - Data that has different representations for the same entity, such as "I.D", "ID" or variations in casing. Invalid entries - Data that is structurally correct but illogical or outside of allowed ranges. Dirty data ruins the integrity of the analysis that will be done. Cleaning the above issues ensures that the data is of high quality which leads to data that can be trusted when making decisions in whichever field that the data is required. This is due to the clear syntax that it has which resembles English, this makes it easy for a beginner to understand the language. One can access a lot of free resources, youtube tutorials, documentation, articles and developers in different platforms such as reddit, x, github, discord where one may get help from professionals when stuck. Many organizations use python for various functions such as analytics, software engineering, machine learning etc. The print() function is used for output in various formats and the input() function enables interaction with users. For you to display an output, the print() function to display text or output value is used. Let's start with the famous hello world that is used among the first explanations in programming. This clearly shows that the print function is used to display an output. Text in Python must be inside quotes. You can use either " double quotes or a single quote '. This is why we used the double quotes for the hello world because it is a text but in python we refer to it as a string. You can also use the [print()] function to display numbers. When printing numbers, we do not use quotes for they are integers. print() is just enough. It is possible to write a sentence that has both string and integers on the same line. This is done by combining the text and numbers in one output by separating them with a comma. Comments can be used to explain python code, make it more readable and prevent execution when the code is being tested. The importance of this is to assist any person who will interact with the code and they may not be the author to get a good understanding of the function of the code. They are only used when necessary. Python's popularity in data analytics is driven backed by a collection of several libraries that simplify workload. The libraries are categorized by the role they play, that is basically their functions in python. The following are libraries in python: Pandas is a powerful, open source Python library designed for data manipulation and analysis. It is the industry standard for handling tabular data, often described as a hybrid between Excel and SQL because of its ability to clean, filter, and analyze large datasets using code. Pandas organizes data in form of series and dataframe. A DataFrame is what resembles a spreadsheet. Key Functionalities in Pandas: Data input/output Data Cleaning Selection and Filtering Transformation Time Series A fundamental package for scientific computing. It provides high-performance array objects and advanced functions that are significantly faster than standard Python lists. Polars is an open-source DataFrame library for Python and Rust designed to handle large datasets with extreme efficiency. It is often used as a high-performance alternative to Pandas, particularly when dealing with data that is too large for traditional tools but doesn't yet require a distributed system like Spark. It is the foundational library for creating static, animated, and interactive visualizations. It offers precise control over every element of a chart but requires more code for complex designs. It is used to create charts and graphs such as line plots, bar charts, and histograms. A tool for connecting Python to databases to query data using SQL. Used to retrieve data from APIs and other web sources. It allows you to send HTTP requests extremely easily. There's no need to manually add query strings to your URLs. Designed for parallel computing, allowing you to handle datasets larger than your computer's memory. Used for processing massive datasets across distributed clusters. Python is well suited for cleaning data due to libraries such as Pandas which provide simple, powerful tools to detect and fix common problems. Poor-quality data leads to inaccurate insights and unreliable decisions. Cleaning ensures that: Calculations are correct Visualizations are meaningful Models are trustworthy Business decisions are based on accurate information Cleaning tasks include: Removing duplicate rows Filling or deleting missing values Converting data types Filtering out invalid records This is an example of cleaning data by converting data type. The data was in form of a string which we later converted it into an integer then proceeded to confirm if the data type has changed. Once the data has been cleaned, patterns, calculations, trends and relationships are found. Various operations can be done such as average, finding total and how datasets are related. This shows how you can find the total of various products in a data set. Visualization turns abstract numbers into easy-to-understand charts and graphs. Basic Plotting - Matplotlib is the foundation for creating line charts, bar graphs, and histograms to show distributions and trends over time. Statistical Visuals - Seaborn is built on top of Matplotlib and provides more attractive, high-level graphics like heatmaps and box plots to identify outliers or data density. Interactive Dashboards - Tools like Streamlit or Plotly allow analysts to turn their static visualizations into interactive web apps for stakeholders to explore. Hospitals use Python to calculate patient ages from birth records, analyze prescriptions across age groups, and verify that medications and dosages are appropriate, helping improve patient safety, regulatory compliance, and evidence-based clinical decision-making through efficient healthcare data analytics. Python functions allow us to package a task into reusable code. A simple example is calculating a person's age by subtracting their birth year from the current year. In this example, the function age() accepts two inputs: current_year and birth_year. Inside the function, the return statement performs the subtraction and sends the result back to the caller. Supermarkets use Python to calculate profits, analyze product performance, and identify the most profitable items, helping managers optimize pricing, inventory, and purchasing decisions through efficient retail data analytics. In this example, the function calculate_profit() accepts two inputs: selling_price and buying_price. Inside the function, the return statement subtracts the buying price from the selling price and sends the resulting profit back to the caller. When the function is called as calculate_profit(40000, 2000), Python calculates a profit of 38,000. In a supermarket, this can be used in data analytics to determine how much profit is earned on each product and to identify the most profitable items. Cryptocurrency companies in Kenya use Python to convert Bitcoin prices into Kenyan shillings, analyze market trends, and monitor exchange rates, helping improve trading decisions, risk management, and financial forecasting through efficient financial data analytics. In this example, btc_priceusd contains Bitcoin prices in US dollars, and usd_ksh_rate stores the exchange rate. The for loop processes each price, converts it to Kenyan shillings, and stores the result in btc_ksh. The print() statement displays both values. Python is an excellent programming language for beginners because its syntax is clean and easy to read, almost like plain English. This makes it easier to understand core programming concepts without getting overwhelmed by complicated rules. Python is also used in many areas, including web development, automation, data analytics, and artificial intelligence, making it a valuable skill to learn. Because beginners can build useful programs quickly and Python is widely used in industry, it provides both an accessible starting point and strong long-term career opportunities.