Research breakthrough possible @S-Logix pro@slogix.in

Office Address

Social List

How to Perform Time Series Analysis Using Python?

Time Series Analysis using Python

Condition for Performing Time Series Analysis Using Python

  • Description:
    Time series analysis involves analyzing data points collected or recorded at specific time intervals. In Python, libraries like pandas, matplotlib, seaborn, and statsmodels can be used for preprocessing, visualization, and forecasting of time series data. This guide demonstrates how to perform time series analysis, visualize trends and seasonal patterns, and use machine learning or statistical models for predictions.
Step-by-Step Process
  • Data Collection:
    Choose an appropriate time series dataset.
  • Data Preprocessing:
    Clean, format, and handle missing values in the data.
  • Exploratory Data Analysis (EDA):
    Visualize trends, seasonal components, and any potential anomalies.
  • Modeling:
    Choose a suitable model for forecasting (ARIMA, SARIMA, or machine learning models like Random Forest).
  • Evaluation:
    Assess model performance using suitable metrics (RMSE, MAE, etc.).
  • Forecasting:
    Make future predictions based on the trained model.
  • Visualization:
    Use plots (line charts, heatmaps, etc.) to visualize trends, seasonal patterns, and model forecasts.
  • Conclusion:
    Summarize findings and model accuracy.
Why Should We Choose Time Series Analysis?
  • Forecasting: Predict future values based on past data.
  • Trend Analysis: Identify long-term trends in the data.
  • Seasonality: Capture periodic fluctuations that can be crucial for business decisions.
  • Anomaly Detection: Detect outliers or unusual behavior in the data.
Sample Source Code
  • # Import necessary libraries
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import statsmodels.api as sm
    from statsmodels.tsa.stattools import adfuller, acf, pacf
    from statsmodels.tsa.seasonal import seasonal_decompose
    from statsmodels.tsa.arima.model import ARIMA

    # Sample time series data generation (for demonstration purposes)
    np.random.seed(42)
    date_range = pd.date_range(start="2020-01-01", end="2024-01-01", freq="M")
    data = np.random.randn(len(date_range)) + np.linspace(0, 10, len(date_range))

    # Create DataFrame
    df = pd.DataFrame(data, index=date_range, columns=["Value"])

    # Plot the original time series
    plt.figure(figsize=(10,6))
    plt.plot(df, label='Original Time Series')
    plt.title("Time Series Plot")
    plt.xlabel("Date")
    plt.ylabel("Value")
    plt.legend()
    plt.show()

    # Decompose the time series
    decomposition = seasonal_decompose(df, model='additive', period=12)
    decomposition.plot()
    plt.show()

    # ADF Test for stationarity
    result = adfuller(df["Value"])
    print("ADF Statistic:", result[0])
    print("p-value:", result[1])

    # Plot ACF and PACF with adjusted nlags (max 24)
    lag_acf = acf(df, nlags=24)
    lag_pacf = pacf(df, nlags=24)

    # Plot ACF
    plt.figure(figsize=(12, 6))
    plt.subplot(121)
    plt.plot(lag_acf)
    plt.title("ACF (AutoCorrelation Function)")
    plt.subplot(122)
    plt.plot(lag_pacf)
    plt.title("PACF (Partial AutoCorrelation Function)")
    plt.show()

    # Fit ARIMA model (Example: ARIMA(1,1,1))
    model = ARIMA(df, order=(1, 1, 1))
    model_fit = model.fit()

    # Print the ARIMA model summary
    print(model_fit.summary())

    # Forecasting the next 10 periods
    forecast = model_fit.forecast(steps=10)
    print("Forecasted values for next 10 periods:", forecast)

    # Plot the forecast
    plt.figure(figsize=(10,6))
    plt.plot(df, label='Original Time Series')
    plt.plot(pd.date_range(start="2024-02-01", periods=10, freq="M"), forecast, label='Forecasted Values', color='red')
    plt.title("Time Series Forecast")
    plt.xlabel("Date")
    plt.ylabel("Value")
    plt.legend()
    plt.show()

Screenshots
  • original time series