utils.py API

Utility functions for the data visualization application.

Provides data analysis, cleaning, interaction with Gemini and Claude, and plot generation functionalities.

utils.analyze_data(df: DataFrame) → dict[source]

Analyzes the DataFrame for data quality issues (missing values, duplicates, data types).

utils.apply_fixes_to_data(df: DataFrame) → tuple[DataFrame, str][source]

Applies basic data cleaning (removes duplicates, fills numerical NaNs with mean).

utils.create_dataset_summary(df: DataFrame) → str[source]

Generates a textual summary of the dataset.

utils.exec_code_to_generate_plot(code_str, df)[source]

Executes Python code (provided as a string) to generate a plot.

This function takes code generated by an LLM and tries to run it. It includes

workarounds for common errors in LLM-generated plotting code. The generated plot is returned as a base64-encoded PNG image.

Args:

code_str (str): The Python code to execute (base64 encoded). df (pandas.DataFrame): The DataFrame to use for plotting.

Returns:

str: A base64-encoded string representing the generated plot image, or: None if an error occurred during code execution.

utils.filter_data_by_top_variables(df: DataFrame, column_name: str, top_n_variables: list) → DataFrame[source]

Filters the DataFrame to include only rows with top N values in a column.

Parameters:

Returns:

pd.DataFrame, filtered dataframe

utils.fix_json(json_str: str) → str[source]

Attempts to fix common JSON formatting errors (trailing commas).

utils.generate_graph_interpretation_claude(suggestion_text: str, dataset_summary: str, api_key: str) → str[source]: Generates a graph interpretation using the Claude language model.

utils.generate_graph_interpretation_gemini(suggestion_text: str, dataset_summary: str, api_key: str) → str[source]: Generates a graph interpretation using the Gemini language model.

utils.get_plot_suggestion_from_claude(df: DataFrame, api_key: str) → list | None[source]: Gets plot suggestions and Python code from Claude, with retries.

utils.get_plot_suggestion_from_gemini(df: DataFrame, api_key: str) → list | None[source]: Gets plot suggestions and Python code from Gemini, with retries.

utils.get_top_n_variables(df: DataFrame, column_name: str, n: int = 10) → list[source]

Gets the top N most frequent values in a column.

Parameters:

Returns:

A list of the top N most frequent values.

utils.handle_graph_communication_claude(image_data: bytes, dataset_str: str, user_message: str, api_key: str) → str[source]: Handles user-model communication about a graph image using Claude.

utils.handle_graph_communication_gemini(image_data: bytes, dataset_str: str, user_message: str, api_key: str) → str[source]: Handles user-model communication about a graph image using Gemini.