Back to Blog

The 30 Best AI Prompts for Data Scientists and ML Engineers

March 20, 2026by Promptzy
ai prompts data scientistschatgpt prompts machine learningdata science ai toolsprompt engineering

If you work with data every day, you know the pattern: the analysis is the easy part. Explaining it to a non-technical stakeholder, writing clean documentation, or turning a messy dataset into a compelling story — that's where time disappears. These prompts are designed to handle exactly that, plus the technical side too.

All prompts use {{clipboard}} where you'd paste in your data, code, or context. Most are AI-tool agnostic — they work in ChatGPT, Claude, or Gemini equally well.


Promptzy in action – manage AI prompts on Mac

Exploratory Data Analysis

1. First-look EDA brief

I've just loaded a new dataset. Here's the output of df.describe() and the first 10 rows:

{{clipboard}}

Give me a structured EDA brief: data shape, obvious data quality issues, distributions worth investigating, columns that look correlated at a glance, and the 3 most important questions I should answer before modelling.

2. Data quality audit

Audit this dataset summary for data quality issues:

{{clipboard}}

Flag: missing values, outliers that look like encoding errors, columns with suspicious cardinality, date range issues, and any columns that might be duplicates or leaky features. Be specific about column names.

3. Feature engineering suggestions

Given this dataset schema and the prediction target, suggest 5-10 feature engineering ideas that could improve model performance:

{{clipboard}}

Include: interaction terms worth trying, time-based features if dates are present, encoding strategies for high-cardinality categoricals, and any domain-specific transformations that seem obvious from the column names.

Model Explanation & Interpretation

4. Explain a model to a non-technical stakeholder

I need to explain this machine learning model to a non-technical business stakeholder. Here are the model type, features, and performance metrics:

{{clipboard}}

Write a plain-English explanation (max 200 words) that covers: what the model predicts, what inputs it uses, how accurate it is, and what the business should do with its predictions. No jargon.

5. Interpret SHAP values

Here are SHAP values from my model:

{{clipboard}}

Explain what these tell us about feature importance and model behaviour. Which features are driving predictions most? Are there any surprising or counterintuitive findings worth investigating? What would you recommend looking into next?

6. Write a model card

Write a model card for this ML model. Here are the details:

{{clipboard}}

Include: model overview, intended use, training data summary, performance metrics by subgroup if available, known limitations, and how it should/shouldn't be used. Format it clearly for internal documentation.

SQL Query Generation & Review

7. Generate SQL from a plain English description

Write a SQL query for the following question. Here's the database schema:

{{clipboard}}

Question: [describe what you need]

Use standard SQL. Add comments explaining the logic. If there are multiple approaches, show the most readable one and briefly note alternatives.

8. Review and optimise a SQL query

Review this SQL query for correctness, performance, and readability:

{{clipboard}}

Point out: any logic errors, joins that could be expensive at scale, indexes that would help, places where the query could be simplified, and any edge cases it doesn't handle (NULLs, duplicates, etc.).

9. Translate SQL to pandas (or vice versa)

Convert this SQL query to pandas code:

{{clipboard}}

Maintain the same logic. Use idiomatic pandas — avoid loops. Add a comment above each block explaining what step it corresponds to in the SQL.

Data Storytelling & Communication

10. Turn analysis results into an executive summary

Here are the results of a data analysis:

{{clipboard}}

Write a 3-paragraph executive summary for a business audience. Paragraph 1: the key finding in one sentence. Paragraph 2: what drove it (the data story). Paragraph 3: the recommended action. No charts, no numbers that don't need to be there, no hedging.

11. Write a hypothesis for an A/B test

I want to run an A/B test on the following change:

{{clipboard}}

Write a formal hypothesis statement: the proposed change, the metric we're measuring, the expected direction and magnitude of the effect, and how we'll know if the test was successful. Also flag any risks or confounders we should control for.

12. Interpret A/B test results

Here are the results of an A/B test:

{{clipboard}}

Tell me: Is the result statistically significant? Is it practically significant? What caveats apply (sample size, segment differences, novelty effect)? What's the decision — ship it, don't ship it, or run longer?

Statistical Interpretation

13. Explain a statistical concept in plain English

Explain [statistical concept] to someone who knows basic statistics but isn't a statistician. Use one concrete example, avoid formulas unless essential, and tell me when I'd actually use this vs when I wouldn't.

14. Diagnose a regression model

Here are the diagnostics from a linear regression:

{{clipboard}}

Walk me through what each diagnostic tells us. Are the assumptions met? What violations do you see? What would you try to fix them (transformation, different model, different features)?

15. Design an experiment

I want to test whether [hypothesis]. Help me design a proper experiment:

- What's the correct statistical test?
- What sample size do I need for 80% power at p < 0.05?
- What control variables should I include?
- What would invalidate the results?
- What would "success" look like?

Python Code Assistance

16. Debug a Python data error

I'm getting this error in my Python data code:

{{clipboard}}

Explain what's causing it, show the fix, and tell me if there's a broader data quality issue that might have caused it.

17. Write unit tests for a data pipeline

Write pytest unit tests for this data transformation function:

{{clipboard}}

Cover: normal inputs, edge cases (empty DataFrames, nulls, wrong dtypes), and any business logic assertions. Use fixtures where appropriate.

18. Refactor a messy data script

Refactor this data processing script for readability and maintainability:

{{clipboard}}

Apply: functions for repeated logic, descriptive variable names, inline comments for non-obvious steps, and split it into logical sections. Don't change the output — just clean up the code.

Reporting & Documentation

19. Write a data dictionary

Generate a data dictionary for this table schema:

{{clipboard}}

For each column, write: column name, data type, description (inferred from context), example values if obvious, and any business rules or constraints that apply.

20. Summarise a research paper's methodology

Summarise the methodology of this research paper section:

{{clipboard}}

Tell me: what they did, what assumptions they made, what could go wrong with this approach, and whether the conclusions follow from the method. Be critical but fair.

21. Write a stakeholder update on a data project

Write a weekly stakeholder update email for this data project. Here's what happened this week:

{{clipboard}}

Format: what was planned, what was completed, what's blocked, what's next. Keep it under 150 words. No jargon. If something is behind, say so plainly.

Machine Learning Engineering

22. Review a model training script

Review this model training script for correctness and best practices:

{{clipboard}}

Check for: data leakage, train/val/test split issues, incorrect loss function for the task, missing reproducibility seeds, evaluation metric appropriateness, and any obvious bugs.

23. Write a prompt for a classification task

I'm building a [classification task] model. Help me write a clear problem statement that includes: the target variable and its classes, why each class matters to the business, the evaluation metric I should optimise for and why, and the baseline I should compare against.

24. Evaluate model fairness

Here are model performance metrics broken down by demographic groups:

{{clipboard}}

Analyse for fairness concerns. Are there significant performance disparities? Which groups are most affected? What are the potential causes? What mitigation strategies would you recommend?

Exploration Prompts

25. Generate hypotheses from data patterns

Here's an interesting pattern I found in the data:

{{clipboard}}

Generate 5 hypotheses that could explain it. For each: state the hypothesis clearly, what additional data would confirm or refute it, and how confident you'd be if confirmed.

26. Suggest ML approaches for a problem

I have this business problem and dataset:

{{clipboard}}

Suggest 3-5 ML approaches I could take, from simplest to most complex. For each: describe the approach, its pros and cons for this specific problem, what performance metric makes sense, and when you'd move to the next level of complexity.

27. Write a data pipeline design doc

Help me write a design document for a data pipeline that does the following:

{{clipboard}}

Structure: problem statement, proposed solution, data flow diagram (text description), components and their responsibilities, error handling strategy, monitoring approach, and open questions.

Miscellaneous

28. Clean up messy column names

Here are column names from a raw dataset:

{{clipboard}}

Rename them to follow snake_case, be descriptive, be consistent, and remove any special characters. Return a Python dict mapping old name → new name.

29. Write a README for a data project

Write a README for this data science project:

{{clipboard}}

Include: project overview (one paragraph), data sources, how to set up the environment, how to run the pipeline, key outputs and where to find them, and any known limitations or caveats.

30. Post-mortem analysis of a failed model

This model failed to meet performance targets in production. Here are the details:

{{clipboard}}

Write a post-mortem: what went wrong, what warning signs were missed during development, what we'd do differently, and what monitoring we should add to catch this earlier next time.

These 30 prompts cover most of what data scientists and ML engineers reach for in AI tools daily. If you're using several of them regularly, the quickest workflow upgrade is storing them in a prompt manager where you can fire any of them in under 2 seconds — Promptzy is built for exactly this, and it's free to start.

Store and manage your prompts with Promptzy

Free prompt manager for Mac. Search with Cmd+Shift+P, auto-paste into any AI app.

Download Free for macOS