AI Engineer Projects in DelhiAI Engineer Projects in DelhiNanoMech: The Real-Time Multimodal AI Trading Assistant 📈🤖
Built for the Gemini Hackathon
Traders know that in the market, seconds equal dollars. By the time you switch between your chart, your analysis tools, and your risk calculator, the candle has already moved, and your setup is gone.
I wanted to fix this. For the Gemini Hackathon, I built NanoMech—an AI that sits on top of your screen, sees exactly what you see, and gives you a complete trade plan in seconds. No API keys required, and no leaving your chart.
💡 The Solution
NanoMech runs as two frameless, transparent overlays on top of any trading platform. It uses Google Gemini 2.5 Flash's multimodal vision capabilities to visually read your screen and deliver instant insights.
Overlay 1: Market Analysis
Trend: Analyzes bullish/bearish market structure and moving average crossovers.
Liquidity: Evaluates order book depth, bid/ask walls, and support/resistance zones.
Momentum: Breaks down candlestick patterns, volume behavior, and price velocity.
Overlay 2: Trade Setup & Risk Management
AI-Extracted Targets: Instantly provides Entry Price, Target Price, and Stop Loss.
Live CALC Engine: Calculates Risk Amount ($), Position Size (Units), and Risk-to-Reward (R:R) Ratio. Everything updates live as you type in your desired risk percentage.
🛠️ How It Works (Under the Hood)
Vision-to-Text Processing: Captures the screen in real-time using the mss library and sends the raw screenshot to Google Gemini 2.5 Flash via the Google GenAI SDK.
Prompt Engineering: Engineered strict structured prompts using [ANALYSIS] and [TRADE] tags to force the LLM to output reliably parseable price data.
Regex Extraction: Uses regex to pull the exact Entry, Target, and Stop prices from the AI's response and wire them directly into the local risk calculator.
Custom Desktop UI: Built always-on-top transparent overlays using Python's Tkinter, utilizing threading to keep the UI fully responsive during API calls.
Hands-Free Scanning: Integrated a global hotkey (Ctrl+A+I) and an Auto Mode that scans the chart every 20 seconds.
🧗♂️ Challenges Overcome
Structured LLM Outputs: Getting an LLM to consistently return prices in a parseable numeric format is notoriously tricky. We solved this with rigorous prompt engineering and robust fallback handling.
Thread-Safe UI: Tkinter isn’t thread-safe. We engineered a solution to route all UI updates through root.after() callbacks from the active analysis thread.
UX/UI Friction: Tuning the transparency and colors so the text remains readable across both dark and light chart themes, while ensuring our global hotkeys didn't conflict with native trading platforms.
🚀 What We Learned & What's Next
This project proved just how incredibly capable Gemini 2.5 Flash is at visual reasoning. It accurately identified complex candlestick patterns, moving averages, and volume spikes from a raw image alone.
The Roadmap for NanoMech:
Voice Output: Speaking the trade setup aloud for a 100% hands-free experience.
Multi-Monitor Support: Allowing users to select which screen the AI tracks.
Cloud Hosting: Running NanoMech as a scalable web service on Google Cloud Run.
Trade Logging: Automatically tracking how the AI's setups perform over time.
💻 Built With
Python | Google Gemini 2.5 Flash | Google GenAI SDK | Google Cloud | Tkinter | mss | pillow | Regex
Ready to try it out? Check out the code and run it yourself: https://github.com/omshukla24/NanoMech Problem:
Many organizations still process invoices manually by reading PDF documents and entering key details (invoice number, vendor, amount, etc.) into systems. This process is slow, error-prone, and difficult to scale, and it also makes it harder to detect duplicate invoices or incorrect totals.
Solution:
This project builds an automated invoice processing pipeline that converts uploaded invoice PDFs into structured data. It uses OCR to extract text, LLMs to identify invoice fields, validation checks to ensure correctness, and Kafka-based event streaming to manage the processing pipeline. The extracted data is stored in PostgreSQL and visualized through a dashboard, enabling faster, scalable, and more reliable invoice processing.