Build Your Own Cuely-Style AI Assistant on Windows

2024 filed under: reflections

Cuely is an impressive AI assistant for macOS that lets you activate an AI helper with a simple keyboard shortcut, read your screen content, and provide contextual assistance. But what if you're on Windows? In this guide, I'll show you how I built a similar AI assistant for Windows using Electron, Tesseract.js for OCR, and the GPT-4o API.

What We're Building

Our Windows AI assistant will have these key features:

Global hotkey activation Press Ctrl+\ anywhere to bring up the assistant
Screen OCR capability Read and understand text on your screen
GPT-4o integration Powered by OpenAI's latest model for intelligent responses
Quick dismissal Press Esc or click outside to close instantly
Minimal UI Clean interface that doesn't get in the way
System tray integration Runs quietly in the background

Prerequisites

Before we begin, make sure you have:

Windows 10 or 11 The operating system we're targeting
Node.js Download from nodejs.org (LTS version recommended)
OpenAI API key Sign up at platform.openai.com and create an API key
Basic JavaScript knowledge Understanding of async/await and promises helps
Text editor VS Code, Sublime Text, or any editor you prefer

Technology Stack

Here's what powers our AI assistant:

Electron Framework for building cross-platform desktop apps with web technologies
Tesseract.js JavaScript OCR library for reading screen text
OpenAI API GPT-4o model for natural language understanding and generation
electron-globalShortcut For registering system-wide keyboard shortcuts
screenshot-desktop Capturing screen content programmatically

Step 1: Project Setup

Create a new directory and initialize the project:

mkdir windows-ai-assistant
cd windows-ai-assistant
npm init -y

Install required dependencies:

npm install electron
npm install tesseract.js
npm install openai
npm install screenshot-desktop
npm install electron-store

Step 2: Basic Electron Structure

Create the main process file (main.js) that handles:

Creating the application window
Registering global shortcuts
Managing system tray icon
Handling window visibility

The window should be frameless, always on top, and positioned near the cursor when activated.

Step 3: Configure OpenAI API

Store your API key securely using electron-store or environment variables. Never hardcode API keys in your source code!

Create a settings panel where users can input their API key on first run. The key should be encrypted and stored locally.

Step 4: Implement Screen OCR

The OCR functionality involves:

Capture screenshot Use screenshot-desktop to grab the current screen
Process with Tesseract Extract text from the image
Send to GPT-4o Include the extracted text as context
Display response Show AI's answer in the interface

Tesseract.js can take a few seconds to process images, so include a loading indicator to show progress.

Step 5: Create the User Interface

Build a minimal, clean interface with:

Input field Where users type their questions
Screen read button Trigger OCR to read current screen
Response area Display AI's answers with markdown support
Settings button Access API key configuration
Loading animation Show when processing requests

Use CSS to create a modern, semi-transparent window with rounded corners and smooth animations.

Step 6: Hotkey Implementation

globalShortcut.register('CommandOrControl+\\', () => {
  mainWindow.show();
  mainWindow.focus();
});

Also implement:

Esc key Hide the window
Click outside Dismiss the assistant
Blur event Auto-hide when focus is lost

Step 7: Integrate GPT-4o

Create an API wrapper that:

Sends user queries to OpenAI's GPT-4o endpoint
Includes screen text as context when OCR is used
Handles rate limiting and errors gracefully
Streams responses for better UX (show text as it generates)
Maintains conversation history for context

Example API call structure:

const messages = [
  { role: "system", content: "You are a helpful AI assistant." },
  { role: "user", content: userQuestion }
];

if (screenText) {
  messages.push({
    role: "user",
    content: `Screen content: ${screenText}`
  });
}

Step 8: Add Advanced Features

Conversation History

Store recent conversations so the AI can reference previous exchanges. Clear history when the window is closed or after a timeout.

Quick Actions

Implement shortcuts for common tasks:

"Explain this" Quick explanation of selected screen text
"Summarize" Condense long screen content
"Translate" Convert text to another language
"Fix grammar" Correct writing mistakes

Custom Prompts

Allow users to create custom prompt templates they use frequently.

Clipboard Integration

Option to automatically copy AI responses to clipboard for easy pasting.

Step 9: Optimize Performance

OCR Optimization

Only process the relevant portion of the screen, not the entire display
Cache Tesseract worker to avoid initialization on every use
Compress screenshots before processing

API Efficiency

Implement request queuing to avoid simultaneous API calls
Add response caching for identical queries
Use shorter context windows when appropriate

Resource Management

Hide window instead of closing to keep it ready
Lazy load heavy components
Release resources when window is hidden

Step 10: Package and Distribute

Use electron-builder to create an installer:

npm install --save-dev electron-builder

Configure in package.json:

"build": {
  "appId": "com.yourname.ai-assistant",
  "productName": "AI Assistant",
  "win": {
    "target": "nsis",
    "icon": "assets/icon.ico"
  }
}

Build the installer:

npm run build

Usage Tips

Screen Reading Best Practices

Works best with clear, high-contrast text
May struggle with handwritten text or unusual fonts
Give Tesseract 2-3 seconds to process accurately

Effective Prompting

Be specific about what you want from screen content
Use context: "Based on the code shown, suggest improvements"
Combine screen reading with specific questions

API Cost Management

GPT-4o is powerful but costs more than GPT-3.5
Set token limits to control costs
Consider using GPT-3.5 for simple queries
Monitor usage through OpenAI dashboard

Troubleshooting Common Issues

Hotkey Not Working

Check if another app is using the same shortcut
Run Electron app with administrator privileges
Try a different key combination

OCR Producing Gibberish

Ensure screenshot is captured correctly
Check image quality (resolution, clarity)
Try preprocessing image (contrast adjustment, noise reduction)

Slow Response Times

Check internet connection
Reduce context size sent to API
Use streaming responses to show progress

Privacy and Security Considerations

Screen content OCR reads everything on screen, including sensitive information
API transmission Screen text is sent to OpenAI servers
Local storage Conversation history is stored on your machine
API key security Keep your key encrypted and never share it

Consider adding:

Option to exclude certain apps from screen reading
Clear button to delete conversation history
Local-only mode without cloud AI (using local models)
Encryption for stored conversations

Future Enhancements

Ideas to expand functionality:

Multi-monitor support Choose which screen to read
Selection mode Draw a box around specific screen area to analyze
Voice input Speak your questions instead of typing
Plugin system Extend with custom actions and integrations
Cross-platform Make it work on macOS and Linux too
Local LLM support Use Ollama or similar for offline AI
Image understanding Analyze charts, diagrams, UI elements visually

Learning Resources

To dive deeper:

Electron documentation electronjs.org/docs
Tesseract.js guide github.com/naptha/tesseract.js
OpenAI API reference platform.openai.com/docs
Electron sample apps github.com/electron/electron-quick-start

Conclusion

Building an AI assistant like Cuely for Windows is surprisingly achievable with modern web technologies. Electron makes desktop development accessible to web developers, Tesseract.js brings powerful OCR capabilities, and GPT-4o provides the intelligence.

This project taught me about:

Desktop application architecture
Working with system-level APIs (screenshots, global shortcuts)
Integrating AI models into real applications
Optimizing for performance and user experience
Handling sensitive data responsibly

The best part? Once you build it, you have a personalized AI assistant that works exactly how you want it to. You can customize the UI, add features specific to your workflow, and extend it indefinitely.

Start building, experiment, and create your own AI-powered productivity tool. The code is yoursmake it perfect for your needs.

Dharun Ashokkumar