dataTamer logodataTamer
Back to Tutorials
Intermediate18 min read

Searching GitHub Repositories

Connect public or private Git repositories and search through your codebase with AI.

What You'll Learn

How to connect GitHub repositories
Authenticating with public and private repos
Searching code with natural language
Understanding code structure with AI
Finding bugs and documentation
1

Prepare Your Repository Access

For Public Repositories:

  • You only need the repository URL (e.g., https://github.com/username/repo-name)
  • No authentication required

For Private Repositories:

  1. 1. Go to GitHub Settings → Developer settings → Personal access tokens
  2. 2. Click "Generate new token (classic)"
  3. 3. Give it a name (e.g., "dataTamer Access")
  4. 4. Select scopes: repo (full control of private repositories)
  5. 5. Click "Generate token" and copy it immediately

⚠️ Security: Store your Personal Access Token securely. dataTamer encrypts tokens in the database.

2

Add Git Repository Datasource

  1. 1. Navigate to Datasources in the left sidebar
  2. 2. Click "Add Datasource"
  3. 3. Select "Git Repository"
  4. 4. Click "Continue"
3

Configure Repository Connection

Fill in the repository connection form:

  • Repository URL: https://github.com/username/repo-name
  • Branch: Specify which branch to clone (default: main or master)
  • Authentication: For public repos select "Public", for private paste your token
  • File Filters (Optional): Include only specific file types (e.g., *.js,*.ts,*.py)
4

Configure Processing Options

Optimize how dataTamer processes your code:

  • Index Comments: Include code comments in search (recommended)
  • Index Documentation: Process README and other docs
  • Extract Functions: Identify and index individual functions/methods
  • Create Summaries: Generate AI summaries of files and modules
5

Wait for Cloning and Indexing

dataTamer will now:

  • Clone the repository from GitHub
  • Extract and parse source code files
  • Create embeddings for semantic code search
  • Generate summaries for better search results

💡 Processing Time: Typically 5-15 minutes for medium repos. Large repos (100MB+) may take 20-30 minutes.

6

Search Your Codebase with AI

Ask questions about your code using natural language:

Architecture Questions:

  • • "How does authentication work in this project?"
  • • "Where is the user session managed?"

Finding Specific Code:

  • • "Show me all functions that connect to the database"
  • • "Find API endpoints that handle user authentication"

Code Quality:

  • • "Are there any SQL injection vulnerabilities?"
  • • "Find all TODO comments in the codebase"

Next Steps

Great job! You can now search and understand code repositories with AI. Ready to set up automated rules and alerts?