Siriusdigitals - Sirius Digitals

August 6, 2025

A Coding Implementation to Build a Self-Adaptive Goal-Oriented AI Agent Using Google Gemini and the SAGE Framework

In this tutorial, we dive into building an advanced AI agent system based on the SAGE framework, Self-Adaptive Goal-oriented Execution, using Google’s Gemini API. We walk through each core component of the framework: Self-Assessment, Adaptive Planning, Goal-oriented Execution, and Experience Integration. By combining these, we aim to create an intelligent, self-improving agent that can deconstruct a high-level goal, plan its steps, execute tasks methodically, and learn from its outcomes. This hands-on walkthrough helps us understand the underlying architecture and also demonstrates how to orchestrate complex decision-making using real-time AI generation. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

import google.generativeai as genai
import json
import time
from typing import Dict, List, Any, Optional
from dataclasses import dataclass, asdict
from enum import Enum


class TaskStatus(Enum):
   PENDING = "pending"
   IN_PROGRESS = "in_progress"
   COMPLETED = "completed"
   FAILED = "failed"

We start by importing the necessary libraries, including google.generativeai for interacting with the Gemini model, and Python modules like json, time, and dataclasses for task management. We define a TaskStatus enum to help us track the progress of each task as pending, in progress, completed, or failed. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

@dataclass
class Task:
   id: str
   description: str
   priority: int
   status: TaskStatus = TaskStatus.PENDING
   dependencies: List[str] = None
   result: Optional[str] = None
  
   def __post_init__(self):
       if self.dependencies is None:
           self.dependencies = []


class SAGEAgent:
   """Self-Adaptive Goal-oriented Execution AI Agent"""
  
   def __init__(self, api_key: str, model_name: str = "gemini-1.5-flash"):
       genai.configure(api_key=api_key)
       self.model = genai.GenerativeModel(model_name)
       self.memory = []
       self.tasks = {}
       self.context = {}
       self.iteration_count = 0
      
   def self_assess(self, goal: str, context: Dict[str, Any]) -> Dict[str, Any]:
       """S: Self-Assessment - Evaluate current state and capabilities"""
       assessment_prompt = f"""
       You are an AI agent conducting self-assessment. Respond ONLY with valid JSON, no additional text.


       GOAL: {goal}
       CONTEXT: {json.dumps(context, indent=2)}
       TASKS_PROCESSED: {len(self.tasks)}
      
       Provide assessment as JSON with these exact keys:
       {{
           "progress_score": <number 0-100>,
           "resources": ["list of available resources"],
           "gaps": ["list of knowledge gaps"],
           "risks": ["list of potential risks"],
           "recommendations": ["list of next steps"]
       }}
       """
      
       response = self.model.generate_content(assessment_prompt)
       try:
           text = response.text.strip()
           if text.startswith('```'):
               text = text.split('```')[1]
               if text.startswith('json'):
                   text = text[4:]
           text = text.strip()
           return json.loads(text)
       except Exception as e:
           print(f"Assessment parsing error: {e}")
           return {
               "progress_score": 25,
               "resources": ["AI capabilities", "Internet knowledge"],
               "gaps": ["Specific domain expertise", "Real-time data"],
               "risks": ["Information accuracy", "Scope complexity"],
               "recommendations": ["Break down into smaller tasks", "Focus on research first"]
           }
  
   def adaptive_plan(self, goal: str, assessment: Dict[str, Any]) -> List[Task]:
       """A: Adaptive Planning - Create dynamic, context-aware task decomposition"""
       planning_prompt = f"""
       You are an AI task planner. Respond ONLY with valid JSON array, no additional text.


       MAIN_GOAL: {goal}
       ASSESSMENT: {json.dumps(assessment, indent=2)}
      
       Create 3-4 actionable tasks as JSON array:
       [
           {{
               "id": "task_1",
               "description": "Clear, specific task description",
               "priority": 5,
               "dependencies": []
           }},
           {{
               "id": "task_2",
               "description": "Another specific task",
               "priority": 4,
               "dependencies": ["task_1"]
           }}
       ]
      
       Each task must have: id (string), description (string), priority (1-5), dependencies (array of strings)
       """
      
       response = self.model.generate_content(planning_prompt)
       try:
           text = response.text.strip()
           if text.startswith('```'):
               text = text.split('```')[1]
               if text.startswith('json'):
                   text = text[4:]
           text = text.strip()
          
           task_data = json.loads(text)
           tasks = []
           for i, task_info in enumerate(task_data):
               task = Task(
                   id=task_info.get('id', f'task_{i+1}'),
                   description=task_info.get('description', 'Undefined task'),
                   priority=task_info.get('priority', 3),
                   dependencies=task_info.get('dependencies', [])
               )
               tasks.append(task)
           return tasks
       except Exception as e:
           print(f"Planning parsing error: {e}")
           return [
               Task(id="research_1", description="Research sustainable urban gardening basics", priority=5),
               Task(id="research_2", description="Identify space-efficient growing methods", priority=4),
               Task(id="compile_1", description="Organize findings into structured guide", priority=3, dependencies=["research_1", "research_2"])
           ]
  
   def execute_goal_oriented(self, task: Task) -> str:
       """G: Goal-oriented Execution - Execute specific task with focused attention"""
       execution_prompt = f"""
       GOAL-ORIENTED EXECUTION:
       Task: {task.description}
       Priority: {task.priority}
       Context: {json.dumps(self.context, indent=2)}
      
       Execute this task step-by-step:
       1. Break down the task into concrete actions
       2. Execute each action methodically
       3. Validate results at each step
       4. Provide comprehensive output
      
       Focus on practical, actionable results. Be specific and thorough.
       """
      
       response = self.model.generate_content(execution_prompt)
       return response.text.strip()
  
   def integrate_experience(self, task: Task, result: str, success: bool) -> Dict[str, Any]:
       """E: Experience Integration - Learn from outcomes and update knowledge"""
       integration_prompt = f"""
       You are learning from task execution. Respond ONLY with valid JSON, no additional text.


       TASK: {task.description}
       RESULT: {result[:200]}...
       SUCCESS: {success}
      
       Provide learning insights as JSON:
       {{
           "learnings": ["key insight 1", "key insight 2"],
           "patterns": ["pattern observed 1", "pattern observed 2"],
           "adjustments": ["adjustment for future 1", "adjustment for future 2"],
           "confidence_boost": <number -10 to 10>
       }}
       """
      
       response = self.model.generate_content(integration_prompt)
       try:
           text = response.text.strip()
           if text.startswith('```'):
               text = text.split('```')[1]
               if text.startswith('json'):
                   text = text[4:]
           text = text.strip()
          
           experience = json.loads(text)
           experience['task_id'] = task.id
           experience['timestamp'] = time.time()
           self.memory.append(experience)
           return experience
       except Exception as e:
           print(f"Experience parsing error: {e}")
           experience = {
               "learnings": [f"Completed task: {task.description}"],
               "patterns": ["Task execution follows planned approach"],
               "adjustments": ["Continue systematic approach"],
               "confidence_boost": 5 if success else -2,
               "task_id": task.id,
               "timestamp": time.time()
           }
           self.memory.append(experience)
           return experience
  
   def execute_sage_cycle(self, goal: str, max_iterations: int = 3) -> Dict[str, Any]:
       """Execute complete SAGE cycle for goal achievement"""
       print(f" Starting SAGE cycle for goal: {goal}")
       results = {"goal": goal, "iterations": [], "final_status": "unknown"}
      
       for iteration in range(max_iterations):
           self.iteration_count += 1
           print(f"n SAGE Iteration {iteration + 1}")
          
           print(" Self-Assessment...")
           assessment = self.self_assess(goal, self.context)
           print(f"Progress Score: {assessment.get('progress_score', 0)}/100")
          
           print("  Adaptive Planning...")
           tasks = self.adaptive_plan(goal, assessment)
           print(f"Generated {len(tasks)} tasks")
          
           print(" Goal-oriented Execution...")
           iteration_results = []
          
           for task in sorted(tasks, key=lambda x: x.priority, reverse=True):
               if self._dependencies_met(task):
                   print(f"  Executing: {task.description}")
                   task.status = TaskStatus.IN_PROGRESS
                  
                   try:
                       result = self.execute_goal_oriented(task)
                       task.result = result
                       task.status = TaskStatus.COMPLETED
                       success = True
                       print(f"   Completed: {task.id}")
                   except Exception as e:
                       task.status = TaskStatus.FAILED
                       task.result = f"Error: {str(e)}"
                       success = False
                       print(f"   Failed: {task.id}")
                  
                   experience = self.integrate_experience(task, task.result, success)
                  
                   self.tasks[task.id] = task
                   iteration_results.append({
                       "task": asdict(task),
                       "experience": experience
                   })
          
           self._update_context(iteration_results)
          
           results["iterations"].append({
               "iteration": iteration + 1,
               "assessment": assessment,
               "tasks_generated": len(tasks),
               "tasks_completed": len([r for r in iteration_results if r["task"]["status"] == "completed"]),
               "results": iteration_results
           })
          
           if assessment.get('progress_score', 0) >= 90:
               results["final_status"] = "achieved"
               print(" Goal achieved!")
               break
      
       if results["final_status"] == "unknown":
           results["final_status"] = "in_progress"
      
       return results
  
   def _dependencies_met(self, task: Task) -> bool:
       """Check if task dependencies are satisfied"""
       for dep_id in task.dependencies:
           if dep_id not in self.tasks or self.tasks[dep_id].status != TaskStatus.COMPLETED:
               return False
       return True
  
   def _update_context(self, results: List[Dict[str, Any]]):
       """Update agent context based on execution results"""
       completed_tasks = [r for r in results if r["task"]["status"] == "completed"]
       self.context.update({
           "completed_tasks": len(completed_tasks),
           "total_tasks": len(self.tasks),
           "success_rate": len(completed_tasks) / len(results) if results else 0,
           "last_update": time.time()
       })

We define a Task data class to encapsulate each unit of work, including its ID, description, priority, and dependencies. Then, we build the SAGEAgent class, which serves as the brain of our framework. It orchestrates the full cycle, self-assessing progress, planning adaptive tasks, executing each task with focus, and learning from outcomes to improve performance in future iterations. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

if __name__ == "__main__":
   API_KEY = "Use Your Own API Key Here" 
  
   try:
       agent = SAGEAgent(API_KEY, model_name="gemini-1.5-flash")
      
       goal = "Research and create a comprehensive guide on sustainable urban gardening practices"
      
       results = agent.execute_sage_cycle(goal, max_iterations=2)
      
       print("n" + "="*50)
       print(" SAGE EXECUTION SUMMARY")
       print("="*50)
       print(f"Goal: {results['goal']}")
       print(f"Status: {results['final_status']}")
       print(f"Iterations: {len(results['iterations'])}")
      
       for i, iteration in enumerate(results['iterations'], 1):
           print(f"nIteration {i}:")
           print(f"  Assessment Score: {iteration['assessment'].get('progress_score', 0)}/100")
           print(f"  Tasks Generated: {iteration['tasks_generated']}")
           print(f"  Tasks Completed: {iteration['tasks_completed']}")
      
       print("n Agent Memory Entries:", len(agent.memory))
       print(" Total Tasks Processed:", len(agent.tasks))
      
   except Exception as e:
       print(f"Demo requires valid Gemini API key. Error: {e}")
       print("Get your free API key from: https://makersuite.google.com/app/apikey")

We wrap up the tutorial by initializing the SAGEAgent with our Gemini API key and defining a sample goal on sustainable urban gardening. We then execute the full SAGE cycle and print a detailed summary, including progress scores, task counts, and memory insights, allowing us to evaluate how effectively our agent performed across iterations.

In conclusion, we successfully implemented and ran a complete SAGE cycle with our Gemini-powered agent. We observe how the system assesses its progress, dynamically generates actionable tasks, executes them with precision, and refines its strategy through learned experience. This modular design empowers us to extend the framework further for more complex, multi-agent environments or domain-specific applications.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post A Coding Implementation to Build a Self-Adaptive Goal-Oriented AI Agent Using Google Gemini and the SAGE Framework appeared first on MarkTechPost.

MarkTechPost

OpenAI stops ChatGPT from telling people to break up with partners

August 6, 2025

by Siriusdigitals0 comments

OpenAI stops ChatGPT from telling people to break up with partners

Instead of giving definitive answers to personal challenges the chatbot will help people reflect on a problem

ChatGPT will not tell people to break up with their partner and will encourage users to take breaks from long chatbot sessions, under new changes to the artificial intelligence tool.

OpenAI, ChatGPT’s developer, said the chatbot would stop giving definitive answers to personal challenges and would instead help people to mull over problems such as potential breakups.

Continue reading…Artificial intelligence (AI) | The Guardian

OpenAI Launches ‘gpt-oss’: Two New Open-Weight AI Models You Can Test Now for Free

August 6, 2025

by Siriusdigitals0 comments

OpenAI Launches ‘gpt-oss’: Two New Open-Weight AI Models You Can Test Now for Free

OpenAI Launches 'gpt-oss': Two New Open-Weight AI Models You Can Test Now for Free

OpenAI has finally launched two open-weight models, gpt-oss-120b and gpt-oss-20b. After years of OpenAI guarding its AI models, they have released gpt-oss, two openly licensed language models that anyone can download, fine-tune, and even run on a mid-range laptop. Developers have been asking OpenAI for this for years, and now it seems like OpenAI is ready to compete head-to-head in the fast-expanding world of community-driven models.

What Makes gpt-oss Different?

The gpt-oss family introduces two open-weight AI language models, gpt-oss-120b and gpt-oss-20b, under the flexible Apache 2.0 license. This means you can run, fine-tune, and integrate these models locally, on your own hardware, or virtually anywhere.

Here’s a quick look at what makes them stand out:

Performance for Everyone:

gpt-oss-120b rivals proprietary models like OpenAI’s o4-mini in core reasoning benchmarks, running on a single 80GB GPU.
gpt-oss-20b matches or surpasses o3-mini for mainstream tasks, and it operates on common edge devices with just 16GB of memory.

Low-cost Local AI:

These open-weight AI models are optimized for efficient deployment, even on consumer-grade devices, making advanced AI affordable and accessible, no cloud required.

Serious Reasoning and Tool Use:

Both models are great for real-world problem-solving, tool usage (like web search or Python code execution), and following instructions.
They’re adaptable, supporting full “chain-of-thought” (CoT) reasoning, structured outputs, and multiple levels of reasoning effort to balance speed and complexity.

Open by Design: Full weights and reference implementations are available on platforms like Hugging Face. Developers can fine-tune models and customize how they interact with users, powering everything from chatbots to search to healthcare tools.

Safety as a Foundation:

OpenAI applied strict safety training and adversarial testing, measuring these open models against their proprietary systems and working with external experts.
The company even ran worst-case scenarios, fine-tuning models for risky purposes to ensure robust defenses, and is inviting the public to identify new issues via a “Red Teaming Challenge” with a $500,000 bounty.

Flexible Integration: The gpt-oss models are ready to run via widely used engines and platforms, from Azure to Hugging Face, or even directly on Windows devices with GPU optimizations.

Why did OpenAI launch open-weight models now?

Earlier this year OpenAI’s CEO Sam Altman revealed the company had been “on the wrong side of history” by keeping its tech locked up while rivals like Llama and DeepSeek captured the open-source mind-share. There is also geopolitical pressure: U.S. policymakers have urged domestic labs to share more technology to ensure American values shape global AI adoption with AI that is both powerful and transparent.

How to test the new OpenAI gpt-oss models:

You can test the new gpt-oss model right now for free.

Step 1: Visit this gpt-oss website. You’ll be asked to either continue with visible reasoning or not show reasoning.

Step 2: Once you are in, you can choose the model and reasoning level.

Step 3: Enter a prompt to continue testing these two new open-weight AI models by OpenAI.

Why This Matters

OpenAI’s gpt-oss release isn’t just a technical milestone; it’s about democratizing access to powerful AI.

Open-weight models bring:

Lower costs and barriers for startups, researchers, and those in resource-constrained sectors.
Give developers and organizations control over their data and infrastructure.
Promotes transparency, safety research, and creative experimentation on a global scale.

In Conclusion:

OpenAI’s late but ambitious entry can potentially reset the conversation about what “open” can mean from a commercial lab. OpenAI is inviting the world to imagine, create, and safeguard the future of AI, together by sharing the gpt-oss-120b and gpt-oss-20b models and making them open weights.

It’s a bet on the power of open collaboration and a recognition that the future of AI will be built not by a select few, but by a global community of creators and innovators. These models put advanced, adaptable AI tools directly into the hands of developers, researchers, and curious innovators, and the next breakthrough might just come from you.

🤝

For Partnership/Promotion on AI Tools Club, please check out our partnership page.

AI Tools Club

Claude Subagents: Automate Your Workflow with Custom AI Agents by Anthropic AI

August 6, 2025

by Siriusdigitals0 comments

Claude Subagents: Automate Your Workflow with Custom AI Agents by Anthropic AI

Autonomous AI agents are getting more popular and capable day by day, and Anthropic AI has contributed massively to the growth and development of these new AI systems. One of Anthropic AI’s successful flagship AI agents is Claude Code, an AI coding agent designed to work directly in your terminal, helping users automate coding tasks, explain complex code, edit files, and run commands using prompts. Now, Anthropic AI is offering Claude Subagents.

What are Claude Subagents?

Claude Subagents are custom AI agents within Claude Code that are purpose-built versions of Claude for task automation. Each subagent focuses on one job, such as testing code, finding bugs, maintaining quality standards, or gathering research. Instead of using one AI for all tasks, you can assign a group of these subagents, each with its own role, custom instructions, special tools, and separate contexts.

Here’s a closer, easy-to-digest breakdown of their main features and how they work:

Collaboration and Orchestration:
The main Claude model acts as an orchestrator, coordinating the work of the different subagents and ensuring that they are all working together towards the common goal. It’s this ability to manage and collaborate that makes the system so effective without needing to micromanage.
Specialized AI Teammates:
Each subagent is designed for a specific task. You can create one for managing databases, another for reviewing code, and a third for running tests. They can work together on your project while also operating independently. After you build a subagent, you can use it for other projects or share it with your teammates. This helps create consistent workflows and improves teamwork.
Dedicated Context Windows:
Subagents operate in their own isolated space, preventing their work from mixing inappropriately with other conversations or tasks. This means more focused answers, better memory, and less risk of confusion across tasks. By distributing work among domain-focused subagents, you can stay clear of the typical mistakes that come from juggling too much at once with a single, generalist assistant.
Customizable Setups:
You control what tools each subagent has access to, their prompts, and even their style. This allows you to customize their behavior and function to meet your preferences or your organization’s needs. Users can create multiple subagents that can handle different jobs at once. For example, you can run up to 10 tasks in parallel, with new work queued up as soon as a task is done, allowing fast, efficient, and scalable project work.

Claude Subagents can automate complex workflows that were once too difficult for AI to manage. For example, you can ask it to “research the latest trends in artificial intelligence, write a blog post about the findings, and then post it to your website.” Previously, this task would have involved many steps and a lot of human help. Now, with subagents, Claude can complete the entire process by itself from start to finish.

How they slot into real-world workflows

Code quality gate: A code-reviewer agent can scan every pull request, flag insecure patterns, and even suggest fixes before a human reviewer looks.
Automated testing: A test-runner can bootstrap unit tests, execute them, and report failures, all without derailing the main chat.
Data wrangling: A data-scientist agent can write SQL, run BigQuery tasks, and return with clean summaries you can drop straight into a slide deck.
Research sprints: Anthropic’s own teams spawn multiple research subagents in parallel, then have a lead agent combine the findings into a single brief, showing a 90% precision bump over single-agent runs in internal tests.

For developers and project leads, Claude Subagents offer an edge: less routine stress, more accurate results, and the ability to focus on bigger-picture thinking. For organizations, it means higher speed, less error-prone development, and an AI workforce that never gets tired or burned out.

A few words of caution

Running multiple agents can consume tokens roughly 4× faster than a single chat, and poorly scoped agents can step on each other or duplicate work. Keep prompts sharp, monitor usage, and resist the urge to “agent-ify” every trivial action. Think of subagents as specialists, not interns, and only deploy them where their expertise cancels their cost.

Conclusion

Claude Subagents turn a single large language model (LLM) into a team of focused co-workers. Instead of relying on a single AI that’s a jack-of-all-trades, users can now build a team of focused experts, each handling their specific task with precision. Claude Subagents allow you to automate your entire workflow without losing control or flooding your chat history by isolating context, granting precise tool access, and allowing parallel execution.

Treat these custom AI agents like any new hire: give each a clear role, the right tools, and a tight feedback loop, making AI noticeably smarter and more reliable.

🤝

For Partnership/Promotion on AI Tools Club, please check out our partnership page.

AI Tools Club

August 6, 2025

by Siriusdigitals0 comments

Anthropic AI Introduces Persona Vectors to Monitor and Control Personality Shifts in LLMs

LLMs are deployed through conversational interfaces that present helpful, harmless, and honest assistant personas. However, they fail to maintain consistent personality traits throughout the training and deployment phases. LLMs show dramatic and unpredictable persona shifts when exposed to different prompting strategies or contextual inputs. The training process can also cause unintended personality shifts, as seen when modifications to RLHF unintentionally create overly sycophantic behaviors in GPT-4o, leading to validation of harmful content and reinforcement of negative emotions. This highlights weaknesses in current LLM deployment practices and emphasizes the urgent need for reliable tools to detect and prevent harmful persona shifts.

Related works like linear probing techniques extract interpretable directions for behaviors like entity recognition, sycophancy, and refusal patterns by creating contrastive sample pairs and computing activation differences. However, these methods struggle with unexpected generalization during finetuning, where training on narrow domain examples can cause broader misalignment through emergent shifts along meaningful linear directions. Current prediction and control methods, including gradient-based analysis for identifying harmful training samples, sparse autoencoder ablation techniques, and directional feature removal during training, show limited effectiveness in preventing unwanted behavioral changes.

A team of researchers from Anthropic, UT Austin, Constellation, Truthful AI, and UC Berkeley present an approach to address persona instability in LLMs through persona vectors in activation space. The method extracts directions corresponding to specific personality traits like evil behavior, sycophancy, and hallucination propensity using an automated pipeline that requires only natural-language descriptions of target traits. Moreover, it shows that intended and unintended personality shifts after finetuning strongly correlate with movements along persona vectors, offering opportunities for intervention via post-hoc correction or preventative steering methods. Moreover, researchers show that finetuning-induced persona shifts can be predicted before finetuning, identifying problematic training data at both the dataset and individual sample levels.

To monitor persona shifts during finetuning, two datasets are constructed. The first one is trait-eliciting datasets that contain explicit examples of malicious responses, sycophantic behaviors, and fabricated information. The second is “emergent misalignment-like” (“EM-like”) datasets, which contain narrow domain-specific issues such as incorrect medical advice, flawed political arguments, invalid math problems, and vulnerable code. Moreover, researchers extract average hidden states to detect behavioral shifts during finetuning mediated by persona vectors at the last prompt token across evaluation sets, computing the difference to provide activation shift vectors. These shift vectors are then mapped onto previously extracted persona directions to measure finetuning-induced changes along specific trait dimensions.

Dataset-level projection difference metrics show a strong correlation with trait expression after finetuning, allowing early detection of training datasets that may trigger unwanted persona characteristics. It proves more effective than raw projection methods in predicting trait shifts, as it considers the base model’s natural response patterns to specific prompts. Sample-level detection achieves high separability between problematic and control samples across trait-eliciting datasets (Evil II, Sycophantic II, Hallucination II) and “EM-like” datasets (Opinion Mistake II). The persona directions identify individual training samples that induce persona shifts with fine-grained precision, outperforming traditional data filtering methods and providing broad coverage across trait-eliciting content and domain-specific errors.

In conclusion, researchers introduced an automated pipeline that extracts persona vectors from natural-language trait descriptions, providing tools for monitoring and controlling personality shifts across deployment, training, and pre-training phases in LLMs. Future research directions include characterizing the complete persona space dimensionality, identifying natural persona bases, exploring correlations between persona vectors and trait co-expression patterns, and investigating limitations of linear methods for certain personality traits. This study builds a foundational understanding of persona dynamics in models and offers practical frameworks for creating more reliable and controllable language model systems.

Check out the Paper, Technical Blog and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Anthropic AI Introduces Persona Vectors to Monitor and Control Personality Shifts in LLMs appeared first on MarkTechPost.

MarkTechPost

OpenAI Just Released the Hottest Open-Weight LLMs: gpt-oss-120B (Runs on a High-End Laptop) and gpt-oss-20B (Runs on a Phone)

August 6, 2025

by Siriusdigitals0 comments

OpenAI Just Released the Hottest Open-Weight LLMs: gpt-oss-120B (Runs on a High-End Laptop) and gpt-oss-20B (Runs on a Phone)

OpenAI has just sent seismic waves through the AI world: for the first time since GPT-2 hit the scene in 2019, the company is releasing not one, but TWO open-weight language models. Meet gpt-oss-120b and gpt-oss-20b—models that anyone can download, inspect, fine-tune, and run on their own hardware. This launch doesn’t just shift the AI landscape; it detonates a new era of transparency, customization, and raw computational power for researchers, developers, and enthusiasts everywhere.

Why Is This Release a Big Deal?

OpenAI has long cultivated a reputation for both jaw-dropping model capabilities and a fortress-like approach to proprietary tech. That changed on August 5, 2025. These new models are distributed under the permissive Apache 2.0 license, making them open for commercial and experimental use. The difference? Instead of hiding behind cloud APIs, anyone can now put OpenAI-grade models under their microscope—or put them directly to work on problems at the edge, in enterprise, or even on consumer devices.

Meet the Models: Technical Marvels with Real-World Muscle

gpt-oss-120B

Size: 117 billion parameters (with 5.1 billion active parameters per token, thanks to Mixture-of-Experts tech)
Performance: Punches at the level of OpenAI’s o4-mini (or better) in real-world benchmarks.
Hardware: Runs on a single high-end GPU—think Nvidia H100, or 80GB-class cards. No server farm required.
Reasoning: Features chain-of-thought and agentic capabilities—ideal for research automation, technical writing, code generation, and more.
Customization: Supports configurable “reasoning effort” (low, medium, high), so you can dial up power when needed or save resources when you don’t.
Context: Handles up to a massive 128,000 tokens—enough text to read entire books at a time.
Fine-Tuning: Built for easy customization and local/private inference—no rate limits, full data privacy, and total deployment control.

gpt-oss-20B

Size: 21 billion parameters (with 3.6 billion active parameters per token, also Mixture-of-Experts).
Performance: Sits squarely between o3-mini and o4-mini in reasoning tasks—on par with the best “small” models available.
Hardware: Runs on consumer-grade laptops—with just 16GB RAM or equivalent, it’s the most powerful open-weight reasoning model you can fit on a phone or local PC.
Mobile Ready: Specifically optimized to deliver low-latency, private on-device AI for smartphones (including Qualcomm Snapdragon support), edge devices, and any scenario needing local inference minus the cloud.
Agentic Powers: Like its big sibling, 20B can use APIs, generate structured outputs, and execute Python code on demand.

Technical Details: Mixture-of-Experts and MXFP4 Quantization

Both models use a Mixture-of-Experts (MoE) architecture, only activating a handful of “expert” subnetworks per token. The result? Enormous parameter counts with modest memory usage and lightning-fast inference—perfect for today’s high-performance consumer and enterprise hardware.

Add to that native MXFP4 quantization, shrinking model memory footprints without sacrificing accuracy. The 120B model fits snugly onto a single advanced GPU; the 20B model can run comfortably on laptops, desktops, and even mobile hardware.

Real-World Impact: Tools for Enterprise, Developers, and Hobbyists

For Enterprises: On-premises deployment for data privacy and compliance. No more black-box cloud AI: financial, healthcare, and legal sectors can now own and secure every bit of their LLM workflow.
For Developers: Freedom to tinker, fine-tune, and extend. No API limits, no SaaS bills, just pure, customizable AI with full control over latency or cost.
For the Community: Models are already available on Hugging Face, Ollama, and more—go from download to deployment in minutes.

How Does GPT-OSS Stack Up?

Here’s the kicker: gpt-oss-120B is the first freely available open-weight model that matches the performance of top-tier commercial models like o4-mini. The 20B variant not only bridges the performance gap for on-device AI but will likely accelerate innovation and push boundaries on what’s possible with local LLMs.

The Future Is Open (Again)

OpenAI’s GPT-OSS isn’t just a release; it’s a clarion call. By making state-of-the-art reasoning, tool use, and agentic capabilities available for anyone to inspect and deploy, OpenAI throws open the door to an entire community of makers, researchers, and enterprises—not just to use, but to build on, iterate, and evolve.

Check out the gpt-oss-120B, gpt-oss-20B and Technical Blog. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post OpenAI Just Released the Hottest Open-Weight LLMs: gpt-oss-120B (Runs on a High-End Laptop) and gpt-oss-20B (Runs on a Phone) appeared first on MarkTechPost.

MarkTechPost

Helping data storage keep up with the AI revolution

August 6, 2025

by Siriusdigitals0 comments

Helping data storage keep up with the AI revolution

Artificial intelligence is changing the way businesses store and access their data. That’s because traditional data storage systems were designed to handle simple commands from a handful of users at once, whereas today, AI systems with millions of agents need to continuously access and process large amounts of data in parallel. Traditional data storage systems now have layers of complexity, which slows AI systems down because data must pass through multiple tiers before reaching the graphical processing units (GPUs) that are the brain cells of AI.

Cloudian, co-founded by Michael Tso ’93, SM ’93 and Hiroshi Ohta, is helping storage keep up with the AI revolution. The company has developed a scalable storage system for businesses that helps data flow seamlessly between storage and AI models. The system reduces complexity by applying parallel computing to data storage, consolidating AI functions and data onto a single parallel-processing platform that stores, retrieves, and processes scalable datasets, with direct, high-speed transfers between storage and GPUs and CPUs.

Cloudian’s integrated storage-computing platform simplifies the process of building commercial-scale AI tools and gives businesses a storage foundation that can keep up with the rise of AI.

“One of the things people miss about AI is that it’s all about the data,” Tso says. “You can’t get a 10 percent improvement in AI performance with 10 percent more data or even 10 times more data — you need 1,000 times more data. Being able to store that data in a way that’s easy to manage, and in such a way that you can embed computations into it so you can run operations while the data is coming in without moving the data — that’s where this industry is going.”

From MIT to industry

As an undergraduate at MIT in the 1990s, Tso was introduced by Professor William Dally to parallel computing — a type of computation in which many calculations occur simultaneously. Tso also worked on parallel computing with Associate Professor Greg Papadopoulos.

“It was an incredible time because most schools had one super-computing project going on — MIT had four,” Tso recalls.

As a graduate student, Tso worked with MIT senior research scientist David Clark, a computing pioneer who contributed to the internet’s early architecture, particularly the transmission control protocol (TCP) that delivers data between systems.

“As a graduate student at MIT, I worked on disconnected and intermittent networking operations for large scale distributed systems,” Tso says. “It’s funny — 30 years on, that’s what I’m still doing today.”

Following his graduation, Tso worked at Intel’s Architecture Lab, where he invented data synchronization algorithms used by Blackberry. He also created specifications for Nokia that ignited the ringtone download industry. He then joined Inktomi, a startup co-founded by Eric Brewer SM ’92, PhD ’94 that pioneered search and web content distribution technologies.

In 2001, Tso started Gemini Mobile Technologies with Joseph Norton ’93, SM ’93 and others. The company went on to build the world’s largest mobile messaging systems to handle the massive data growth from camera phones. Then, in the late 2000s, cloud computing became a powerful way for businesses to rent virtual servers as they grew their operations. Tso noticed the amount of data being collected was growing far faster than the speed of networking, so he decided to pivot the company.

“Data is being created in a lot of different places, and that data has its own gravity: It’s going to cost you money and time to move it,” Tso explains. “That means the end state is a distributed cloud that reaches out to edge devices and servers. You have to bring the cloud to the data, not the data to the cloud.”

Tso officially launched Cloudian out of Gemini Mobile Technologies in 2012, with a new emphasis on helping customers with scalable, distributed, cloud-compatible data storage.

“What we didn’t see when we first started the company was that AI was going to be the ultimate use case for data on the edge,” Tso says.

Although Tso’s research at MIT began more than two decades ago, he sees strong connections between what he worked on and the industry today.

“It’s like my whole life is playing back because David Clark and I were dealing with disconnected and intermittently connected networks, which are part of every edge use case today, and Professor Dally was working on very fast, scalable interconnects,” Tso says, noting that Dally is now the senior vice president and chief scientist at the leading AI company NVIDIA. “Now, when you look at the modern NVIDIA chip architecture and the way they do interchip communication, it’s got Dally’s work all over it. With Professor Papadopoulos, I worked on accelerate application software with parallel computing hardware without having to rewrite the applications, and that’s exactly the problem we are trying to solve with NVIDIA. Coincidentally, all the stuff I was doing at MIT is playing out.”

Today Cloudian’s platform uses an object storage architecture in which all kinds of data —documents, videos, sensor data — are stored as a unique object with metadata. Object storage can manage massive datasets in a flat file stucture, making it ideal for unstructured data and AI systems, but it traditionally hasn’t been able to send data directly to AI models without the data first being copied into a computer’s memory system, creating latency and energy bottlenecks for businesses.

In July, Cloudian announced that it has extended its object storage system with a vector database that stores data in a form which is immediately usable by AI models. As the data are ingested, Cloudian is computing in real-time the vector form of that data to power AI tools like recommender engines, search, and AI assistants. Cloudian also announced a partnership with NVIDIA that allows its storage system to work directly with the AI company’s GPUs. Cloudian says the new system enables even faster AI operations and reduces computing costs.

“NVIDIA contacted us about a year and a half ago because GPUs are useful only with data that keeps them busy,” Tso says. “Now that people are realizing it’s easier to move the AI to the data than it is to move huge datasets. Our storage systems embed a lot of AI functions, so we’re able to pre- and post-process data for AI near where we collect and store the data.”

AI-first storage

Cloudian is helping about 1,000 companies around the world get more value out of their data, including large manufacturers, financial service providers, health care organizations, and government agencies.

Cloudian’s storage platform is helping one large automaker, for instance, use AI to determine when each of its manufacturing robots need to be serviced. Cloudian is also working with the National Library of Medicine to store research articles and patents, and the National Cancer Database to store DNA sequences of tumors — rich datasets that AI models could process to help research develop new treatments or gain new insights.

“GPUs have been an incredible enabler,” Tso says. “Moore’s Law doubles the amount of compute every two years, but GPUs are able to parallelize operations on chips, so you can network GPUs together and shatter Moore’s Law. That scale is pushing AI to new levels of intelligence, but the only way to make GPUs work hard is to feed them data at the same speed that they compute — and the only way to do that is to get rid of all the layers between them and your data.”

MIT News – Artificial intelligence

‘We didn’t vote for ChatGPT’: Swedish PM under fire for using AI in role

August 5, 2025

by Siriusdigitals0 comments

‘We didn’t vote for ChatGPT’: Swedish PM under fire for using AI in role

Tech experts criticise Ulf Kristersson as newspaper accuses him of falling for ‘the oligarchs’ AI psychosis’

The Swedish prime minister, Ulf Kristersson, has come under fire after admitting that he regularly consults AI tools for a second opinion in his role running the country.

Kristersson, whose Moderate party leads Sweden’s centre-right coalition government, said he used tools including ChatGPT and the French service LeChat. His colleagues also used AI in their daily work, he said.

Continue reading…Artificial intelligence (AI) | The Guardian

Google says its new ‘world model’ could train AI robots in virtual warehouses

August 5, 2025

by Siriusdigitals0 comments

Google says its new ‘world model’ could train AI robots in virtual warehouses

Genie 3 is latest step towards human-level artificial general intelligence, tech company claims

Google has outlined its latest step towards artificial general intelligence (AGI) with a new model that allows AI systems to interact with a convincing simulation of the real world.

The Genie 3 “world model” could be used to train robots and autonomous vehicles as they engage with realistic recreations of environments such as warehouses, according to Google.

Continue reading…Artificial intelligence (AI) | The Guardian

August 5, 2025

by Siriusdigitals0 comments

Clay confirms it closed $100M round at $3.1B valuation

The AI sales automation startup raised fresh funds, led by CapitalG, just months after its last round.AI News & Artificial Intelligence | TechCrunch

Author Archives: Siriusdigitals

What Makes gpt-oss Different?

Here’s a quick look at what makes them stand out:

Why did OpenAI launch open-weight models now?

How to test the new OpenAI gpt-oss models:

Why This Matters

In Conclusion:

What are Claude Subagents?

Here’s a closer, easy-to-digest breakdown of their main features and how they work:

How they slot into real-world workflows

A few words of caution

Conclusion

Why Is This Release a Big Deal?

Meet the Models: Technical Marvels with Real-World Muscle

gpt-oss-120B

gpt-oss-20B

Technical Details: Mixture-of-Experts and MXFP4 Quantization

Real-World Impact: Tools for Enterprise, Developers, and Hobbyists

How Does GPT-OSS Stack Up?

The Future Is Open (Again)

Recent Posts

Recent Comments

Latest Posts

Categories

Tags

About

Explore

Projects

Newsletter