Our Blog

21 Ways People Are Using A.I. at Work

Yes, it still makes plenty of mistakes, but it has become part of the job for many.

Artificial Intelligence

A.I. Start-Up Perplexity Offers to Buy Google’s Chrome Browser for $34.5 Billion

August 13, 2025

A.I. Start-Up Perplexity Offers to Buy Google’s Chrome Browser for $34.5 Billion

The tiny start-up hopes to take advantage of an upcoming antitrust ruling against the tech giant.

Artificial Intelligence

Elon Musk Threatens Lawsuit Against Apple Over Claims It Favors OpenAI

August 13, 2025

Elon Musk Threatens Lawsuit Against Apple Over Claims It Favors OpenAI

The billionaire said in posts on X that the consumer tech giant was violating antitrust laws by giving preferential treatment to OpenAI on the App Store.

Artificial Intelligence

August 13, 2025

The Complete Guide to DeepSeek-R1-0528 Inference Providers: Where to Run the Leading Open-Source Reasoning Model

DeepSeek-R1-0528 has emerged as a groundbreaking open-source reasoning model that rivals proprietary alternatives like OpenAI’s o1 and Google’s Gemini 2.5 Pro. With its impressive 87.5% accuracy on AIME 2025 tests and significantly lower costs, it’s become the go-to choice for developers and enterprises seeking powerful AI reasoning capabilities.

This comprehensive guide covers all the major providers where you can access DeepSeek-R1-0528, from cloud APIs to local deployment options, with current pricing and performance comparisons. (Updated August 11, 2025)

Cloud & API Providers

DeepSeek Official API

The most cost-effective option

Pricing: $0.55/M input tokens, $2.19/M output tokens
Features: 64K context length, native reasoning capabilities
Best for: Cost-sensitive applications, high-volume usage
Note: Includes off-peak pricing discounts (16:30-00:30 UTC daily)

Amazon Bedrock (AWS)

Enterprise-grade managed solution

Availability: Fully managed serverless deployment
Regions: US East (N. Virginia), US East (Ohio), US West (Oregon)
Features: Enterprise security, Amazon Bedrock Guardrails integration
Best for: Enterprise deployments, regulated industries
Note: AWS is the first cloud provider to offer DeepSeek-R1 as fully managed

Together AI

Performance-optimized options

DeepSeek-R1: $3.00 input / $7.00 output per 1M tokens
DeepSeek-R1 Throughput: $0.55 input / $2.19 output per 1M tokens
Features: Serverless endpoints, dedicated reasoning clusters
Best for: Production applications requiring consistent performance

Novita AI

Competitive cloud option

Pricing: $0.70/M input tokens, $2.50/M output tokens
Features: OpenAI-compatible API, multi-language SDKs
GPU Rental: Available with hourly pricing for A100/H100/H200 instances
Best for: Developers wanting flexible deployment options

Fireworks AI

Premium performance provider

Pricing: Higher tier pricing (contact for current rates)
Features: Fast inference, enterprise support
Best for: Applications where speed is critical

Other Notable Providers

Nebius AI Studio: Competitive API pricing
Parasail: Listed as API provider
Microsoft Azure: Available (some sources indicate preview pricing)
Hyperbolic: Fast performance with FP8 quantization
DeepInfra: API access available

GPU Rental & Infrastructure Providers

Novita AI GPU Instances

Hardware: A100, H100, H200 GPU instances
Pricing: Hourly rental available (contact for current rates)
Features: Step-by-step setup guides, flexible scaling

Amazon SageMaker

Requirements: ml.p5e.48xlarge instances minimum
Features: Custom model import, enterprise integration
Best for: AWS-native deployments with customization needs

Local & Open-Source Deployment

Hugging Face Hub

Access: Free model weights download
License: MIT License (commercial use allowed)
Formats: Safetensors format, ready for deployment
Tools: Transformers library, pipeline support

Local Deployment Options

Ollama: Popular framework for local LLM deployment
vLLM: High-performance inference server
Unsloth: Optimized for lower-resource deployments
Open Web UI: User-friendly local interface

Hardware Requirements

Full Model: Requires significant GPU memory (671B parameters, 37B active)
Distilled Version (Qwen3-8B): Can run on consumer hardware
- RTX 4090 or RTX 3090 (24GB VRAM) recommended
- Minimum 20GB RAM for quantized versions

Pricing Comparison Table

Provider	Input Price/1M	Output Price/1M	Key Features	Best For
DeepSeek Official	$0.55	$2.19	Lowest cost, off-peak discounts	High-volume, cost-sensitive
Together AI (Throughput)	$0.55	$2.19	Production-optimized	Balanced cost/performance
Novita AI	$0.70	$2.50	GPU rental options	Flexible deployment
Together AI (Standard)	$3.00	$7.00	Premium performance	Speed-critical applications
Amazon Bedrock	Contact AWS	Contact AWS	Enterprise features	Regulated industries
Hugging Face	Free	Free	Open source	Local deployment

Prices are subject to change. Always verify current pricing with providers.

Performance Considerations

Speed vs. Cost Trade-offs

DeepSeek Official: Cheapest but may have higher latency
Premium Providers: 2-4x cost but sub-5 second response times
Local Deployment: No per-token costs but requires hardware investment

Regional Availability

Some providers have limited regional availability
AWS Bedrock: Currently US regions only
Check provider documentation for latest regional support

DeepSeek-R1-0528 Key Improvements

Enhanced Reasoning Capabilities

AIME 2025: 87.5% accuracy (up from 70%)
Deeper thinking: 23K average tokens per question (vs 12K previously)
HMMT 2025: 79.4% accuracy improvement

New Features

System prompt support
JSON output format
Function calling capabilities
Reduced hallucination rates
No manual thinking activation required

Distilled Model Option

DeepSeek-R1-0528-Qwen3-8B

8B parameter efficient version
Runs on consumer hardware
Matches performance of much larger models
Perfect for resource-constrained deployments

Choosing the Right Provider

For Startups & Small Projects

Recommendation: DeepSeek Official API

Lowest cost at $0.55/$2.19 per 1M tokens
Sufficient performance for most use cases
Off-peak discounts available

For Production Applications

Recommendation: Together AI or Novita AI

Better performance guarantees
Enterprise support
Scalable infrastructure

For Enterprise & Regulated Industries

Recommendation: Amazon Bedrock

Enterprise-grade security
Compliance features
Integration with AWS ecosystem

For Local Development

Recommendation: Hugging Face + Ollama

Free to use
Full control over data
No API rate limits

Conclusion

DeepSeek-R1-0528 offers unprecedented access to advanced AI reasoning capabilities at a fraction of the cost of proprietary alternatives. Whether you’re a startup experimenting with AI or an enterprise deploying at scale, there’s a deployment option that fits your needs and budget.

The key is choosing the right provider based on your specific requirements for cost, performance, security, and scale. Start with the DeepSeek official API for testing, then scale to enterprise providers as your needs grow.

Disclaimer: Always verify current pricing and availability directly with providers, as the AI landscape evolves rapidly.

The post The Complete Guide to DeepSeek-R1-0528 Inference Providers: Where to Run the Leading Open-Source Reasoning Model appeared first on MarkTechPost.

MarkTechPost

August 13, 2025

The Best Chinese Open Agentic/Reasoning Models (2025): Expanded Review, Comparative Insights & Use Cases

China continues to set the pace in open-source large-language-model innovation, especially for agentic architectures and deep reasoning. Here is a comprehensive, up-to-date guide to the best Chinese open agentic/reasoning models, expanded with the newest and most influential entrants.

1. Kimi K2 (Moonshot AI)

Profile: Mixture-of-Experts architecture, up to 128K context, superior agentic ability and bilingual (Chinese/English) fluency.
Strengths:
- High benchmark performance in reasoning, coding, mathematics, and long-document workflows.
- Well-rounded agentic skills: tool-use, multi-step automation, protocol adherence.
Use Cases: General-purpose agentic workflows, document intelligence, code generation, multi-language enterprise.
Why Pick: The most balanced all-rounder for open source agentic systems.

2. GLM‑4.5 (Zhipu AI)

Profile: 355B total parameters, native agentic design, long-context support.
Strengths:
- Purpose-built for complex agent execution, workflow automation, and tool orchestration.
- MIT-licensed, established ecosystem (700,000+ developers), rapid community adoption.
Use Cases: Multi-agent applications, cost-effective autonomous agents, research requiring agent-native logic.
Why Pick: For building deeply agentic, tool-integrated, open LLM apps at scale.

3. Qwen3 / Qwen3-Coder (Alibaba DAMO)

Profile: Next-gen Mixture-of-Experts, control over reasoning depth/modes, dominant multilingual model (119+ languages), repo-scale coding specialist.
Strengths:
- Dynamic “thinking/non-thinking” switching, advanced function-calling, top scores in math/code/tool tasks.
- Qwen3-Coder: Handles 1M tokens for code, excels at step-by-step repo analysis and complex dev workflows.
Use Cases: Multilingual tools, global SaaS, multi-modal logic/coding apps, Chinese-centric dev teams.
Why Pick: Precise control, best multilingual support, world-class code agent.

4. DeepSeek-R1 / V3

Profile: Reasoning-first, multi-stage RLHF training, 37B activated parameters per query (R1); V3 expands to 671B for world-class math/code.
Strengths:
- State-of-the-art on logic and chain-of-thought reasoning, surpasses most Western rivals in scientific tasks.
- “Agentic Deep Research” protocols for fully autonomous planning/searching/synthesizing information.
Use Cases: Technical/scientific research, factual analytics, environments that value interpretability.
Why Pick: Maximum reasoning accuracy, agentic extensions for research and planning.

5. Wu Dao 3.0 (BAAI)

Profile: Modular family (AquilaChat, EVA, AquilaCode), open-source, strong long-context and multimodal capabilities.
Strengths:
- Handles both text and images, supports multilingual workflows, well suited for startups and low-compute users.
Use Cases: Multimodal agentic deployment, SMEs, flexible application development.
Why Pick: Most practical and modular for multimodal and smaller-scope agentic tasks.

6. ChatGLM (Zhipu AI)

Profile: Edge-ready, bilingual, context windows up to 1M, quantized for low-memory hardware.
Strengths:
- Best for on-device agentic applications, long-document reasoning, mobile deployments.
Use Cases: Local/gov deployments, privacy-sensitive scenarios, resource-constrained environments.
Why Pick: Flexible scaling from the cloud to edge/mobile, strong bilingual proficiency.

7. Manus & OpenManus (Monica AI / Community)

Profile: China’s new benchmark for general AI agents: independent reasoning, real-world tool use, and agentic orchestration. OpenManus enables agentic workflows based on many underlying models (Llama variants, GLM, DeepSeek).
Strengths:
- Natural autonomous behavior: web search, travel planning, research writing, voice commands.
- OpenManus is highly modular, integrating Chinese open models or proprietary LLMs for tailored agentic tasks.
Use Cases: True mission-completion agents, multi-agent orchestration, open-source agentic frameworks.
Why Pick: First major step towards AGI-like agentic applications in China.

8. Doubao 1.5 Pro

Profile: Known for superior fact consistency and reasoning logic structure, high context window (expected 1M+ tokens).
Strengths:
- Real-time problem-solving, superior logic structure, scalable to multiple enterprise deployments.
Use Cases: Scenarios emphasizing logical rigor, enterprise-level automation.
Why Pick: Enhanced reasoning and logic, strong in scalable business environments.

9. Baichuan, Stepfun, Minimax, 01.AI

Profile: “Six Tigers” of Chinese open AI (per MIT Tech Review), each offering strong reasoning/agentic features in their domain (Stepfun/AIGC, Minimax/memory, Baichuan/multilingual legal).
Strengths:
- Diverse applications: from conversational agents to domain-specific logic in law/finance/science.
Why Pick: Choose for sector-specific requirements, especially high-value business apps.

Comparative Table

Model	Best For	Agentic?	Multilingual?	Context Window	Coding	Reasoning	Unique Features
Kimi K2	All-purpose agentic	Yes	Yes	128K	High	High	Mixture-of-Experts, fast, open
GLM-4.5	Agent-native applications	Yes	Yes	128K+	High	High	Native task/planning API
Qwen3	Control, multilingual, SaaS	Yes	Yes (119+)	32K–1M	Top	Top	Fast mode switching
Qwen3-Coder	Repo-scale coding	Yes	Yes	Up to 1M	Top	High	Step-by-step repo analysis
DeepSeek-R1/V3	Reasoning/math/science	Some	Yes	Large	Top	Highest	RLHF, agentic science, V3: 671B
Wu Dao 3.0	Modular, multimodal, SME	Yes	Yes	Large	Mid	High	Text/image, code, modular builds
ChatGLM	Edge/mobile agentic use	Yes	Yes	1M	Mid	High	Quantized, resource-efficient
Manus	Autonomous agents/voice	Yes	Yes	Large	Task	Top	Voice/smartphone, real-world AGI
Doubao 1.5 Pro	Logic-heavy enterprise	Yes	Yes	1M+	Mid	Top	1M+ tokens, logic structure
Baichuan/etc	Industry-specific logic	Yes	Yes	Varies	Varies	High	Sector specialization

Key Takeaways & When to Use Which Model

Kimi K2: Best all-rounder—if you want balanced agentic power and reasoning, long context, broad language support.
GLM-4.5: Native agent, great for autonomous task apps or tool orchestration; open-source ecosystem leader.
Qwen3/Qwen3-Coder: Superior for agile control, multilingual/enterprise tasks, and high-level code agentics.
DeepSeek-R1/V3: Gold standard for chain-of-thought reasoning, math/science, and research-grade logic.
Wu Dao 3.0: Most practical for SMEs/startups, especially for multimodal (text/image/code) agentic solutions.
ChatGLM/Manus/OpenManus: Field deployment, privacy, and truly autonomous agents—recommended for cutting-edge real-world use, on-device, or collaborative multi-agent tasks.
Doubao 1.5 Pro/Baichuan/Six Tigers: Consider for sector-specific deployments or if factual consistency and specialized logic are critical.

The post The Best Chinese Open Agentic/Reasoning Models (2025): Expanded Review, Comparative Insights & Use Cases appeared first on MarkTechPost.

MarkTechPost

August 13, 2025

Genie Envisioner: A Unified Video-Generative Platform for Scalable, Instruction-Driven Robotic Manipulation

Embodied AI agents that can perceive, think, and act in the real world mark a key step toward the future of robotics. A central challenge is building scalable, reliable robotic manipulation, the skill of deliberately interacting with and controlling objects through selective contact. While progress spans analytic methods, model-based approaches, and large-scale data-driven learning, most systems still operate in disjoint stages of data collection, training, and evaluation. These stages often require custom setups, manual curation, and task-specific tweaks, creating friction that slows progress, hides failure patterns, and hampers reproducibility. This highlights the need for a unified framework to streamline learning and assessment.

Robotic manipulation research has progressed from analytical models to neural world models that learn dynamics directly from sensory inputs, using both pixel and latent spaces. Large-scale video generation models can produce realistic visuals but often lack action conditioning, long-term temporal consistency, and multi-view reasoning needed for control. Vision-language-action models follow instructions but are limited by imitation-based learning, preventing error recovery and planning. Policy evaluation remains challenging, as physics simulators require heavy tuning, and real-world testing is resource-intensive. Existing evaluation metrics often emphasize visual quality over task success, highlighting the need for benchmarks that better capture real-world manipulation performance.

The Genie Envisioner (GE), developed by researchers from AgiBot Genie Team, NUS LV-Lab, and BUAA, is a unified platform for robotic manipulation that combines policy learning, simulation, and evaluation in a video-generative framework. Its core, GE-Base, is a large-scale, instruction-driven video diffusion model capturing spatial, temporal, and semantic dynamics of real-world tasks. GE-Act maps these representations to precise action trajectories, while GE-Sim offers fast, action-conditioned video-based simulation. The EWMBench benchmark evaluates visual realism, physical accuracy, and instruction-action alignment. Trained on over a million episodes, GE generalizes across robots and tasks, enabling scalable, memory-aware, and physically grounded embodied intelligence research.

GE’s design unfolds in three key parts. GE-Base is a multi-view, instruction-conditioned video diffusion model trained on over 1 million robotic manipulation episodes. It learns latent trajectories that capture how scenes evolve under given commands. Building on that, GE-Act translates these latent video representations into real action signals via a lightweight, flow-matching decoder, offering quick, precise motor control even on robots not in the training data. GE-Sim repurposes GE-Base’s generative power into an action-conditioned neural simulator, enabling closed-loop, video-based rollout at speeds far beyond real hardware. The EWMBench suite then evaluates the system holistically across video realism, physical consistency, and alignment between instructions and resulting actions.

In evaluations, Genie Envisioner showed strong real-world and simulated performance across varied robotic manipulation tasks. GE-Act achieved rapid control generation (54-step trajectories in 200 ms) and consistently outperformed leading vision-language-action baselines in both step-wise and end-to-end success rates. It adapted to new robot types, like Agilex Cobot Magic and Dual Franka, with only an hour of task-specific data, excelling in complex deformable object tasks. GE-Sim delivered high-fidelity, action-conditioned video simulations for scalable, closed-loop policy testing. The EWMBench benchmark confirmed GE-Base’s superior temporal alignment, motion consistency, and scene stability over state-of-the-art video models, aligning closely with human quality judgments.

In conclusion, Genie Envisioner is a unified, scalable platform for dual-arm robotic manipulation that merges policy learning, simulation, and evaluation into one video-generative framework. Its core, GE-Base, is an instruction-guided video diffusion model capturing the spatial, temporal, and semantic patterns of real-world robot interactions. GE-Act builds on this by converting these representations into precise, adaptable action plans, even on new robot types with minimal retraining. GE-Sim offers high-fidelity, action-conditioned simulation for closed-loop policy refinement, while EWMBench provides rigorous evaluation of realism, alignment, and consistency. Extensive real-world tests highlight the system’s superior performance, making it a strong foundation for general-purpose, instruction-driven embodied intelligence.

Check out the Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Genie Envisioner: A Unified Video-Generative Platform for Scalable, Instruction-Driven Robotic Manipulation appeared first on MarkTechPost.

MarkTechPost

NuMind AI Releases NuMarkdown-8B-Thinking: A Reasoning Breakthrough in OCR and Document-to-Markdown Conversion

August 13, 2025

NuMind AI Releases NuMarkdown-8B-Thinking: A Reasoning Breakthrough in OCR and Document-to-Markdown Conversion

NuMind AI has officially released NuMarkdown-8B-Thinking, an open-source (MIT License) reasoning OCR Vision-Language Model (VLM) that redefines how complex documents are digitized and structured. Unlike traditional OCR systems, NuMarkdown-8B-Thinking doesn’t just extract text—it thinks about a document’s layout, structure, and formatting before generating a precise, ready-to-use Markdown file.

This makes it the first reasoning VLM purpose-built for converting PDFs, scanned documents, and spreadsheets into clean, structured Markdown—ideal for Retrieval-Augmented Generation (RAG) workflows, AI-powered knowledge bases, and large-scale document archiving.

How NuMarkdown-8B-Thinking Is Different?

The model introduces a reasoning-first approach to OCR. Instead of directly rendering extracted text, NuMarkdown-8B-Thinking generates “thinking tokens” — internal reasoning steps that help it understand document layouts before producing the final output.

This capability allows it to handle formats and structures that stump most conventional and even AI-powered OCR systems, including:

Multi-column layouts with complex reading orders
Tables with merged, nested, or irregular cells
Mixed visual elements (images, decorative headers, watermarks)
Historical or degraded scans where layout inference is crucial

The number of reasoning tokens varies with complexity—anywhere from 20% to 500% of the final Markdown length—showing how much the model “thinks” before it “writes.”

Training and Architecture

NuMarkdown-8B-Thinking is a fine-tuned version of Qwen 2.5-VL-7B from Alibaba—one of the strongest open-source multi-modal models available.

Its training pipeline involved two key phases:

Supervised Fine-Tuning (SFT) on synthetic document samples where each example included:
- Raw document input
- Intermediate reasoning steps (layout parsing, structure inference)
- Final Markdown representation
Reinforcement Learning with GRPO, using a layout-centric reward that encouraged accurate reconstruction of document formatting and spatial relationships.

This two-stage process gave NuMarkdown-8B-Thinking the ability to maintain high accuracy even on challenging layouts that typically require human-level judgment.

Benchmark Results: Outperforming OCR Heavyweights

In independent evaluations and user testing, NuMarkdown-8B-Thinking demonstrates state-of-the-art reasoning for OCR-to-Markdown tasks:

Beats:
- Generalist models like GPT-4o
- Specialized OCR-focused models like OCRFlux
Competitive with:
- Large closed-source reasoning models like Gemini 2.5
- Just behind elite models like Gemini Flash Reasoning in blind, multi-model user rankings

Users particularly highlight its ability to:

Correctly infer reading order in non-linear layouts
Preserve intricate table formatting
Output clean, parsing-friendly Markdown for RAG ingestion without further post-processing

Example in Action

Imagine a scanned annual report page with:

Multi-level headings
Sidebars and multiple columns
A financial table with merged cells and uneven row spacing
A footer with legal disclaimers

NuMarkdown-8B-Thinking first produces reasoning tokens outlining the structure (“Column 1: Intro paragraph… Column 2: Continue paragraph… Footer text at bottom… Table spans two columns…”), then outputs Markdown that accurately reflects both content and layout.

This transparent reasoning layer makes the model’s decisions auditable—a major plus in enterprise, legal, and archival contexts.

Deployment Options

Whether you’re a researcher, developer, or enterprise AI engineer, NuMarkdown-8B-Thinking is ready to slot into your workflow:

Hugging Face: Available for direct testing and integration.
Local Execution: Model weights and quantized GGUF versions are published for CPU/GPU-friendly deployment.
API-friendly: Compatible with OpenAI-style APIs and Hugging Face Transformers for rapid integration into pipelines.

Its MIT License ensures full freedom for commercial, academic, or personal projects—no vendor lock-in or costly API gates.

Why This Matters

For industries that rely on accurate document digitization—finance, legal, healthcare, government archives—layout fidelity is as important as textual accuracy. Most OCR systems treat layout as an afterthought; NuMarkdown-8B-Thinking treats it as a reasoning problem.

By combining open-sourcing, layout reasoning, and RAG-optimized Markdown output, NuMarkdown-8B-Thinking offers a transparent, verifiable, and high-performance alternative to proprietary document AI solutions.

Check out the Model on Hugging Face and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post NuMind AI Releases NuMarkdown-8B-Thinking: A Reasoning Breakthrough in OCR and Document-to-Markdown Conversion appeared first on MarkTechPost.

MarkTechPost

How to Make Perfect Songs for Any Moment Using This New AI Music Generator

August 13, 2025

How to Make Perfect Songs for Any Moment Using This New AI Music Generator

Music is often called the universal language of mankind. You can either agree or disagree, but one thing is for sure: music has the ability to change our mood. You could be feeling sad, but then you hear your favourite song, and suddenly you feel better. Music does make a difference, and that is why it is everywhere, in ads, films, YouTube intros, and meditation apps. ElevenLabs^*, well-known as an AI voice generator for its realistic voice creation, has introduced Eleven Music.

What is Eleven Music all about?

Eleven Music is an AI music generator that allows anyone to make studio-quality songs for any moment, of any genre, style, vocals, or instrumental in minutes using simple text prompts. It even lets you tweak parts of the song, like editing a chorus or verse using prompts. It even supports multiple languages like English, Spanish, German, and Japanese.

Crucially, Eleven Music comes with commercial clearance. That’s thanks to licensing deals with Merlin Network and Kobalt Music Group, so creators, from freelancers to indie filmmakers, can use the tracks in films, ads, games, social videos, podcasts, and more without legal worry. Plus, built‑in safeguards prevent the AI from mimicking known artists, using copyrighted lyrics, or generating hateful or violent content.

According to company docs, music is generated in MP3 format at studio quality (44.1 kHz), tracks range from 10 seconds to 5 minutes, and both playlists and APIs are rolling out soon.

Top 3 features of Eleven Music that make it a game-changer:

Text-to-Music: Being a prompt-based tool, you can generate a complete musical piece simply by describing it and letting the AI compose a track based on your input.
Vocal and Instrumental Tracks: This AI music generator by ElevenLabs can generate both purely instrumental music and tracks with AI-generated vocals in different languages like English, Spanish, and Japanese.
Fine Control: You aren’t just stuck with the first thing the AI creates. The platform allows for section-by-section editing. You can generate an intro, then a verse, and fine-tune each part to build a complete song with seamless transitions.

We often see creators facing licensing and copyright issues. In such cases, if an AI tool can create the perfect track for your work, why not give it a try?

How to make perfect songs using Eleven Music:

I eagerly wanted to test this AI music generator, being an ElevenLabs user, and show how you can create your first song using it.

Step 1: Visit ElevenLabs’ website^*. Click on the platforms option at the top and then select the Music option.

Scroll down a bit and click on Get Started. You’ll need to sign up for ElevenLabs if you haven’t already.

Step 2: If you are new to ElevenLabs, you will be asked a few quick questions to optimize your experience.

Step 3: Once you are in, you can get started immediately with your 10,000 free credits.

For my song, I went for the following setting:

2 variants
30 seconds
Prompt: A fun UK drill rap song to help increase work productivity

Step 4: Eleven Music will generate songs for you fast. As I generated two variants, I could choose from two. I personally like the chorus on the second one.

0:00

/1:22

Step 5: You can edit the song using a text prompt. As I loved the chorus on the second variant, I wanted the verse to be similar to that to match the vibe.

I am happy I decided to edit the verse, and this time I liked the first verse of the first variant of the edited version.

0:00

/1:22

If you are happy, just like me. You can either download or share the Eleven Music-generated song.

In Conclusion:

ElevenLabs^* was already a capable AI voice generator, but by adding this new AI music generation ability, it has only gotten better. I wouldn’t say it’s perfect, and it has to compete against Suno AI and Udio, which have been in the game for quite some time now and are very capable. I would say it is a solid AI music generator that makes studio-quality music production accessible for everyone. Eleven Music allows everyone to make perfect songs for any moment.

🤝

For Partnership/Promotion on AI Tools Club, please check out our partnership page.

_{*Affiliate: We do make a small profit from the sales of this AI product through affiliate marketing. This is not an official list; we have tried to mention as many tools as possible.}

AI Tools Club

August 13, 2025

Building a Secure and Memory-Enabled Cipher Workflow for AI Agents with Dynamic LLM Selection and API Integration

In this tutorial, we walk through building a compact but fully functional Cipher-based workflow. We start by securely capturing our Gemini API key in the Colab UI without exposing it in code. We then implement a dynamic LLM selection function that can automatically switch between OpenAI, Gemini, or Anthropic based on which API key is available. The setup phase ensures Node.js and the Cipher CLI are installed, after which we programmatically generate a cipher.yml configuration to enable a memory agent with long-term recall. We create helper functions to run Cipher commands directly from Python, store key project decisions as persistent memories, retrieve them on demand, and finally spin up Cipher in API mode for external integration. Check out the FULL CODES here.

import os, getpass
os.environ["GEMINI_API_KEY"] = getpass.getpass("Enter your Gemini API key: ").strip()


import subprocess, tempfile, pathlib, textwrap, time, requests, shlex


def choose_llm():
   if os.getenv("OPENAI_API_KEY"):
       return "openai", "gpt-4o-mini", "OPENAI_API_KEY"
   if os.getenv("GEMINI_API_KEY"):
       return "gemini", "gemini-2.5-flash", "GEMINI_API_KEY"
   if os.getenv("ANTHROPIC_API_KEY"):
       return "anthropic", "claude-3-5-haiku-20241022", "ANTHROPIC_API_KEY"
   raise RuntimeError("Set one API key before running.")

We start by securely entering our Gemini API key using getpass so it stays hidden in the Colab UI. We then define a choose_llm() function that checks our environment variables and automatically selects the appropriate LLM provider, model, and key based on what is available. Check out the FULL CODES here.

def run(cmd, check=True, env=None):
   print("▸", cmd)
   p = subprocess.run(cmd, shell=True, text=True, capture_output=True, env=env)
   if p.stdout: print(p.stdout)
   if p.stderr: print(p.stderr)
   if check and p.returncode != 0:
       raise RuntimeError(f"Command failed: {cmd}")
   return p

We create a run() helper function that executes shell commands, prints both stdout and stderr for visibility, and raises an error if the command fails when check is enabled, making our workflow execution more transparent and reliable. Check out the FULL CODES here.

def ensure_node_and_cipher():
   run("sudo apt-get update -y && sudo apt-get install -y nodejs npm", check=False)
   run("npm install -g @byterover/cipher")

We define ensure_node_and_cipher() to install Node.js, npm, and the Cipher CLI globally, ensuring our environment has all the necessary dependencies before running any Cipher-related commands. Check out the FULL CODES here.

def write_cipher_yml(workdir, provider, model, key_env):
   cfg = """
llm:
 provider: {provider}
 model: {model}
 apiKey: ${key_env}
systemPrompt:
 enabled: true
 content: |
   You are an AI programming assistant with long-term memory of prior decisions.
embedding:
 disabled: true
mcpServers:
 filesystem:
   type: stdio
   command: npx
   args: ['-y','@modelcontextprotocol/server-filesystem','.']
""".format(provider=provider, model=model, key_env=key_env)


   (workdir / "memAgent").mkdir(parents=True, exist_ok=True)
   (workdir / "memAgent" / "cipher.yml").write_text(cfg.strip() + "n")

We implement write_cipher_yml() to generate a cipher.yml configuration file inside a memAgent folder, setting the chosen LLM provider, model, and API key, enabling a system prompt with long-term memory, and registering a filesystem MCP server for file operations. Check out the FULL CODES here.

def cipher_once(text, env=None, cwd=None):
   cmd = f'cipher {shlex.quote(text)}'
   p = subprocess.run(cmd, shell=True, text=True, capture_output=True, env=env, cwd=cwd)
   print("Cipher says:n", p.stdout or p.stderr)
   return p.stdout.strip() or p.stderr.strip()

We define cipher_once() to run a single Cipher CLI command with the provided text, capture and display its output, and return the response, allowing us to interact with Cipher programmatically from Python. Check out the FULL CODES here.

def start_api(env, cwd):
   proc = subprocess.Popen("cipher --mode api", shell=True, env=env, cwd=cwd,
                           stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
   for _ in range(30):
       try:
           r = requests.get("http://127.0.0.1:3000/health", timeout=2)
           if r.ok:
               print("API /health:", r.text)
               break
       except: pass
       time.sleep(1)
   return proc

We create start_api() to launch Cipher in API mode as a subprocess, then repeatedly poll its /health endpoint until it responds, ensuring the API server is ready before proceeding. Check out the FULL CODES here.

def main():
   provider, model, key_env = choose_llm()
   ensure_node_and_cipher()
   workdir = pathlib.Path(tempfile.mkdtemp(prefix="cipher_demo_"))
   write_cipher_yml(workdir, provider, model, key_env)
   env = os.environ.copy()


   cipher_once("Store decision: use pydantic for config validation; pytest fixtures for testing.", env, str(workdir))
   cipher_once("Remember: follow conventional commits; enforce black + isort in CI.", env, str(workdir))


   cipher_once("What did we standardize for config validation and Python formatting?", env, str(workdir))


   api_proc = start_api(env, str(workdir))
   time.sleep(3)
   api_proc.terminate()


if __name__ == "__main__":
   main()

In main(), we select the LLM provider, install dependencies, and create a temporary working directory with a cipher.yml configuration. We then store key project decisions in Cipher’s memory, query them back, and finally start the Cipher API server briefly before shutting it down, demonstrating both CLI and API-based interactions.

In conclusion, we have a working Cipher environment that securely manages API keys, selects the right LLM provider automatically, and configures a memory-enabled agent entirely through Python automation. Our implementation includes decision logging, memory retrieval, and a live API endpoint, all orchestrated in a Notebook/Colab-friendly workflow. This makes the setup reusable for other AI-assisted development pipelines, allowing us to store and query project knowledge programmatically while keeping the environment lightweight and easy to redeploy.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Building a Secure and Memory-Enabled Cipher Workflow for AI Agents with Dynamic LLM Selection and API Integration appeared first on MarkTechPost.

MarkTechPost

August 13, 2025

NVIDIA AI Introduces End-to-End AI Stack, Cosmos Physical AI Models and New Omniverse Libraries for Advanced Robotics

Nvidia made major waves at SIGGRAPH 2025 by unveiling a suite of new Cosmos world models, robust simulation libraries, and cutting-edge infrastructure—all designed to accelerate the next era of physical AI for robotics, autonomous vehicles, and industrial applications. Let’s break down the technological details, what this means for developers, and why it matters to the future of embodied intelligence and simulation.

Cosmos World Foundation Models: Reasoning for Robots

Cosmos Reason: Vision-Language Model for Physical AI

At the heart of the announcement is Cosmos Reason, a 7-billion-parameter reasoning vision-language model. This AI is engineered for robots and embodied agents tackling real-world tasks:

Memory and Physics Awareness: Cosmos Reason incorporates advanced memory for spatial and temporal reasoning, plus an understanding of physical laws. This lets robots and AI agents actually “plan” step-by-step actions in complex environments—making it ideal for data curation, robot planning, and video analytics.
Planning Capability: The model feeds structured video and sensor data (like segmentation maps and LIDAR) into a reasoning engine that decides what moves an agent should take next. It supports both high-level instruction parsing and low-level action generation, mimicking human-like logic for navigation and manipulation.

Cosmos Transfer Models: Turbocharging Synthetic Data Generation

Cosmos Transfer-2: Accelerates generation of synthetic datasets from 3D simulation scenes or spatial control inputs, vastly reducing the time and cost to produce realistic robot training data. This is especially helpful for reinforcement learning and policy model validation—where edge cases, diverse lighting, and weather scenarios must be modeled at scale.
Distilled Transfer Variant: Optimized for speed, letting developers iterate fast on dataset creation.

Practical Impact

The Cosmos WFM family spans three categories (Nano, Super, Ultra), ranging from 4 billion to 14 billion parameters, and can be fine-tuned for varied latency, fidelity, and use cases from real-time streaming to photorealistic rendering.

Simulation and Rendering Libraries: Creating Virtual Worlds for Training

Nvidia’s Omniverse platform gets a major update, adding:

Neural Reconstruction Libraries: These tools allow developers to import sensor data and simulate the physical world in 3D with lifelike photorealism, powered by neural rendering techniques.
Integration with OpenUSD and CARLA Simulator: The addition of new conversion tools and rendering capabilities helps standardize complex simulation workflows, making it easier to interoperate between robotics frameworks (like Mujoco) and Nvidia’s USD-based pipeline.
SimReady Materials Library: Offers thousands of substrate materials for creating highly realistic virtual environments, boosting the fidelity of robotics training and simulation.

Isaac Sim 5.0.0: Nvidia’s simulation engine now includes enhanced actuator models, broader Python and ROS support, and new neural rendering for better synthetic data.

Infrastructure for Robotics Workflows

RTX Pro Blackwell Servers: Purpose-built for robotic development workloads, providing unified architecture for simulation, training, and inference tasks.
DGX Cloud: Enables cloud-based management and scaling of physical AI workflows, so teams can develop, train, and deploy AI agents remotely.

Industry Adoption and Open Innovation

Industry leaders—including Amazon Devices, Agility Robotics, Figure AI, Uber, Boston Dynamics, and more—are already piloting Cosmos models and Omniverse tools to generate training data, build digital twins, and accelerate the deployment of robotics in manufacturing, transportation, and logistics.

Cosmos models are broadly available through Nvidia’s API and developer catalogs, with a permissive license supporting both research and commercial usage.

A New Era for Physical AI

Nvidia’s vision is clear: physical AI is a full-stack challenge, demanding smarter models, richer simulation, and scalable infrastructure. With the Cosmos model suite, Omniverse libraries, and Blackwell-powered servers, Nvidia is closing the gap between virtual training and real-world deployment—reducing costly trial-and-error and unlocking new levels of autonomy for robots and intelligent agents.

Check out the technical article from NVIDIA blog. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.