21 Ways People Are Using A.I. at Work

Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
Artificial Intelligence
DeepSeek-R1-0528 has emerged as a groundbreaking open-source reasoning model that rivals proprietary alternatives like OpenAI’s o1 and Google’s Gemini 2.5 Pro. With its impressive 87.5% accuracy on AIME 2025 tests and significantly lower costs, it’s become the go-to choice for developers and enterprises seeking powerful AI reasoning capabilities.
This comprehensive guide covers all the major providers where you can access DeepSeek-R1-0528, from cloud APIs to local deployment options, with current pricing and performance comparisons. (Updated August 11, 2025)
The most cost-effective option
Enterprise-grade managed solution
Performance-optimized options
Competitive cloud option
Premium performance provider
Provider | Input Price/1M | Output Price/1M | Key Features | Best For |
---|---|---|---|---|
DeepSeek Official | $0.55 | $2.19 | Lowest cost, off-peak discounts | High-volume, cost-sensitive |
Together AI (Throughput) | $0.55 | $2.19 | Production-optimized | Balanced cost/performance |
Novita AI | $0.70 | $2.50 | GPU rental options | Flexible deployment |
Together AI (Standard) | $3.00 | $7.00 | Premium performance | Speed-critical applications |
Amazon Bedrock | Contact AWS | Contact AWS | Enterprise features | Regulated industries |
Hugging Face | Free | Free | Open source | Local deployment |
Prices are subject to change. Always verify current pricing with providers.
DeepSeek-R1-0528-Qwen3-8B
Recommendation: DeepSeek Official API
Recommendation: Together AI or Novita AI
Recommendation: Amazon Bedrock
Recommendation: Hugging Face + Ollama
DeepSeek-R1-0528 offers unprecedented access to advanced AI reasoning capabilities at a fraction of the cost of proprietary alternatives. Whether you’re a startup experimenting with AI or an enterprise deploying at scale, there’s a deployment option that fits your needs and budget.
The key is choosing the right provider based on your specific requirements for cost, performance, security, and scale. Start with the DeepSeek official API for testing, then scale to enterprise providers as your needs grow.
Disclaimer: Always verify current pricing and availability directly with providers, as the AI landscape evolves rapidly.
The post The Complete Guide to DeepSeek-R1-0528 Inference Providers: Where to Run the Leading Open-Source Reasoning Model appeared first on MarkTechPost.
MarkTechPost
China continues to set the pace in open-source large-language-model innovation, especially for agentic architectures and deep reasoning. Here is a comprehensive, up-to-date guide to the best Chinese open agentic/reasoning models, expanded with the newest and most influential entrants.
Model | Best For | Agentic? | Multilingual? | Context Window | Coding | Reasoning | Unique Features |
---|---|---|---|---|---|---|---|
Kimi K2 | All-purpose agentic | Yes | Yes | 128K | High | High | Mixture-of-Experts, fast, open |
GLM-4.5 | Agent-native applications | Yes | Yes | 128K+ | High | High | Native task/planning API |
Qwen3 | Control, multilingual, SaaS | Yes | Yes (119+) | 32K–1M | Top | Top | Fast mode switching |
Qwen3-Coder | Repo-scale coding | Yes | Yes | Up to 1M | Top | High | Step-by-step repo analysis |
DeepSeek-R1/V3 | Reasoning/math/science | Some | Yes | Large | Top | Highest | RLHF, agentic science, V3: 671B |
Wu Dao 3.0 | Modular, multimodal, SME | Yes | Yes | Large | Mid | High | Text/image, code, modular builds |
ChatGLM | Edge/mobile agentic use | Yes | Yes | 1M | Mid | High | Quantized, resource-efficient |
Manus | Autonomous agents/voice | Yes | Yes | Large | Task | Top | Voice/smartphone, real-world AGI |
Doubao 1.5 Pro | Logic-heavy enterprise | Yes | Yes | 1M+ | Mid | Top | 1M+ tokens, logic structure |
Baichuan/etc | Industry-specific logic | Yes | Yes | Varies | Varies | High | Sector specialization |
The post The Best Chinese Open Agentic/Reasoning Models (2025): Expanded Review, Comparative Insights & Use Cases appeared first on MarkTechPost.
MarkTechPost
Embodied AI agents that can perceive, think, and act in the real world mark a key step toward the future of robotics. A central challenge is building scalable, reliable robotic manipulation, the skill of deliberately interacting with and controlling objects through selective contact. While progress spans analytic methods, model-based approaches, and large-scale data-driven learning, most systems still operate in disjoint stages of data collection, training, and evaluation. These stages often require custom setups, manual curation, and task-specific tweaks, creating friction that slows progress, hides failure patterns, and hampers reproducibility. This highlights the need for a unified framework to streamline learning and assessment.
Robotic manipulation research has progressed from analytical models to neural world models that learn dynamics directly from sensory inputs, using both pixel and latent spaces. Large-scale video generation models can produce realistic visuals but often lack action conditioning, long-term temporal consistency, and multi-view reasoning needed for control. Vision-language-action models follow instructions but are limited by imitation-based learning, preventing error recovery and planning. Policy evaluation remains challenging, as physics simulators require heavy tuning, and real-world testing is resource-intensive. Existing evaluation metrics often emphasize visual quality over task success, highlighting the need for benchmarks that better capture real-world manipulation performance.
The Genie Envisioner (GE), developed by researchers from AgiBot Genie Team, NUS LV-Lab, and BUAA, is a unified platform for robotic manipulation that combines policy learning, simulation, and evaluation in a video-generative framework. Its core, GE-Base, is a large-scale, instruction-driven video diffusion model capturing spatial, temporal, and semantic dynamics of real-world tasks. GE-Act maps these representations to precise action trajectories, while GE-Sim offers fast, action-conditioned video-based simulation. The EWMBench benchmark evaluates visual realism, physical accuracy, and instruction-action alignment. Trained on over a million episodes, GE generalizes across robots and tasks, enabling scalable, memory-aware, and physically grounded embodied intelligence research.
GE’s design unfolds in three key parts. GE-Base is a multi-view, instruction-conditioned video diffusion model trained on over 1 million robotic manipulation episodes. It learns latent trajectories that capture how scenes evolve under given commands. Building on that, GE-Act translates these latent video representations into real action signals via a lightweight, flow-matching decoder, offering quick, precise motor control even on robots not in the training data. GE-Sim repurposes GE-Base’s generative power into an action-conditioned neural simulator, enabling closed-loop, video-based rollout at speeds far beyond real hardware. The EWMBench suite then evaluates the system holistically across video realism, physical consistency, and alignment between instructions and resulting actions.
In evaluations, Genie Envisioner showed strong real-world and simulated performance across varied robotic manipulation tasks. GE-Act achieved rapid control generation (54-step trajectories in 200 ms) and consistently outperformed leading vision-language-action baselines in both step-wise and end-to-end success rates. It adapted to new robot types, like Agilex Cobot Magic and Dual Franka, with only an hour of task-specific data, excelling in complex deformable object tasks. GE-Sim delivered high-fidelity, action-conditioned video simulations for scalable, closed-loop policy testing. The EWMBench benchmark confirmed GE-Base’s superior temporal alignment, motion consistency, and scene stability over state-of-the-art video models, aligning closely with human quality judgments.
In conclusion, Genie Envisioner is a unified, scalable platform for dual-arm robotic manipulation that merges policy learning, simulation, and evaluation into one video-generative framework. Its core, GE-Base, is an instruction-guided video diffusion model capturing the spatial, temporal, and semantic patterns of real-world robot interactions. GE-Act builds on this by converting these representations into precise, adaptable action plans, even on new robot types with minimal retraining. GE-Sim offers high-fidelity, action-conditioned simulation for closed-loop policy refinement, while EWMBench provides rigorous evaluation of realism, alignment, and consistency. Extensive real-world tests highlight the system’s superior performance, making it a strong foundation for general-purpose, instruction-driven embodied intelligence.
Check out the Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Genie Envisioner: A Unified Video-Generative Platform for Scalable, Instruction-Driven Robotic Manipulation appeared first on MarkTechPost.
MarkTechPost
NuMind AI has officially released NuMarkdown-8B-Thinking, an open-source (MIT License) reasoning OCR Vision-Language Model (VLM) that redefines how complex documents are digitized and structured. Unlike traditional OCR systems, NuMarkdown-8B-Thinking doesn’t just extract text—it thinks about a document’s layout, structure, and formatting before generating a precise, ready-to-use Markdown file.
This makes it the first reasoning VLM purpose-built for converting PDFs, scanned documents, and spreadsheets into clean, structured Markdown—ideal for Retrieval-Augmented Generation (RAG) workflows, AI-powered knowledge bases, and large-scale document archiving.
The model introduces a reasoning-first approach to OCR. Instead of directly rendering extracted text, NuMarkdown-8B-Thinking generates “thinking tokens” — internal reasoning steps that help it understand document layouts before producing the final output.
This capability allows it to handle formats and structures that stump most conventional and even AI-powered OCR systems, including:
The number of reasoning tokens varies with complexity—anywhere from 20% to 500% of the final Markdown length—showing how much the model “thinks” before it “writes.”
NuMarkdown-8B-Thinking is a fine-tuned version of Qwen 2.5-VL-7B from Alibaba—one of the strongest open-source multi-modal models available.
Its training pipeline involved two key phases:
This two-stage process gave NuMarkdown-8B-Thinking the ability to maintain high accuracy even on challenging layouts that typically require human-level judgment.
In independent evaluations and user testing, NuMarkdown-8B-Thinking demonstrates state-of-the-art reasoning for OCR-to-Markdown tasks:
Users particularly highlight its ability to:
Imagine a scanned annual report page with:
NuMarkdown-8B-Thinking first produces reasoning tokens outlining the structure (“Column 1: Intro paragraph… Column 2: Continue paragraph… Footer text at bottom… Table spans two columns…”), then outputs Markdown that accurately reflects both content and layout.
This transparent reasoning layer makes the model’s decisions auditable—a major plus in enterprise, legal, and archival contexts.
Whether you’re a researcher, developer, or enterprise AI engineer, NuMarkdown-8B-Thinking is ready to slot into your workflow:
Its MIT License ensures full freedom for commercial, academic, or personal projects—no vendor lock-in or costly API gates.
For industries that rely on accurate document digitization—finance, legal, healthcare, government archives—layout fidelity is as important as textual accuracy. Most OCR systems treat layout as an afterthought; NuMarkdown-8B-Thinking treats it as a reasoning problem.
By combining open-sourcing, layout reasoning, and RAG-optimized Markdown output, NuMarkdown-8B-Thinking offers a transparent, verifiable, and high-performance alternative to proprietary document AI solutions.
Check out the Model on Hugging Face and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post NuMind AI Releases NuMarkdown-8B-Thinking: A Reasoning Breakthrough in OCR and Document-to-Markdown Conversion appeared first on MarkTechPost.
MarkTechPost
Music is often called the universal language of mankind. You can either agree or disagree, but one thing is for sure: music has the ability to change our mood. You could be feeling sad, but then you hear your favourite song, and suddenly you feel better. Music does make a difference, and that is why it is everywhere, in ads, films, YouTube intros, and meditation apps. ElevenLabs*, well-known as an AI voice generator for its realistic voice creation, has introduced Eleven Music.
Eleven Music is an AI music generator that allows anyone to make studio-quality songs for any moment, of any genre, style, vocals, or instrumental in minutes using simple text prompts. It even lets you tweak parts of the song, like editing a chorus or verse using prompts. It even supports multiple languages like English, Spanish, German, and Japanese.
Crucially, Eleven Music comes with commercial clearance. That’s thanks to licensing deals with Merlin Network and Kobalt Music Group, so creators, from freelancers to indie filmmakers, can use the tracks in films, ads, games, social videos, podcasts, and more without legal worry. Plus, built‑in safeguards prevent the AI from mimicking known artists, using copyrighted lyrics, or generating hateful or violent content.
According to company docs, music is generated in MP3 format at studio quality (44.1 kHz), tracks range from 10 seconds to 5 minutes, and both playlists and APIs are rolling out soon.
We often see creators facing licensing and copyright issues. In such cases, if an AI tool can create the perfect track for your work, why not give it a try?
I eagerly wanted to test this AI music generator, being an ElevenLabs user, and show how you can create your first song using it.
Step 1: Visit ElevenLabs’ website*. Click on the platforms option at the top and then select the Music option.
Step 2: If you are new to ElevenLabs, you will be asked a few quick questions to optimize your experience.
Step 3: Once you are in, you can get started immediately with your 10,000 free credits.
For my song, I went for the following setting:
Step 4: Eleven Music will generate songs for you fast. As I generated two variants, I could choose from two. I personally like the chorus on the second one.
Step 5: You can edit the song using a text prompt. As I loved the chorus on the second variant, I wanted the verse to be similar to that to match the vibe.
If you are happy, just like me. You can either download or share the Eleven Music-generated song.
ElevenLabs* was already a capable AI voice generator, but by adding this new AI music generation ability, it has only gotten better. I wouldn’t say it’s perfect, and it has to compete against Suno AI and Udio, which have been in the game for quite some time now and are very capable. I would say it is a solid AI music generator that makes studio-quality music production accessible for everyone. Eleven Music allows everyone to make perfect songs for any moment.
*Affiliate: We do make a small profit from the sales of this AI product through affiliate marketing. This is not an official list; we have tried to mention as many tools as possible.
AI Tools Club
In this tutorial, we walk through building a compact but fully functional Cipher-based workflow. We start by securely capturing our Gemini API key in the Colab UI without exposing it in code. We then implement a dynamic LLM selection function that can automatically switch between OpenAI, Gemini, or Anthropic based on which API key is available. The setup phase ensures Node.js and the Cipher CLI are installed, after which we programmatically generate a cipher.yml configuration to enable a memory agent with long-term recall. We create helper functions to run Cipher commands directly from Python, store key project decisions as persistent memories, retrieve them on demand, and finally spin up Cipher in API mode for external integration. Check out the FULL CODES here.
import os, getpass
os.environ["GEMINI_API_KEY"] = getpass.getpass("Enter your Gemini API key: ").strip()
import subprocess, tempfile, pathlib, textwrap, time, requests, shlex
def choose_llm():
if os.getenv("OPENAI_API_KEY"):
return "openai", "gpt-4o-mini", "OPENAI_API_KEY"
if os.getenv("GEMINI_API_KEY"):
return "gemini", "gemini-2.5-flash", "GEMINI_API_KEY"
if os.getenv("ANTHROPIC_API_KEY"):
return "anthropic", "claude-3-5-haiku-20241022", "ANTHROPIC_API_KEY"
raise RuntimeError("Set one API key before running.")
We start by securely entering our Gemini API key using getpass so it stays hidden in the Colab UI. We then define a choose_llm() function that checks our environment variables and automatically selects the appropriate LLM provider, model, and key based on what is available. Check out the FULL CODES here.
def run(cmd, check=True, env=None):
print("▸", cmd)
p = subprocess.run(cmd, shell=True, text=True, capture_output=True, env=env)
if p.stdout: print(p.stdout)
if p.stderr: print(p.stderr)
if check and p.returncode != 0:
raise RuntimeError(f"Command failed: {cmd}")
return p
We create a run() helper function that executes shell commands, prints both stdout and stderr for visibility, and raises an error if the command fails when check is enabled, making our workflow execution more transparent and reliable. Check out the FULL CODES here.
def ensure_node_and_cipher():
run("sudo apt-get update -y && sudo apt-get install -y nodejs npm", check=False)
run("npm install -g @byterover/cipher")
We define ensure_node_and_cipher() to install Node.js, npm, and the Cipher CLI globally, ensuring our environment has all the necessary dependencies before running any Cipher-related commands. Check out the FULL CODES here.
def write_cipher_yml(workdir, provider, model, key_env):
cfg = """
llm:
provider: {provider}
model: {model}
apiKey: ${key_env}
systemPrompt:
enabled: true
content: |
You are an AI programming assistant with long-term memory of prior decisions.
embedding:
disabled: true
mcpServers:
filesystem:
type: stdio
command: npx
args: ['-y','@modelcontextprotocol/server-filesystem','.']
""".format(provider=provider, model=model, key_env=key_env)
(workdir / "memAgent").mkdir(parents=True, exist_ok=True)
(workdir / "memAgent" / "cipher.yml").write_text(cfg.strip() + "n")
We implement write_cipher_yml() to generate a cipher.yml configuration file inside a memAgent folder, setting the chosen LLM provider, model, and API key, enabling a system prompt with long-term memory, and registering a filesystem MCP server for file operations. Check out the FULL CODES here.
def cipher_once(text, env=None, cwd=None):
cmd = f'cipher {shlex.quote(text)}'
p = subprocess.run(cmd, shell=True, text=True, capture_output=True, env=env, cwd=cwd)
print("Cipher says:n", p.stdout or p.stderr)
return p.stdout.strip() or p.stderr.strip()
We define cipher_once() to run a single Cipher CLI command with the provided text, capture and display its output, and return the response, allowing us to interact with Cipher programmatically from Python. Check out the FULL CODES here.
def start_api(env, cwd):
proc = subprocess.Popen("cipher --mode api", shell=True, env=env, cwd=cwd,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
for _ in range(30):
try:
r = requests.get("http://127.0.0.1:3000/health", timeout=2)
if r.ok:
print("API /health:", r.text)
break
except: pass
time.sleep(1)
return proc
We create start_api() to launch Cipher in API mode as a subprocess, then repeatedly poll its /health endpoint until it responds, ensuring the API server is ready before proceeding. Check out the FULL CODES here.
def main():
provider, model, key_env = choose_llm()
ensure_node_and_cipher()
workdir = pathlib.Path(tempfile.mkdtemp(prefix="cipher_demo_"))
write_cipher_yml(workdir, provider, model, key_env)
env = os.environ.copy()
cipher_once("Store decision: use pydantic for config validation; pytest fixtures for testing.", env, str(workdir))
cipher_once("Remember: follow conventional commits; enforce black + isort in CI.", env, str(workdir))
cipher_once("What did we standardize for config validation and Python formatting?", env, str(workdir))
api_proc = start_api(env, str(workdir))
time.sleep(3)
api_proc.terminate()
if __name__ == "__main__":
main()
In main(), we select the LLM provider, install dependencies, and create a temporary working directory with a cipher.yml configuration. We then store key project decisions in Cipher’s memory, query them back, and finally start the Cipher API server briefly before shutting it down, demonstrating both CLI and API-based interactions.
In conclusion, we have a working Cipher environment that securely manages API keys, selects the right LLM provider automatically, and configures a memory-enabled agent entirely through Python automation. Our implementation includes decision logging, memory retrieval, and a live API endpoint, all orchestrated in a Notebook/Colab-friendly workflow. This makes the setup reusable for other AI-assisted development pipelines, allowing us to store and query project knowledge programmatically while keeping the environment lightweight and easy to redeploy.
Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Building a Secure and Memory-Enabled Cipher Workflow for AI Agents with Dynamic LLM Selection and API Integration appeared first on MarkTechPost.
MarkTechPost
Nvidia made major waves at SIGGRAPH 2025 by unveiling a suite of new Cosmos world models, robust simulation libraries, and cutting-edge infrastructure—all designed to accelerate the next era of physical AI for robotics, autonomous vehicles, and industrial applications. Let’s break down the technological details, what this means for developers, and why it matters to the future of embodied intelligence and simulation.
At the heart of the announcement is Cosmos Reason, a 7-billion-parameter reasoning vision-language model. This AI is engineered for robots and embodied agents tackling real-world tasks:
The Cosmos WFM family spans three categories (Nano, Super, Ultra), ranging from 4 billion to 14 billion parameters, and can be fine-tuned for varied latency, fidelity, and use cases from real-time streaming to photorealistic rendering.
Nvidia’s Omniverse platform gets a major update, adding:
Isaac Sim 5.0.0: Nvidia’s simulation engine now includes enhanced actuator models, broader Python and ROS support, and new neural rendering for better synthetic data.
Industry leaders—including Amazon Devices, Agility Robotics, Figure AI, Uber, Boston Dynamics, and more—are already piloting Cosmos models and Omniverse tools to generate training data, build digital twins, and accelerate the deployment of robotics in manufacturing, transportation, and logistics.
Cosmos models are broadly available through Nvidia’s API and developer catalogs, with a permissive license supporting both research and commercial usage.
Nvidia’s vision is clear: physical AI is a full-stack challenge, demanding smarter models, richer simulation, and scalable infrastructure. With the Cosmos model suite, Omniverse libraries, and Blackwell-powered servers, Nvidia is closing the gap between virtual training and real-world deployment—reducing costly trial-and-error and unlocking new levels of autonomy for robots and intelligent agents.
Check out the technical article from NVIDIA blog. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post NVIDIA AI Introduces End-to-End AI Stack, Cosmos Physical AI Models and New Omniverse Libraries for Advanced Robotics appeared first on MarkTechPost.
MarkTechPost