✨ NEW · Bare Metal Servers with 20G Dedicated Unmetered Bandwidth → read more ✨ NEW · 20G Dedicated Unmetered Servers → STATUS

For the AI workloads you actually run

GPU Dedicated Servers

A100 80GB, A40 48GB and A10 24GB GPU based server for AI inference, fine-tuning and small teams of AI developers. Instant provision (in minutes) for zero hourly billing and availability in various jurisdictions.

A100 80GB · A40 48GB · A10 24GB · zero hourly billing

Configurations

Pick a configuration, deploy in minutes

10G, Free DDoS, no hourly charge. Choose a plan and go live in minutes.

Compare SKU CPU RAM Storage Network BW Locations Features From /mo Order
ACCELERATOR-A10-EU 64c / 128t · AMD EPYC 7702p 256 GB → 1024 2x4TB NVMe 2x 10GbE 100 TB NL Quote
ACCELERATOR-A100-EU 64c / 128t · AMD EPYC 7702p 512 GB → 1024 2x4TB NVMe 2x 10GbE 100 TB NL Quote

Side-by-side comparison

Use cases

Match the GPU to the workload

If all your AI production workloads run fine with H100 then you only need capacity for A100 for those workloads and A40 or A10 for the rest.

  • LLM inference A100 80GB supports running Llama 3.1 70B-Q4 at approximately 30 tokens/second. A40 48GB should be fast enough for 13B-34B models. A10 24GB is price-performance sweet-spot for 7B-13B models.
  • Fine-tuning + LoRA 2× A100 NVLink for fine-tuning a 70B model, A100 80GB single A100 for fine-tuning 13B–34B models using LoRA. Note that storage is charged, however there is no egress charge for data that already is stored.
  • CV + embeddings A40 48GB: large batch CV pipelines (Detectron2, SAM, Stable Diffusion XL) - A10: large embeddings / ranking models in production.
  • Render + simulation A few A40's in parallel for work like hours long renderings in Blender, Houdini or Octane Render (for RT cores or NVENC etc).

The lineup

A100, A40, A10 — purpose-built for production AI

As the H100 units are for the time being somewhat of a restricted part, price tagged in a manner which does not render it very suitable for the vast majority of AI-focused production workloads, we here at Netrouting are not resellers of this GPU nor do we keep H100 units in stock for immediate deployment to customers.

  • A100 80GB SXM4 80 GB HBM2e / 6,912 CUDA cores / 432 Tensor cores / 600 GB/s NVLink (training + 70B inference). Single card or 2-way NVLink.
  • A40 48GB Inference workhorse + render. 48 GB GDDR6 ECC, 10,752 CUDA cores, 336 Tensor cores, 84 RT cores. Best price/perf for 13B–34B inference and CV pipelines.
  • A10 24GB Our cheapest option for Inference is the A10. It is low cost to purchase and then to serve with, and is therefore best for large scale Embedding Services (7B-13B parameters), and large scale rerankers (similar parameter size). This doesn't need to cost as much as an A100 for training 70B parameters.

Included with every plan

No surprises in the bill or the install

All GPU plans come with everything you'll need after you deploy your inference endpoint.

  • Free always-on DDoS Free always-on DDoS (Edge scrubbing) for all our customers on all our plans - recently we handled a 600+ Gbps DDoS attack on our edge servers without any of the inference endpoints at any of the edge locations to drop a single request.
  • Unmetered 10G network We charge one price for the ability to serve as many inferences. No egress fees, no inferences per GB charges. Whether you download a 70B model from Hugging Face (like we did here for testing purposes) or use a smaller one, and whether you serve 100,000 inferences per month or many, many more – you get to serve as many inferences as your application requires for the price quoted.
  • 24/7 monitoring + engineers Our team of real network engineers is monitoring your infrastructure 24/7. In the event of an incident our engineers will respond within 5 minutes on work days and 15 minutes on non work days 24/7.
  • CUDA-ready images Ubuntu 24.04 Server Edition with the latest available NVIDIA drivers, as well as the CUDA Toolkit installed. Should you happen to encounter a problem with the kernel, we can re-image your server in a matter of minutes.

Why Netrouting

Predictable, fast, jurisdiction-flexible

  • Predictable monthly pricing No hourly charges, no egress fees. A 24/7 running model does not cost more for running over hours or peak traffic hours. Your monthly bill will simply be the same number as always.
  • Fast deploy, real engineers All GPU plans are deployed within the same day. The real network engineers at Netrouting will also assist you with any problems that you might have with your CUDA environment, the NVIDIA drivers or with kernel panics. Automated support-bots will not be involved in solving your problem.
  • Pick your jurisdiction Host your servers in Amsterdam, Frankfurt or Stockholm under our Dutch B.V. or in Miami or New York under our Florida Inc. This way your model weights as well as customer data are subject to the law of the country of your choice and not to that of your hosting provider.

Common questions

Pricing FAQ

  • Which types of attacks does Netrouting DDoS Protection block automatically?

    Volumetric floods (e.g. UDP, ICMP and TCP floods) as well as reflection/amplification attacks (e.g. DNS, NTP, memcached etc.) and protocol based attacks (e.g. SYN floods, ACK floods etc. as well as fragmentation and malformed packet attacks). We also absorb multi-vector attacks.

  • What happens if a DDoS attack exceeds our mitigation capacity?

    In case the incoming traffic load (DDoS attack) is bigger than the traffic that can be filtered by us, then the targeted IP will be black-holed.

  • Do I need to enable Netrouting DDoS Protection?

    Dedicated DDoS Protection is automatically enabled for all relevant services such as Cloud Compute, Colocation, Bare Metal servers and GPU servers. No activation is required. The protection starts as soon as the related resource goes online.

  • Does Netrouting DDoS Protection cover application-layer (Layer 7) attacks?

    No, Netrouting DDoS Protection is a network-layer attack mitigation service, it is designed to protect against volumetric traffic attacks (so called floods) at network-layer (so called Layer 3/4). For application-layer attacks (so called Layer 7 attacks) like HTTP floods, slowloris, etc. you would use a Web Application Firewall (WAF) or layer 7 packet processing, edge caching services like Cloudflare or Akamai.

  • What is Netrouting DDoS Protection

    Gaming / DDoS | Netrouting DDoS Protection - free, always-on service protecting all Netrouting infrastructure (Cloud Compute, Colocation, Bare Metal servers, GPU servers) from network-layer (L3/L4) DDoS attacks. No setup required.

  • How do I test the network speed from my location?

    Every city page now includes the public test IP address for that location, as well as a link to 100 MB, 1 GB and 10 GB test files. You can download these files from the location nearest to you with curl or wget, or download them from your browser.

  • Is the bandwidth dedicated or shared?

    Netrouting provides dedicated uplink ports (e.g. 1GbE, 10GbE). This means that there is dedicated bandwidth and a dedicated uplink port, which means that there is no loss of performance and quality caused by shared uplink usage as found with other providers.