Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Every Nodejitsu app is down (jit.su)
34 points by switz on April 9, 2013 | hide | past | favorite | 45 comments


We are working on a major outage from our database provider, so, please, keep calm :) Our platform will be up asap.

Check http://twitter.com/nodejitsu for updates.

Sorry for the inconveniences and thanks for your patience.


A friendly word of PR advice: I'm not a customer and not affected by your outage, but if I were, being told to keep calm would really irritate me, especially if I were fighting fires right now. If you want someone to calm down, telling them to calm down generally won't have the desired effect.


Seconded.


My nodejitsu applications have been having a notable amount of trouble over the past two weeks. At various times there would be momentary downtime, broken deployers, and slow updates to status.jit.su. This, however, has been the longest and most agitating downtime yet. Looks like I'll have to move back to handling my servers myself.


I think the main issue with this is the status page. It should be hosted off their normal infrastructure, Heroku went through this a year or two ago if I recall correctly. I don't mind downtime with a PaaS (within reason), it's having zero visibility into the problems, what exactly they are doing to remedy them, and what the ETA for the fixes are.


As is often the case, their Twitter stream is a backup for news:

https://twitter.com/nodejitsu


So then what happens if twitter also fail-whales?


You go to http://status.twitter.com/, which is powered by Tumblr which, in one of my favourite bits of Internet insanity, uses Twitter for status announcements: https://twitter.com/tumblr (there is a status.tumblr.com but it's blank).


Nodejitsu is incredibly unreliable. I use them only because they are so cheap and quick. Anything of value should not be hosted on Nodejitsu. This disappoints me greatly as Nodejitsu is a company i want to love.


Can you please give some details because that's quite a damning statement for a company that professes to be build on enterprise ready infrastructure. I would default to be skeptical of statements like yours, everyone has downtime.


Here's what I can remember off the top of my head:

- Deployments often fail with a nonspecific 500 error. I'd say near 50% of my deploys end up failing because of this.

- Downtime occurs at random times. It's usually just a few minutes (maybe 10)

- In the past I've had problems with certain routes always returning 500 errors when loaded in groups, but not alone. This problem hasn't happened for a while.

- The admin panel is at its best only "kind of" responsive. Most of the time it's a complete mystery as to whether commands issued there ever make it out to your instance(s).


seconded. not to mention the rest..


We also really wanted to love Nodejitsu, but our patience on the downtime has been exhausted. SSL has been down at least once a day recently from load balancing issues.


I feel the same - intermittent short (1min) down times happen pretty regularly, about once a month, and I only know that from my own monitoring tools - no emails or notifications from them. I want to love them too :(


So far its uptime seems comparable to other PaaS services.

Curious what opinions HNers have on this is, though:

How important is it to have pretty downtime screens, or nothing at all? In this case it's 404-ing, which is inaccurate. Would it be better to just not respond to the HTTP request than to show this?

Also have other PaaS services like Heroku got these messages right?


I frankly quite hate it, and no.

Depending on the kind of error they're experiencing, Heroku will show a quite unhelpful and generic "Application is down" screen.

I know they're limited in what they can show back given a certain kind of error but it's real crappy to let your users or clients hung out to dry.

It's especially loathsome with clients because it's indistinguishable from you fucking up. I do wish they showed a "Oh hello we're having a spot of trouble. Please hang tight" instead. Mad points for letting you style the page to match your site.

Of course, as a service provider this would not quite be something you spend a lot of time thinking about or optimizing for.


This is one of those perennial problems: if you're spending engineering time making your "we're down" screen look good, you're really optimizing for the wrong thing, but on the other hand, some downtime is unavoidable and you don't want to look unprofessional.

An HTTP 500 would definitely be more appropriate in this case, but it seems a 400 is accurate -- it's down because it has somehow forgotten where the apps are (their Twitter stream says a problem with load balancers).


I know I'm getting in the weeds here but no, 400 is not appropriate -- all 4xx errors indicate an issue with user input (with the possible exception of 418). 5xx errors indicate the problem is occurring on the server side.


Yeah, that's fair. There's some grey area though -- like, if you accidentally delete a file on your site, your server will return 400. How is the server supposed to know it's accidental? So it's accurate (the user has requested a file that doesn't exist) while at the same time being misleading -- the file should be there.


I don't see a grey area in that example. I do see a bug tho :) - the server should return 404 (File Not Found) or 410 (Gone). As you say, the server has no way to know whether the file was deleted by accident. A smart server should return 410 if possible. That signals to the user: "Your request was perfect, but the file you wanted is no longer here". If your server returns 404, what it really means is "Hey, your request is fine, I just can't find what you're asking for". A 400 is the "Bad Request" error code; you would typically return this in web API contexts. The message here is "Hey, your request wasn't fine. I don't know what to do with it"

If nodejitsu was returning any 4xx errors during its downtime, that's a bug that should be fixed. In this situation, they should have returned either a 500 or a 503, and I'd probably pick 503, and leave 500 to any web apps that suffered an internal error. This way you know that it's not your app but the platform it's running on.

As always, the best place to look up the details is in the spec: http://tools.ietf.org/html/rfc2616 -- long, but quite readable.


A lot of people are talking shit about a service they've either never tried, or are paying $3 per drone per month for. Relax.

Having been with nodejitsu for 4 months now, I can say with confidently that they provide the best support out there. 24/7 IRC support with guys that go beyond the call of duty is worth way more than $3.

Yes, jitsu deploy used to have its 500 problems, but when we reduced our package size these all went away. We've had +98% deploy success for the last 4 weeks.

Yes, they are working on enterprise solutions, and if you're paying for enterprise, nobody forced you to buy into a provider that is less than 3 months old.

Nodejitsu have teething problems, but as far as I'm concerned, I'm happy with the experience, and nobody has a gun to my head to stick with nodejitsu if I didn't want to.


https://twitter.com/nodejitsu/status/321775765962240000

"Sorry for the inconveniences, we found that our database provider (@IrisCouch) is down at the moment, we are bringing up the cloud."

I'm pretty surprised to learn that they don't run their own databases (and also that their database is CouchDB, but maybe I'm just old fashioned).


My two cents: CouchDB is great in that it allows you to remove the backend and rely solely on a database with REST baked in. However, it has its drawbacks, like database queries can be costly do to transactions over http.

It's like every other argument in the community, how much control over ease do you want?


So you think these drawbacks are in someway related to the fact that they are offline? It seems like you are just jumping for a chance to kick a project and or spread some FUD.

Not to mention "database queries can be costly do to transaction over http" => Not gonna get into this too much, but couchdb has an optimistic concurrency model, yeah?

An implementation that uses a technology ( or like iriscouch is a db as a service) is not that technology. If heroku postgresdb's go down because they configure things incorrectly is that the fault of the project? Could be, could be documentation was lacking, or it was a real footgun. It is also possible to deploy technology within a budget where failures are the expected outcome, it is also very possible to deploy technology in a manner which does not support a use case, or may not even be something the technology can support, that does not make it 'bad' or 'great', and has nothing to do with "control vs ease" or http vs a yet to be named wire protocol...

I just don't see how this is helpful, accurate, or adds to the discussion, sorry if I am jumping down your throat about it.


You are a bit, but that's ok, fwiw, I'm speaking from my experience whether it's right or wrong, which I felt had value. I only meant that after using it on several projects, I felt that it had some cool features, but there are quite a few 'gotchas' that come with it. That is, come with thinking it's like any other database, which it's not.


We are back to life, thanks for your support and your patience.


I have to agree, I was pretty excited when I started using nodejitsu but it's insane how little regard they have for uptime and reliability. Just the other day my app suddenly started pointing to some other nodejitsu customer. I asked about this in IRC and was told "I'm rolling out new balancing architecture today" and that once that was done my app would start working "normally." Totally. Crazy.


Nodejitsu itself is down. http://isup.me/nodejitsu.com


Come on guys, every provider goes down some time or the other! Unfortunately, today it was their turn.

Let's just give them some space to improve and they will!

I think Nodejitsu is a great company! They are cheap, and awesome in support! :)


This is the kind of downtime that would never happen again: http://webcache.googleusercontent.com/search?q=cache:rk8FQr-...

The employee page is unavailable right now. But if you look through the bios, they have a bunch of young open source hackers, but virtually zero operations experience, and virtually zero ops culture, unless you count hanging out in an IRC channel sometimes.


Operations, like management, is something that engineers who haven't been directly involved in, under estimate. It's like Python programmers telling an embedded systems guy that an ad-hoc database is pretty straightforward, just use a b-tree for heavens sake, everyone learns how to build a basic tree.

That said, there is also a tremendous seduction in leaving the problems to others. Nobody really wants their programming or configuration mistakes to call them up in the middle of the night.

Scaling is hard and it's why many products die when they find they can't scale. But it also keeps good devops people in high demand. So I really can't complain.


Obvious troll is obvious. You don't have a clue what you're talking about. They run thousands of instances on an infrastructure they've built mostly themselves. They built the platform to be provider agnostic, meaning they can run it on EC2, Rackspace Cloud, or an internal rack.

If you actually look at the open source projects listed you'll see most of them are from Nodejitsu and are used internally.


Ops is not about building the platform, it's about running the service.


The existence of this thread is the difference between high availability and disaster recovery.


Error handling in Node sucks. I hope they can find what is causing this.


This problem has nothing to do with node. It is a problem with a hosting service for node.

JavaScript has exceptions. It has booleans. It has integers. What more do you need to handle an error?


Error handling is node is terrible due to a lack of consistency about errors. Any function can potentially either:

a) throw an exception, which will crash your app

b) emit an error event, which will crash your app if you aren't handling it

c) include the error in the first response of the callback

Domains are a knee jerk reaction to try and remedy this, but it's a pretty gross bandaid. I doubt that Nodejitsu's specific issues here are related to this, but it's one of the main reasons that I've moved away from building things in Node.


"JavaScript has exceptions"

But I miss those colorful exceptions like javax.management.modelmbean.InvalidTargetObjectTypeException :P


hosting service presumably built on node... it's a PITA tracking down errors in production in node moreso than any other popular language ever


Looks like they were down for about 90 minutes maybe less


My nodejitsu site is down, it was up yesterday.

404

No application found for "adams.jit.su"


this is shitty. sorry about your business Nodejitsu, but you just lost us all.


AWS, Heroku, and many other PaaS providers have had their own share of downtime occasionally. I see nothing so damning about this particular instance of PaaS going down that warrants complete abandonment of the platform.


Speak for yourself.


totally agree




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: