BIG Scalability
Nov 19, 2008
I have used this forum before to find good suggestions for VPSs, but now I'm in need of something that I have no idea where to go to to get the solution, or even what that solution may be.
First, my situation is this...
I've built a web application in PHP for an emerging company and it's primary function is to crawl remote websites that we are provided API access to (via lib_curl_multi). The company's clients login to their account and initiate the crawler on a domain of their choice that has API access. They are then added to a queue (to prevent abuse/server overloading) which performs the crawl. The crawl process takes anywhere from 1 minute to an hour and uses anywhere from 50KB of bandwidth to 30MB depending on the remote domain.
The client anticipates their first 3 months needs to be 1,000 crawls every 24 hour period, obviously meaning crawls would have to be done in simultaneous 'groups' to ensure all 1,000 in the queue can be done throughout the day.
That means for their first 3 months, they need bandwidth of about 10-15GB per day and I have NO CLUE what kind of hardware setup they'll need.
That's not really the issue though as I'm sure most dedicated server setups found here can support that. The problem is that after that 3 months they anticipate a ten-fold increase (obviously this would be a gradual build-up) in the number of crawls needed to be done daily, meaning 10,000.
Now that's a huge increase in bandwidth as well as CPU and memory needs. What kind of setup and/or host could accomodate this constant need for scaling without charging ridiculous prices?
My theory is there needs to be one dedicated server or VPS to serve up the website and its content, whereas there can be one or many dedicated servers (expand as they grow kind of deal) that process the crawler queue in the background (hopefully geographically dispersed as their clients are worldwide). EDIT: I forgot to mention that if the website server is separate from the others, they MUST share the same MySQL database as that is where the queue is stored.
I hope I didn't confuse anyone. I'm great with programming, but hardware and hosting's not my strong point so please let me know if you need clarification.
View 11 Replies
Jul 2, 2009
MySQL just released an update including "scalability improvements" -- how badly were these needed?
"An update has been released for initial preview release of MySQL 5.4. The release contains scalability improvements and additional DTrace probes for diagnostic troubleshooting on Solaris."
View 0 Replies
View Related
Jun 11, 2008
I'm just curious as to what kind of things the huge sites--Youtube, Myspace, etc.--are doing to try to keep scalable. What sites do you guys just hate for failing in this regard, and perhaps most importantly, what are some ways we can prevent downtime?
View 4 Replies
View Related
Dec 1, 2008
Does support matter if there was 100% uptime and scalability?
Our team has been developing scalable sites since 2004. We started renting servers from Layeredtech then, since they had good reviews and they were still good until we migrated away from dedicated server land. Although we have systems administration backgrounds, it still took time away from developing software in order to administer the servers (look over logs, backups/restores, performance graphs, hardware failure, etc). Having said that, one thing I've noticed is that customers are usually happy if servers are always running and running fast.
To get rid of the systems administration part we tried Mosso (they had just released, great support but a lot of problems), we tried mediatemple's grid (also had a lot of problems), couldnt try EC2 because of persistent storage, and lastly we are currently using thegridlayer (it lags, the initial request takes about a second to display a page with no load on the server).
The next things to try were VPS then managed dedicated servers. We decided to try VPSes so we can isolate sites from each other and add VPSes as needed for specific sites. So I got a zone.net and they were running fine until they had a problem mentioned here. People recommended them because they had fast servers, now is the opposite because of this one downtime.
So finally, my questions:
1) how much do you think support is needed if your hosts provides fast servers and 100% uptime?
2) What measures do you take (if any) to verify the host's procedures such as backups, company size, profitability, etc?
3) How do you verify that a host is not overselling before buying a hosting package (assuming shared or VPS)?
View 9 Replies
View Related